Prediction Intervals for Machine Learning

A prediction from a machine learning perspective is a single point that hides the uncertainty of that prediction.

Prediction intervals provide a way to quantify and communicate the uncertainty in a prediction. They are different from confidence intervals that instead seek to quantify the uncertainty in a population parameter such as a mean or standard deviation. Prediction intervals describe the uncertainty for a single specific outcome.

In this tutorial, you will discover the prediction interval and how to calculate it for a simple linear regression model.

After completing this tutorial, you will know:

  • That a prediction interval quantifies the uncertainty of a single point prediction.
  • That prediction intervals can be estimated analytically for simple models, but are more challenging for nonlinear machine learning models.
  • How to calculate the prediction interval for a simple linear regression model.

Let’s get started.

Prediction Intervals for Machine Learning

Prediction Intervals for Machine Learning
Photo by Jim Bendon, some rights reserved.

Tutorial Overview

This tutorial is divided into 5 parts; they are:

  1. What Is Wrong With a Point Estimate?
  2. What Is a Prediction Interval?
  3. How to Calculate a Prediction Interval
  4. Prediction Interval for Linear Regression
  5. Worked Example

Need help with Statistics for Machine Learning?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Why Calculate a Prediction Interval?

In predictive modeling, a prediction or a forecast is a single outcome value given some input variables.

For example:

Where yhat is the estimated outcome or prediction made by the trained model for the given input data X.

This is a point prediction.

By definition, it is an estimate or an approximation and contains some uncertainty.

The uncertainty comes from the errors in the model itself and noise in the input data. The model is an approximation of the relationship between the input variables and the output variables.

Given the process used to choose and tune the model, it will be the best approximation made given available information, but it will still make errors. Data from the domain will naturally obscure the underlying and unknown relationship between the input and output variables. This will make it a challenge to fit the model, and will also make it a challenge for a fit model to make predictions.

Given these two main sources of error, their point prediction from a predictive model is insufficient for describing the true uncertainty of the prediction.

What Is a Prediction Interval?

A prediction interval is a quantification of the uncertainty on a prediction.

It provides a probabilistic upper and lower bounds on the estimate of an outcome variable.

A prediction interval for a single future observation is an interval that will, with a specified degree of confidence, contain a future randomly selected observation from a distribution.

— Page 27, Statistical Intervals: A Guide for Practitioners and Researchers, 2017.

Prediction intervals are most commonly used when making predictions or forecasts with a regression model, where a quantity is being predicted.

An example of the presentation of a prediction interval is as follows:

Given a prediction of ‘y’ given ‘x’, there is a 95% likelihood that the range ‘a’ to ‘b’ covers the true outcome.

The prediction interval surrounds the prediction made by the model and hopefully covers the range of the true outcome.

The diagram below helps to visually understand the relationship between the prediction, prediction interval, and the actual outcome.

Relationship between prediction, actual value and prediction interval

Relationship between prediction, actual value and prediction interval.
Taken from “Machine learning approaches for estimation of prediction interval for the model output”, 2006.

A prediction interval is different from a confidence interval.

A confidence interval quantifies the uncertainty on an estimated population variable, such as the mean or standard deviation. Whereas a prediction interval quantifies the uncertainty on a single observation estimated from the population.

In predictive modeling, a confidence interval can be used to quantify the uncertainty of the estimated skill of a model, whereas a prediction interval can be used to quantify the uncertainty of a single forecast.

A prediction interval is often larger than the confidence interval as it must take the confidence interval and the variance in the output variable being predicted into account.

Prediction intervals will always be wider than confidence intervals because they account for the uncertainty associated with e [error], the irreducible error.

— Page 103, An Introduction to Statistical Learning: with Applications in R, 2013.

How to Calculate a Prediction Interval

A prediction interval is calculated as some combination of the estimated variance of the model and the variance of the outcome variable.

Prediction intervals are easy to describe, but difficult to calculate in practice.

In simple cases like linear regression, we can estimate the confidence interval directly.

In the cases of nonlinear regression algorithms, such as artificial neural networks, it is a lot more challenging and requires the choice and implementation of specialized techniques. General techniques such as the bootstrap resampling method can be used, but are computationally expensive to calculate.

The paper “A Comprehensive Review of Neural Network-based Prediction Intervals and New Advances” provides a reasonably recent study of prediction intervals for nonlinear models in the context of neural networks. The following list summarizes some methods that can be used for prediction uncertainty for nonlinear machine learning models:

  • The Delta Method, from the field of nonlinear regression.
  • The Bayesian Method, from Bayesian modeling and statistics.
  • The Mean-Variance Estimation Method, using estimated statistics.
  • The Bootstrap Method, using data resampling and developing an ensemble of models.

We can make the calculation of a prediction interval concrete with a worked example in the next section.

Prediction Interval for Linear Regression

A linear regression is a model that describes the linear combination of inputs to calculate the output variables.

For example, an estimated linear regression model may be written as:

Where yhat is the prediction, b0 and b1 are coefficients of the model estimated from training data and x is the input variable.

We do not know the true values of the coefficients b0 and b1. We also do not know the true population parameters such as mean and standard deviation for x or y. All of these elements must be estimated, which introduces uncertainty into the use of the model in order to make predictions.

We can make some assumptions, such as the distributions of x and y and the prediction errors made by the model, called residuals, are Gaussian.

The prediction interval around yhat can be calculated as follows:

Where yhat is the predicted value, z is the critical value from the Gaussian distribution (e.g. 1.96 for a 95% interval) and sigma is the standard deviation of the predicted distribution.

We do not known in practice. We can calculate an unbiased estimate of the of the predicted standard deviation as follows (taken from Machine learning approaches for estimation of prediction interval for the model output):

Where stdev is an unbiased estimate of the standard deviation for the predicted distribution, n are the total predictions made, and e(i) is the difference between the ith prediction and actual value.

Worked Example

Let’s make the case of linear regression prediction intervals concrete with a worked example.

First, let’s define a simple two-variable dataset where the output variable (y) depends on the input variable (x) with some Gaussian noise.

The example below defines the dataset we will use for this example.

Running the example first prints the mean and standard deviations of the two variables.

A plot of the dataset is then created.

We can see the clear linear relationship between the variables with the spread of the points highlighting the noise or random error in the relationship.

Scatter Plot of Related Variables

Scatter Plot of Related Variables

Next, we can develop a simple linear regression that given the input variable x, will predict the y variable. We can use the linregress() SciPy function to fit the model and return the b0 and b1 coefficients for the model.

We can use the coefficients to calculate the predicted y values, called yhat, for each of the input variables. The resulting points will form a line that represents the learned relationship.

The complete example is listed below.

Running the example fits the model and prints the coefficients.

The coefficients are then used with the inputs from the dataset to make a prediction. The resulting inputs and predicted y-values are plotted as a line on top of the scatter plot for the dataset.

We can clearly see that the model has learned the underlying relationship in the dataset.

Scatter Plot of Dataset with Line for Simple Linear Regression Model

Scatter Plot of Dataset with Line for Simple Linear Regression Model

We are now ready to make a prediction with our simple linear regression model and add a prediction interval.

We will fit the model as before. This time we will take one sample from the dataset to demonstrate the prediction interval. We will use the input to make a prediction, calculate the prediction interval for the prediction, and compare the prediction and interval to the known expected value.

First, let’s define the input, prediction, and expected values.

Next, we can estimate the standard deviation in the prediction direction.

We can calculate this directly using the NumPy arrays as follows:

Next, we can calculate the prediction interval for our chosen input:

We will use the significance level of 95%, which is the Gaussian critical value of 1.69.

Once the interval is calculated, we can summarize the bounds on the prediction to the user.

We can tie all of this together. The complete example is listed below.

Running the example estimates the yhat standard deviation and then calculates the confidence interval.

Once calculated, the prediction interval is presented to the user for the given input variable. Because we contrived this example, we know the true outcome, which we also display. We can see that in this case, the 95% prediction interval does cover the true expected value.

A plot is also created showing the raw dataset as a scatter plot, the predictions for the dataset as a red line, and the prediction and prediction interval as a black dot and line respectively.

Scatter Plot of Dataset With Linear Model and Prediction Interval

Scatter Plot of Dataset With Linear Model and Prediction Interval

Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

  • Summarize the difference between tolerance, confidence, and prediction intervals.
  • Develop a linear regression model for a standard machine learning dataset and calculate prediction intervals for a small test set.
  • Describe in detail how one nonlinear prediction interval method works.

If you explore any of these extensions, I’d love to know.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Posts

Books

Papers

API

Articles

Summary

In this tutorial, you discovered the prediction interval and how to calculate it for a simple linear regression model.

Specifically, you learned:

  • That a prediction interval quantifies the uncertainty of a single point prediction.
  • That prediction intervals can be estimated analytically for simple models but are more challenging for nonlinear machine learning models.
  • How to calculate the prediction interval for a simple linear regression model.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Get a Handle on Statistics for Machine Learning!

Statistical Methods for Machine Learning

Develop a working understanding of statistics

…by writing lines of code in python

Discover how in my new Ebook:
Statistical Methods for Machine Learning

It provides self-study tutorials on topics like:
Hypothesis Tests, Correlation, Nonparametric Stats, Resampling, and much more…

Discover how to Transform Data into Knowledge

Skip the Academics. Just Results.

Click to learn more.

18 Responses to Prediction Intervals for Machine Learning

  1. Dorn May 31, 2018 at 3:56 am #

    Hi
    any advice on how to draw prediction intervals for time series in Keras?

  2. Vladislav Gladkikh June 8, 2018 at 11:20 pm #

    Inspired by your suggestions for extending your tutorial that I may wish to explore, I wrote some examples of calculating confidence and prediction intervals for linear and non-linear regression models using bootstrap: https://vladgladkikh.wordpress.com/2018/06/08/confidence-and-prediction-intervals-using-the-bootstrap/

  3. Anthony The Koala June 18, 2018 at 2:23 am #

    Dear Dr Jason,
    I have mentioned elsewhere on your blog the difference between pyplot.plot(x,y) and pyplot.scatter(x,y).

    This is another example to be careful when performing a scatter plot and a plot of x and y described above].

    Lesson be careful when using pyplot.plot(x,y) and pyplot.scatter(x,y) especially when the data is not in time order. I have described this elsewhere on this blog.

    Regards
    Anthony of Belfield

  4. Novin July 5, 2018 at 1:01 am #

    Thank you for the post.

    I ran the code and tried to draw the interval for every data point. Then I realized that the stdev for each data point is same and hence the interval is same for every one of them. There are a few points that I would like to ask you to please elaborate on:

    1) naturally we expect higher confidence and subsequently tighter prediction intervals for the regions where there is more density of the training example, so how could this be taken into account?

    2) probably the latter relates to the difference between methods for deriving the PI, i.e. frequentist vs Bayesian methods?

    3) It feels like the stdev used in the calculation of the interval is meant to be the standard error. was it?

    • Jason Brownlee July 5, 2018 at 7:55 am #

      I believe the calculation is correct, you can confirm this independently based on a description on wikipedia:
      https://en.wikipedia.org/wiki/Prediction_interval

      And in the “forecasting principles”:
      https://otexts.org/fpp2/prediction-intervals.html

      Although, there are variations on the calculation that you may wish to explore.

      Indeed, the standard deviation is supposed to capture a description of density, but it is crude. I would go do for a nonparametric percentile based approach myself in practice as I rarely used linear regression on challenging problems.

      • Ricky Ho July 6, 2018 at 1:10 am #

        Hi Jason,

        Thanks for the great article. It illustrates a very important concept.
        Can this method be applied to all types of prediction models and not just for linear regression ? In other words, can we plug in any blackbox prediction model and this method of estimating prediction interval will still work ?

        However, my intuition is the confidence interval of output yhat should be dependent on the value of input x. Since the density of x in the training data set is different, at least if the blackbox model is KNN, it should be more confident in its estimation when the input x is in a high density region.

        I am thinking instead of calculating the stdev using the whole training data set e(i), should we use only a subset of e(j) where j is the neighborhood of the input x. In other words, we extract the neighborhood of input x from our testing data set and use their local stderr to estimate the confidence bound. Does it make more sense ?

        • Jason Brownlee July 6, 2018 at 6:43 am #

          Perhaps, if the distribution of the output variable is Gaussian.

  5. Jounlin August 23, 2018 at 5:31 am #

    Dear Julian,
    Thank you for the useful link you provided regarding comparing regression models. Please, I want to make a day ahead prediction, do you have a tutorial in that?

    Thak you so much for your efforts.
    KJ

  6. Simon October 4, 2018 at 10:43 pm #

    Hi!
    Thanks for the great tutorial.
    Do you know if there exists any open source implementations of the four methods you mentioned for constructing PIs for non-linear NN models?

    (The Delta Method, from the field of nonlinear regression.
    The Bayesian Method, from Bayesian modeling and statistics.
    The Mean-Variance Estimation Method, using estimated statistics.
    The Bootstrap Method, using data resampling and developing an ensemble of models.)

    Best regards

    • Jason Brownlee October 5, 2018 at 5:37 am #

      I have many examples of the bootstrap method on the blog that you can adapt.

  7. Sayon Bhattacharjee October 12, 2018 at 8:23 am #

    Hi,

    Thank you for the tutorial. In the beginning, you state that the prediction interval is the upper prediction limit – lower prediction limit. However, at the end you get a 95% prediction interval of 20 and then associate this with a predicted value ± 20, giving a total interval of 40. Does the calculated prediction interval give you the spread both ways or just one way?

    • Jason Brownlee October 12, 2018 at 11:17 am #

      Generally, the interval is centered around the point forecast (e.g. both ways).

      • Sayon Bhattacharjee October 13, 2018 at 5:13 am #

        so why was that not the case in your example? You received the interval only one way and had to multiply by two to get the total prediction interval.

Leave a Reply