Understand Time Series Forecast Uncertainty Using Confidence Intervals with Python

Time series forecast models can both make predictions and provide a confidence interval for those predictions.

Confidence intervals provide an upper and lower expectation for the real observation. These can be useful for assessing the range of real possible outcomes for a prediction and for better understanding the skill of the model

In this tutorial, you will discover how to calculate and interpret confidence intervals for time series forecasts with Python.

Specifically, you will learn:

  • How to make a forecast with an ARIMA model and gather forecast diagnostic information.
  • How to interpret a confidence interval for a forecast and configure different intervals.
  • How to plot the confidence interval in the context of recent observations.

Let’s dive in.

Understand Time Series Forecast Uncertainty Using Confidence Intervals with Python

Understand Time Series Forecast Uncertainty Using Confidence Intervals with Python
Photo by Bernard Spragg. NZ, some rights reserved.

ARIMA Forecast

The ARIMA implementation in the statsmodels Python library can be used to fit an ARIMA model.

It returns an ARIMAResults object. This object provides the forecast() function that can be used to make predictions about future time steps and default to predicting the value at the next time step after the end of the training data.

Assuming we are predicting just the next time step, the forecast() method returns three values:

  • Forecast. The forecasted value in the units of the training time series.
  • Standard error. The standard error for the model.
  • Confidence interval. The 95% confidence interval for the forecast.

In this tutorial, we will better understand the confidence interval provided with an ARIMA forecast.

Before we dive in, let’s first look at the Daily Female Births dataset that we will use as the context for this tutorial.

Stop learning Time Series Forecasting the slow way!

Take my free 7-day email course and discover how to get started (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

Daily Female Births Dataset

This dataset describes the number of daily female births in California in 1959.

The units are a count and there are 365 observations. The source of the dataset is credited to Newton (1988).

You can learn more and download the dataset from the Data Market website.

Download the dataset and save it in your current working directory with the filename “daily-total-female-births.csv“.

The example below loads and graphs the dataset.

Running the example loads the dataset and graphs it as a line plot.

Daily Female Births Dataset

Daily Female Births Dataset

Forecast Confidence Interval

In this section, we will train an ARIMA model, use it to make a prediction, and inspect the confidence interval.

First, we will split the training dataset into a training and test dataset. Almost all observations will be used for training and we will hold back the last single observation as a test dataset for which we will make a prediction.

An ARIMA(5,1,1) model is trained. This is not the optimal model for this problem, just a good model for demonstration purposes.

The trained model is then used to make a prediction by calling the forecast() function. The results of the forecast are then printed.

The complete example is listed below.

Running the example prints the expected value from the test set followed by the predicted value, standard error, and confidence interval for the forecast.

Interpreting the Confidence Interval

The forecast() function allows the confidence interval to be specified.

The alpha argument on the forecast() function specifies the confidence level. It is set by default to alpha=0.05, which is a 95% confidence interval. This is a sensible and widely used confidence interval.

An alpha of 0.05 means that the ARIMA model will estimate the upper and lower values around the forecast where there is a only a 5% chance that the real value will not be in that range.

Put another way, the 95% confidence interval suggests that there is a high likelihood that the real observation will be within the range.

In the above example, the forecast was 45.878. The 95% confidence interval suggested that the real observation was highly likely to fall within the range of values between 32.167 and 59.590.

The real observation was 50.0 and was well within this range.

We can tighten the range of likely values a few ways:

  • We can ask for a range that is narrower but increases the statistical likelihood of a real observation falling outside of the range.
  • We can develop a model that has more predictive power and in turn makes more accurate predictions.

Further, the confidence interval is also limited by the assumptions made by the model, such as the distribution of errors made by the model fit a Gaussian distribution with a zero mean value (e.g. white noise).

Extending the example above, we can report our forecast with a few different commonly used confidence intervals of 80%, 90%, 95% and 99%.

The complete example is listed below.

Running the example prints the forecasts and confidence intervals for each alpha value.

We can see that we get the same forecast value each time and an interval that expands as our desire for a ‘safer’ interval increases. We can see that an 80% captures our actual value just fine in this specific case.

Plotting the Confidence Interval

The confidence interval can be plotted directly.

The ARIMAResults object provides the plot_predict() function that can be used to make a forecast and plot the results showing recent observations, the forecast, and confidence interval.

As with the forecast() function, the confidence interval can be configured by specifying the alpha argument. The default is 0.05 (95% confidence), which is a sensible default.

The example below shows the same forecast from above plotted using this function.

The plot_predict() will plot the observed y values if the prediction interval covers the training data.

In this case, we predict the previous 10 days and the next 1 day. This is useful to see the prediction carry on from in sample to out of sample time indexes (blue). This is contracted with the actual observations from the last 10 days (green).

Finally, we can see the confidence interval as a gray cone around the predicted value. This is useful to get a spatial feeling for the range of possible values that an observation in the next time step may take.

Plot Confidence Interval

Plot Confidence Interval

Summary

In this tutorial, you discovered how to calculate and interpret the confidence interval for a time series forecast with Python.

Specifically, you learned:

  • How to report forecast diagnostic statistics when making a point forecast.
  • How to interpret and configure the confidence interval for a time series forecast.
  • How to plot a forecast and confidence interval in the context of recent observations.

Do you have any questions about forecast confidence intervals, or about this tutorial?
Ask your questions in the comments below and I will do my best to answer.

Want to Develop Time Series Forecasts with Python?

Introduction to Time Series Forecasting With Python

Develop Your Own Forecasts in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Introduction to Time Series Forecasting With Python

It covers self-study tutorials and end-to-end projects on topics like:
Loading data, visualization, modeling, algorithm tuning, and much more...

Finally Bring Time Series Forecasting to
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.

18 Responses to Understand Time Series Forecast Uncertainty Using Confidence Intervals with Python

  1. Luis June 1, 2017 at 5:29 pm #

    Hi, Jason.

    your contents are great! Only one thing:

    are you talking about “confidence intervals” or about “prediction intervals”? Some books mix the terms, but Time Series experts like Hyndman do an interesting differentiation:

    https://robjhyndman.com/hyndsight/intervals/

    Thanks

  2. Anupama Shroff June 16, 2017 at 5:07 am #

    Hi Jason.
    What if we get a negative lower Confidence Interval while forecasting prices?? Since prices cannot be negative, is there a way to correct for this?

    Thanks!

    • Jason Brownlee June 16, 2017 at 8:07 am #

      Great question. I’m not sure of good theory for this off the cuff. In practice, you can impose a hard limit on the interval to fit in your domain, for example:

  3. TD September 27, 2017 at 10:30 pm #

    Do you have additional code somewhere showing how to make predictions and plot the confidence interval for the entire time series using this model?
    Thanks!
    T

    • Jason Brownlee September 28, 2017 at 5:26 am #

      The post above does make a prediction and show the confidence interval.

      You can use it as a template for your own dataset.

  4. neha January 5, 2018 at 7:06 am #

    how can i get the confidence intervals for SARIMAX where i performed the grid search to get optimum parameters also keeping into consideration that i have a exogenous variable like Holiday effect?

    • Jason Brownlee January 5, 2018 at 11:36 am #

      Perhaps re-fit and analyze the best performing model as a standalone exercise after you have chosen its configuration.

      I expect the approach in the above post would work for SARIMAX.

  5. Alireza Manashty January 10, 2018 at 1:35 am #

    Great article. Thank you for all your posts. I learn a lot from them.

    I understand that time series forecasting is for when we are forecasting the same variables in the future.
    X1, X2, X3->X4, X5, X6
    What if the nature of forecasting is not the same variables.
    X1, X2, X3->Y1, Y2, Y3.

    If we forecast one step, we will get something like:
    X1, X2, X3, Y1-?>Y2, Y3
    and we can not use the model further in this case for further predictions.

    Do you know what is the technique used for such forecasting called?

    Should we use a direct mapping between X->Y using LSTM or there is some other techniques available? I am looking for the definition of this problem and the techniques already developed. Thanks!

  6. Mat Jod April 25, 2018 at 11:30 pm #

    Hi Jason,
    Thank you very much for the article!
    I have one question regarding to log transformation vs prediction intervals.
    In case that we first log transformed our data to improve prediction power, when we’re getting back with our values to regular space we’re putting the log transformation on prediction intervals as well. But putting exp function on logged data for little differences in log space makes huge differences in regular space and making upper bounds gigantic in many cases.
    So here is my question- are there any common methodologies how to avoid this side effect?
    All best,
    Mateusz

    • Jason Brownlee April 26, 2018 at 6:33 am #

      The prediction and interval assume a gaussian distribution that will balloon when made exponential was you comment.

      Ouch. Good question.

      I don’t know any good strategies off hand, sorry. Let me know if you discover anything.

  7. franky October 29, 2018 at 5:20 pm #

    Dear Jason: Thanks for the article. This is a great one. One question about the plot_predict call. As I read the code in “Plotting the Confidence Interval,” the test data is actually not used for ploting. Do you think if we should remove the test part from the code?

  8. Sanchit November 10, 2018 at 4:12 am #

    Hi Jason,

    Any idea, how to compute the confidence intervals using the LSTM model for timeseries forecasting ?

    Thanks
    Sanchit

  9. Soph L April 17, 2019 at 3:53 pm #

    Thanks for the great article Jason.

    I am interested in calculating the sum of all the predicted values from multiple time steps, which are output of an ARIMA or SARIMA model. If I can estimate the standard error for each predicted value using the method you described, do you how can I calculate the standard error for the sum of these predicted values?

    Your help will be much appreciated.

    • Jason Brownlee April 18, 2019 at 8:20 am #

      This will require custom code, should be fun for you.

      What problem are you having exactly?

Leave a Reply