[New Book] Click to get The Beginner's Guide to Data Science!
Use the offer code 20offearlybird to get 20% off. Hurry, sale ends soon!

Time Series Forecasting Performance Measures With Python

Time series prediction performance measures provide a summary of the skill and capability of the forecast model that made the predictions.

There are many different performance measures to choose from. It can be confusing to know which measure to use and how to interpret the results.

In this tutorial, you will discover performance measures for evaluating time series forecasts with Python.

Time series generally focus on the prediction of real values, called regression problems. Therefore the performance measures in this tutorial will focus on methods for evaluating real-valued predictions.

After completing this tutorial, you will know:

  • Basic measures of forecast performance, including residual forecast error and forecast bias.
  • Time series forecast error calculations that have the same units as the expected outcomes such as mean absolute error.
  • Widely used error calculations that punish large errors, such as mean squared error and root mean squared error.

Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Jun/2019: Fixed typo in forecast bias (thanks Francisco).
Time Series Forecasting Performance Measures With Python

Time Series Forecasting Performance Measures With Python
Photo by Tom Hall, some rights reserved.

Forecast Error (or Residual Forecast Error)

The forecast error is calculated as the expected value minus the predicted value.

This is called the residual error of the prediction.

The forecast error can be calculated for each prediction, providing a time series of forecast errors.

The example below demonstrates how the forecast error can be calculated for a series of 5 predictions compared to 5 expected values. The example was contrived for demonstration purposes.

Running the example calculates the forecast error for each of the 5 predictions. The list of forecast errors is then printed.

The units of the forecast error are the same as the units of the prediction. A forecast error of zero indicates no error, or perfect skill for that forecast.

Stop learning Time Series Forecasting the slow way!

Take my free 7-day email course and discover how to get started (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Mean Forecast Error (or Forecast Bias)

Mean forecast error is calculated as the average of the forecast error values.

Forecast errors can be positive and negative. This means that when the average of these values is calculated, an ideal mean forecast error would be zero.

A mean forecast error value other than zero suggests a tendency of the model to over forecast (negative error) or under forecast (positive error). As such, the mean forecast error is also called the forecast bias.

The forecast error can be calculated directly as the mean of the forecast values. The example below demonstrates how the mean of the forecast errors can be calculated manually.

Running the example prints the mean forecast error, also known as the forecast bias.

In this case the result is negative, meaning that we have over forecast.

The units of the forecast bias are the same as the units of the predictions. A forecast bias of zero, or a very small number near zero, shows an unbiased model.

Mean Absolute Error

The mean absolute error, or MAE, is calculated as the average of the forecast error values, where all of the forecast error values are forced to be positive.

Forcing values to be positive is called making them absolute. This is signified by the absolute function abs() or shown mathematically as two pipe characters around the value: |value|.

Where abs() makes values positive, forecast_error is one or a sequence of forecast errors, and mean() calculates the average value.

We can use the mean_absolute_error() function from the scikit-learn library to calculate the mean absolute error for a list of predictions. The example below demonstrates this function.

Running the example calculates and prints the mean absolute error for a list of 5 expected and predicted values.

These error values are in the original units of the predicted values. A mean absolute error of zero indicates no error.

Mean Squared Error

The mean squared error, or MSE, is calculated as the average of the squared forecast error values. Squaring the forecast error values forces them to be positive; it also has the effect of putting more weight on large errors.

Very large or outlier forecast errors are squared, which in turn has the effect of dragging the mean of the squared forecast errors out resulting in a larger mean squared error score. In effect, the score gives worse performance to those models that make large wrong forecasts.

We can use the mean_squared_error() function from scikit-learn to calculate the mean squared error for a list of predictions. The example below demonstrates this function.

Running the example calculates and prints the mean squared error for a list of expected and predicted values.

The error values are in squared units of the predicted values. A mean squared error of zero indicates perfect skill, or no error.

Root Mean Squared Error

The mean squared error described above is in the squared units of the predictions.

It can be transformed back into the original units of the predictions by taking the square root of the mean squared error score. This is called the root mean squared error, or RMSE.

This can be calculated by using the sqrt() math function on the mean squared error calculated using the mean_squared_error() scikit-learn function.

Running the example calculates the root mean squared error.

The RMES error values are in the same units as the predictions. As with the mean squared error, an RMSE of zero indicates no error.

Further Reading

Below are some references for further reading on time series forecast error measures.

Summary

In this tutorial, you discovered a suite of 5 standard time series performance measures in Python.

Specifically, you learned:

  • How to calculate forecast residual error and how to estimate the bias in a list of forecasts.
  • How to calculate mean absolute forecast error to describe error in the same units as the predictions.
  • How to calculate the widely used mean squared error and root mean squared error for forecasts.

Do you have any questions about time series forecast performance measures, or about this tutorial
Ask your questions in the comments below and I will do my best to answer.

Want to Develop Time Series Forecasts with Python?

Introduction to Time Series Forecasting With Python

Develop Your Own Forecasts in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Introduction to Time Series Forecasting With Python

It covers self-study tutorials and end-to-end projects on topics like: Loading data, visualization, modeling, algorithm tuning, and much more...

Finally Bring Time Series Forecasting to
Your Own Projects

Skip the Academics. Just Results.

See What's Inside

67 Responses to Time Series Forecasting Performance Measures With Python

  1. Avatar
    Peter Marelas February 1, 2017 at 2:24 pm #

    I’ve seen MAPE used a few times to evaluate our forecasting models. Do you see this used often and when would you use one over the other?

    • Avatar
      Jason Brownlee February 2, 2017 at 1:55 pm #

      Hi Peter, MAPE is a good metric and I do see it used.

      I prefer RMSE myself.

      • Avatar
        Divya December 8, 2020 at 1:29 am #

        I have 9.69 rmse value from arima model how do i reduced it?

        • Avatar
          Jason Brownlee December 8, 2020 at 7:44 am #

          Try alternate model configurations?
          Try alternate models?
          Try alternate data preparations?

  2. Avatar
    Ian February 3, 2017 at 3:21 am #

    First line of code of Forecast Error should be forecast_error = expected_value “-” predicted_value.
    I believe this is a typo.

  3. Avatar
    Jasem February 6, 2017 at 9:39 pm #

    Dr.Jason,

    Can you provide us simple way to split the data with 10 fold cross validation for train and test set with large csv file. Then apply different algorithms to train model after that test model to check how does model accurate. we also want to see ROC curve to combine different algorithms

    my second question does ROC curve show precision of model?? can you show me a mathematical formula for ROC curve?

  4. Avatar
    Devakar Kumar Verma August 8, 2017 at 6:44 pm #

    What should be range of values for all different measures of performance for a acceptable model.

    • Avatar
      Jason Brownlee August 9, 2017 at 6:25 am #

      Good question, it really depends on your problem and the units of your variable.

      • Avatar
        Devakar Kumar Verma August 9, 2017 at 2:14 pm #

        Suppose, variable values range from 0-100, then what will be range?

        • Avatar
          Jason Brownlee August 10, 2017 at 6:48 am #

          If you have accuracy scores between 0 and 100, maybe 60% is good because the problem is hard, maybe 98% is good because the problem is easy.

          I cannot answer this question generically, sorry.

          A good way to figure out if a model is skillful is to compare it to a lot of other models or against a solid base line model (e.g. relative measure of good).

  5. Avatar
    Irati October 10, 2017 at 8:43 pm #

    Hi Jason,

    And what about if we perform multivariate time series forecasting?
    Imagine we forecast 3 time series with the same model, how would you provide the results? per time series? the mean of the errors ?

    Thanks for your time 🙂

  6. Avatar
    Carlos May 19, 2018 at 1:45 am #

    Good evening, one question , if i want to get max error, how could it be?

  7. Avatar
    Kate August 15, 2018 at 11:33 pm #

    Hi, thanks for the post. If I understand correctly, the method mentioned here is useful for correcting predictions if the ground truths of the test examples are readily available and are included in the correction process. I was wondering if there are similar approaches for situations where there is a noticeable trend for residuals in your training/testing data, and I’d like to create a model utilizing these trends in an environment where ground truths for new examples are not available?

    • Avatar
      Jason Brownlee August 16, 2018 at 6:06 am #

      ARIMA and ETS models can handle the trend in your data.

  8. Avatar
    Parth Gadoya October 31, 2018 at 6:32 pm #

    Hi Sir,

    I am forecasting sales for each product on each retail store. I want the accuracy of more than 70% on 85% of store-product combination. So, I am calculating Absolute Percentage Error for each forecast. But I have lots of zeros and I am unable to evaluate the model completely.

    According to my internet search, I found that Mean Absolute Scaled Error is a perfect measure for sales forecasting. But I didn’t found any concrete explanations on how to use and calculate it. As I am working with multiple stores and multiple products, I have multiple time series in dataset. I have all the predictions but don’t know how to evaluate?

    Please give some details on how to do this and calculate MASE for multiple time series.

    Thank you very much in advance.

    • Avatar
      Jason Brownlee November 1, 2018 at 6:04 am #

      Sorry, I don’t have material on MASE.

      Perhaps search on scholar.google.com for examples?

      • Avatar
        Parth Gadoya November 2, 2018 at 5:49 pm #

        Thanks for the suggestion.

    • Avatar
      Carla December 7, 2018 at 1:58 am #

      Hi,
      in “Statistical ans Machine Learning Forecasting Methods: Concerns and ways forward” by Spyros, Makridakis, they used this Code for sMAPE. Add this as def and use it in the same way as you use mse. I assume it should work.

  9. Avatar
    bobby November 7, 2018 at 9:03 am #

    Hi,

    Do you know any error metrics that punish longer lasting errors in time series more than large magnitude errors?

    thanks,
    bobby

  10. Avatar
    Daniël Muysken December 4, 2018 at 12:06 am #

    Hey, I was wondering if you know of an error measure that is not so sensitive to outliers? I have some high peaks in my timeseries that are difficult to predict and I want this error to not carry to much weight, when evaluating my prediction.

  11. Avatar
    Atharva February 4, 2019 at 5:50 pm #

    hey, can u tell me that how can i know the accuracy of my model from rmse value

  12. Avatar
    Abid Mehmood February 20, 2019 at 8:53 pm #

    How to know which error(RMSE,MSE, MAE) can we use in our time series predictions?

    • Avatar
      Jason Brownlee February 21, 2019 at 7:55 am #

      You can talk to project stakeholders and discover what they would like to know about the performance of a model on the problem – then choose a metric accordingly.

      If unsure, use RMSE as the units will be in the scale of the target variable and it’s easy to understand.

  13. Avatar
    Dav June 8, 2019 at 2:59 am #

    Hi

    Once again, great articles and sorry, I just asked you a question on another topic as well.

    Tracking Error = Standard deviation of difference between Actual and Predicted values

    I am thinking about using Tracking Error to measure Time Series Forecasting Performance. Any reason I shouldn’t use it?

    Thanks
    Dav

  14. Avatar
    Francisco June 28, 2019 at 5:59 pm #

    Hi Jason,

    I’m confused with the Forecast bias: “A mean forecast error value other than zero suggests a tendency of the model to over forecast (positive error) or under forecast (negative error)”

    actual – prediction > 0 if prediction is below, and it’d understand that’s under forecast, but in your example the bias is negative and the prediction is above:

    expected = [0.0, 0.5, 0.0, 0.5, 0.0]
    predictions = [0.2, 0.4, 0.1, 0.6, 0.2]

    Is there a mistake somewhere or maybe I’m missing or not understanding something?

    Thanks a lot

    • Avatar
      Jason Brownlee June 29, 2019 at 6:45 am #

      Yes, I have it the wrong way around, thanks.

      Negative is over forecast, positive is under forecast.

      Fixed.

  15. Avatar
    duderino July 20, 2019 at 5:52 am #

    I really enjoyed reading your post, thank you for this. one question if I may:

    let’s say we are working with a dataset when you are forecasting population growth (number of people) and your dataset’s most recent value shows roughly 37mil population.

    Assuming we do all of the forecasting and calculations correctly, and I (we) are currently sitting at

    Mean Absolute Error: 52,386
    Mean Squared Error: 3,650,276,091
    Root Mean Squared Error: 60,417
    (and just for fun) Mean Absolute Percentage Error: 0.038

    How does one interpret these numbers when working with a dataset of this scale? I’ve read that “closer to zero is best” but I feel like the size of my dataset means that 60,417 is actually a pretty good number, but I’m not sure.

    (not sure if this is enough data to go off of or not)

  16. Avatar
    Wimarsha May 9, 2020 at 2:37 am #

    are that matrix can be used for the ARIMA model and LSTM? If yes, Is it the same as your example describes?

    • Avatar
      Jason Brownlee May 9, 2020 at 6:19 am #

      Sorry, I don’t understand.

      Perhaps you can elaborate or rephrase the question? What do you mean by “matrix for ARIMA”?

  17. Avatar
    Narayanan May 19, 2020 at 5:03 am #

    Hi Jason,

    Thanks for your great article. Can you please help me to this scenario.

    My Actual and Predicted is having more 0’s. Which metric is more suitable to measure the forecast accuracy percentage. My end users are looking at accuracy as a percentage format.

    Actual -> 0,1,1,4,1,1,0

    Predicted-> 1,0,0,2,1,1,0

    • Avatar
      Jason Brownlee May 19, 2020 at 6:11 am #

      You’re welcome!

      Perhaps explore MAE and RMSE and even others and pick one that best captures the goals of your project.

  18. Avatar
    Joel September 6, 2020 at 9:15 pm #

    Is there any difference between squared loss and mean squared error. For more reference – Page 6 of this research paper https://arxiv.org/pdf/1511.05942.pdf

    • Avatar
      Jason Brownlee September 7, 2020 at 8:30 am #

      Same thing I would expect. I have not checked your paper, sorry.

  19. Avatar
    Saumil Shah September 10, 2020 at 12:24 am #

    Hello Jason, Great fan of your work.

    Suggesting a correction, Under MAE, 2nd Line Should it be “forecast error values” in place of “forecast values”.

  20. Avatar
    MED September 16, 2020 at 8:35 am #

    Hi Jason Thank you for this wonderfull article/tutorial,
    I am trying to make a forecast by using 4 years of daily data which is about grocery sales.
    I prepared several models to forecast.. One of the interesting result that I came occur is that my SARIMA model beats out RandomForest and other tree models in terms of MAPE but in the case of the RMSE Random forests and other tree machine learning models are more desirable. I am confused to make a clear judgement about this issue. Do you have any idea why it is the case ?

    • Avatar
      Jason Brownlee September 16, 2020 at 12:15 pm #

      Choose one metric for model selection, then choose a model that does well on that one metric.

  21. Avatar
    VLADIMIR KIM September 22, 2020 at 10:02 pm #

    Hello, Jason! Your books and articles are the only solution of my problem, but I also have a question, how can we measure the performance of multi step model of, let’s say, 3 days? for instance the RMSE = [2, 4, 5], can we take average RMSE of these three? And second can we measure the Coefficient of determination in time series data? Is it valid measure metric?

    • Avatar
      Jason Brownlee September 23, 2020 at 6:39 am #

      Thanks.

      Yes, you can calculate the error for each forecasted lead time separately if you like.

      I think I have examples of this is power forecasting tutorials:
      https://machinelearningmastery.com/?s=power+forecasting&post_type=post&submit=Search

      • Avatar
        VLADIMIR KIM September 23, 2020 at 9:49 am #

        Dear Jason, Thank you very much for you response. and another question is can I calculate an average RMSE, MAE of these three? is it a valid measure metric? and what about coefficient of determination (R-squared)? is it valid metric for time series data?

        • Avatar
          Jason Brownlee September 23, 2020 at 1:43 pm #

          Sure, although I recommend selecting one metric to optimize for your project – because sometimes they will disagree.

          • Avatar
            VLADIMIR KIM September 23, 2020 at 2:05 pm #

            Thank you very much for your response and time. Have a nice day!

          • Avatar
            Jason Brownlee September 24, 2020 at 6:08 am #

            You’re welcome.

  22. Avatar
    yeshwanth June 16, 2021 at 4:57 am #

    hey dont know if you are still replying but how can i find the standardized accuracy using MAR and MARp that is the MAR of large number of random guessing

  23. Avatar
    Suvi June 20, 2021 at 9:18 am #

    Hello Jason,
    Thanks for putting this together. What are your thoughts on using the weighted RMSE metric?

    Regards,
    S

    • Avatar
      Jason Brownlee June 21, 2021 at 5:35 am #

      No strong opinions. I recommend carefully selecting a metric that best captures the goals of your project.

  24. Avatar
    Joseph Manoj August 2, 2021 at 4:36 pm #

    Hi Jason,
    I forecast the next 15/30 days of session count. Is there any technique to find the accuracy of the forecasted values.
    FYI: I do not have actual values to compare.

  25. Avatar
    Kay98 December 26, 2021 at 12:37 am #

    Hi Jason.
    I’m working on a model where it is better to predict less than more and it is important that big errors are penalized. The problem is that rmse penalizes big errors but dods not care about predicting more or less while MAE may tend towards a negative bias but does not penalize big errors.

    My question is would it be wise to separate the predictions as follows :
    – if an error e(t) is cosidered too big ( the difference is with the true value is bigger than a predetermined percentage say 40% ) then it is squared like the rmse
    – if a prediction is over the demand then the error is multiplied by a factor bigger than one say 2 )
    – for other ” normal predictions we go forward with mae

    Then we have three average errors from which we make a final average

    I would appreciate your opinion.
    Thank you.

  26. Avatar
    Samin Payro April 28, 2022 at 8:12 pm #

    Dear Jason,

    If I use the whole time series data for training, is the training error (using any of the error metrics in your blog) a good indicator of model accuracy?
    The situation is that, I’ve done experiments for time series forecasting using Auto Arima, and I evaluated the model by Splitting the dataset to train and test. But, now that my model is going to be used in practice, I input the whole data to the model for training not to loose any information. I still need to display an indicator of the accuracy of my model to show how much the forecasts of my model could be reliable. So, I’m wondering if the training error could be considered here as accuracy metric of the model in case no test set is considered.

    Best,
    Samin

  27. Avatar
    Syamini June 1, 2022 at 12:05 am #

    Hi,

    In your example, rmse = 0.1483. So how can we interpret that?

  28. Avatar
    ankit October 18, 2022 at 1:07 pm #

    MSE is calculated over actual dataset or normalized one?

    • Avatar
      James Carmichael October 19, 2022 at 6:51 am #

      Hi Ankit…Either is fine because it is used a relative comparison metric.

Leave a Reply