How to Develop a Skillful Machine Learning Time Series Forecasting Model

You are handed data and told to develop a forecast model.

What do you do?

This is a common situation; far more common than most people think.

  • Perhaps you are sent a CSV file.
  • Perhaps you are given access to a database.
  • Perhaps you are starting a competition.

The problem can be reasonably well defined:

  • You have or can access historical time series data.
  • You know or can find out what needs to be forecasted.
  • You know or can find out how what is most important in evaluating a candidate model.

So how do you tackle this problem?

Unless you have been through this trial by fire, you may struggle.

  • You may struggle because you are new to the fields of machine learning and time series.
  • You may struggle even if you have machine learning experience because time series data is different.
  • You may struggle even if you have a background in time series forecasting because machine learning methods may outperform the classical approaches on your data.

In all of these cases, you will benefit from working through the problem carefully and systematically.

In this post, I want to give you a specific and actionable procedure that you can use to work through your time series forecasting problem.

Kick-start your project with my new book Deep Learning for Time Series Forecasting, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

How to Develop a Skilful Time Series Forecasting Model

How to Develop a Skilful Time Series Forecasting Model
Photo by Make it Kenya, some rights reserved.

Process Overview

The goal of this process is to get a “good enough” forecast model as fast as possible.

This process may or may not deliver the best possible model, but it will deliver a good model: a model that is better than a baseline prediction, if such a model exists.

Typically, this process will deliver a model that is 80% to 90% of what can be achieved on the problem.

The process is fast. As such, it focuses on automation. Hyperparameters are searched rather than specified based on careful analysis. You are encouraged to test suites of models in parallel, rapidly getting an idea of what works and what doesn’t.

Nevertheless, the process is flexible, allowing you to circle back or go as deep as you like on a given step if you have the time and resources.

This process is divided into four parts; they are:

  1. Define Problem
  2. Design Test Harness
  3. Test Models
  4. Finalize Model

You will notice that the process is different from a classical linear work-through of a predictive modeling problem. This is because it is designed to get a working forecast model fast and then slow down and see if you can get a better model.

What is your process for working through a new time series forecasting problem?
Share it below in the comments.

Need help with Deep Learning for Time Series?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

How to Use This Process

The biggest mistake is skipping steps.

For example, the mistake that almost all beginners make is going straight to modeling without a strong idea of what problem is being solved or how to robustly evaluate candidate solutions. This almost always results in a lot of wasted time.

Slow down, follow the process, and complete each step.

I recommend having separate code for each experiment that can be re-run at any time.

This is important so that you can circle back when you discover a bug, fix the code, and re-run an experiment. You are running experiments and iterating quickly, but if you are sloppy, then you cannot trust any of your results. This is especially important when it comes to the design of your test harness for evaluating candidate models.

Let’s take a closer look at each step of the process.

1. Define Problem

Define your time series problem.

Some topics to consider and motivating questions within each topic are as follows:

  1. Inputs vs. Outputs
    1. What are the inputs and outputs for a forecast?
  2. Endogenous vs. Exogenous
    1. What are the endogenous and exogenous variables?
  3. Unstructured vs. Structured
    1. Are the time series variables unstructured or structured?
  4. Regression vs. Classification
    1. Are you working on a regression or classification predictive modeling problem?
    2. What are some alternate ways to frame your time series forecasting problem?
  5. Univariate vs. Multivariate
    1. Are you working on a univariate or multivariate time series problem?
  6. Single-step vs. Multi-step
    1. Do you require a single-step or a multi-step forecast?
  7. Static vs. Dynamic
    1. Do you require a static or a dynamically updated model?

Answer each question even if you have to estimate or guess.

Some useful tools to help get answers include:

  • Data visualizations (e.g. line plots, etc.).
  • Statistical analysis (e.g. ACF/PACF plots, etc.).
  • Domain experts.
  • Project stakeholders.

Update your answers to these questions as you learn more.

2. Design Test Harness

Design a test harness that you can use to evaluate candidate models.

This includes both the method used to estimate model skill and the metric used to evaluate predictions.

Below is a common time series forecasting model evaluation scheme if you are looking for ideas:

  1. Split the dataset into a train and test set.
  2. Fit a candidate approach on the training dataset.
  3. Make predictions on the test set directly or using walk-forward validation.
  4. Calculate a metric that compares the predictions to the expected values.

The test harness must be robust and you must have complete trust in the results it provides.

An important consideration is to ensure that any coefficients used for data preparation are estimated from the training dataset only and then applied on the test set. This might include mean and standard deviation in the case of data standardization.

3. Test Models

Test many models using your test harness.

I recommend carefully designing experiments to test a suite of configurations for standard models and letting them run. Each experiment can record results to a file, to allow you to quickly discover the top three to five most skilful configurations from each run.

Some common classes of methods that you can design experiments around include the following:

  • Baseline.
    • Persistence (grid search the lag observation that is persisted)
    • Rolling moving average.
  • Autoregression.
    • ARMA for stationary data.
    • ARIMA for data with a trend.
    • SARIMA for data with seasonality.
  • Exponential Smoothing.
    • Simple Smoothing
    • Holt Winters Smoothing
  • Linear Machine Learning.
    • Linear Regression
    • Ridge Regression
    • Lasso Regression
    • Elastic Net Regression
    • ….
  • Nonlinear Machine Learning.
    • k-Nearest Neighbors
    • Classification and Regression Trees
    • Support Vector Regression
  • Ensemble Machine Learning.
    • Bagging
    • Boosting
    • Random Forest
    • Gradient Boosting
  • Deep Learning.
    • MLP
    • CNN
    • LSTM
    • Hybrids

This list is based on a univariate time series forecasting problem, but you can adapt it for the specifics of your problem, e.g. use VAR/VARMA/etc. in the case of multivariate time series forecasting.

Slot in more of your favorite classical time series forecasting methods and machine learning methods as you see fit.

Order here is important and is structured in increasing complexity from classical to modern methods. Early approaches are simple and give good results fast; later approaches are slower and more complex, but also have a higher bar to clear to be skillful.

The resulting model skill can be used in a ratchet. For example, the skill of the best persistence configuration provide a baseline skill that all other models must outperform. If an autoregression model does better than persistence, it becomes the new level to outperform in order for a method to be considered skilful.

Ideally, you want to exhaust each level before moving on to the next. E.g. get the most out of Autoregression methods and use the results as a new baseline to define “skilful” before moving on to Exponential Smoothing methods.

I put deep learning at the end as generally neural networks are poor at time series forecasting, but there is still a lot of room for improvement and experimentation in this area.

The more time and resources that you have, the more configurations that you can evaluate.

For example, with more time and resources, you could:

  • Search model configurations at a finer resolution around a configuration known to already perform well.
  • Search more model hyperparameter configurations.
  • Use analysis to set better bounds on model hyperparameters to be searched.
  • Use domain knowledge to better prepare data or engineer input features.
  • Explore different potentially more complex methods.
  • Explore ensembles of well performing base models.

I also encourage you to include data preparation schemes as hyperparameters for model runs.

Some methods will perform some basic data preparation, such as differencing in ARIMA, nevertheless, it is often unclear exactly what data preparation schemes or combinations of schemes are required to best present a dataset to a modeling algorithm. Rather than guess, grid search and decide based on real results.

Some data preparation schemes to consider include:

  • Differencing to remove a trend.
  • Seasonal differencing to remove seasonality.
  • Standardize to center.
  • Normalize to rescale.
  • Power Transform to make normal.

So much searching can be slow.

Some ideas to speed up the evaluation of models include:

  • Use multiple machines in parallel via cloud hardware (such as Amazon EC2).
  • Reduce the size of the train or test dataset to make the evaluation process faster.
  • Use a more coarse grid of hyperparameters and circle back if you have time later.
  • Perhaps do not refit a model for each step in walk-forward validation.

4. Finalize Model

At the end of the previous time step, you know whether your time series is predictable.

If it is predictable, you will have a list of the top 5 to 10 candidate models that are skillful on the problem.

You can pick one or multiple models and finalize them. This involves training a new final model on all available historical data (train and test).

The model is ready for use; for example:

  • Make a prediction for the future.
  • Save the model to file for later use in making predictions.
  • Incorporate the model into software for making predictions.

If you have time, you can always circle back to the previous step and see if you can further improve upon the final model.

This may be required periodically if the data changes significantly over time.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Summary

In this post, you discovered a simple four-step process that you can use to quickly discover a skilful predictive model for your time series forecasting problem.

Did you find this process useful?
Let me know below.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning models for Time Series Today!

Deep Learning for Time Series Forecasting

Develop Your Own Forecasting models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Time Series Forecasting

It provides self-study tutorials on topics like:
CNNs, LSTMs, Multivariate Forecasting, Multi-Step Forecasting and much more...

Finally Bring Deep Learning to your Time Series Forecasting Projects

Skip the Academics. Just Results.

See What's Inside

42 Responses to How to Develop a Skillful Machine Learning Time Series Forecasting Model

  1. Avatar
    Pandya Rajnikant August 10, 2018 at 12:41 pm #

    What is the cost/price to develop forecasting regression model?

  2. Avatar
    Kevin August 13, 2018 at 12:01 pm #

    Hi Jason,

    As always, love your articles. Is there a de-facto model that has been showing the most promise rather than conventional time series analysis (smoothing, differencing, etc…), or better yet is there some known combination of time series analysis (like differencing as a pre-processing technique) prior to a standard machine learning model?
    Recurrent Nets have been showing promise within sequential data, like NLP problems, but are there any rules you’ve found as to when you should more readily consider Recurrent Nets/LSTM’s versus ARIMA’s? Your article suggests this remains an art (as ML really is) but I wasn’t sure if you personally had insight.
    I appreciate your time.

  3. Avatar
    Rishabh August 14, 2018 at 3:13 am #

    In one of your blog posts you have advised against using deep learning for time series. Do you just want to use it for time series because its a fad?

    • Avatar
      Jason Brownlee August 14, 2018 at 6:23 am #

      Recently, I’ve had great success with CNNs, CNN-LSTMs and ConvLSTMs for time series forecasting.

      I have a suite of posts scheduled on the topic and a book on the topic to share these findings.

      • Avatar
        Darkwind August 23, 2018 at 6:53 pm #

        Really look forward to this book!! Thanks Jason for your great work.

        • Avatar
          Jason Brownlee August 24, 2018 at 6:07 am #

          Thanks.

          • Avatar
            Benti August 25, 2018 at 2:37 am #

            Hey Jason,

            I was wondering when these posts / the book will be available?
            I am currently working on a time series analysis project where I am trying to assess (compare) the performance of different DL algorithms against “traditional” benchmark models for time series analysis. And these ensemble models are really a crucial part of the overall comparison.
            Many of your posts have already been a great help to me, and I am looking forward to the upcoming post on DL and time series. (I really hope it is gonna be soon)
            Thanks a lot for your effort and great work!

          • Avatar
            Jason Brownlee August 25, 2018 at 5:49 am #

            In a week or two, going through final edits.

  4. Avatar
    Hamza August 20, 2018 at 3:06 am #

    I know how to use LSTM networks very well (that’s what I think, at least), but I have no idea about how to use most of the simpler methods mentioned in this article (I have never heard of them!). I have some experience with deep learning on ordinary datasets, and I am trying to apply my knowledge to time series forecasting. Is that OK?

    • Avatar
      Jason Brownlee August 20, 2018 at 6:34 am #

      Do what works best for you, your project and your stakeholders.

  5. Avatar
    Bishwajit Roy October 1, 2018 at 3:22 pm #

    HI Jason B. I am new in deep learning. I want to do time series forecasting in environmental modelling. Please suggest me which toolkit do I prefer to implement my deep learning model.

  6. Avatar
    Ravindra December 16, 2018 at 5:14 am #

    Hello Jason,

    Wish you are doing well.
    Many thanks for such a informative post.
    I request you to please share knowledge on how to detect outliers in a time series. Outliers meaning here is sudden rise or fall of series.

    Thanks,
    Ravindra

  7. Avatar
    Richard December 17, 2018 at 5:51 am #

    Thanks Jason!

    About your new book, is it available in print format?

  8. Avatar
    Richard December 17, 2018 at 5:54 am #

    Sorry about the printed copy of the book.
    I have just seen in the FAQ that you do not provide print versions.
    Apologies again and please delete my comments.

    Thanks.

  9. Avatar
    Julien April 25, 2019 at 1:07 am #

    Many thanks Jason, as usual very useful and to be kept for future references.

    One question, would you try to apply LTSM/LTSM with attention (or any other NN) models to a multivariate time series regression problem with discontiguous non-uniformly sampled output data ? It seems the case of discontiguous data is much less tractable than the regular case usually assumed by most methods and the literature is a bit scarce on the matter. I have seen papers recommending modifications of the basic LSTM unit architectures.

    Any suggestion other than resampling/imputing data welcome!

    Best regards,

    • Avatar
      Jason Brownlee April 25, 2019 at 8:19 am #

      Yes, test that the discontinuities impact model performance before trying to correct them.

      Also, LSTMs are robust to missing or noisy input data, like most neural nets in general.

  10. Avatar
    Chris June 13, 2019 at 2:34 am #

    Hi Jason,

    I am in the middle of implementing an LSTM to forecast a single label (stock close price) based on multiple features and visualise this prediction from today (t).

    My question is, how can you use an LSTM to forecast the close price for say t + 10 days, t+20 days, t + a month into the future? Most of the examples I see are of plots of predicted price based on test data, but to take action on a stock I need to see the predictions for future dates.

    Hopefully this makes sense.

    Best regards

  11. Avatar
    Diwang Ruan July 22, 2019 at 10:03 pm #

    Hi Jason, at present, I am also trying to apply LSTM to predict vibration signal based on history data acquired from test bench, but it does not work so well, especially for long time prediction. Could you share me some suggestions to improve the performance of LSTM while predicting nonstationary signal in a long period? Thank you so much.

  12. Avatar
    Diwang Ruan July 22, 2019 at 10:08 pm #

    I also have tried to decompose the vibration signal into 6 components by EMD, and then each component was predicted by LSTM. The final prediction of vibration was obtained by sum of predictions of 6 components. However, the performance is still not so good. Do you have some other good suggestions for me? Because the vibration is a nonstationary signal.

    • Avatar
      Jason Brownlee July 23, 2019 at 8:03 am #

      Yes, try an MLP, CNN, CNN-LSTM, and more.

      More ideas here:
      https://machinelearningmastery.com/start-here/#better

      • Avatar
        Diwang Ruan July 24, 2019 at 6:22 pm #

        Could you tell me what is the longest period you can predict for non-stationary signals? how about 1000 points?

        • Avatar
          Jason Brownlee July 25, 2019 at 7:43 am #

          You can predict any future interval you want, but the amount of error you will have depends on the complexity of the series that you’re modeling.

          Try it and see is the best that I can say.

  13. Avatar
    Akshayaa October 22, 2019 at 8:39 pm #

    Hi Jason,
    I have a dataset containing the energy consumption for 365 days which is taken at a time interval of 1 minute of 100 apartments. I want to predict the energy consumption of the 100 apartments for the next year. Which model should i use for this purpose?

  14. Avatar
    Thulasi February 23, 2021 at 7:05 am #

    In time series if dates are missing which model can i choose( LSTM) for forecasting model for multiple products. Need help in this

  15. Avatar
    mistr March 11, 2022 at 4:07 pm #

    Hi It would be really convenient if you could link the taxonomy of a Time series problem article in there :

    https://machinelearningmastery.com/time-series-forecast-study-python-monthly-sales-french-champagne/

    I only discovered it after searching all the terms myself on google! Might be handy to others in the future.

    Thanks for the awesome article sir!!

    • Avatar
      James Carmichael March 12, 2022 at 2:41 pm #

      Thank you for the feedback mistr!

  16. Avatar
    Migom Libang July 26, 2022 at 4:19 pm #

    Hi Jason, I am doing a forecasting of electrical load consumption taking meteorological datas as independent variables. I took the datas of last 3 years. My query is that, How to treat the dates and Months of the 03 years in my data? Do I need to delete them or normalize them as they repeat after 365 days. please help

    • Avatar
      James Carmichael July 27, 2022 at 10:21 am #

      Hi Migom…I would recommend establishing a consistent time frame for each record in your dataset. That is, each line would represent a measurement taken each hour or each day as example.

  17. Avatar
    aldo April 25, 2023 at 7:46 pm #

    Hi, hopefully you answer my question to. I have a timeseries problem on hotel reservation. I ve done with ARIMA but the result is strange, it predict flat value (i.e 280 constatnt every day). I also try prophet but it gives me negative value. The data I have is one year daily data of hotel reservation.
    my question is, what is the pre processing step you advice ? should I do the minmax scalling on my data first ?

Leave a Reply