4 Strategies for Multi-Step Time Series Forecasting

Last Updated on

Time series forecasting is typically discussed where only a one-step prediction is required.

What about when you need to predict multiple time steps into the future?

Predicting multiple time steps into the future is called multi-step time series forecasting. There are four main strategies that you can use for multi-step forecasting.

In this post, you will discover the four main strategies for multi-step time series forecasting.

After reading this post, you will know:

  • The difference between one-step and multiple-step time series forecasts.
  • The traditional direct and recursive strategies for multi-step forecasting.
  • The newer direct-recursive hybrid and multiple output strategies for multi-step forecasting.

Discover how to prepare and visualize time series data and develop autoregressive forecasting models in my new book, with 28 step-by-step tutorials, and full python code.

Let’s get started.

  • Update May/2018: Fixed typo in direct strategy example.
Strategies for Multi-Step Time Series Forecasting

Strategies for Multi-Step Time Series Forecasting
Photo by debs-eye, some rights reserved.

Multi-Step Forecasting

Generally, time series forecasting describes predicting the observation at the next time step.

This is called a one-step forecast, as only one time step is to be predicted.

There are some time series problems where multiple time steps must be predicted. Contrasted to the one-step forecast, these are called multiple-step or multi-step time series forecasting problems.

For example, given the observed temperature over the last 7 days:

A single-step forecast would require a forecast at time step 8 only.

A multi-step may require a forecast for the next two days, as follows:

There are at least four commonly used strategies for making multi-step forecasts.

They are:

  1. Direct Multi-step Forecast Strategy.
  2. Recursive Multi-step Forecast Strategy.
  3. Direct-Recursive Hybrid Multi-step Forecast Strategies.
  4. Multiple Output Forecast Strategy.

Let’s take a closer look at each method in turn.

Stop learning Time Series Forecasting the slow way!

Take my free 7-day email course and discover how to get started (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

1. Direct Multi-step Forecast Strategy

The direct method involves developing a separate model for each forecast time step.

In the case of predicting the temperature for the next two days, we would develop a model for predicting the temperature on day 1 and a separate model for predicting the temperature on day 2.

For example:

Having one model for each time step is an added computational and maintenance burden, especially as the number of time steps to be forecasted increases beyond the trivial.

Because separate models are used, it means that there is no opportunity to model the dependencies between the predictions, such as the prediction on day 2 being dependent on the prediction in day 1, as is often the case in time series.

2. Recursive Multi-step Forecast

The recursive strategy involves using a one-step model multiple times where the prediction for the prior time step is used as an input for making a prediction on the following time step.

In the case of predicting the temperature for the next two days, we would develop a one-step forecasting model. This model would then be used to predict day 1, then this prediction would be used as an observation input in order to predict day 2.

For example:

Because predictions are used in place of observations, the recursive strategy allows prediction errors to accumulate such that performance can quickly degrade as the prediction time horizon increases.

3. Direct-Recursive Hybrid Strategies

The direct and recursive strategies can be combined to offer the benefits of both methods.

For example, a separate model can be constructed for each time step to be predicted, but each model may use the predictions made by models at prior time steps as input values.

We can see how this might work for predicting the temperature for the next two days, where two models are used, but the output from the first model is used as an input for the second model.

For example:

Combining the recursive and direct strategies can help to overcome the limitations of each.

4. Multiple Output Strategy

The multiple output strategy involves developing one model that is capable of predicting the entire forecast sequence in a one-shot manner.

In the case of predicting the temperature for the next two days, we would develop one model and use it to predict the next two days as one operation.

For example:

Multiple output models are more complex as they can learn the dependence structure between inputs and outputs as well as between outputs.

Being more complex may mean that they are slower to train and require more data to avoid overfitting the problem.

Further Reading

See the resources below for further reading on multi-step forecasts.

Summary

In this post, you discovered strategies that you can use to make multiple-step time series forecasts.

Specifically, you learned:

  • How to train multiple parallel models in the direct strategy or reuse a one-step model in the recursive strategy.
  • How to combine the best parts of the direct and recursive strategies in the hybrid strategy.
  • How to predict the entire forecast sequence in a one-shot manner using the multiple output strategy.

Do you have any questions about multi-step time series forecasts, or about this post? Ask your questions in the comments below and I will do my best to answer.

Want to Develop Time Series Forecasts with Python?

Introduction to Time Series Forecasting With Python

Develop Your Own Forecasts in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Introduction to Time Series Forecasting With Python

It covers self-study tutorials and end-to-end projects on topics like: Loading data, visualization, modeling, algorithm tuning, and much more...

Finally Bring Time Series Forecasting to
Your Own Projects

Skip the Academics. Just Results.

See What's Inside

127 Responses to 4 Strategies for Multi-Step Time Series Forecasting

  1. anthony March 8, 2017 at 10:05 pm #

    Thanks Jason for a wonderful post. Why does your model skips the value at “t”?

    • Jason Brownlee March 9, 2017 at 9:54 am #

      Just a choice of terminology, think of t+1 as t.

      I could have made it clearer, thanks for the note.

      • Pavlo Fesenko September 11, 2019 at 12:40 am #

        Hi Jason,

        Using t+1 instead of t is super confusing. 🙁 I don’t know about other users but it made it very difficult for me to grasp the idea with this terminology. If possible, please reconsider changing it to the traditional way in the future. Thanks a lot!

      • Katie November 17, 2019 at 1:15 pm #

        Hi Jason,

        which strategy would you recommend for recursive models like ARIMA ? I originally thought recursive but now I’m wondering if the hybrid would make more sense. I have the same question for moving averages and exponential smoothing models. I was using the strictly recursive approach and repeating the entire training process for several models on several folds. This was really computationally expensive, though, and I don’t know if was really necessary. Not sure if this matters, but models from a few different families (arima, ets, ..) were pre-tuned/configured on the smallest possible subset of the training set (I used hyperopt) and then walk forward validation was applied for each of the candidate models. As mentioned, Multistep forecasts were estimated using a strictly recursive approach with the RMSE being calculated over all time steps in the horizon (t+1,t+2,..) for each iteration of walk forward validation. To reiterate, models were pre-tuned so the same exact models were applied to predict each value in the horizon for a given iteration, but it was recursive since each new time step in the horizon in a given iteration was predicted using a training set with previous predcited time steps appended.

        • Katie November 17, 2019 at 1:22 pm #

          Oh, I forgot to provide some important details for context:
          1. I’m working with small samples
          2. The frequency is monthly
          3. The data is volatile
          4. The context is inventory optimization (specifically, we’re predicting quantity of products issued by warehouses)
          5. Forecasting is done at the SKU level and separate forecasts need to be made for each product and for each warehouse at my company
          6. Some SKUs are sparse and most are extremely volatile
          7. (slightly) Negative quantities do occur (indicating returns or adjustments) but are rare
          7. Solution is being developed in R and Python in Azure ML

        • Jason Brownlee November 18, 2019 at 6:42 am #

          I would recommend testing a suite of methods on your dataset and use the approach that results in the lowest error.

    • Roise September 5, 2018 at 3:42 pm #

      good question.thanks

  2. Dylan March 14, 2017 at 2:31 am #

    Hi Jason, it is always helpful to read your post. I have some confusion related to Time Series Forecasting.
    There is traffic data (1440 pieces in total, and 288 pieces each day) I collected to predict traffic flow. The data is collected every 5 min in five consecutive working days. I am going to use the traffic data of the first four day to train the prediction model, while the traffic data of the fifth day is used to test the model.
    Here is my question, if I want to predict the traffic flow of the fifth day, do I only need to treat my prediction as one-step forecast or do I have to predict 288-step?
    Look forward to your advice.
    Thanks for your post again.

    • Jason Brownlee March 14, 2017 at 8:25 am #

      Hi Dylan,

      If you want to predict an entire day in advance (288 observations), this sounds like a multi-step forecast.

      You could use a recursive one-step strategy or something like a neural net to predict the entire sequence one a one-shot manner.

      Predicting so many steps in advance is hard work (a hard problem) and results may be poor. You will do better if you can use data as it comes in to continually refine your forecast.

      Does that help?

      • Dylan March 15, 2017 at 1:54 am #

        Yes, your response is very helpful. Thank you very much. Now I realize my prediction is a multi-step forecast.
        Could you recommend me some more detailed materials related to the multi-step forecast, like the recursive one-step strategy or the neural net?
        Now I am reading your post, it is great.
        Thank you for your advice.
        Best regards

        • Jason Brownlee March 15, 2017 at 8:10 am #

          I am working on this type of material at the moment, it should be on the blog in coming weeks/months.

          You can use an ARIMA recursively by taking forecasts as inputs to make the next prediction.

          You can use a neural network to make a multi-step forecast by setting the output sequence length as the number of neurons in the output layer.

          I hope that helps as a start.

      • Patricia July 11, 2017 at 8:18 am #

        Hi Jason,
        Thank you for great posts! they’re awesome!

        I have the same problem as Dylan and decided to use statsmodel’s SARIMAX. It takes some time to do the prediction for the entire next day (288 steps), and have been wondering if I’m doing this wrong or should I use a different approach.
        Currently, I’m looking into LSTM RNN as a possible approach, but not sure.
        The thing is, with my data, I have to predict the entire 288 steps in one shot and detect an anomaly if there’s any, then predict the type of anomaly that occured….

        My question is, am I going in the right direction by looking into LSTM RNN?

        I’m really looking forward into reading your posts on this topic!

        Thanks Jason 🙂

  3. Kunpeng Zhang March 17, 2017 at 12:45 am #

    Yes, I will. Discussing with you is always helpful. Look forward to reading your new post on Time Series Forecast.

  4. Abhishek March 18, 2017 at 4:18 pm #

    Hi Jason, just another brilliant post. Can you show up a working example for first or second method like you have always shown in other tutorials. It would be immense help to a novice like me. Thanks…

    • Jason Brownlee March 19, 2017 at 6:09 am #

      I do hope to have many examples on the blog in the coming weeks.

  5. mary March 31, 2017 at 4:57 am #

    Thank you Jason for your wonderful articles ! you are a life saver!
    But I suppose you did a mistake in the example for number2 and 3. both has the same value as
    prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))
    prediction(t+2) = model2(prediction(t-1), obs(t-2), …, obs(t-n))

    I believe that one of them should be
    prediction(t+2) = model2(prediction(t+1), obs(t-1), …, obs(t-n-1))

    • Jason Brownlee March 31, 2017 at 5:59 am #

      Hmmm, I guess you’re right. I was thinking from the frame of the second prediction rather than the frame of both predictions.

      Fixed.

  6. Fatima Abu Salem April 7, 2017 at 3:29 pm #

    Hello Jason,

    What kind of Multiple output models would you recommend if we are opting for the fourth strategy?

    • Jason Brownlee April 9, 2017 at 2:52 pm #

      Neural networks, such as MLPs.

      • Vipul September 21, 2017 at 7:38 pm #

        Sir, how about LSTM?

        • Jason Brownlee September 22, 2017 at 5:38 am #

          Sure, try them, but contrast results to an MLP.

  7. Masum May 13, 2017 at 8:20 pm #

    Sir,

    Would you please come up with a blog where we would love to see all these strategies have been applied to an example (dataset) and their result comparisons.

    Would you please?

    • Jason Brownlee May 14, 2017 at 7:26 am #

      Perhaps in the future, thanks for the suggestion.

      • masum May 22, 2017 at 10:12 pm #

        sir,
        would you be kind enough to post soon?

        I am stuck with my theoretical knowledge need to apply on my data to see the result and their comparative analysis.

  8. Hans June 12, 2017 at 12:40 am #

    What is a decent one-step prediction of unseen data? How would it looks like?

    Let’s say I have 100 rows in a data set and do the following in R:

    I write ‘=’ instead of the arrows because of the forum parser:

    1. I split the 100 rows of raw data in 99 training rows and 1 testing row:

    inTrain=createDataPartition(y=dataset$n12,p=1,list = FALSE)

    training=dataset[inTrain-1,];
    testing=dataset[-inTrain+1,]

    2. I train the model:

    modFit=train(n12~., data=training, method = ‘xxx’)

    3. I get the final model of Caret

    finMod<-modFit$finalModel

    4. I predict one step with the final model of the training and the one row of testing.

    newx=testing[,1:11]
    unseenPredict=predict(finMod, newx)

    Now, do I have a decent prediction of one step unseen data in point 4 ???

    And why there are libraries like forecast for R, if everything can have been coded to a one-step forecast by default?

    https://github.com/robjhyndman/forecast/

    • Jason Brownlee June 12, 2017 at 7:09 am #

      Sorry, I don’t have examples of time series forecasting in R, I cannot offer good advice.

  9. Hans June 12, 2017 at 6:18 pm #

    I know there is also the option to use the time series object(s) in R.
    But could you answer my question in general?

  10. Hans June 14, 2017 at 1:57 am #

    I don’t understand the difference between regression forecast and time series forecast.
    Or what are the benefits from each over the other.

    • Jason Brownlee June 14, 2017 at 8:47 am #

      A time series forecast can predict a real value (regression) or a class value (classification).

  11. Leonildo June 16, 2017 at 10:35 am #

    Hello Jason,

    How to prepare dataset for train models using with Direct Multi-step Forecast Strategy ?

    For the serie: 1,2,3,4,5,6,7,8,9

    Model 1 will forecast t+1 using window of size 3 , then the dataset would be:
    1,2,3->4
    2,3,4->5
    3,4,5->6
    4,5,6->7
    5,6,7->8
    6,7,8->9

    Model 2 will forecast t+2 using window of size 3 , then the dataset would be:
    1,2,3->5
    2,3,4->6
    3,4,5->7
    4,5,6->8
    5,6,7->9

    Model 3 will forecast t+3 using window of size 3 , then the dataset would be:
    1,2,3->6
    2,3,4->7
    3,4,5->8
    4,5,6->9

    and so on. Is it right ? Thanks

    • Jason Brownlee June 17, 2017 at 7:18 am #

      Great question.

      For the direct approach, the input will be the available lag vars and the output will be a vector of the prediction.

      I can see that you want to use different models for each step in the prediction.

      You could structure it as follows:

      Try many approaches and see what works best on your problem.

      I hope that helps.

      • Leonildo June 17, 2017 at 10:17 am #

        thank you so much! This answers my question.

      • Davood Raoofsheibani July 29, 2018 at 11:05 pm #

        Dear Jason,

        In Direct Multi-step Forecast Strategy, for model 2
        why haven’t you used obs(t-1) as well?

        i.e.

        instead of:
        prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))
        prediction(t+2) = model2(obs(t-2), obs(t-2), obs(t-3), …, obs(t-n))

        =>
        prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))
        prediction(t+2) = model2(obs(t-1), obs(t-2), …, obs(t-n))

        The latter seems more compatible with the example you provided in this comment.

        thanks!

        • Jason Brownlee July 30, 2018 at 5:49 am #

          t-1 for model2 would be the predicted value of model1. It could use them too if you wish.

          • Shellder September 6, 2019 at 2:24 pm #

            Hi, Jason,

            Can you explain more about why t-1 for model2 would be the predicted value of model1?
            I think there is no information leakage since the model 1 is “older” than the model 2. The model 2 can use all the data used by the model 1.

            What is your concern?

          • Jason Brownlee September 7, 2019 at 5:16 am #

            Sorry, I don’t follow, what’s the context exactly?

          • Pavlo Fesenko September 11, 2019 at 12:37 am #

            Hi Shellder,

            I think that’s indeed a mistake in Jason’s formulation of the direct multi-step approach. The component obs(t-1) should also be included in the training set for the model2. 😉 It made me very confused the first time I read it but then after checking other sources I realized that there is no reason not to include obs(t-1) in the model2.

            Jason, could you please have a look at it and correct it for the future blog visitors? =) Thanks!

  12. Jisun August 30, 2017 at 7:00 am #

    Hello Jason, thank you for your wonderful post. (Actually I already bought your book.)
    I have a question.. I am building a forecasting model with my timeseries dataset, which is a daily number of some cases, I have 3 years past data (so will be records of 3*365 days.) I’d like to forecast 2 months future data (60 days.)

    I already built a multi-step LSTM model for this, however, it doesn’t seem to work well… For example, 3 years past data clearly has a pattern like Nov/Dec high peak seasonality and increasing trend, but 60 steps of LSTM gave me poor forecasting like decreasing trend with no seasonality… and even the base is decreasing too.. which is so not understandable.
    My question is:

    1. Do you think my parameter tuning could be wrong? I mean, LSTM multi step forecasting cannot be this much poor..?

    2. Is there any recommendation for one model approach for my problem..? I used ARIMA, but I wanted to use algorithmic model rather than a statistical model, so that’s why I’m trying to build LSTM… Do you think I need to go back to ARIMA..?
    (After building one model, I will use ensemble method to improve current model though. For now, I need a decent model giving me the understandable result.)

    Thank you so much, your any opinion on this will be really appreciated.

    • Jason Brownlee August 30, 2017 at 4:17 pm #

      Thanks Jisun,

      I generally would recommend using an MLP, LSTMs do not seem to perform well on straight autoregression problems.

  13. dan November 23, 2017 at 4:09 pm #

    Hi Jason,

    There is a question above asking you “What kind of Multiple output models would you recommend if we are opting for the fourth strategy”?
    And you answer MLPs

    Then I try to use mlp to get a one-shot sequence, but I keep getting error…

    Below is my code and scenario,

    x_train.shape:
    (4, 5, 29)
    y_train.shape:
    (4, 28)
    I wish to use prior 5 timesteps and 29 features to get the 28 timesteps ahead forecast sequence.
    only 4 training data for illustrative purpose.

    model = Sequential()

    model.add(Dense(units = 100, input_shape = (5, 29)))

    model.add(Dense(90, kernel_initializer=’normal’, activation=’relu’))
    model.add(Dense(90, kernel_initializer=’normal’, activation=’relu’))
    model.add(Dense(30, kernel_initializer=’normal’, activation=’relu’))

    model.add(Dense( 28,kernel_initializer=’normal’, activation=’relu’))
    model.compile(loss=’mse’, optimizer=’adam’, metrics=[‘accuracy’])

    Error when checking target: expected dense_168 to have 3 dimensions, but got array with shape (4, 28)
    \

    How can I rectify it? Thank you very much

    • Jason Brownlee November 24, 2017 at 9:33 am #

      With an MLP, the prior observations will be features.

      If you have 29 features and 5 time steps, this will in fact be 5 x 29 (145) input features to the MLP.

  14. Amalka January 4, 2018 at 5:25 pm #

    Hi Jason

    I am trying to fit a LSTM model which is a multivariate (input and output) and multi step.

    So I need to predict multiple steps and multiple features in one model.

    Temp : [1,2,3,4],Rain[1,2,3,4] = predict(Temp : [5,6], Rain[5,6])

    What is your recommended architecture to do this in one model ?.

    I have daily selling values for 5 years with 167000(per item per store) features to predict 15 days for 167000 features

  15. Charles Lang February 13, 2018 at 1:33 pm #

    Hi Jason,

    Thank you very much for sharing a great article again. I have read many your posts these days, and learned a lot from them.

    My project is one-step forecast on time series data. Do you think which model is the best to it?

    Charles

    • Jason Brownlee February 14, 2018 at 8:13 am #

      I would recommend testing a suite of models to see what works best for your data.

  16. pezhman February 20, 2018 at 10:15 am #

    Hi Jason,

    Is there anyway to reduce the propagated error during Multi step ahead prediction with recurrent neural network?

    Thanks

  17. Roshan March 26, 2018 at 6:37 pm #

    Hi Jason,

    Thanks for the great post. Your post uses Direct Strategy. I would like to apply Direct-Recursive Hybrid Strategy using RNN-LSTM on a time series data that has trend and seasonality. What I need is a multistep forecast where the prediction for the prior time step is used as an input for making a prediction on the following time step. How to go about this ? Because recursion for multi-step would be highly computationally expensive. What changes do I need in the existing code for multi-step using hybrid method?

    Thanks

  18. Kaushal Shetty March 30, 2018 at 8:20 pm #

    Hi Jason,
    I am going with the 4th strategy you mentioned that is one model predicting forecasts in one shot.
    I have two models in my mind for this.
    1) Multi-Output Network : Output layer of this architecture has ‘forecast’ number of dense layer. In this model there are ‘forecast’ number of weight matrices each trained on predicting individual forecast? Each output dense layer is optimized(adam) for MSE.

    2)Single-Output network: This architecture has one dense layer as output with ‘forecast’ number of neurons. So in this case there is only one weight matrix trained for all forecasts and one cost function across all forecasts as opposed to first approach.

    Are both the architecture valid? Which architecture works best?

    Also one more question Jason. What is the best way to add regularization on time series model.Dropouts or kernel regularizer?Both would do?

    Thanks,
    Kaushal

  19. MLT May 17, 2018 at 10:03 pm #

    Hi Jason, it is a great article and very helpful summary. I have one trivial question.

    1. Direct Multi-step Forecast Strategy

    prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))
    prediction(t+2) = model1(obs(t-2), obs(t-3), …, obs(t-n))

    ==>
    do you mean model2, not model1 for t+2?
    prediction(t+2) = model2(obs(t-2), obs(t-3), …, obs(t-n))

    why it starts t+1 not t
    prediction(t) = model1(obs(t-1), obs(t-2), …, obs(t-n))

    • Jason Brownlee May 18, 2018 at 6:24 am #

      Yes, I meant model2, thanks – fixed.

      • MLT May 19, 2018 at 12:31 am #

        sorry to disturb you again.

        Why it is t+1, not t? Thanks.

        prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))
        =>
        prediction(t) = model1(obs(t-1), obs(t-2), …, obs(t-n))

        • Jason Brownlee May 19, 2018 at 7:43 am #

          It could be and probably should be t. Just a chosen notation of t-1, t+1.

  20. Klaus May 21, 2018 at 2:22 pm #

    Hi jason,
    Thanks a lot for the great post. I am going with the 2th strategy you mentioned that is recursive multi-step forecast but having difficulty in implementing the recursive forecasting part.
    How to put the prior time step to be used as an input for making a prediction on the following time step, for example with SVR or MLP. It would be immense help to a novice like me. Thanks…

    • Jason Brownlee May 21, 2018 at 2:31 pm #

      Thanks Klaus.

      You will have to store it in a variable and then create an array that includes the value and use it as input for the next prediction.

  21. Thada June 4, 2018 at 1:57 am #

    Hi Jason,

    Thank you for your great post.

    If I use Recursive Multi-step Forecast with ARMA model, the effect of MA predictions will reduce over the steps of prediction or not?

    Due to the nature of multi-step forecast, the error terms of the previous unobserved samples will become zero when they are used as an inputs for the further prediction. Howevers, MA will estimate based on the previous errors if I do not missunderstand. Thus, if the previous error terms become 0, will only AR terms affect the prediction results?

    Sorry if I misunderstand about the ARMA model. I’m quite new for this topic.

    • Jason Brownlee June 4, 2018 at 6:30 am #

      Recursive prediction will result in compounding error. Why would MA go to zero?

      • Thada June 5, 2018 at 12:14 am #

        Sorry, I didn’t mean the real error but the unobserved residuals. For example in the following website, no observed values for residuals ε106, ε107 are considered as zero in the equation ŷ107 = f(ε106, ε107, y106).

        http://www.real-statistics.com/time-series-analysis/arma-processes/forecasting-arma/

        “This time, there are no observed values for ε106, ε107, or y106. As before, we estimate ε106 and ε107 by zero, but we estimate y106 by the forecasted value ŷ106.”

  22. Sam June 13, 2018 at 10:54 am #

    Hello Jason,

    I am trying to build an LSTM model. My training set has 580 timesteps and 12000 features.
    I wanted to use 10 timesteps to predict next 5 timesteps. In this case my train_x.shape will be (87,10,12000). However I am confused about train_y.shape. Should it be (87,10)?

  23. MLT June 27, 2018 at 5:50 am #

    Hi Jason,

    In my understanding, Direct-Recursive Hybrid Strategies can be implemented in below three steps. Could you help me to check if it is correct please? Thanks in advance.

    prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))
    prediction(t+2) = model2(prediction(t+1), obs(t-1), …, obs(t-n))

    1. Use train data to train model1
    2. Predict t+1 for all train data
    3. Use predicted t+1 plus train data to train model2

  24. Davood Raoofsheibani July 29, 2018 at 11:12 pm #

    Dear Jason,

    In Recursive Multi-step Forecast:

    I guess to predict the value at (t+2), the observed value of (t-n) is not needed but the one at (t-n+1).

    therefore:
    instead of:
    prediction(t+1) = model(obs(t-1), obs(t-2), …, obs(t-n))
    prediction(t+2) = model(prediction(t+1), obs(t-1), …, obs(t-n))
    should be :
    prediction(t+1) = model(obs(t-1), obs(t-2), …, obs(t-n))
    prediction(t+2) = model(prediction(t+1), obs(t-1), …, obs(t-n-1))

    Am I right?

  25. Ling July 31, 2018 at 6:09 am #

    Dear Jason ,
    I have some experience in python and machine learning and now I am trying to learn predicting a value ( Regression ) based on Timestamp ( Every 5 mins ). So please suggest me the appropriate model for this. Also, let me know if your book on forecasting helps me in this.

  26. Kumar Avinash August 21, 2018 at 4:27 am #

    HI JASON,

    Very informative and nice one. I have one problem related to LSTM forecasting model. I am making a model on call volume forecasting. I want to forecast 3 months ahead of today, let’s say I have built a model and scoring today (in Aug’18), so the forecasted month should of Dec’18 3 months ahead of scoring month. And how should I be proceeding while building model (training dataset) and validation (test dataset) and scoring unseen data(as discussed above). Do I have to build the model on training dataset in similar way, like forecasting 3 months in advance? If yes, how to proceed.

    Thanks in Advance

    • Jason Brownlee August 21, 2018 at 6:21 am #

      Yes, frame the historical data in the way that you intend to use the model.

      E.g. if you need a model to predict n months ahead, frame all historical data this way and fit a model, then score the fit model.

  27. Ahmed R. Elshami September 29, 2018 at 5:45 am #

    Hello Dr. Jason,

    Thank you for your amazing post.

    kindly I am confusing about the error calculating for multi-step-ahead prediction.

    Suppose I have my training data D_pred = [1,2,3,4,5,6]
    and my corresponding target data, D_trgt = [7,8,9,10,11,12,13]

    I am using lag = 4 and I want to predict 5 step-ahead

    My D_pred would be like this

    1 2 3 4
    2 3 4 5
    3 4 5 6

    I used my D_pred to get my prediction result D_out in one shot like this

    x7 x8 x9 x10 x11
    x8 x9 x10 x11 x12
    x9 x10 x11 x12 x13

    and my D_trgt would be like this

    7 8 9 10 11
    8 9 10 11 12
    9 10 11 12 13

    now, how to calculate the SMAPE error between D_out and D_trgt for horizon 5?

    – get SMAPE error between [x7, x8, x9, x10, x11, x12, x13] and [7, 8, 9, 10, 11, 12, 13]

    or

    – get SMAPE error between [x11, x12, x13] and [11, 12, 13]

    which way is the right way?

    Thank you so much

  28. Mike October 20, 2018 at 12:48 am #

    Hi, master. What is the lag timesteps used for? I still can’t understand,my doctor.

    • Jason Brownlee October 20, 2018 at 5:55 am #

      Lag time steps are observations at prior times. They are used as inputs to the model.

      • Mike October 20, 2018 at 6:43 pm #

        Can you explain more exactly?

        • Jason Brownlee October 21, 2018 at 6:11 am #

          Sure, which part are you having difficulty with exactly?

  29. mk December 26, 2018 at 12:46 pm #

    Is there any criteria for choosing the four main strategies?
    For example,How many steps?How many datas?

    • Jason Brownlee December 27, 2018 at 5:36 am #

      Project requirements may impose constraints, alternately, choose the approach that results in the best skill.

  30. Kristen January 30, 2019 at 4:31 pm #

    Hi Jason,

    Thanks for your post. I’m implementing the Multiple Output Strategy using a Neural Network approach. I have about 5 years worth of historical data at weekly granularity, and I want to predict 6, 12, 18 and 24 months into the future.

    I’ve read your “How To Backtest” post, but am still not sure about how to do the train/test split in this case, ie. predicting 2 years into the future with only 5 years historical data. Even if I only wanted to forecast one year ahead (at say 3, 6, 9 and 12 months), how would you backtest?

    Thanks.

    • Jason Brownlee January 31, 2019 at 5:27 am #

      You would use walk-forward validation as described in the back test tutorial.

  31. Larry February 25, 2019 at 8:33 pm #

    I do not understand the direct approach. I have only found vague examples to explain. Let us consider a problem in which I want to do a 1 step prediction.

    history = [1,2,3,4,5,6,7,8,9,10]

    Yt = 10

    problem: predict Yt+1

    My current understanding of how to formulate the training data is to use a sliding window of size 1.

    X_train = [1,2,3,4,5,6,7,8,9]

    Y_train = [2,3,4,5,6,7,8,9,10]

    Then I train the first model using X_train and Y_train

    In a single step prediction scenario, to predict what comes after 10 I would call

    Predict(model1, 10), and the output should be 11 + some error, depending on the model

    Now, for the 2-tep reccursive method I would call

    Predict(model1,Predict(model1,10)) to get 12 + some error

    The direct method for the 2 step prediction will be

    a = Predict(model1, 10) to get 11 + some error

    b = Predict(model2, K) to get 12 + some error

    predictions = [ ]

    predicions.append(a)

    predictions.append(b)

    Finally, a 2-step prediction is accomplished:

    predictions == [11 + error, 12 + error]

    My questions:

    1)What is K? Which value do I need to send to the predict method for the first model

    2) What are the values of the X_train an Y_train used to train the second model.

    Thanks.

  32. xuxing February 27, 2019 at 2:51 pm #

    Hi Jason,

    Thanks for your post.
    I have two products, both of which have their historical data, and how do I model so that both products can predict future sequences from their historical data.Construct two models or just one model?

  33. bara April 15, 2019 at 4:00 pm #

    dear sir, i follow all your tutorials and combined them. i have more than thousand sequences. i choose vanilla lstm according this

    https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/

    i also add train and test, 80% 20%. i got RMSE and MAPE and the Graph.

    i want to forecast next value, and i follow this.

    https://machinelearningmastery.com/multi-step-time-series-forecasting/

    according https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
    you forecasting 1 next value with 3 last value.

    so want to forecast 2 and 3 next value using Recursive Multi-step Forecast, and my code like this

    #one
    x_input = array([Y_test[-1], Y_test[-2], Y_test[-3]])
    x_input = x_input.reshape((1,1,3))
    yhat = model.predict(x_input, verbose=0)

    #two
    x_input1 = array([yhat,Y_test[-1], Y_test[-2]])
    x_input1 = x_input1.reshape((1,1,3))
    yhat1 = model.predict(x_input1, verbose=0)

    #three
    x_input2 = array([yhat1,yhat, Y_test[-1]])
    x_input2 = x_input2.reshape((1,1,3))
    yhat2 = model.predict(x_input2, verbose=0)

    my question is dowhat i am doing is right? do i follow the principle LSTM & Recursive Multi-step?

    that all, thank you sir
    i wait your responses

    • Jason Brownlee April 16, 2019 at 6:44 am #

      Nice work.

      I’m eager to help but I don’t have the capacity to review/debug your code. Sorry.

  34. Yessica Chen April 28, 2019 at 12:16 pm #

    Thank you Jason for sharing it. I want to know which methods are more helpful in timeseries problem. Did you do some contrast experiment with them?

  35. Joachim May 2, 2019 at 3:17 am #

    Hi Jason! Writing a thesis on this right now and your examples are very much appreciated.

    I have a question about the Direct-Recursive Hybrid, as we have been able to test out all the other methods to a certain degree.

    How would you go about writing the programming logic for this? Especially when using time-series cross validation.

    What exactly do i fit model2 on at time t?

    • Jason Brownlee May 2, 2019 at 8:08 am #

      Perhaps review the example in the tutorial and try to map your data onto it, also perhaps check the paper for an elaboration on the approach.

      You will have to write custom code to prepare data and fit models.

  36. Yeqi Liu May 17, 2019 at 5:14 pm #

    Thank you for your tutorial.
    I have a question: is it possible for the same model to have different dimensions of input variables in Recursive Multi-step Forecast (i.e., from obs(t-n) to (t-1) with R_n dimension, and from t-n to prediction(t+1) with R_n+1 dimension)?
    Also, is it better whether the input dimension should be the same in Direct-Recursive Hybrid Strategies?

    • Jason Brownlee May 18, 2019 at 7:32 am #

      Yes, you could have a multi-input model, like a neural net.

      Perhaps experiment with your dataset/model and see what works well?

  37. Leo June 21, 2019 at 1:34 pm #

    Hi Jason,

    I have a question regarding error propagation in different multi-step forecast models that I post on StackExchange before reading this post. (so the terminology I used is a bit non-standard) I would like to understand the theory behind error propagation. Could you shed some lights, please?

    https://datascience.stackexchange.com/questions/54130/error-propagation-in-time-series-forecast-with-many-to-many-multi-steps-rnn-lstm

    Many thanks!

    • Jason Brownlee June 21, 2019 at 2:03 pm #

      Perhaps you can summarize the gist of your question?

  38. Lopa July 3, 2019 at 8:53 pm #

    Hi Jason,

    I have been able to implement the recursive strategy. However for Direct-Recursive Hybrid Strategy if I understood correctly we train & predict the entire training data & append those predictions to the initial train data & retrain the model.

    Having said that if my initial train data has 100 observations & I predict all 100 then I append these predictions to my initial train data making 200 observations ? Is my understanding correct or am I missing out something?

    • Jason Brownlee July 4, 2019 at 7:45 am #

      Not quite, the predictions become inputs for subsequent predictions.

      • Lopa July 5, 2019 at 9:59 pm #

        Sorry to ask again Jason but can you please explain because I tried finding it in your books but couldn’t find & also tried understanding it based on the example & the example of the household electricity but could not really grasp it entirely.

        • Jason Brownlee July 6, 2019 at 8:36 am #

          Sure, what are you having trouble understanding exactly?

  39. Lopa July 10, 2019 at 12:05 am #

    Thanks for your reply Jason.

    Suppose I have 100 observations in total after training & validating my model I train my model on the entire data (all 100 observations) & predict one step ahead i.e; 101st observation.

    Do I use this predicted value to replace the 1st observation in my original data to have 100 observations & predict the next step & repeat the process ?

    Thanks again.

    • Jason Brownlee July 10, 2019 at 8:14 am #

      It really depends on how your model is defined.

      You have defined your model to map some number of inputs to an output, you must provide data in that format to make a prediction.

  40. Chao August 6, 2019 at 8:57 am #

    Hi Jason, thanks for making this great tutorial.

    I am not sure I’m fully understood the difference between recursive multi-step and direct-recursive hybrid. These two look exactly same in your example code.


    prediction(t+1) = model1(obs(t-1), obs(t-2), ..., obs(t-n))
    prediction(t+2) = model2(prediction(t+1), obs(t-1), ..., obs(t-n))

    If I understood correctly, the main difference is hybrid is each model may use or may not use the models at prior time, and the recursive multi-step is each model will use the prior model?

    Thanks!

    • Chao August 6, 2019 at 9:00 am #

      If the above description is correct, how do you decide when should use or not use? I assumed that requires to be training?

      • Jason Brownlee August 6, 2019 at 2:01 pm #

        Test a few different approaches and see what works best for your choice of model and the dataset.

    • Jason Brownlee August 6, 2019 at 2:01 pm #

      Recursive uses predictions as inputs.

      Direct recursive hybrid uses the same idea, but separate models for each time step to be forecasted.

      Does that help?

  41. Matteo P. August 6, 2019 at 7:47 pm #

    Hi Jason, your articles are great and they helped me a lot!
    I’m working on predictive maintenance and given a long time series of data, each of them with 15 features, I should predict the next X time steps.
    Basically I thought of using 400 time steps as input and predicting 20 steps as output. As a result I’m using your 4.Multiple Output Forecast Strategy.

    My Net is this:
    n_steps_in = 400
    n_steps_out = 20
    n_epochs = 20
    batch_size = 128
    model = Sequential()
    model.add(LSTM(100,
    activation=’relu’,
    input_shape=(n_steps_in, n_features),
    return_sequences=True))
    model.add(LSTM(20))
    model.add(Dense(n_steps_out * n_features))
    model.add(Reshape((n_steps_out, n_features)))
    model.compile(loss=’mse’,
    optimizer=Adam(lr=0.001),
    metrics=[‘accuracy’])
    model.fit(X,
    y,
    epochs=n_epochs,
    validation_split=0.25,
    batch_size=batch_size)

    But sometimes i’ve got Nan in Loss while training, but i don’t know why. Do you have any explanation?
    Thanks

    • Jason Brownlee August 7, 2019 at 7:51 am #

      Nice work.

      Perhaps vanishing or exploding gradients.

      Try scaling the data prior to fitting?
      Try relu?
      Try gradient clipping?

    • Amit Krishna Baral August 21, 2019 at 12:40 pm #

      Did you think about applying the concept of Survival model here?

  42. Sneha Mitta September 19, 2019 at 6:46 am #

    Hi Jason,

    I wanted to know if you have any posts that have implemented each of the strategies you discussed above?

    I also wanted to know if one strategy is better than the other by any chance? I’d like to get a deeper insight on what kind of strategy would work for a particular kind of data.

  43. Sam October 18, 2019 at 1:06 pm #

    Hi Jason, First of all, thanks for all your nice posts. People has asked this question before, but I was wondering if there might be an update! I was wondering do you have an example code for “Direct-Recursive Hybrid Strategies” similar to what you have for “Time Series Prediction With Deep Learning in Keras”. Or if your E-Book has it?
    Thanks again, Sam

  44. SOUALHi November 25, 2019 at 6:57 am #

    Hi Mr. Jason,

    I’m working on forecasting time series, i use LSTM as model to forecast.This is the main steps i used to structure my data in oder to predict one step:

    1) The model takes 1 day of data as “training X”
    2) The model takes the VALUE of 1 day + 18hours after as “trainingY”
    3) I build a slliding window as well as the sequences are shifted by one value, fore example:

    XTrain{1} = data(1:24) –> YTrain{1} = data(42)
    XTrain{2} = data(2:25) –> YTrain{1} = data(43)
    XTrain{3} = data(3:26) –> YTrain{1} = data(44)
    XTrain{4} = data(4:27) –> YTrain{1} = data(45)
    .
    .
    .

    4) The test data are also constructed as the same way of training data, fore example:

    XTtest{1} = data_test(1:24) –> YTest{1} = data_test(42)
    XTtest{2} = data_test(2:25) –> YTest{2} = data_test(43)
    .
    .
    .

    First, to sumurize, my objective id to predict each time 18h after. Is this structed cited above is correct?

    If yes, I have the problem when i try to predict the XTest{1} the obtained predicted value is the corresponding value of data_test(25) instead of d ata_test(42)? For this purpuse, why the predicted value is shifted? Where is the problem and how to remedy to this problem?

    Tnak you in advance for your help.

    • Jason Brownlee November 25, 2019 at 2:04 pm #

      There are many ways to frame your problem, your approach is one possible way.

      Perhaps the model is not skillful?

      Perhaps try alternate architectures or training configurations?
      Perhaps try alternate framings of the problem?
      Perhaps try alternate models?

      • SOUALHi² November 26, 2019 at 1:01 am #

        Thank you very your reativity. Hence, i have tried several architechtures of the model and LSTM gives the best reults, which suitable for for learning many sequences with different lengths.

        For my part, i use this architechture to learn the model. However, could you please inform me or show me the orther architechtures to prepare the training data that can help me to solve this problem of shiftting results? Because, i use 3 test sets, when try t predict, the results are shifted by the number of difference steps used for target prediction. i.e.

        model_1: data(1:24) –>data(30) the difference is 6 points. Therefore the predicted curve is shifted by 6 steps earlier

        model_2: data(1:24) –>data(36) the difference is 12 points. Therefore the predicted curve is shifted by 12 steps earlier

        I don’t think that this is logical response?

        Best regards

  45. Yass T November 27, 2019 at 9:45 pm #

    Hello, could there be a typo in the explanation of Direct Multi-step Forecast Strategy ?

    I was expecting the second row in the code to be :
    prediction(t+2) = model2(obs(t-2), obs(t-4), obs(t-6), obs(t-8), obs(t-n)) – so basically no (t-3), and two time steps increments ?
    Otherwise, I still do not understand that correctly 🙂
    Thank you

    • Jason Brownlee November 28, 2019 at 6:38 am #

      No, I believe it is correct.

      There are 2 models and both models only use available historic data.

      model1 predicts t+1 and model2 predicts t+2.

      Does that help?

Leave a Reply