4 Strategies for Multi-Step Time Series Forecasting

Last Updated on August 21, 2019

Time series forecasting is typically discussed where only a one-step prediction is required.

What about when you need to predict multiple time steps into the future?

Predicting multiple time steps into the future is called multi-step time series forecasting. There are four main strategies that you can use for multi-step forecasting.

In this post, you will discover the four main strategies for multi-step time series forecasting.

After reading this post, you will know:

  • The difference between one-step and multiple-step time series forecasts.
  • The traditional direct and recursive strategies for multi-step forecasting.
  • The newer direct-recursive hybrid and multiple output strategies for multi-step forecasting.

Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Update May/2018: Fixed typo in direct strategy example.
Strategies for Multi-Step Time Series Forecasting

Strategies for Multi-Step Time Series Forecasting
Photo by debs-eye, some rights reserved.

Multi-Step Forecasting

Generally, time series forecasting describes predicting the observation at the next time step.

This is called a one-step forecast, as only one time step is to be predicted.

There are some time series problems where multiple time steps must be predicted. Contrasted to the one-step forecast, these are called multiple-step or multi-step time series forecasting problems.

For example, given the observed temperature over the last 7 days:

A single-step forecast would require a forecast at time step 8 only.

A multi-step may require a forecast for the next two days, as follows:

There are at least four commonly used strategies for making multi-step forecasts.

They are:

  1. Direct Multi-step Forecast Strategy.
  2. Recursive Multi-step Forecast Strategy.
  3. Direct-Recursive Hybrid Multi-step Forecast Strategies.
  4. Multiple Output Forecast Strategy.

Let’s take a closer look at each method in turn.

Stop learning Time Series Forecasting the slow way!

Take my free 7-day email course and discover how to get started (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

1. Direct Multi-step Forecast Strategy

The direct method involves developing a separate model for each forecast time step.

In the case of predicting the temperature for the next two days, we would develop a model for predicting the temperature on day 1 and a separate model for predicting the temperature on day 2.

For example:

Having one model for each time step is an added computational and maintenance burden, especially as the number of time steps to be forecasted increases beyond the trivial.

Because separate models are used, it means that there is no opportunity to model the dependencies between the predictions, such as the prediction on day 2 being dependent on the prediction in day 1, as is often the case in time series.

2. Recursive Multi-step Forecast

The recursive strategy involves using a one-step model multiple times where the prediction for the prior time step is used as an input for making a prediction on the following time step.

In the case of predicting the temperature for the next two days, we would develop a one-step forecasting model. This model would then be used to predict day 1, then this prediction would be used as an observation input in order to predict day 2.

For example:

Because predictions are used in place of observations, the recursive strategy allows prediction errors to accumulate such that performance can quickly degrade as the prediction time horizon increases.

3. Direct-Recursive Hybrid Strategies

The direct and recursive strategies can be combined to offer the benefits of both methods.

For example, a separate model can be constructed for each time step to be predicted, but each model may use the predictions made by models at prior time steps as input values.

We can see how this might work for predicting the temperature for the next two days, where two models are used, but the output from the first model is used as an input for the second model.

For example:

Combining the recursive and direct strategies can help to overcome the limitations of each.

4. Multiple Output Strategy

The multiple output strategy involves developing one model that is capable of predicting the entire forecast sequence in a one-shot manner.

In the case of predicting the temperature for the next two days, we would develop one model and use it to predict the next two days as one operation.

For example:

Multiple output models are more complex as they can learn the dependence structure between inputs and outputs as well as between outputs.

Being more complex may mean that they are slower to train and require more data to avoid overfitting the problem.

Further Reading

See the resources below for further reading on multi-step forecasts.


In this post, you discovered strategies that you can use to make multiple-step time series forecasts.

Specifically, you learned:

  • How to train multiple parallel models in the direct strategy or reuse a one-step model in the recursive strategy.
  • How to combine the best parts of the direct and recursive strategies in the hybrid strategy.
  • How to predict the entire forecast sequence in a one-shot manner using the multiple output strategy.

Do you have any questions about multi-step time series forecasts, or about this post? Ask your questions in the comments below and I will do my best to answer.

Want to Develop Time Series Forecasts with Python?

Introduction to Time Series Forecasting With Python

Develop Your Own Forecasts in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Introduction to Time Series Forecasting With Python

It covers self-study tutorials and end-to-end projects on topics like: Loading data, visualization, modeling, algorithm tuning, and much more...

Finally Bring Time Series Forecasting to
Your Own Projects

Skip the Academics. Just Results.

See What's Inside

195 Responses to 4 Strategies for Multi-Step Time Series Forecasting

  1. anthony March 8, 2017 at 10:05 pm #

    Thanks Jason for a wonderful post. Why does your model skips the value at “t”?

    • Jason Brownlee March 9, 2017 at 9:54 am #

      Just a choice of terminology, think of t+1 as t.

      I could have made it clearer, thanks for the note.

      • Pavlo Fesenko September 11, 2019 at 12:40 am #

        Hi Jason,

        Using t+1 instead of t is super confusing. 🙁 I don’t know about other users but it made it very difficult for me to grasp the idea with this terminology. If possible, please reconsider changing it to the traditional way in the future. Thanks a lot!

      • Katie November 17, 2019 at 1:15 pm #

        Hi Jason,

        which strategy would you recommend for recursive models like ARIMA ? I originally thought recursive but now I’m wondering if the hybrid would make more sense. I have the same question for moving averages and exponential smoothing models. I was using the strictly recursive approach and repeating the entire training process for several models on several folds. This was really computationally expensive, though, and I don’t know if was really necessary. Not sure if this matters, but models from a few different families (arima, ets, ..) were pre-tuned/configured on the smallest possible subset of the training set (I used hyperopt) and then walk forward validation was applied for each of the candidate models. As mentioned, Multistep forecasts were estimated using a strictly recursive approach with the RMSE being calculated over all time steps in the horizon (t+1,t+2,..) for each iteration of walk forward validation. To reiterate, models were pre-tuned so the same exact models were applied to predict each value in the horizon for a given iteration, but it was recursive since each new time step in the horizon in a given iteration was predicted using a training set with previous predcited time steps appended.

        • Katie November 17, 2019 at 1:22 pm #

          Oh, I forgot to provide some important details for context:
          1. I’m working with small samples
          2. The frequency is monthly
          3. The data is volatile
          4. The context is inventory optimization (specifically, we’re predicting quantity of products issued by warehouses)
          5. Forecasting is done at the SKU level and separate forecasts need to be made for each product and for each warehouse at my company
          6. Some SKUs are sparse and most are extremely volatile
          7. (slightly) Negative quantities do occur (indicating returns or adjustments) but are rare
          7. Solution is being developed in R and Python in Azure ML

          • Shiv March 4, 2020 at 6:59 am #

            Hi Katie, Wondering if you were able to build and test any models. Please share your findings if you are able to.

        • Jason Brownlee November 18, 2019 at 6:42 am #

          I would recommend testing a suite of methods on your dataset and use the approach that results in the lowest error.

    • Roise September 5, 2018 at 3:42 pm #

      good question.thanks

  2. Dylan March 14, 2017 at 2:31 am #

    Hi Jason, it is always helpful to read your post. I have some confusion related to Time Series Forecasting.
    There is traffic data (1440 pieces in total, and 288 pieces each day) I collected to predict traffic flow. The data is collected every 5 min in five consecutive working days. I am going to use the traffic data of the first four day to train the prediction model, while the traffic data of the fifth day is used to test the model.
    Here is my question, if I want to predict the traffic flow of the fifth day, do I only need to treat my prediction as one-step forecast or do I have to predict 288-step?
    Look forward to your advice.
    Thanks for your post again.

    • Jason Brownlee March 14, 2017 at 8:25 am #

      Hi Dylan,

      If you want to predict an entire day in advance (288 observations), this sounds like a multi-step forecast.

      You could use a recursive one-step strategy or something like a neural net to predict the entire sequence one a one-shot manner.

      Predicting so many steps in advance is hard work (a hard problem) and results may be poor. You will do better if you can use data as it comes in to continually refine your forecast.

      Does that help?

      • Dylan March 15, 2017 at 1:54 am #

        Yes, your response is very helpful. Thank you very much. Now I realize my prediction is a multi-step forecast.
        Could you recommend me some more detailed materials related to the multi-step forecast, like the recursive one-step strategy or the neural net?
        Now I am reading your post, it is great.
        Thank you for your advice.
        Best regards

        • Jason Brownlee March 15, 2017 at 8:10 am #

          I am working on this type of material at the moment, it should be on the blog in coming weeks/months.

          You can use an ARIMA recursively by taking forecasts as inputs to make the next prediction.

          You can use a neural network to make a multi-step forecast by setting the output sequence length as the number of neurons in the output layer.

          I hope that helps as a start.

      • Patricia July 11, 2017 at 8:18 am #

        Hi Jason,
        Thank you for great posts! they’re awesome!

        I have the same problem as Dylan and decided to use statsmodel’s SARIMAX. It takes some time to do the prediction for the entire next day (288 steps), and have been wondering if I’m doing this wrong or should I use a different approach.
        Currently, I’m looking into LSTM RNN as a possible approach, but not sure.
        The thing is, with my data, I have to predict the entire 288 steps in one shot and detect an anomaly if there’s any, then predict the type of anomaly that occured….

        My question is, am I going in the right direction by looking into LSTM RNN?

        I’m really looking forward into reading your posts on this topic!

        Thanks Jason 🙂

  3. Kunpeng Zhang March 17, 2017 at 12:45 am #

    Yes, I will. Discussing with you is always helpful. Look forward to reading your new post on Time Series Forecast.

  4. Abhishek March 18, 2017 at 4:18 pm #

    Hi Jason, just another brilliant post. Can you show up a working example for first or second method like you have always shown in other tutorials. It would be immense help to a novice like me. Thanks…

    • Jason Brownlee March 19, 2017 at 6:09 am #

      I do hope to have many examples on the blog in the coming weeks.

  5. mary March 31, 2017 at 4:57 am #

    Thank you Jason for your wonderful articles ! you are a life saver!
    But I suppose you did a mistake in the example for number2 and 3. both has the same value as
    prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))
    prediction(t+2) = model2(prediction(t-1), obs(t-2), …, obs(t-n))

    I believe that one of them should be
    prediction(t+2) = model2(prediction(t+1), obs(t-1), …, obs(t-n-1))

    • Jason Brownlee March 31, 2017 at 5:59 am #

      Hmmm, I guess you’re right. I was thinking from the frame of the second prediction rather than the frame of both predictions.


    • Jim January 17, 2021 at 11:12 pm #

      Thanks Mary. So in effect what should both step be?

      many thanks

    • Jen April 28, 2022 at 1:00 am #

      No, You are not aware of the difference between an expanding and a rolling window recursive forecast model.

  6. Fatima Abu Salem April 7, 2017 at 3:29 pm #

    Hello Jason,

    What kind of Multiple output models would you recommend if we are opting for the fourth strategy?

    • Jason Brownlee April 9, 2017 at 2:52 pm #

      Neural networks, such as MLPs.

      • Vipul September 21, 2017 at 7:38 pm #

        Sir, how about LSTM?

        • Jason Brownlee September 22, 2017 at 5:38 am #

          Sure, try them, but contrast results to an MLP.

      • Taha February 9, 2020 at 9:39 am #

        Sir, Is MLP performs better than RNN for multi-output or multivariate ?

        • Jason Brownlee February 9, 2020 at 1:04 pm #

          It really depends on the specific dataset. I recommend testing a range of algorithms and discover what works best on your data.

  7. Masum May 13, 2017 at 8:20 pm #


    Would you please come up with a blog where we would love to see all these strategies have been applied to an example (dataset) and their result comparisons.

    Would you please?

    • Jason Brownlee May 14, 2017 at 7:26 am #

      Perhaps in the future, thanks for the suggestion.

      • masum May 22, 2017 at 10:12 pm #

        would you be kind enough to post soon?

        I am stuck with my theoretical knowledge need to apply on my data to see the result and their comparative analysis.

      • Akhil March 28, 2020 at 7:40 am #

        Thanx for great explanations.
        I have one little question.
        Is the accuracy same for all these 4 strategies.?
        If No, then which one gives more accuracy.

        • Jason Brownlee March 29, 2020 at 5:47 am #

          No, you must test a suite of models and strategies and discover what works best for your specific dataset.

  8. Hans June 12, 2017 at 12:40 am #

    What is a decent one-step prediction of unseen data? How would it looks like?

    Let’s say I have 100 rows in a data set and do the following in R:

    I write ‘=’ instead of the arrows because of the forum parser:

    1. I split the 100 rows of raw data in 99 training rows and 1 testing row:

    inTrain=createDataPartition(y=dataset$n12,p=1,list = FALSE)


    2. I train the model:

    modFit=train(n12~., data=training, method = ‘xxx’)

    3. I get the final model of Caret


    4. I predict one step with the final model of the training and the one row of testing.

    unseenPredict=predict(finMod, newx)

    Now, do I have a decent prediction of one step unseen data in point 4 ???

    And why there are libraries like forecast for R, if everything can have been coded to a one-step forecast by default?


    • Jason Brownlee June 12, 2017 at 7:09 am #

      Sorry, I don’t have examples of time series forecasting in R, I cannot offer good advice.

  9. Hans June 12, 2017 at 6:18 pm #

    I know there is also the option to use the time series object(s) in R.
    But could you answer my question in general?

  10. Hans June 14, 2017 at 1:57 am #

    I don’t understand the difference between regression forecast and time series forecast.
    Or what are the benefits from each over the other.

    • Jason Brownlee June 14, 2017 at 8:47 am #

      A time series forecast can predict a real value (regression) or a class value (classification).

  11. Leonildo June 16, 2017 at 10:35 am #

    Hello Jason,

    How to prepare dataset for train models using with Direct Multi-step Forecast Strategy ?

    For the serie: 1,2,3,4,5,6,7,8,9

    Model 1 will forecast t+1 using window of size 3 , then the dataset would be:

    Model 2 will forecast t+2 using window of size 3 , then the dataset would be:

    Model 3 will forecast t+3 using window of size 3 , then the dataset would be:

    and so on. Is it right ? Thanks

    • Jason Brownlee June 17, 2017 at 7:18 am #

      Great question.

      For the direct approach, the input will be the available lag vars and the output will be a vector of the prediction.

      I can see that you want to use different models for each step in the prediction.

      You could structure it as follows:

      Try many approaches and see what works best on your problem.

      I hope that helps.

      • Leonildo June 17, 2017 at 10:17 am #

        thank you so much! This answers my question.

      • Davood Raoofsheibani July 29, 2018 at 11:05 pm #

        Dear Jason,

        In Direct Multi-step Forecast Strategy, for model 2
        why haven’t you used obs(t-1) as well?


        instead of:
        prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))
        prediction(t+2) = model2(obs(t-2), obs(t-2), obs(t-3), …, obs(t-n))

        prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))
        prediction(t+2) = model2(obs(t-1), obs(t-2), …, obs(t-n))

        The latter seems more compatible with the example you provided in this comment.


        • Jason Brownlee July 30, 2018 at 5:49 am #

          t-1 for model2 would be the predicted value of model1. It could use them too if you wish.

          • Shellder September 6, 2019 at 2:24 pm #

            Hi, Jason,

            Can you explain more about why t-1 for model2 would be the predicted value of model1?
            I think there is no information leakage since the model 1 is “older” than the model 2. The model 2 can use all the data used by the model 1.

            What is your concern?

          • Jason Brownlee September 7, 2019 at 5:16 am #

            Sorry, I don’t follow, what’s the context exactly?

          • Pavlo Fesenko September 11, 2019 at 12:37 am #

            Hi Shellder,

            I think that’s indeed a mistake in Jason’s formulation of the direct multi-step approach. The component obs(t-1) should also be included in the training set for the model2. 😉 It made me very confused the first time I read it but then after checking other sources I realized that there is no reason not to include obs(t-1) in the model2.

            Jason, could you please have a look at it and correct it for the future blog visitors? =) Thanks!

  12. Jisun August 30, 2017 at 7:00 am #

    Hello Jason, thank you for your wonderful post. (Actually I already bought your book.)
    I have a question.. I am building a forecasting model with my timeseries dataset, which is a daily number of some cases, I have 3 years past data (so will be records of 3*365 days.) I’d like to forecast 2 months future data (60 days.)

    I already built a multi-step LSTM model for this, however, it doesn’t seem to work well… For example, 3 years past data clearly has a pattern like Nov/Dec high peak seasonality and increasing trend, but 60 steps of LSTM gave me poor forecasting like decreasing trend with no seasonality… and even the base is decreasing too.. which is so not understandable.
    My question is:

    1. Do you think my parameter tuning could be wrong? I mean, LSTM multi step forecasting cannot be this much poor..?

    2. Is there any recommendation for one model approach for my problem..? I used ARIMA, but I wanted to use algorithmic model rather than a statistical model, so that’s why I’m trying to build LSTM… Do you think I need to go back to ARIMA..?
    (After building one model, I will use ensemble method to improve current model though. For now, I need a decent model giving me the understandable result.)

    Thank you so much, your any opinion on this will be really appreciated.

    • Jason Brownlee August 30, 2017 at 4:17 pm #

      Thanks Jisun,

      I generally would recommend using an MLP, LSTMs do not seem to perform well on straight autoregression problems.

  13. dan November 23, 2017 at 4:09 pm #

    Hi Jason,

    There is a question above asking you “What kind of Multiple output models would you recommend if we are opting for the fourth strategy”?
    And you answer MLPs

    Then I try to use mlp to get a one-shot sequence, but I keep getting error…

    Below is my code and scenario,

    (4, 5, 29)
    (4, 28)
    I wish to use prior 5 timesteps and 29 features to get the 28 timesteps ahead forecast sequence.
    only 4 training data for illustrative purpose.

    model = Sequential()

    model.add(Dense(units = 100, input_shape = (5, 29)))

    model.add(Dense(90, kernel_initializer=’normal’, activation=’relu’))
    model.add(Dense(90, kernel_initializer=’normal’, activation=’relu’))
    model.add(Dense(30, kernel_initializer=’normal’, activation=’relu’))

    model.add(Dense( 28,kernel_initializer=’normal’, activation=’relu’))
    model.compile(loss=’mse’, optimizer=’adam’, metrics=[‘accuracy’])

    Error when checking target: expected dense_168 to have 3 dimensions, but got array with shape (4, 28)

    How can I rectify it? Thank you very much

    • Jason Brownlee November 24, 2017 at 9:33 am #

      With an MLP, the prior observations will be features.

      If you have 29 features and 5 time steps, this will in fact be 5 x 29 (145) input features to the MLP.

  14. Amalka January 4, 2018 at 5:25 pm #

    Hi Jason

    I am trying to fit a LSTM model which is a multivariate (input and output) and multi step.

    So I need to predict multiple steps and multiple features in one model.

    Temp : [1,2,3,4],Rain[1,2,3,4] = predict(Temp : [5,6], Rain[5,6])

    What is your recommended architecture to do this in one model ?.

    I have daily selling values for 5 years with 167000(per item per store) features to predict 15 days for 167000 features

  15. Charles Lang February 13, 2018 at 1:33 pm #

    Hi Jason,

    Thank you very much for sharing a great article again. I have read many your posts these days, and learned a lot from them.

    My project is one-step forecast on time series data. Do you think which model is the best to it?


    • Jason Brownlee February 14, 2018 at 8:13 am #

      I would recommend testing a suite of models to see what works best for your data.

  16. pezhman February 20, 2018 at 10:15 am #

    Hi Jason,

    Is there anyway to reduce the propagated error during Multi step ahead prediction with recurrent neural network?


  17. Roshan March 26, 2018 at 6:37 pm #

    Hi Jason,

    Thanks for the great post. Your post uses Direct Strategy. I would like to apply Direct-Recursive Hybrid Strategy using RNN-LSTM on a time series data that has trend and seasonality. What I need is a multistep forecast where the prediction for the prior time step is used as an input for making a prediction on the following time step. How to go about this ? Because recursion for multi-step would be highly computationally expensive. What changes do I need in the existing code for multi-step using hybrid method?


  18. Kaushal Shetty March 30, 2018 at 8:20 pm #

    Hi Jason,
    I am going with the 4th strategy you mentioned that is one model predicting forecasts in one shot.
    I have two models in my mind for this.
    1) Multi-Output Network : Output layer of this architecture has ‘forecast’ number of dense layer. In this model there are ‘forecast’ number of weight matrices each trained on predicting individual forecast? Each output dense layer is optimized(adam) for MSE.

    2)Single-Output network: This architecture has one dense layer as output with ‘forecast’ number of neurons. So in this case there is only one weight matrix trained for all forecasts and one cost function across all forecasts as opposed to first approach.

    Are both the architecture valid? Which architecture works best?

    Also one more question Jason. What is the best way to add regularization on time series model.Dropouts or kernel regularizer?Both would do?


  19. MLT May 17, 2018 at 10:03 pm #

    Hi Jason, it is a great article and very helpful summary. I have one trivial question.

    1. Direct Multi-step Forecast Strategy

    prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))
    prediction(t+2) = model1(obs(t-2), obs(t-3), …, obs(t-n))

    do you mean model2, not model1 for t+2?
    prediction(t+2) = model2(obs(t-2), obs(t-3), …, obs(t-n))

    why it starts t+1 not t
    prediction(t) = model1(obs(t-1), obs(t-2), …, obs(t-n))

    • Jason Brownlee May 18, 2018 at 6:24 am #

      Yes, I meant model2, thanks – fixed.

      • MLT May 19, 2018 at 12:31 am #

        sorry to disturb you again.

        Why it is t+1, not t? Thanks.

        prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))
        prediction(t) = model1(obs(t-1), obs(t-2), …, obs(t-n))

        • Jason Brownlee May 19, 2018 at 7:43 am #

          It could be and probably should be t. Just a chosen notation of t-1, t+1.

  20. Klaus May 21, 2018 at 2:22 pm #

    Hi jason,
    Thanks a lot for the great post. I am going with the 2th strategy you mentioned that is recursive multi-step forecast but having difficulty in implementing the recursive forecasting part.
    How to put the prior time step to be used as an input for making a prediction on the following time step, for example with SVR or MLP. It would be immense help to a novice like me. Thanks…

    • Jason Brownlee May 21, 2018 at 2:31 pm #

      Thanks Klaus.

      You will have to store it in a variable and then create an array that includes the value and use it as input for the next prediction.

  21. Thada June 4, 2018 at 1:57 am #

    Hi Jason,

    Thank you for your great post.

    If I use Recursive Multi-step Forecast with ARMA model, the effect of MA predictions will reduce over the steps of prediction or not?

    Due to the nature of multi-step forecast, the error terms of the previous unobserved samples will become zero when they are used as an inputs for the further prediction. Howevers, MA will estimate based on the previous errors if I do not missunderstand. Thus, if the previous error terms become 0, will only AR terms affect the prediction results?

    Sorry if I misunderstand about the ARMA model. I’m quite new for this topic.

    • Jason Brownlee June 4, 2018 at 6:30 am #

      Recursive prediction will result in compounding error. Why would MA go to zero?

      • Thada June 5, 2018 at 12:14 am #

        Sorry, I didn’t mean the real error but the unobserved residuals. For example in the following website, no observed values for residuals ε106, ε107 are considered as zero in the equation ŷ107 = f(ε106, ε107, y106).


        “This time, there are no observed values for ε106, ε107, or y106. As before, we estimate ε106 and ε107 by zero, but we estimate y106 by the forecasted value ŷ106.”

  22. Sam June 13, 2018 at 10:54 am #

    Hello Jason,

    I am trying to build an LSTM model. My training set has 580 timesteps and 12000 features.
    I wanted to use 10 timesteps to predict next 5 timesteps. In this case my train_x.shape will be (87,10,12000). However I am confused about train_y.shape. Should it be (87,10)?

  23. MLT June 27, 2018 at 5:50 am #

    Hi Jason,

    In my understanding, Direct-Recursive Hybrid Strategies can be implemented in below three steps. Could you help me to check if it is correct please? Thanks in advance.

    prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))
    prediction(t+2) = model2(prediction(t+1), obs(t-1), …, obs(t-n))

    1. Use train data to train model1
    2. Predict t+1 for all train data
    3. Use predicted t+1 plus train data to train model2

  24. Davood Raoofsheibani July 29, 2018 at 11:12 pm #

    Dear Jason,

    In Recursive Multi-step Forecast:

    I guess to predict the value at (t+2), the observed value of (t-n) is not needed but the one at (t-n+1).

    instead of:
    prediction(t+1) = model(obs(t-1), obs(t-2), …, obs(t-n))
    prediction(t+2) = model(prediction(t+1), obs(t-1), …, obs(t-n))
    should be :
    prediction(t+1) = model(obs(t-1), obs(t-2), …, obs(t-n))
    prediction(t+2) = model(prediction(t+1), obs(t-1), …, obs(t-n-1))

    Am I right?

  25. Ling July 31, 2018 at 6:09 am #

    Dear Jason ,
    I have some experience in python and machine learning and now I am trying to learn predicting a value ( Regression ) based on Timestamp ( Every 5 mins ). So please suggest me the appropriate model for this. Also, let me know if your book on forecasting helps me in this.

  26. Kumar Avinash August 21, 2018 at 4:27 am #


    Very informative and nice one. I have one problem related to LSTM forecasting model. I am making a model on call volume forecasting. I want to forecast 3 months ahead of today, let’s say I have built a model and scoring today (in Aug’18), so the forecasted month should of Dec’18 3 months ahead of scoring month. And how should I be proceeding while building model (training dataset) and validation (test dataset) and scoring unseen data(as discussed above). Do I have to build the model on training dataset in similar way, like forecasting 3 months in advance? If yes, how to proceed.

    Thanks in Advance

    • Jason Brownlee August 21, 2018 at 6:21 am #

      Yes, frame the historical data in the way that you intend to use the model.

      E.g. if you need a model to predict n months ahead, frame all historical data this way and fit a model, then score the fit model.

  27. Ahmed R. Elshami September 29, 2018 at 5:45 am #

    Hello Dr. Jason,

    Thank you for your amazing post.

    kindly I am confusing about the error calculating for multi-step-ahead prediction.

    Suppose I have my training data D_pred = [1,2,3,4,5,6]
    and my corresponding target data, D_trgt = [7,8,9,10,11,12,13]

    I am using lag = 4 and I want to predict 5 step-ahead

    My D_pred would be like this

    1 2 3 4
    2 3 4 5
    3 4 5 6

    I used my D_pred to get my prediction result D_out in one shot like this

    x7 x8 x9 x10 x11
    x8 x9 x10 x11 x12
    x9 x10 x11 x12 x13

    and my D_trgt would be like this

    7 8 9 10 11
    8 9 10 11 12
    9 10 11 12 13

    now, how to calculate the SMAPE error between D_out and D_trgt for horizon 5?

    – get SMAPE error between [x7, x8, x9, x10, x11, x12, x13] and [7, 8, 9, 10, 11, 12, 13]


    – get SMAPE error between [x11, x12, x13] and [11, 12, 13]

    which way is the right way?

    Thank you so much

  28. Mike October 20, 2018 at 12:48 am #

    Hi, master. What is the lag timesteps used for? I still can’t understand,my doctor.

    • Jason Brownlee October 20, 2018 at 5:55 am #

      Lag time steps are observations at prior times. They are used as inputs to the model.

      • Mike October 20, 2018 at 6:43 pm #

        Can you explain more exactly?

        • Jason Brownlee October 21, 2018 at 6:11 am #

          Sure, which part are you having difficulty with exactly?

  29. mk December 26, 2018 at 12:46 pm #

    Is there any criteria for choosing the four main strategies?
    For example,How many steps?How many datas?

    • Jason Brownlee December 27, 2018 at 5:36 am #

      Project requirements may impose constraints, alternately, choose the approach that results in the best skill.

  30. Kristen January 30, 2019 at 4:31 pm #

    Hi Jason,

    Thanks for your post. I’m implementing the Multiple Output Strategy using a Neural Network approach. I have about 5 years worth of historical data at weekly granularity, and I want to predict 6, 12, 18 and 24 months into the future.

    I’ve read your “How To Backtest” post, but am still not sure about how to do the train/test split in this case, ie. predicting 2 years into the future with only 5 years historical data. Even if I only wanted to forecast one year ahead (at say 3, 6, 9 and 12 months), how would you backtest?


    • Jason Brownlee January 31, 2019 at 5:27 am #

      You would use walk-forward validation as described in the back test tutorial.

  31. Larry February 25, 2019 at 8:33 pm #

    I do not understand the direct approach. I have only found vague examples to explain. Let us consider a problem in which I want to do a 1 step prediction.

    history = [1,2,3,4,5,6,7,8,9,10]

    Yt = 10

    problem: predict Yt+1

    My current understanding of how to formulate the training data is to use a sliding window of size 1.

    X_train = [1,2,3,4,5,6,7,8,9]

    Y_train = [2,3,4,5,6,7,8,9,10]

    Then I train the first model using X_train and Y_train

    In a single step prediction scenario, to predict what comes after 10 I would call

    Predict(model1, 10), and the output should be 11 + some error, depending on the model

    Now, for the 2-tep reccursive method I would call

    Predict(model1,Predict(model1,10)) to get 12 + some error

    The direct method for the 2 step prediction will be

    a = Predict(model1, 10) to get 11 + some error

    b = Predict(model2, K) to get 12 + some error

    predictions = [ ]



    Finally, a 2-step prediction is accomplished:

    predictions == [11 + error, 12 + error]

    My questions:

    1)What is K? Which value do I need to send to the predict method for the first model

    2) What are the values of the X_train an Y_train used to train the second model.


  32. xuxing February 27, 2019 at 2:51 pm #

    Hi Jason,

    Thanks for your post.
    I have two products, both of which have their historical data, and how do I model so that both products can predict future sequences from their historical data.Construct two models or just one model?

  33. bara April 15, 2019 at 4:00 pm #

    dear sir, i follow all your tutorials and combined them. i have more than thousand sequences. i choose vanilla lstm according this


    i also add train and test, 80% 20%. i got RMSE and MAPE and the Graph.

    i want to forecast next value, and i follow this.


    according https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
    you forecasting 1 next value with 3 last value.

    so want to forecast 2 and 3 next value using Recursive Multi-step Forecast, and my code like this

    x_input = array([Y_test[-1], Y_test[-2], Y_test[-3]])
    x_input = x_input.reshape((1,1,3))
    yhat = model.predict(x_input, verbose=0)

    x_input1 = array([yhat,Y_test[-1], Y_test[-2]])
    x_input1 = x_input1.reshape((1,1,3))
    yhat1 = model.predict(x_input1, verbose=0)

    x_input2 = array([yhat1,yhat, Y_test[-1]])
    x_input2 = x_input2.reshape((1,1,3))
    yhat2 = model.predict(x_input2, verbose=0)

    my question is dowhat i am doing is right? do i follow the principle LSTM & Recursive Multi-step?

    that all, thank you sir
    i wait your responses

    • Jason Brownlee April 16, 2019 at 6:44 am #

      Nice work.

      I’m eager to help but I don’t have the capacity to review/debug your code. Sorry.

  34. Yessica Chen April 28, 2019 at 12:16 pm #

    Thank you Jason for sharing it. I want to know which methods are more helpful in timeseries problem. Did you do some contrast experiment with them?

  35. Joachim May 2, 2019 at 3:17 am #

    Hi Jason! Writing a thesis on this right now and your examples are very much appreciated.

    I have a question about the Direct-Recursive Hybrid, as we have been able to test out all the other methods to a certain degree.

    How would you go about writing the programming logic for this? Especially when using time-series cross validation.

    What exactly do i fit model2 on at time t?

    • Jason Brownlee May 2, 2019 at 8:08 am #

      Perhaps review the example in the tutorial and try to map your data onto it, also perhaps check the paper for an elaboration on the approach.

      You will have to write custom code to prepare data and fit models.

  36. Yeqi Liu May 17, 2019 at 5:14 pm #

    Thank you for your tutorial.
    I have a question: is it possible for the same model to have different dimensions of input variables in Recursive Multi-step Forecast (i.e., from obs(t-n) to (t-1) with R_n dimension, and from t-n to prediction(t+1) with R_n+1 dimension)?
    Also, is it better whether the input dimension should be the same in Direct-Recursive Hybrid Strategies?

    • Jason Brownlee May 18, 2019 at 7:32 am #

      Yes, you could have a multi-input model, like a neural net.

      Perhaps experiment with your dataset/model and see what works well?

  37. Leo June 21, 2019 at 1:34 pm #

    Hi Jason,

    I have a question regarding error propagation in different multi-step forecast models that I post on StackExchange before reading this post. (so the terminology I used is a bit non-standard) I would like to understand the theory behind error propagation. Could you shed some lights, please?


    Many thanks!

    • Jason Brownlee June 21, 2019 at 2:03 pm #

      Perhaps you can summarize the gist of your question?

  38. Lopa July 3, 2019 at 8:53 pm #

    Hi Jason,

    I have been able to implement the recursive strategy. However for Direct-Recursive Hybrid Strategy if I understood correctly we train & predict the entire training data & append those predictions to the initial train data & retrain the model.

    Having said that if my initial train data has 100 observations & I predict all 100 then I append these predictions to my initial train data making 200 observations ? Is my understanding correct or am I missing out something?

    • Jason Brownlee July 4, 2019 at 7:45 am #

      Not quite, the predictions become inputs for subsequent predictions.

      • Lopa July 5, 2019 at 9:59 pm #

        Sorry to ask again Jason but can you please explain because I tried finding it in your books but couldn’t find & also tried understanding it based on the example & the example of the household electricity but could not really grasp it entirely.

        • Jason Brownlee July 6, 2019 at 8:36 am #

          Sure, what are you having trouble understanding exactly?

  39. Lopa July 10, 2019 at 12:05 am #

    Thanks for your reply Jason.

    Suppose I have 100 observations in total after training & validating my model I train my model on the entire data (all 100 observations) & predict one step ahead i.e; 101st observation.

    Do I use this predicted value to replace the 1st observation in my original data to have 100 observations & predict the next step & repeat the process ?

    Thanks again.

    • Jason Brownlee July 10, 2019 at 8:14 am #

      It really depends on how your model is defined.

      You have defined your model to map some number of inputs to an output, you must provide data in that format to make a prediction.

  40. Chao August 6, 2019 at 8:57 am #

    Hi Jason, thanks for making this great tutorial.

    I am not sure I’m fully understood the difference between recursive multi-step and direct-recursive hybrid. These two look exactly same in your example code.

    prediction(t+1) = model1(obs(t-1), obs(t-2), ..., obs(t-n))
    prediction(t+2) = model2(prediction(t+1), obs(t-1), ..., obs(t-n))

    If I understood correctly, the main difference is hybrid is each model may use or may not use the models at prior time, and the recursive multi-step is each model will use the prior model?


    • Chao August 6, 2019 at 9:00 am #

      If the above description is correct, how do you decide when should use or not use? I assumed that requires to be training?

      • Jason Brownlee August 6, 2019 at 2:01 pm #

        Test a few different approaches and see what works best for your choice of model and the dataset.

    • Jason Brownlee August 6, 2019 at 2:01 pm #

      Recursive uses predictions as inputs.

      Direct recursive hybrid uses the same idea, but separate models for each time step to be forecasted.

      Does that help?

  41. Matteo P. August 6, 2019 at 7:47 pm #

    Hi Jason, your articles are great and they helped me a lot!
    I’m working on predictive maintenance and given a long time series of data, each of them with 15 features, I should predict the next X time steps.
    Basically I thought of using 400 time steps as input and predicting 20 steps as output. As a result I’m using your 4.Multiple Output Forecast Strategy.

    My Net is this:
    n_steps_in = 400
    n_steps_out = 20
    n_epochs = 20
    batch_size = 128
    model = Sequential()
    input_shape=(n_steps_in, n_features),
    model.add(Dense(n_steps_out * n_features))
    model.add(Reshape((n_steps_out, n_features)))

    But sometimes i’ve got Nan in Loss while training, but i don’t know why. Do you have any explanation?

    • Jason Brownlee August 7, 2019 at 7:51 am #

      Nice work.

      Perhaps vanishing or exploding gradients.

      Try scaling the data prior to fitting?
      Try relu?
      Try gradient clipping?

    • Amit Krishna Baral August 21, 2019 at 12:40 pm #

      Did you think about applying the concept of Survival model here?

  42. Sneha Mitta September 19, 2019 at 6:46 am #

    Hi Jason,

    I wanted to know if you have any posts that have implemented each of the strategies you discussed above?

    I also wanted to know if one strategy is better than the other by any chance? I’d like to get a deeper insight on what kind of strategy would work for a particular kind of data.

  43. Sam October 18, 2019 at 1:06 pm #

    Hi Jason, First of all, thanks for all your nice posts. People has asked this question before, but I was wondering if there might be an update! I was wondering do you have an example code for “Direct-Recursive Hybrid Strategies” similar to what you have for “Time Series Prediction With Deep Learning in Keras”. Or if your E-Book has it?
    Thanks again, Sam

  44. SOUALHi November 25, 2019 at 6:57 am #

    Hi Mr. Jason,

    I’m working on forecasting time series, i use LSTM as model to forecast.This is the main steps i used to structure my data in oder to predict one step:

    1) The model takes 1 day of data as “training X”
    2) The model takes the VALUE of 1 day + 18hours after as “trainingY”
    3) I build a slliding window as well as the sequences are shifted by one value, fore example:

    XTrain{1} = data(1:24) –> YTrain{1} = data(42)
    XTrain{2} = data(2:25) –> YTrain{1} = data(43)
    XTrain{3} = data(3:26) –> YTrain{1} = data(44)
    XTrain{4} = data(4:27) –> YTrain{1} = data(45)

    4) The test data are also constructed as the same way of training data, fore example:

    XTtest{1} = data_test(1:24) –> YTest{1} = data_test(42)
    XTtest{2} = data_test(2:25) –> YTest{2} = data_test(43)

    First, to sumurize, my objective id to predict each time 18h after. Is this structed cited above is correct?

    If yes, I have the problem when i try to predict the XTest{1} the obtained predicted value is the corresponding value of data_test(25) instead of d ata_test(42)? For this purpuse, why the predicted value is shifted? Where is the problem and how to remedy to this problem?

    Tnak you in advance for your help.

    • Jason Brownlee November 25, 2019 at 2:04 pm #

      There are many ways to frame your problem, your approach is one possible way.

      Perhaps the model is not skillful?

      Perhaps try alternate architectures or training configurations?
      Perhaps try alternate framings of the problem?
      Perhaps try alternate models?

      • SOUALHi² November 26, 2019 at 1:01 am #

        Thank you very your reativity. Hence, i have tried several architechtures of the model and LSTM gives the best reults, which suitable for for learning many sequences with different lengths.

        For my part, i use this architechture to learn the model. However, could you please inform me or show me the orther architechtures to prepare the training data that can help me to solve this problem of shiftting results? Because, i use 3 test sets, when try t predict, the results are shifted by the number of difference steps used for target prediction. i.e.

        model_1: data(1:24) –>data(30) the difference is 6 points. Therefore the predicted curve is shifted by 6 steps earlier

        model_2: data(1:24) –>data(36) the difference is 12 points. Therefore the predicted curve is shifted by 12 steps earlier

        I don’t think that this is logical response?

        Best regards

  45. Yass T November 27, 2019 at 9:45 pm #

    Hello, could there be a typo in the explanation of Direct Multi-step Forecast Strategy ?

    I was expecting the second row in the code to be :
    prediction(t+2) = model2(obs(t-2), obs(t-4), obs(t-6), obs(t-8), obs(t-n)) – so basically no (t-3), and two time steps increments ?
    Otherwise, I still do not understand that correctly 🙂
    Thank you

    • Jason Brownlee November 28, 2019 at 6:38 am #

      No, I believe it is correct.

      There are 2 models and both models only use available historic data.

      model1 predicts t+1 and model2 predicts t+2.

      Does that help?

  46. Morteza March 4, 2020 at 3:58 am #

    Hi Jason,

    Thanks for your efforts to put up all these helpful articles.

    I have a question that what is the difference between these two cases when we want to have multi-step time series forecast:

    1. There is one neuron in output layer (T+1). After extracting weights, we iteratively use weights and (T+1) forecast to get (T+2) forecast and so on.

    2. We have multiple neurons for each horizon

    My question is mostly related to LSTM, however, a general reply is also appreciated. I need a mathematical explanation if there is any.

    Thanks a lot,

    • Jason Brownlee March 4, 2020 at 6:00 am #

      Yes, in the first case you are reusing the same model recursively in the second you are using a single direct model.

      Yes, you can describe each approach using math or pseudocode.

      Perhaps I don’t understand the problem you’re having?

  47. Ben March 11, 2020 at 5:18 am #

    Hi Jason, can CNN LSTM do multi-step?

  48. Xin April 13, 2020 at 12:02 am #

    hello, very nice doc but there are several tiny typos:

    in almost all the code, you wrote:

    prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))
    prediction(t+2) = model2(obs(t-2), obs(t-3), …, obs(t-n))

    the prediction(t+2) is wrong, which should be

    prediction(t+2) = model2(obs(t-2), obs(t-3), …, obs(t-n+1))

  49. Samrat May 30, 2020 at 11:17 pm #

    Thanks for the awesome article.
    Do you have any post on multivariate multi-step time series forecasting ??


  50. Dr S Balasubramanian June 30, 2020 at 9:53 pm #

    good very nice concept description

  51. Jose Q July 11, 2020 at 8:18 am #

    Hi Jason,
    Great post! You are always clear in your concepts
    Let me tell you this concern.
    I have seen notebooks trying to analyze the spreadth of COVIT-19 using regression analysis. They treat X as the consecutive number of days since the first case [1,2,3,..,n-1,n], and y as the number of daily cases since the first case [1,3,7,…, 3821,4213].
    Then they predict future values for X=[n+1, n+2, …n+20] trying to forecast daily cases for the next 20 days.
    I think this is not correct, because these models do not consider the effect of time, and also, they are doing extrapolation in a regression analysis model.
    I guess I saw one of your posts saying that supervised learning can intrapolate, but time series can extrapolate.
    Is this correct so far?

    Now, if you arrange time series data as a supervised learning problem with input sequence = [t-lags, …, t-1] and forecast sequence = [t+1, …,t+steps], now you can use supervised learning algorithms or MLP to make predictions, right?
    We can now evaluate performance using backtesting.
    Is this comparable to ARIMA/SES forecasting?

    Thank you

  52. tuttoaposto July 20, 2020 at 7:46 am #

    If I understand this Autoreg model_fit.predict example correctly, it is an example of multi-step forecast strategy: https://machinelearningmastery.com/autoregression-models-time-series-forecasting-python/

    But I got nan’s following this method. I guess that happened because the model has missing lag values as the input. Referring to this example code, it doesn’t explicitly update the historic lags for subsequent forecasts. I wonder if .predict() already handles that and something in my code caused the nan’s, or the example code is missing the recursively updating steps?

    • tuttoaposto July 20, 2020 at 9:32 am #

      I figured why: use series.values instead of series solved the problem.

    • Jason Brownlee July 20, 2020 at 1:51 pm #

      Sorry to hear that you’re having trouble.

      You can make a multistep prediction directly by first fitting the model on all available data and calling predict() and specifying the interval to forecast, or calling forecast() and specifying the number of steps to predict.

      This will help:

  53. saranraj k July 27, 2020 at 12:01 am #

    H, Jason thank for the great article,
    For the direct multi-step forecast, we build separate model for each forecast time step,
    Let’s say for two days, model m1 for day 1 and model m2 for day2, and you have given the
    following example
    prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))
    prediction(t+2) = model2(obs(t-2), obs(t-3), …, obs(t-n))

    In the above example, for predicting day t+1 we used data from precious day t-1 to t-n.
    for predicting second t+2 we used t-2 to t-n.

    If so,

    It’s seems like for forecasting tomorrow’s data use the historical data from last day to
    last day -n.

    for forecasting day after tomorrow’s use the data from two days before to day -n, how it make

    • Jason Brownlee July 27, 2020 at 5:47 am #

      Yes, the other model can use historical data and output of another model to make a prediction.

      Perhaps I don’t understand your question?

  54. Zahra October 9, 2020 at 12:59 pm #

    Hello and thanks for the tutorial,
    I wonder for the second and third approaches for forecasting you mentioned above, which you said:
    prediction(t+1) = model(obs(t-1), obs(t-2), …, obs(t-n))
    prediction(t+2) = model(prediction(t+1), obs(t-1), …, obs(t-n))
    if the prediction(t+1) uses some kind of a supervised time series approach where the model actually have seen a part of data at (t+1), is that going to create information leakage to the second model?

    • Jason Brownlee October 9, 2020 at 1:49 pm #

      Not leakage as it does not violate the design. It is by design.

  55. Priya October 10, 2020 at 3:53 pm #

    First of all thanks for the tutorial.
    For the direct multi-step forecast, you have given the
    following example
    prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))
    prediction(t+2) = model2(obs(t-2), obs(t-3), …, obs(t-n))

    My questions are
    1. For prediction(t+2) why you are not taking obs(t-1) as input like for prediction(t+1).

    2. I am using direct multi-step forecast for my project and it is expected with machine learning that by increasing forecast horizon(time step) error should increase. Am I right? If yes then in my project, error is continuously increasing upto 7 time step but after that error is fluctuating. Can you suggest me how can I improve this error?

    • Jason Brownlee October 11, 2020 at 6:44 am #

      You’re welcome.

      We do not use the observation at t-1 because it is not available in that framing of the problem.

      Yes, the further into the future you want to predict, the more error.

      You can reduce the error with different/better models or by simplifying the problem.

  56. Sandy October 18, 2020 at 4:42 am #

    Hi Jayson,

    Loving your work!
    I recently bought the math bundle and considering the forecasting books.

    I was wondering If you cover a full example of a Direct-Recursive Hybrid model? And in which e-book?


    A fan

  57. Adrien November 1, 2020 at 7:59 am #

    Hi Jason,

    great article, there is very little content on the internet on prediction strategies for using the model once trained.

    I have defined a fairly simple recursive prediction method.

    My model uses a 160 hour rollback window to predict the next hour on 10 features outputs.

    Model inputs: 10 features
    Model outputs: 10 features
    Timesteps: 160h
    Output timesteps: 1h

    My recursive prediction method therefore consists of taking the last 160 hour window of my data to predict the next hour and re-integrating the prediction into the last window with an offset of 1 to then predict the second hour. Etc …

    I understand that a recursive prediction method increases the error over time but I still get very bad results while on the evaluation of the model under test the results are very good (green series and red)

    Here is an image to illustrate my point:


    The recursive prediction in blue seems smooth and totally wrong compared to the reality in orange.

    Do you have any idea where to dig? I went through all your articles on this subject.

    Thanks to you Jason.

    • Jason Brownlee November 1, 2020 at 8:25 am #

      Well done!

      Perhaps you can vary the amount of input data?
      Perhaps you can try alternate models? alternate configs? alternate data preparations?
      Perhaps you can try direct prediction methods?
      Perhaps you can try algorithms that support input sequences, like LSTMs?
      Perhaps you can benchmark with a naive forecasting method (persistence) to see if any methods have skill?
      Perhaps you can use a linear methods like SARIMA or ETS?

      • Adrien November 2, 2020 at 3:40 am #

        Hello Jason, thanks for your advice!

        I’m actually already using a simple LSTM model with 3 layers .

        I get very good results in MAE, MAPE and R2.

        This is confirmed when I plot the test series with the predicted series.

        The problem is with future predictions now, with my recursive method I got bad results, I tried to increase and decrease the rollback window but it doesn’t change much. The same for the model parameters (number of layers or neurons), the evaluation is good but the prediction on future data remains bad.

        However, I managed to improve the prediction by adding an additional Dense layer with Relu activation in addition to the output Dense layer:

        model = Sequential ()
        model.add (LSTM (units = 50, return_sequences = True, input_shape = (X_train.shape [1], X_train.shape [2])))
        model.add (Dropout (0.4))
        model.add (LSTM (units = 50, return_sequences = True))
        model.add (Dropout (0.4))
        model.add (LSTM (units = 50))
        model.add (Dropout (0.4))

        model.add (Dense (units = 50, activation = ‘relu’)) # New Dense layer !!!

        model.add (Dense (units = X_train.shape [2], activation = ‘linear’))
        model.compile (optimizer = ‘adam’, loss = ‘mean_squared_error’

        On the other hand, I can’t quite understand why this is better with a second Dense layer in Relu activation.

        Do you have an idea ?

        • Jason Brownlee November 2, 2020 at 6:42 am #

          Nice work!

          We can’t answer “why” questions in deep learning (e.g. why does this config work and another one not work), the best we can do is run experiments and discover what works well for a given dataset.

          • Adrien November 3, 2020 at 2:22 am #

            Hi Jason,

            I tried with the direct forecast and the results are much better!

            I was using the wrong window to make my prediction that’s why.

            Thank you for your advice !

          • Jason Brownlee November 3, 2020 at 6:55 am #

            Well done!

  58. Bhaskar Tripathi January 11, 2021 at 5:30 am #

    Hi Jason,
    Is this apporach applicable for only linear models or it can be used with non-linear models like SVR , RandomForestRegressor etc as well?

  59. shafi April 20, 2021 at 10:34 am #

    Hi Jason,

    I have a basic question. I have data from 1969Q1 to 2020Q4, train (1969q1- 2018q4) and test (2019q1-2020q4). Is the forecast on test data is one-step/single step or multip step. It is static no rolling window is used.

    • shafi April 20, 2021 at 9:02 pm #

      I think it is a multi-step where we apply the one-step strategy as you mentioned in one of your comments. Actually, somebody said to me, it is one step ahead but it is not one step ahead rather multiple steps using a one-step strategy.

    • Jason Brownlee April 21, 2021 at 5:50 am #

      You can choose how to frame your problem and what you want to use to evaluate the model.

  60. N.G May 11, 2021 at 8:32 pm #

    Hi Jason,
    thanks for the wonderful posts you have published.
    I’m new at machine learning, just completed some courses. I have a question: Is there any function that can automatically calculate different strategies for a given model?

  61. Aka May 21, 2021 at 3:05 am #

    Hi Jason! Thanks a lot for all this admirable endeavour of yours – we deeply appreciate it!

    Regarding the recursive Multi-step Forecast approach, would it be applicable in the case of datasets with multiple ‘predictor’ variables as well?
    Say for example, one is interested in forecasting the temperature for the next two days but based both on current temperature and humidity:

    temp(t+1) = model(temp(t-1), hum(t-1), temp(t-2), hum(t-2))
    temp(t+2) = model(temp(t+1), hum(t+1), temp(t-1), hum(t-1))

    Which would then be the humidity value to be used in the 2nd prediction step i.e. hum(t+1)? A humidity predicted value would not be available as the model of interest only predicts temperature.

    I seem to be missing something…

    Thanks in advance!

    • Jason Brownlee May 21, 2021 at 6:03 am #

      Sure if you want.

      Perhaps try it and see what works well or best for your data and model.

      • Jeff July 8, 2021 at 7:40 am #

        Hi Jason, thanks for putting this together.

        As a follow-up to this, what would you advise when you have a lot of features and want to try out a recursive style model? At some point it would become impractical to build predictive models for each feature to use in the t+1 and beyond steps.

        Thank you!!

        • Jason Brownlee July 9, 2021 at 5:01 am #

          I try to not be prescriptive – intuitions are often less effective.

          Perhaps you can prototype a few approaches and discover what seems like a good fit for your specific dataset.

  62. Konstantinos July 10, 2021 at 1:57 am #

    Hi jason

    congratulations for your work!!

    i have a confusion about Recursive Multi-step Forecast.

    Considering that we have 10 days and we use the first 6 for training our model to predict the temperature for day 7 and 8 with Recursive Multi-step Forecast.
    To predict day 9 and 10 a new model is trained with the real temperatures of days 1-8?

    Thank you!!

    • Jason Brownlee July 10, 2021 at 6:12 am #

      You can define the input and output any way that you want based on the data that is avaialble.

  63. Miranda August 25, 2021 at 4:44 am #

    Hi Jason,

    Thanks for the tutorial! I have a problem of predicting the temperature flow in a specific process. I have multiple measurements from sensors and controllers. The measurements are recorded every 5 minutes. The system has to be stopped every now and then for an specific maintenance. The goal is to predict when the system needs to be shut down for this specific maintenance by predicting the value of temperature flow two weeks a head. Is this considered to be a multi step or one-step time series prediction?
    Another question: I want to use the given sensor measurements for predicting the Temperature flow using a machine learning approach. Is there any specific approach that you would suggest for such a problem?

    Thank you!

    • Adrian Tam August 25, 2021 at 6:18 am #

      It can be one-step to predict whether the system will be down for maintenance. It can also be multi-step to predict whether the system will be down for maintenance in the next N steps.

      Hope this helps explain that.

  64. Haitao September 10, 2021 at 11:09 am #

    Hi Jason,

    I am wondering if there is a model that takes a varying number of time steps as one sample that can be labelled ‘1’ or ‘0’. Suppose I have thousands of such samples to train the model and let it predict on new data (also of varying number of time steps). For example,

    [ [1, 1.5, 1.45, 1.60, 1.60, 0.1, 1000],
    [2, 1.55, 1.5, 1.82, 1.63, 0.06, 1200],
    [3, 1.6, 1.61, 1.86, 1.72,0.06, 1150],
    its label is ‘1’

    [ [1, 1.50, 1.45, 1.60, 1.60, 0.10, 1000],
    [2, 1.60, 1.50, 1.82, 1.63, 0.06, 1200],
    [3, 1.63, 1.61, 1.56, 1.47, -0.06, 1150],
    [4, 1.47, 1.50, 1.65, 1.55, 0.055, 1320]
    its label is ‘0’

    after training, we want the model to predict for
    [ [1, 1.50, 1.45, 1.60, 1.60, 0.10, 1000],
    [2, 1.60, 1.50, 1.82, 1.63, 0.06, 1200],
    [3, 1.63, 1.61, 1.56, 1.47, -0.06, 1150],
    [4, 1.47, 1.50, 1.65, 1.55, 0.055, 1320],
    [5, 1.67, 1.56, 1.98, 1.77, 0.061, 1350]

    Is this possbile? Thank you.

    • Adrian Tam September 11, 2021 at 6:29 am #

      Depends on the model, you may do it. If you use a LSTM, you can train it to read from varying number of steps. Otherwise, people usually use padding to fill up the time steps to a fixed width.

  65. Maakaan December 12, 2021 at 7:52 pm #

    Hi Jason
    I want to predict the bitcoin price in the next month (30 steps ahead). Which approach do you suggest?
    Also, another question, do you have any guidance about conv2dLSTM layer in your book?
    Best regards

    • Adrian Tam December 15, 2021 at 5:54 am #

      Probably a direct multi-step would be easier.

  66. kostas February 14, 2022 at 9:40 am #

    In the Recursive Multi-step Forecast with multivariate data (lets say 8 variables) in step 2 where we take in consideration the previous predicted value and the real values previous the predicted (depends on the lag) what happend with the other 7?
    Are we also predict this 7 variables and take them in consideration for step 2?

    • James Carmichael February 14, 2022 at 12:10 pm #

      The short answer to your question is yes. You understanding is correct.

  67. Zhi February 20, 2022 at 10:23 am #

    Hi Jason

    Do you recommend using XGBoost for multi-step forecast? And what strategy would be appropriate?

    Thank you!!!

    • James Carmichael February 20, 2022 at 12:20 pm #

      Hi Zhi…I do not recommend any strategy in general. I recommend investigating several for a given application and comparing results. You may be surprised at the performance of a given strategy depending up the data available for training.

  68. Ali Tariq June 21, 2022 at 7:50 pm #

    I have a question regarding time series forecasting. Can we predict future values with DeepAR algorithm which are some time ahead of the values for which the model was trained without providing the intermediate values?
    e.g., the model was trained till December 2021 and I want to predict/forecast values for the month June 2022.. Do I have to provide all the values for the months in between (Jan, Feb, Marh, April & May)? Or Do I just have to provide the values for the context length which e.g., is one month (May 2022)?

    If it does provide the predictions, then kindly let me know its working

    Thanks alot

  69. Hayat December 9, 2022 at 1:51 am #

    Thank you for the effort you put in writing this beneficial post.
    Is it possible to make some manipulation to one of the series in the multivariate time series before making forecast with the aim of expecting the modification to be propagated in the result of the forecast?
    Once again, thank you!

Leave a Reply