Time series forecasting is typically discussed where only a one-step prediction is required.

What about when you need to predict multiple time steps into the future?

Predicting multiple time steps into the future is called multi-step time series forecasting. There are four main strategies that you can use for multi-step forecasting.

In this post, you will discover the four main strategies for multi-step time series forecasting.

After reading this post, you will know:

- The difference between one-step and multiple-step time series forecasts.
- The traditional direct and recursive strategies for multi-step forecasting.
- The newer direct-recursive hybrid and multiple output strategies for multi-step forecasting.

Let’s get started.

**Update May/2018**: Fixed typo in direct strategy example.

## Multi-Step Forecasting

Generally, time series forecasting describes predicting the observation at the next time step.

This is called a one-step forecast, as only one time step is to be predicted.

There are some time series problems where multiple time steps must be predicted. Contrasted to the one-step forecast, these are called multiple-step or multi-step time series forecasting problems.

For example, given the observed temperature over the last 7 days:

1 2 3 4 5 6 7 8 |
Time, Temperature 1, 56 2, 50 3, 59 4, 63 5, 52 6, 60 7, 55 |

A single-step forecast would require a forecast at time step 8 only.

A multi-step may require a forecast for the next two days, as follows:

1 2 3 |
Time, Temperature 8, ? 9, ? |

There are at least four commonly used strategies for making multi-step forecasts.

They are:

- Direct Multi-step Forecast Strategy.
- Recursive Multi-step Forecast Strategy.
- Direct-Recursive Hybrid Multi-step Forecast Strategies.
- Multiple Output Forecast Strategy.

Let’s take a closer look at each method in turn.

### Stop learning Time Series Forecasting the *slow way*!

Take my free 7-day email course and discover how to get started (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

## 1. Direct Multi-step Forecast Strategy

The direct method involves developing a separate model for each forecast time step.

In the case of predicting the temperature for the next two days, we would develop a model for predicting the temperature on day 1 and a separate model for predicting the temperature on day 2.

For example:

1 2 |
prediction(t+1) = model1(obs(t-1), obs(t-2), ..., obs(t-n)) prediction(t+2) = model2(obs(t-2), obs(t-3), ..., obs(t-n)) |

Having one model for each time step is an added computational and maintenance burden, especially as the number of time steps to be forecasted increases beyond the trivial.

Because separate models are used, it means that there is no opportunity to model the dependencies between the predictions, such as the prediction on day 2 being dependent on the prediction in day 1, as is often the case in time series.

## 2. Recursive Multi-step Forecast

The recursive strategy involves using a one-step model multiple times where the prediction for the prior time step is used as an input for making a prediction on the following time step.

In the case of predicting the temperature for the next two days, we would develop a one-step forecasting model. This model would then be used to predict day 1, then this prediction would be used as an observation input in order to predict day 2.

For example:

1 2 |
prediction(t+1) = model(obs(t-1), obs(t-2), ..., obs(t-n)) prediction(t+2) = model(prediction(t+1), obs(t-1), ..., obs(t-n)) |

Because predictions are used in place of observations, the recursive strategy allows prediction errors to accumulate such that performance can quickly degrade as the prediction time horizon increases.

## 3. Direct-Recursive Hybrid Strategies

The direct and recursive strategies can be combined to offer the benefits of both methods.

For example, a separate model can be constructed for each time step to be predicted, but each model may use the predictions made by models at prior time steps as input values.

We can see how this might work for predicting the temperature for the next two days, where two models are used, but the output from the first model is used as an input for the second model.

For example:

1 2 |
prediction(t+1) = model1(obs(t-1), obs(t-2), ..., obs(t-n)) prediction(t+2) = model2(prediction(t+1), obs(t-1), ..., obs(t-n)) |

Combining the recursive and direct strategies can help to overcome the limitations of each.

## 4. Multiple Output Strategy

The multiple output strategy involves developing one model that is capable of predicting the entire forecast sequence in a one-shot manner.

In the case of predicting the temperature for the next two days, we would develop one model and use it to predict the next two days as one operation.

For example:

1 |
prediction(t+1), prediction(t+2) = model(obs(t-1), obs(t-2), ..., obs(t-n)) |

Multiple output models are more complex as they can learn the dependence structure between inputs and outputs as well as between outputs.

Being more complex may mean that they are slower to train and require more data to avoid overfitting the problem.

## Further Reading

See the resources below for further reading on multi-step forecasts.

- Machine Learning Strategies for Time Series Forecasting, 2013
- Recursive and direct multi-step forecasting: the best of both worlds, 2012 [PDF]

## Summary

In this post, you discovered strategies that you can use to make multiple-step time series forecasts.

Specifically, you learned:

- How to train multiple parallel models in the direct strategy or reuse a one-step model in the recursive strategy.
- How to combine the best parts of the direct and recursive strategies in the hybrid strategy.
- How to predict the entire forecast sequence in a one-shot manner using the multiple output strategy.

Do you have any questions about multi-step time series forecasts, or about this post? Ask your questions in the comments below and I will do my best to answer.

Thanks Jason for a wonderful post. Why does your model skips the value at “t”?

Just a choice of terminology, think of t+1 as t.

I could have made it clearer, thanks for the note.

good question.thanks

Hi Jason, it is always helpful to read your post. I have some confusion related to Time Series Forecasting.

There is traffic data (1440 pieces in total, and 288 pieces each day) I collected to predict traffic flow. The data is collected every 5 min in five consecutive working days. I am going to use the traffic data of the first four day to train the prediction model, while the traffic data of the fifth day is used to test the model.

Here is my question, if I want to predict the traffic flow of the fifth day, do I only need to treat my prediction as one-step forecast or do I have to predict 288-step?

Look forward to your advice.

Thanks for your post again.

Hi Dylan,

If you want to predict an entire day in advance (288 observations), this sounds like a multi-step forecast.

You could use a recursive one-step strategy or something like a neural net to predict the entire sequence one a one-shot manner.

Predicting so many steps in advance is hard work (a hard problem) and results may be poor. You will do better if you can use data as it comes in to continually refine your forecast.

Does that help?

Yes, your response is very helpful. Thank you very much. Now I realize my prediction is a multi-step forecast.

Could you recommend me some more detailed materials related to the multi-step forecast, like the recursive one-step strategy or the neural net?

Now I am reading your post, it is great.

Thank you for your advice.

Best regards

I am working on this type of material at the moment, it should be on the blog in coming weeks/months.

You can use an ARIMA recursively by taking forecasts as inputs to make the next prediction.

You can use a neural network to make a multi-step forecast by setting the output sequence length as the number of neurons in the output layer.

I hope that helps as a start.

Thank you for your advice. I will keep digging into this puzzle. Hope to discuss with you again. It is very helpful. Thank you for your time.

Best regards.

Let me know how you go.

Hi Jason,

How far are you out from publishing this material your speaking about above? Thanks for the tutorials.

Here is an example for multi-step forecasting with LSTMs:

http://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/

Hi Jason,

Thank you for great posts! they’re awesome!

I have the same problem as Dylan and decided to use statsmodel’s SARIMAX. It takes some time to do the prediction for the entire next day (288 steps), and have been wondering if I’m doing this wrong or should I use a different approach.

Currently, I’m looking into LSTM RNN as a possible approach, but not sure.

The thing is, with my data, I have to predict the entire 288 steps in one shot and detect an anomaly if there’s any, then predict the type of anomaly that occured….

My question is, am I going in the right direction by looking into LSTM RNN?

I’m really looking forward into reading your posts on this topic!

Thanks Jason 🙂

I am not high on LSTMs for autoregression models:

http://machinelearningmastery.com/suitability-long-short-term-memory-networks-time-series-forecasting/

Yes, I will. Discussing with you is always helpful. Look forward to reading your new post on Time Series Forecast.

Hi Jason, just another brilliant post. Can you show up a working example for first or second method like you have always shown in other tutorials. It would be immense help to a novice like me. Thanks…

I do hope to have many examples on the blog in the coming weeks.

Thank you Jason for your wonderful articles ! you are a life saver!

But I suppose you did a mistake in the example for number2 and 3. both has the same value as

prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))

prediction(t+2) = model2(prediction(t-1), obs(t-2), …, obs(t-n))

I believe that one of them should be

prediction(t+2) = model2(prediction(t+1), obs(t-1), …, obs(t-n-1))

Hmmm, I guess you’re right. I was thinking from the frame of the second prediction rather than the frame of both predictions.

Fixed.

Hello Jason,

What kind of Multiple output models would you recommend if we are opting for the fourth strategy?

Neural networks, such as MLPs.

Sir, how about LSTM?

Sure, try them, but contrast results to an MLP.

Sir,

Would you please come up with a blog where we would love to see all these strategies have been applied to an example (dataset) and their result comparisons.

Would you please?

Perhaps in the future, thanks for the suggestion.

sir,

would you be kind enough to post soon?

I am stuck with my theoretical knowledge need to apply on my data to see the result and their comparative analysis.

What is a decent one-step prediction of unseen data? How would it looks like?

Let’s say I have 100 rows in a data set and do the following in R:

I write ‘=’ instead of the arrows because of the forum parser:

1. I split the 100 rows of raw data in 99 training rows and 1 testing row:

inTrain=createDataPartition(y=dataset$n12,p=1,list = FALSE)

training=dataset[inTrain-1,];

testing=dataset[-inTrain+1,]

2. I train the model:

modFit=train(n12~., data=training, method = ‘xxx’)

3. I get the final model of Caret

finMod<-modFit$finalModel

4. I predict one step with the final model of the training and the one row of testing.

newx=testing[,1:11]

unseenPredict=predict(finMod, newx)

Now, do I have a decent prediction of one step unseen data in point 4 ???

And why there are libraries like forecast for R, if everything can have been coded to a one-step forecast by default?

https://github.com/robjhyndman/forecast/

Sorry, I don’t have examples of time series forecasting in R, I cannot offer good advice.

I know there is also the option to use the time series object(s) in R.

But could you answer my question in general?

I have many posts showing how to make predictions in Python, including many that show repeated one-step predictions in a walk-forward validation test harness:

http://machinelearningmastery.com/category/time-series/

I don’t understand the difference between regression forecast and time series forecast.

Or what are the benefits from each over the other.

A time series forecast can predict a real value (regression) or a class value (classification).

Hello Jason,

How to prepare dataset for train models using with Direct Multi-step Forecast Strategy ?

For the serie: 1,2,3,4,5,6,7,8,9

Model 1 will forecast t+1 using window of size 3 , then the dataset would be:

1,2,3->4

2,3,4->5

3,4,5->6

4,5,6->7

5,6,7->8

6,7,8->9

Model 2 will forecast t+2 using window of size 3 , then the dataset would be:

1,2,3->5

2,3,4->6

3,4,5->7

4,5,6->8

5,6,7->9

Model 3 will forecast t+3 using window of size 3 , then the dataset would be:

1,2,3->6

2,3,4->7

3,4,5->8

4,5,6->9

and so on. Is it right ? Thanks

Great question.

For the direct approach, the input will be the available lag vars and the output will be a vector of the prediction.

I can see that you want to use different models for each step in the prediction.

You could structure it as follows:

Try many approaches and see what works best on your problem.

I hope that helps.

thank you so much! This answers my question.

I’m happy to help (if I can).

Dear Jason,

In Direct Multi-step Forecast Strategy, for model 2

why haven’t you used obs(t-1) as well?

i.e.

instead of:

prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))

prediction(t+2) = model2(obs(t-2), obs(t-2), obs(t-3), …, obs(t-n))

=>

prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))

prediction(t+2) = model2(obs(t-1), obs(t-2), …, obs(t-n))

The latter seems more compatible with the example you provided in this comment.

thanks!

t-1 for model2 would be the predicted value of model1. It could use them too if you wish.

Hello Jason, thank you for your wonderful post. (Actually I already bought your book.)

I have a question.. I am building a forecasting model with my timeseries dataset, which is a daily number of some cases, I have 3 years past data (so will be records of 3*365 days.) I’d like to forecast 2 months future data (60 days.)

I already built a multi-step LSTM model for this, however, it doesn’t seem to work well… For example, 3 years past data clearly has a pattern like Nov/Dec high peak seasonality and increasing trend, but 60 steps of LSTM gave me poor forecasting like decreasing trend with no seasonality… and even the base is decreasing too.. which is so not understandable.

My question is:

1. Do you think my parameter tuning could be wrong? I mean, LSTM multi step forecasting cannot be this much poor..?

2. Is there any recommendation for one model approach for my problem..? I used ARIMA, but I wanted to use algorithmic model rather than a statistical model, so that’s why I’m trying to build LSTM… Do you think I need to go back to ARIMA..?

(After building one model, I will use ensemble method to improve current model though. For now, I need a decent model giving me the understandable result.)

Thank you so much, your any opinion on this will be really appreciated.

Thanks Jisun,

I generally would recommend using an MLP, LSTMs do not seem to perform well on straight autoregression problems.

Hi Jason,

There is a question above asking you “What kind of Multiple output models would you recommend if we are opting for the fourth strategy”?

And you answer MLPs

Then I try to use mlp to get a one-shot sequence, but I keep getting error…

Below is my code and scenario,

x_train.shape:

(4, 5, 29)

y_train.shape:

(4, 28)

I wish to use prior 5 timesteps and 29 features to get the 28 timesteps ahead forecast sequence.

only 4 training data for illustrative purpose.

model = Sequential()

model.add(Dense(units = 100, input_shape = (5, 29)))

model.add(Dense(90, kernel_initializer=’normal’, activation=’relu’))

model.add(Dense(90, kernel_initializer=’normal’, activation=’relu’))

model.add(Dense(30, kernel_initializer=’normal’, activation=’relu’))

model.add(Dense( 28,kernel_initializer=’normal’, activation=’relu’))

model.compile(loss=’mse’, optimizer=’adam’, metrics=[‘accuracy’])

Error when checking target: expected dense_168 to have 3 dimensions, but got array with shape (4, 28)

\

How can I rectify it? Thank you very much

With an MLP, the prior observations will be features.

If you have 29 features and 5 time steps, this will in fact be 5 x 29 (145) input features to the MLP.

Hi Jason

I am trying to fit a LSTM model which is a multivariate (input and output) and multi step.

So I need to predict multiple steps and multiple features in one model.

Temp : [1,2,3,4],Rain[1,2,3,4] = predict(Temp : [5,6], Rain[5,6])

What is your recommended architecture to do this in one model ?.

I have daily selling values for 5 years with 167000(per item per store) features to predict 15 days for 167000 features

To output two series, perhaps a multiple output model:

https://machinelearningmastery.com/keras-functional-api-deep-learning/

Hi Jason,

Thank you very much for sharing a great article again. I have read many your posts these days, and learned a lot from them.

My project is one-step forecast on time series data. Do you think which model is the best to it?

Charles

I would recommend testing a suite of models to see what works best for your data.

Hi Jason,

Is there anyway to reduce the propagated error during Multi step ahead prediction with recurrent neural network?

Thanks

It’s a hard problem. The general methods help:

http://machinelearningmastery.com/machine-learning-performance-improvement-cheat-sheet/

Hi Jason,

Thanks for the great post. Your post uses Direct Strategy. I would like to apply Direct-Recursive Hybrid Strategy using RNN-LSTM on a time series data that has trend and seasonality. What I need is a multistep forecast where the prediction for the prior time step is used as an input for making a prediction on the following time step. How to go about this ? Because recursion for multi-step would be highly computationally expensive. What changes do I need in the existing code for multi-step using hybrid method?

Thanks

Perhaps this post can help as a starting pint:

https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/

Hi Jason,

I am going with the 4th strategy you mentioned that is one model predicting forecasts in one shot.

I have two models in my mind for this.

1) Multi-Output Network : Output layer of this architecture has ‘forecast’ number of dense layer. In this model there are ‘forecast’ number of weight matrices each trained on predicting individual forecast? Each output dense layer is optimized(adam) for MSE.

2)Single-Output network: This architecture has one dense layer as output with ‘forecast’ number of neurons. So in this case there is only one weight matrix trained for all forecasts and one cost function across all forecasts as opposed to first approach.

Are both the architecture valid? Which architecture works best?

Also one more question Jason. What is the best way to add regularization on time series model.Dropouts or kernel regularizer?Both would do?

Thanks,

Kaushal

Try them both (and more!) and see what works best for your specific dataset.

Try many regularization methods and see what works best for your specific data.

Perhaps this post will help you think about the challenge ahead:

https://machinelearningmastery.com/applied-machine-learning-as-a-search-problem/

Hi Jason, it is a great article and very helpful summary. I have one trivial question.

1. Direct Multi-step Forecast Strategy

prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))

prediction(t+2) = model1(obs(t-2), obs(t-3), …, obs(t-n))

==>

do you mean model2, not model1 for t+2?

prediction(t+2) = model2(obs(t-2), obs(t-3), …, obs(t-n))

why it starts t+1 not t

prediction(t) = model1(obs(t-1), obs(t-2), …, obs(t-n))

Yes, I meant model2, thanks – fixed.

sorry to disturb you again.

Why it is t+1, not t? Thanks.

prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))

=>

prediction(t) = model1(obs(t-1), obs(t-2), …, obs(t-n))

It could be and probably should be t. Just a chosen notation of t-1, t+1.

Hi jason,

Thanks a lot for the great post. I am going with the 2th strategy you mentioned that is recursive multi-step forecast but having difficulty in implementing the recursive forecasting part.

How to put the prior time step to be used as an input for making a prediction on the following time step, for example with SVR or MLP. It would be immense help to a novice like me. Thanks…

Thanks Klaus.

You will have to store it in a variable and then create an array that includes the value and use it as input for the next prediction.

Hi Jason,

Thank you for your great post.

If I use Recursive Multi-step Forecast with ARMA model, the effect of MA predictions will reduce over the steps of prediction or not?

Due to the nature of multi-step forecast, the error terms of the previous unobserved samples will become zero when they are used as an inputs for the further prediction. Howevers, MA will estimate based on the previous errors if I do not missunderstand. Thus, if the previous error terms become 0, will only AR terms affect the prediction results?

Sorry if I misunderstand about the ARMA model. I’m quite new for this topic.

Recursive prediction will result in compounding error. Why would MA go to zero?

Sorry, I didn’t mean the real error but the unobserved residuals. For example in the following website, no observed values for residuals ε106, ε107 are considered as zero in the equation ŷ107 = f(ε106, ε107, y106).

http://www.real-statistics.com/time-series-analysis/arma-processes/forecasting-arma/

“This time, there are no observed values for ε106, ε107, or y106. As before, we estimate ε106 and ε107 by zero, but we estimate y106 by the forecasted value ŷ106.”

Hello Jason,

I am trying to build an LSTM model. My training set has 580 timesteps and 12000 features.

I wanted to use 10 timesteps to predict next 5 timesteps. In this case my train_x.shape will be (87,10,12000). However I am confused about train_y.shape. Should it be (87,10)?

I have information on how to prepare data for LSTMs here:

https://machinelearningmastery.com/faq/single-faq/how-do-i-prepare-my-data-for-an-lstm

I also have an example here that might help:

https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/

Hi Jason,

In my understanding, Direct-Recursive Hybrid Strategies can be implemented in below three steps. Could you help me to check if it is correct please? Thanks in advance.

prediction(t+1) = model1(obs(t-1), obs(t-2), …, obs(t-n))

prediction(t+2) = model2(prediction(t+1), obs(t-1), …, obs(t-n))

1. Use train data to train model1

2. Predict t+1 for all train data

3. Use predicted t+1 plus train data to train model2

Seems good.

Dear Jason,

In Recursive Multi-step Forecast:

I guess to predict the value at (t+2), the observed value of (t-n) is not needed but the one at (t-n+1).

therefore:

instead of:

prediction(t+1) = model(obs(t-1), obs(t-2), …, obs(t-n))

prediction(t+2) = model(prediction(t+1), obs(t-1), …, obs(t-n))

should be :

prediction(t+1) = model(obs(t-1), obs(t-2), …, obs(t-n))

prediction(t+2) = model(prediction(t+1), obs(t-1), …, obs(t-n-1))

Am I right?

Dear Jason ,

I have some experience in python and machine learning and now I am trying to learn predicting a value ( Regression ) based on Timestamp ( Every 5 mins ). So please suggest me the appropriate model for this. Also, let me know if your book on forecasting helps me in this.

You can get started here:

https://machinelearningmastery.com/start-here/#timeseries

HI JASON,

Very informative and nice one. I have one problem related to LSTM forecasting model. I am making a model on call volume forecasting. I want to forecast 3 months ahead of today, let’s say I have built a model and scoring today (in Aug’18), so the forecasted month should of Dec’18 3 months ahead of scoring month. And how should I be proceeding while building model (training dataset) and validation (test dataset) and scoring unseen data(as discussed above). Do I have to build the model on training dataset in similar way, like forecasting 3 months in advance? If yes, how to proceed.

Thanks in Advance

Yes, frame the historical data in the way that you intend to use the model.

E.g. if you need a model to predict n months ahead, frame all historical data this way and fit a model, then score the fit model.

Hello Dr. Jason,

Thank you for your amazing post.

kindly I am confusing about the error calculating for multi-step-ahead prediction.

Suppose I have my training data D_pred = [1,2,3,4,5,6]

and my corresponding target data, D_trgt = [7,8,9,10,11,12,13]

I am using lag = 4 and I want to predict 5 step-ahead

My D_pred would be like this

1 2 3 4

2 3 4 5

3 4 5 6

I used my D_pred to get my prediction result D_out in one shot like this

x7 x8 x9 x10 x11

x8 x9 x10 x11 x12

x9 x10 x11 x12 x13

and my D_trgt would be like this

7 8 9 10 11

8 9 10 11 12

9 10 11 12 13

now, how to calculate the SMAPE error between D_out and D_trgt for horizon 5?

– get SMAPE error between [x7, x8, x9, x10, x11, x12, x13] and [7, 8, 9, 10, 11, 12, 13]

or

– get SMAPE error between [x11, x12, x13] and [11, 12, 13]

which way is the right way?

Thank you so much

I show how with RMSE in this tutorial:

https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/

Thank you so much, Dr. Jason

Hi, master. What is the lag timesteps used for？ I still can’t understand，my doctor.

Lag time steps are observations at prior times. They are used as inputs to the model.

Can you explain more exactly?

Sure, which part are you having difficulty with exactly?