# How to Convert a Time Series to a Supervised Learning Problem in Python

Last Updated on August 21, 2019

Machine learning methods like deep learning can be used for time series forecasting.

Before machine learning can be used, time series forecasting problems must be re-framed as supervised learning problems. From a sequence to pairs of input and output sequences.

In this tutorial, you will discover how to transform univariate and multivariate time series forecasting problems into supervised learning problems for use with machine learning algorithms.

After completing this tutorial, you will know:

• How to develop a function to transform a time series dataset into a supervised learning dataset.
• How to transform univariate time series data for machine learning.
• How to transform multivariate time series data for machine learning.

Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

How to Convert a Time Series to a Supervised Learning Problem in Python
Photo by Quim Gil, some rights reserved.

## Time Series vs Supervised Learning

Before we get started, let’s take a moment to better understand the form of time series and supervised learning data.

A time series is a sequence of numbers that are ordered by a time index. This can be thought of as a list or column of ordered values.

For example:

A supervised learning problem is comprised of input patterns (X) and output patterns (y), such that an algorithm can learn how to predict the output patterns from the input patterns.

For example:

For more on this topic, see the post:

## Pandas shift() Function

A key function to help transform time series data into a supervised learning problem is the Pandas shift() function.

Given a DataFrame, the shift() function can be used to create copies of columns that are pushed forward (rows of NaN values added to the front) or pulled back (rows of NaN values added to the end).

This is the behavior required to create columns of lag observations as well as columns of forecast observations for a time series dataset in a supervised learning format.

Let’s look at some examples of the shift function in action.

We can define a mock time series dataset as a sequence of 10 numbers, in this case a single column in a DataFrame as follows:

Running the example prints the time series data with the row indices for each observation.

We can shift all the observations down by one time step by inserting one new row at the top. Because the new row has no data, we can use NaN to represent “no data”.

The shift function can do this for us and we can insert this shifted column next to our original series.

Running the example gives us two columns in the dataset. The first with the original observations and a new shifted column.

We can see that shifting the series forward one time step gives us a primitive supervised learning problem, although with X and y in the wrong order. Ignore the column of row labels. The first row would have to be discarded because of the NaN value. The second row shows the input value of 0.0 in the second column (input or X) and the value of 1 in the first column (output or y).

We can see that if we can repeat this process with shifts of 2, 3, and more, how we could create long input sequences (X) that can be used to forecast an output value (y).

The shift operator can also accept a negative integer value. This has the effect of pulling the observations up by inserting new rows at the end. Below is an example:

Running the example shows a new column with a NaN value as the last value.

We can see that the forecast column can be taken as an input (X) and the second as an output value (y). That is the input value of 0 can be used to forecast the output value of 1.

Technically, in time series forecasting terminology the current time (t) and future times (t+1, t+n) are forecast times and past observations (t-1, t-n) are used to make forecasts.

We can see how positive and negative shifts can be used to create a new DataFrame from a time series with sequences of input and output patterns for a supervised learning problem.

This permits not only classical X -> y prediction, but also X -> Y where both input and output can be sequences.

Further, the shift function also works on so-called multivariate time series problems. That is where instead of having one set of observations for a time series, we have multiple (e.g. temperature and pressure). All variates in the time series can be shifted forward or backward to create multivariate input and output sequences. We will explore this more later in the tutorial.

## The series_to_supervised() Function

We can use the shift() function in Pandas to automatically create new framings of time series problems given the desired length of input and output sequences.

This would be a useful tool as it would allow us to explore different framings of a time series problem with machine learning algorithms to see which might result in better performing models.

In this section, we will define a new Python function named series_to_supervised() that takes a univariate or multivariate time series and frames it as a supervised learning dataset.

The function takes four arguments:

• data: Sequence of observations as a list or 2D NumPy array. Required.
• n_in: Number of lag observations as input (X). Values may be between [1..len(data)] Optional. Defaults to 1.
• n_out: Number of observations as output (y). Values may be between [0..len(data)-1]. Optional. Defaults to 1.
• dropnan: Boolean whether or not to drop rows with NaN values. Optional. Defaults to True.

The function returns a single value:

• return: Pandas DataFrame of series framed for supervised learning.

The new dataset is constructed as a DataFrame, with each column suitably named both by variable number and time step. This allows you to design a variety of different time step sequence type forecasting problems from a given univariate or multivariate time series.

Once the DataFrame is returned, you can decide how to split the rows of the returned DataFrame into X and y components for supervised learning any way you wish.

The function is defined with default parameters so that if you call it with just your data, it will construct a DataFrame with t-1 as X and t as y.

The function is confirmed to be compatible with Python 2 and Python 3.

The complete function is listed below, including function comments.

Can you see obvious ways to make the function more robust or more readable?

Now that we have the whole function, we can explore how it may be used.

## One-Step Univariate Forecasting

It is standard practice in time series forecasting to use lagged observations (e.g. t-1) as input variables to forecast the current time step (t).

This is called one-step forecasting.

The example below demonstrates a one lag time step (t-1) to predict the current time step (t).

Running the example prints the output of the reframed time series.

We can see that the observations are named “var1” and that the input observation is suitably named (t-1) and the output time step is named (t).

We can also see that rows with NaN values have been automatically removed from the DataFrame.

We can repeat this example with an arbitrary number length input sequence, such as 3. This can be done by specifying the length of the input sequence as an argument; for example:

The complete example is listed below.

Again, running the example prints the reframed series. We can see that the input sequence is in the correct left-to-right order with the output variable to be predicted on the far right.

## Multi-Step or Sequence Forecasting

A different type of forecasting problem is using past observations to forecast a sequence of future observations.

This may be called sequence forecasting or multi-step forecasting.

We can frame a time series for sequence forecasting by specifying another argument. For example, we could frame a forecast problem with an input sequence of 2 past observations to forecast 2 future observations as follows:

The complete example is listed below:

Running the example shows the differentiation of input (t-n) and output (t+n) variables with the current observation (t) considered an output.

## Multivariate Forecasting

Another important type of time series is called multivariate time series.

This is where we may have observations of multiple different measures and an interest in forecasting one or more of them.

For example, we may have two sets of time series observations obs1 and obs2 and we wish to forecast one or both of these.

We can call series_to_supervised() in exactly the same way.

For example:

Running the example prints the new framing of the data, showing an input pattern with one time step for both variables and an output pattern of one time step for both variables.

Again, depending on the specifics of the problem, the division of columns into X and Y components can be chosen arbitrarily, such as if the current observation of var1 was also provided as input and only var2 was to be predicted.

You can see how this may be easily used for sequence forecasting with multivariate time series by specifying the length of the input and output sequences as above.

For example, below is an example of a reframing with 1 time step as input and 2 time steps as forecast sequence.

Running the example shows the large reframed DataFrame.

Experiment with your own dataset and try multiple different framings to see what works best.

## Summary

In this tutorial, you discovered how to reframe time series datasets as supervised learning problems with Python.

Specifically, you learned:

• About the Pandas shift() function and how it can be used to automatically define supervised learning datasets from time series data.
• How to reframe a univariate time series into one-step and multi-step supervised learning problems.
• How to reframe multivariate time series into one-step and multi-step supervised learning problems.

Do you have any questions?

## Want to Develop Time Series Forecasts with Python?

#### Develop Your Own Forecasts in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Introduction to Time Series Forecasting With Python

It covers self-study tutorials and end-to-end projects on topics like: Loading data, visualization, modeling, algorithm tuning, and much more...

### 368 Responses to How to Convert a Time Series to a Supervised Learning Problem in Python

1. Mikkel May 8, 2017 at 7:07 pm #

Hi Jason, thanks for your highly relevant article 🙂

I am having a hard time following the structure of the dataset. I understand the basics of t-n, t-1, t, t+1, t+n and so forth. Although, what exactly are we describing here in the t and t-1 column? Is it the change over time for a specific explanatory variable? In that case, wouldn’t it make more sense to transpose the data, so that the time were described in the rows rather than columns?

Also, how would you then characterise following data:

Customer_ID Month Balance
1 01 1,500
1 02 1,600
1 03 1,700
1 04 1,900
2 01 1,000
2 02 900
2 03 700
2 04 500
3 01 3,500
3 02 1,500
3 03 2,500
3 04 4,500

Let’s say, that we wanna forcast their balance using supervised learning, or classify the customers as “savers” or “spenders”

• Jason Brownlee May 9, 2017 at 7:40 am #

Yes, it is transposing each variable, but allowing control over the length of each row back into time.

• Mostafa March 2, 2018 at 2:43 am #

Hi Jason, thanks for very helpful tutorials, I have the same question as Mikkel.

how would you then characterise following data?

let’s suppose we have a dataset same as the following.
and we want to predict the Balance of each Customer at the fourth month, how should I deal with this problem?

Customer_ID Month Balance
1 01 1,500
1 02 1,600
1 03 1,700
1 04 1,900
2 01 1,000
2 02 900
2 03 700
2 04 500
3 01 3,500
3 02 1,500
3 03 2,500
3 04 4,500

• Jason Brownlee March 2, 2018 at 5:35 am #

Test different framing of the problem.

Try modeling all customers together as a first step.

• Raha August 13, 2020 at 12:35 am #

Hi Jason I also have a similar dataset where we are looking at deal activity over a number of weeks and noting whether they paid early or not in a particular time period. I am trying to predict who is likely to pay early (0 for No and 1 for Yes). Can you explain a bit more what you mean by modeling all customers together as a first step. Please see sample data below:

Deal Date Portfolio Prepaid
1 1/1/18 A 0
1 1/8/18 A 0
1 1/15/18 A 0
1 1/22/18 A 1
2 1/1/18 B 0
2 1/8/18 B 0
2 1/15/18 B 0
2 1/22/18 B 0
3 1/1/18 A 0
3 1/8/18 A 0
3 1/15/18 A 0
3 1/22/18 A 1
4 1/1/18 B 0
4 1/8/18 B 0
4 1/15/18 B 0
4 1/22/18 B 1

• Jason Brownlee August 13, 2020 at 6:18 am #

The idea is whether it makes sense to model across subjects/sites/companies/etc. Or to model each standalone. Perhaps modeling across subjects does not make sense for your project.

• Yavuz June 21, 2018 at 11:08 pm #

Hi Mostafa,

I am dealing with a similar kind of problem right now. Have you found any simple and coherent answer to your question? Any article, code example or video lecture?

I appreciate if you found something and let me know.

Thanks, regards.

• Sandipan Banerjee March 19, 2019 at 12:20 am #

This is similar to my Fluid Mechanics problem too, where in the customer id is replaced by the location of unique point in the 2-d domain (x,y coordinates of the point), and the balance can be replaced by velocities. I, too could not find any help online regarding handling these type of data.

• WangGang June 25, 2018 at 10:16 pm #

I would like to ask if I have the data for the first 5 hours, how to get the data for the sixth hour, Thanks

• Abhimanyu September 20, 2019 at 1:39 am #

How i can detect patterns in time series data. suppose i ahve a timseries influx db box where i am storing total no of online players every minute and i want ti know when the numbers of players shows flat line behavior. Flat line could be on 1 million or on 100 or on 1000..

• Jason Brownlee September 20, 2019 at 5:48 am #

Perhaps you can model absolute change in an interval?

2. Daniel May 9, 2017 at 4:56 pm #

Hey Jason,

this is an awesome article! I was looking for that the whole time.

The only thing is I am general programming in R, so I only found something similar like your code, but I am not sure if it is the same. I have got this from https://www.r-bloggers.com/generating-a-laglead-variables/ and it deals with lagged and leaded values. Also the output includes NA values.

shift1)
return(sapply(shift_by,shift, x=x))

out 0 )
out<-c(tail(x,-abs_shift_by),rep(NA,abs_shift_by))
else if (shift_by < 0 )
else
out<-x
out
}

Output:
1 1 3 NA
2 2 4 NA
3 3 5 1
4 4 6 2
5 5 7 3
6 6 8 4
7 7 9 5
8 8 10 6
9 9 NA 7
10 10 NA 8

I also tried to recompile your code in R, but it failed.

• Jason Brownlee May 10, 2017 at 8:44 am #

I would recommend contacting the authors of the R code you reference.

• chris May 12, 2017 at 3:28 am #

Can you answer this in Go, Java, C# and COBOL as well????? Thanks, I really don’t want to do anything

• Jason Brownlee May 12, 2017 at 7:45 am #

I do my best to help, some need more help than others.

• José Luis Sydor September 19, 2019 at 3:51 am #

lol

• Jason Brownlee September 19, 2019 at 6:06 am #

I know. You should see some of the “can you do my assignment/project/job” emails I get 🙂

3. Lee May 9, 2017 at 11:40 pm #

Hi Jason, good article, but could be much better if you illustrated everything with some actual time series data. Also, no need to repeat the function code 5 times 😉 Gripes aside, this was very timely as I’m just about to get into some time series forecasting, so thanks for this article!!!

4. Christopher May 12, 2017 at 9:16 pm #

Hi Jason,
thank you for the good article! I really like the shifting approach for reframing the training data!
But my question about this topic is: What do you think is the next step for a one-step univariate forecasting? Which machine learning method is the most suitable for that?
Obviously a regressor is the best choice but how can I determine the size of the sliding window for the training?

Thanks a lot for your help and work
~ Christopher

5. tom June 8, 2017 at 4:06 pm #

hi Jason：
In this post, you create new framings of time series ,such as t-1, t, t+1.But, what’s the use of these time series .Do you mean these time series can make a good effect on model? Maybe
my question is too simple ,because I am a newer ,please understand! thank you !

• Jason Brownlee June 9, 2017 at 6:19 am #

I am providing a technique to help you convert a series into a supervised learning problem.

This is valuable because you can then transform your time series problems into supervised learning problems and apply a suite of standard classification and regression techniques in order to make forecasts.

• tom June 9, 2017 at 11:40 am #

• Jason Brownlee June 10, 2017 at 8:12 am #

You’re welcome.

• Josh August 26, 2021 at 6:03 am #

Hi Jason. Fantastic article & useful code. I have a question. Once we have added the additional features, so we now have t, t-1, t-2 etc, can we split our data in to train/test sets in the usual way? (Ie with a shuffle). My thinking is yes, as the temporal information is now included in the features (t-1, t-2, etc).
Would be great to hear your thoughts.

• Adrian Tam August 27, 2021 at 5:44 am #

That’s correct. The whole point of the conversion is to create intervals from the time series, which the model is to consider only the interval but not anything more (and no memory from data outside of the interval). In this case, shuffling the intervals are fine. But shuffling within an interval is not.

6. Brad Suzon June 23, 2017 at 11:32 pm #

If there are multiple variables varXi to train and only one variable varY to predict will the same technique be used in the below way:
varX1(t-1) varX2(t-1) varX1(t) varX2(t) … varY(t-1) varY(t)
.. .. .. .. .. ..
and then use linear regression and as Response= varY(t) ?

• Jason Brownlee June 24, 2017 at 8:03 am #

• Brad June 25, 2017 at 4:47 pm #

In case there are multiple measures and then make the transformation in order to forecast only varXn:

var1(t-1) var2(t-1) var1(t) var2(t) … varN(t-1) varN(t)

linear regression should use as the response variable the varN(t) ?

7. Geoff June 24, 2017 at 8:10 am #

Hi Jason,
I’ve found your articles very useful during my capstone at a bootcamp I’m attending. I have two questions that I hope you could advise where to find better info about.
First, I’ve run into an issue with running PCA on the newly supervised version only the data. Does PCA recognize that the lagged series are actually the same data? If one was to do PCA do they need to perform it before supervising the data?
Secondly, what do you propose as the best learning algorithms and proper ways to perform train test splits on the data?
Thanks again,

8. Kushal July 1, 2017 at 1:31 pm #

Hi Jason

Great post.

Just one question. If the some of the input variables are continuous and some are categorical with one binary, predicting two output variables.

How does the shift work then?

Thanks
Kushal

• Jason Brownlee July 2, 2017 at 6:26 am #

The same, but consider encoding your categorical variables first (e.g. number encoding or one hot encoding).

• Kushal July 15, 2017 at 5:22 pm #

Thanks

Should I then use the lagged versions of the predictors?

Kushal

• Jason Brownlee July 16, 2017 at 7:57 am #

9. Viorel Emilian Teodorescu July 8, 2017 at 9:45 am #

great article, Jason!

10. Chinesh August 10, 2017 at 5:15 pm #

I am working on developing an algorithm which will predict the future traffic for the restaurant. The features I am using are: Day,whether there was festival,temperature,climatic condition , current rating,whether there was holiday,service rating,number of reviews etc.Can I solve this problem using time series analysis along with these features,If yes how.

11. Hossein August 23, 2017 at 1:16 am #

Great article Jason. Just a naive question: How does this method different from moving average smoothing? I’m a bit confused!
Thanks

• Jason Brownlee August 23, 2017 at 6:56 am #

This post is just about the framing of the problem.

Moving average is something to do to the data once it is framed.

12. pkl520 August 26, 2017 at 10:29 pm #

Hi , Jason! Good article as always~

I have a question.

“Running the example shows the differentiation of input (t-n) and output (t+n) variables with the current observation (t) considered an output.”

values = [x for x in range(10)]
data = series_to_supervised(values, 2, 2)
print(data)

var1(t-2) var1(t-1) var1(t) var1(t+1)
2 0.0 1.0 2 3.0
3 1.0 2.0 3 4.0
4 2.0 3.0 4 5.0
5 3.0 4.0 5 6.0
6 4.0 5.0 6 7.0
7 5.0 6.0 7 8.0
8 6.0 7.0 8 9.0

So above example, var1(t-2) var1(t-1) are input , var1(t) var1(t+1) are output, am I right?

Then,below example.

raw = DataFrame()
raw[‘ob1’] = [x for x in range(10)]
raw[‘ob2’] = [x for x in range(50, 60)]
values = raw.values
data = series_to_supervised(values, 1, 2)
print(data)
Running the example shows the large reframed DataFrame.

var1(t-1) var2(t-1) var1(t) var2(t) var1(t+1) var2(t+1)
1 0.0 50.0 1 51 2.0 52.0
2 1.0 51.0 2 52 3.0 53.0
3 2.0 52.0 3 53 4.0 54.0
4 3.0 53.0 4 54 5.0 55.0
5 4.0 54.0 5 55 6.0 56.0
6 5.0 55.0 6 56 7.0 57.0
7 6.0 56.0 7 57 8.0 58.0
8 7.0 57.0 8 58 9.0 59.0

var1(t-1) var2(t-1) are input, var1(t) var2(t) var1(t+1) var2(t+1) are output.

can u answer my question? I will be very appreciate!

• Jason Brownlee August 27, 2017 at 5:48 am #

Yes, or you can interpret and use the columns any way you wish.

13. Thabet August 30, 2017 at 7:36 am #

Thank you Jason!!
You are the best teacher ever

14. Charles September 29, 2017 at 12:24 am #

Jason,

I love your articles! Keep it up! I have a generalization question. In this data set:

var1(t-1) var2(t-1) var1(t) var2(t)
1 0.0 50.0 1 51
2 1.0 51.0 2 52
3 2.0 52.0 3 53
4 3.0 53.0 4 54
5 4.0 54.0 5 55
6 5.0 55.0 6 56
7 6.0 56.0 7 57
8 7.0 57.0 8 58
9 8.0 58.0 9 59

If I was trying to predict var2(t) from the other 3 data, would the input data X shape would be (9,1,3) and the target data Y would be (9,1)? To generalize, what if this was just one instance of multiple time series that I wanted to use. Say I have 1000 instances of time series. Would my data X have the shape (1000,9,3)? And the input target set Y would have shape (1000,9)?

Is my reasoning off? Am I framing my problem the wrong way?

Thanks!
Charles

15. Sean Maloney October 1, 2017 at 5:24 pm #

Hi Jason!

I’m really struggling to make a new prediction once the model has been build. Could you give an example? I’ve been trying to write a method that takes the past time data and returns the yhat for the next time.

Thanks you.

16. Sean Maloney October 1, 2017 at 5:28 pm #

P.S. I’m the most stuck at how to scale the new input values.

• Jason Brownlee October 2, 2017 at 9:38 am #

Any data transforms performed on training data must be performed on new data for which you want to make a prediction.

• Vikram August 1, 2019 at 7:20 pm #

But what if we don’t have that target variable in dataset, like take an example of air pollution problem, now i want to predict the future values based on some expected of other variable just like we do in regression where we train our model on training dataset and then testing it and then making prediction for new data where we don’t now anything about target variable,. But in lstm with keras when we make prediction on new data that have one variable less than training dataset like air pollution we get a shape mismatch…

I am struggling with this from last one week and haven’t foung a solution yet….

• Jason Brownlee August 2, 2019 at 6:47 am #

You can frame the problem anyway you wish.

Think about it in terms of one sample, e.g. what are the inputs and what is the output.

Once you have that straight, shape the training data to represent that and fit the model.

17. Nish October 23, 2017 at 11:42 am #

Hi Jason,
This is great, but what if I have around ten features (say 4 categorical and 6 continuous), a couple of thousand data points per day, around 200 days worth of data in my training set? The shift function could work in theory but you’d be adding hundreds of thousands of columns, which would be computationally horrendous.
In such situations, what is the recommended approach?

• Jason Brownlee October 23, 2017 at 4:11 pm #

Yes, you will get a lot of columns.

18. Shud November 1, 2017 at 5:37 pm #

Hey Jason,

I converted my time series problem into regression problem and i used GradientBoostingRegressor to model the data. I see my adjusted R-squared keep changing everytime i run the model. I believe this is because of the correlation that exists between the independent variable (lag variables). How to handle this scenario? Though the range of fluctuation is small, i am concerned that this might be a bad model

19. Nitin Gupta November 13, 2017 at 10:10 pm #

Hey Jason,

I applied the concept that you have explained to my data and used linear regression. Can I expand this concept to polynomial regression also, by squaring the t-1 terms?

• Jason Brownlee November 14, 2017 at 10:12 am #

Sure, let me know how you go.

20. Samuel November 15, 2017 at 9:43 pm #

Hey Jason,

thanks a lot for your article! I already read a lot of your articles. These articles are great, they really helped me a lot.

But I still have a rather general question, that I can’t seem to wrap my head around.

The question is basically:
In which case do I treat a supervised learning problem as a time series problem, or vice versa?

For further insight, this is my problem I am currently struggling with:
I have data out of a factory (hundreds of features), which I can use as my input.
Additionally I have the energy demand of the factory as my output.
So I already have a lot of input-output-pairs.
The energy demand of the factory is also the quantity I want to predict.
Each data point has its own timestamp.
I can transform the timestamp into several features to take trends and seasonality into account.
Subsequently I can use different regression models to predict the energy demand of the factory.
This would then be a classical supervised regression problem.

But as I unterstood it from your time series articles, I could as well treat the same problem as a time series problem.
I could use the timestamp to extract time values which I can use in multivariate time series forecasting.

In most examples you gave in your time series articles, you had the output over time.
https://machinelearningmastery.com/reframe-time-series-forecasting-problem/
And in this article you shifted the time series to get an input, in order to treat the problem as a supervised learning problem.

So let’s suppose you have the same number of features in both cases.
Is it a promising solution to change the supervised learning problem to a time series problem?
What would be the benefits and drawbacks of doing this?

As most regression outputs are over time.
Is there a general rule, when to use which framing(supervised or time series) of the problem?

I hope, that I could phrase my confusion in an ordered fashion.

Thanks a lot for your time and help, I really appreciate it!

Cheers Samuel

• Jason Brownlee November 16, 2017 at 10:29 am #

To use supervised learning algorithms you must represent your time series as a supervised learning problem.

Not sure I follow what you mean by tuning a supervised learning problem into a series?

• Samuel November 29, 2017 at 10:19 pm #

Dear Jason,

I’m sorry that I couldn’t frame my question comprehensibly, I’m still new to ML.
I’ll try to explain what I mean with an example.

Let’s suppose you have the following data, I adapted it from your article:
https://machinelearningmastery.com/time-series-forecasting-supervised-learning/

input1(time), input2, output
1, 0.2, 88
2, 0.5, 89
3, 0.7, 87
4, 0.4, 88
5, 1.0, 90

This data is, what you would consider a time series. But as you already have 2 inputs and 1 output you could already use the data for supervised machine learning.
In order to predict future outputs of the data you would have to know input 1 and 2 at timestep 6. Let’s assume you know from your production plan in a factory that the input2 will have a value of 0.8 at timestep 6 (input1). With this data you could gain y_pred from your model. You would have treated the data purely as a supervised machine learning problem.

input1(time), input2, output
1, 0.2, 88
2, 0.5, 89
3, 0.7, 87
4, 0.4, 88
5, 1.0, 90
6, 0.8, y_pred

But you could do time-series forecasting with the same data as well, if I understood your articles correctly.

input1(time), input2, output
nan, nan, 88
1, 0.2, 89
2, 0.5, 87
3, 0.7, 88
4, 0.4, 90
5, 1.0, y_pred

In which case do I treat the data as a supervised learning problem and in which case as a time series problem?
Is it a promising solution to change the supervised learning problem to a time series problem?
What would be the benefits and drawbacks of doing this?
As my regression outputs are over time.
Is there a general rule, when to use which framing (supervised or time series) of the problem?

I hope, that I stated my questions more clearly.

Best regards Samuel

• Jason Brownlee November 30, 2017 at 8:16 am #

I follow your first case mostly, but time would not be an input, it would be removed and assumed. I do not follow your second case.

I believe it would be:

What is best for your specific data, I have no idea. Try a suite of different framings (including more or less lag obs) and see which models give the best skill on your problem. That is the only trade-off to consider.

21. MJ November 18, 2017 at 12:46 am #

22. Michael November 30, 2017 at 6:47 am #

Jason:
Thank you for all the time and effort you have expended to share your knowledge of Deep Learning, Neural Networks, etc. Nice work.

I have altered your series_to_supervised function in several ways which might be helpfut to other novices:
(1) the returned column names are based on the original data
(2) the current period data is always included so that leading and lagging period counts can be 0.
(3) the selLag and selFut arguments can limit the subset of columns that are shifted.

There is a simple set of test code at the bottom of this listing:

• Jason Brownlee November 30, 2017 at 8:30 am #

Very cool Michael, thanks for sharing!

• MonkeeYe June 25, 2019 at 4:54 pm #

23. Maciej December 1, 2017 at 7:11 am #

When I do forecasting, let’s say only one step ahead, as the first input value I should use any value that belongs i.e. to validation data (in order to set up initial state of forecasting). In second, third and so on prediction step I should use previous output of forecasting as input of NN. Do I understand correctly ?

• Jason Brownlee December 1, 2017 at 7:46 am #

I think so.

• Maciej December 2, 2017 at 4:15 am #

Ok, so another question. In the blog post here: https://machinelearningmastery.com/time-series-forecasting-long-short-term-memory-network-python/, as an input for NN you use test values. The predictions are only saved to a list and they are not used to predict further values of timeseries.

My question is. Is it possible to predict a series of values knowing only the first value ?
For example. I train a network to predict values of sine wave. Is it possible to predict next N values of sine wave starting from value zero and feeding NN with result of prediction to predict t + 1, t + 2 etc ?

• Maciej December 2, 2017 at 4:18 am #

If my above understanding is incorrect then it means that if your test values are completely different than those which were used to train network, we will get even worse predictions.

• Jason Brownlee December 2, 2017 at 9:05 am #

Yes. Bad predictions in a recursive model will give even worse subsequent predictions.

Ideally, you want to get ground truth values as inputs.

• Jason Brownlee December 2, 2017 at 9:04 am #

Yes, this is called multi-step forecasting. Here is an example:
https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/

• Maciej December 3, 2017 at 5:34 am #

Does it mean that using multi-step forecast (let’s say I will predict 4 values) I can predict a timeseries which contains 100 samples providing only initial step (for example providing only first two values of the timeseries) ?

• Jason Brownlee December 4, 2017 at 7:40 am #

Yes, but I would expect the skill to be poor – it’s a very hard problem to predict so many time steps from so little information.

24. Liz January 12, 2018 at 7:02 am #

Hello Mr. Brownlee,

thank you for all of your nice tutorials. They really help!
I have two questions about the input data for an LSTM for multi-step predictions.
1. If I have multiple features that I use as input for the prediction and at a point (t) I have no new values for any of them. Do I have to predict all my input features in order to make make a multi-step forecast?
2. If some of my input data is binary data and not continuous can I still predict it with the same LSTM? Or do I need a separate Classification?

Sorry if its very basic, I am quite new to LSTM.
Best regards Liz

• Jason Brownlee January 12, 2018 at 11:49 am #

No, you can use whatever inputs you choose.

Sure you can have binary inputs.

• Liz January 13, 2018 at 1:18 am #

Unfortunately I still have some trouble with the implementation.
If I use feature_1 and feature_2 as input for my my LSTM but only predict feature_1 at time (t+1) how do I make the next step to know feature_1 at time (t+2).
Somehow I seem to miss feature_2 at time (t+1) for this approach.
Could you tell me where I am off?
Best regards Liz

• Jason Brownlee January 13, 2018 at 5:34 am #

Perhaps double check your input data is complete?

25. strawberry lv January 31, 2018 at 6:41 pm #

Hello,thank you for the article and i have learned a lot from it.
Now i have a question about it.
The method can be understood as using the value before to forecast the next value. If i need to forecast the value at t+1,…t+ N, whether i need to use the model to first forecat the value at t + 1, and then using the value to forecast t+ 2, then, …. until t+N.
or do you have any aother methed

• Arslan Ahmed March 17, 2018 at 9:01 am #

Hi,
I am working on energy consumption data and I have the same question. Did you get to know any efficient method to forecast the value at t+1, t+2, t+3 + …… t+N?

26. Sameer January 31, 2018 at 11:05 pm #

Hello Dr.Brownlee,

I’m planning to purchase your Introduction to Time series forecasting book. I just want to know that if you’ve covered Multivariate cum multistep LSTM

27. Victor February 21, 2018 at 5:10 am #

Hi Jason,

Thanks for the article. I have a question about going back n periods in terms of choosing the features. If I have a feature and make for example 5 new features based off of some lag time, my new features are all very highly correlated (between 0.7 and 0.95). My model is resulting in training score of 1 and test score of 0.99. I’m concerned that there is an issue with multicollinearity between all the lag features that is causing my model to overfit. Is this a legitimate concern and how could I go about fixing it if so? Thanks!

• Jason Brownlee February 21, 2018 at 6:42 am #

Try removing correlated features, train a new model and compare model skill.

28. Ram Seshadri February 21, 2018 at 12:06 pm #

Dear Jason:

My sincere thanks for all you do. Your blogs were very helpful when I started on the ML journey.

I read this blog post for a Time Series problem I was working on. While I liked the “series_to_supervised” function, I typically use data frames to store and retrieve data while working in ML. Hence, I thought I would modify the code to send in a dataframe and get back a dataframe with just the new columns added. Please take a look at my revised code.

Usage:

Please take a look and let me know. Hope this helps others,
Ram

• Jason Brownlee February 22, 2018 at 11:14 am #

Very cool Ram, thanks for sharing!

• Varun Gupta February 14, 2021 at 4:11 am #

Thanks a ton Ram! You’re a saviour

29. Marius Terblanche February 26, 2018 at 11:33 pm #

Dear Jason,
great article, as always!
Once the time series data (say for multi-step, univariate forecasting) have been prepared using code described above, is it then ready (and in the 3D structure) required for feeding into the first hidden layer of a LSTM RNN?
May be dumb question!
Marius.

30. MikeF March 7, 2018 at 12:49 pm #

Hi Jason, thanks for this post. Its simple enough to understand. However, after converting my time series data I found some feature values are from the future and won’t be available when trying to make predictions. How do you suggest I work around?

31. Adarsh March 27, 2018 at 3:11 pm #

i have a dataset liike this

accno dateofvisit
12345 12/05/15 9:00:00
123345 13/06/15 13:00:00
12345 12/05/15 13:00:00

how will i forecast when that customer will visit again

32. Fatima April 10, 2018 at 6:32 pm #

Hi,

I need to develop input vector which uses every 30 minutes prior to time t for example:

input vector is like (t-30,t-60,t-90,…,t-240) to predict t.

If I wanna use your function for my task, Is it correct to change the shift function to df.shift(3*i) ?

Thanks

• Jason Brownlee April 11, 2018 at 6:33 am #

One approach might be to selectively retrieve/remove columns after the transform.

• fatima April 12, 2018 at 7:14 pm #

Hi,

So I should take these steps:

1- transform for different lags
2-select column related to first lag (for example 30min(
3- transform for other lags
4- concatenate along axis=1

When I perform such steps seems the result is equivalent to when I shift by 3?
I have some questions
which one is better to use?(Shift by 3 or do above steps)
should I remove time t after each transform and just keep time t for last lag?

Thanks

• Jason Brownlee April 13, 2018 at 6:37 am #

Use an approach that you feel makes the most sense for your problem.

33. vishwas April 16, 2018 at 3:15 pm #

Hi Jason,

Amazing article for creating supervised series. But I have a doubt,
Suppose If I wanted the predict sales for next 14 days using Daily sales historical data. Would that require me too take 14 lags to predict the next 14 days??
Ex: (t-14, t-13 …..t-1) to predict (t,t+1,t+2,t+14)

• Jason Brownlee April 17, 2018 at 5:53 am #

No, the in/out obs are separate. You could have 1 input and 14 outputs if you really wanted.

• Vishwas April 17, 2018 at 3:31 pm #

Thanks for the quick response Jason!!

• Shaun November 19, 2019 at 8:39 am #

Still confused.
For example, now we are at time t, want to predict t+5, as an example,
Do we need data at t+1,t+2,t+3,t+4 first?

Thanks, Jason

34. Sanketh Nagarajan April 17, 2018 at 8:37 am #

Hi Jason,

I want to predict if the next value will be higher or lower than the previous value. Can I use the same method to frame it as a classification problem?
For example:

V(t) class

0.2 0
0.3 1
0.1 0
0.5 0
2.0 1
1.5 0

where class zero represents a decrease and class 1 represents an increase?

Thanks,
Sanketh

35. brandon May 7, 2018 at 11:17 pm #

Hi Jason, really nice explanations in your blog. When I have the shape e.g. (180,20) of a shifted dataframe, how can I come back to my original data back with shape (200,1) back ?

• Jason Brownlee May 8, 2018 at 6:14 am #

You will have to write custom code to reverse the transform.

36. Farooq Arshad May 8, 2018 at 8:23 pm #

Hi Jason,

Amazing article.
I have a question, Suppose I want to move the window by 24 steps instead of just one step, what modifications do I have to do in this case?
Like i have Energy data with one hour interval and I want to predict next 24 hours (1 day) looking at last 21 days (504 hours) then for the next prediction i want to move window by 24 hours (1 day).

• Jason Brownlee May 9, 2018 at 6:23 am #

Perhaps re-read the description of the function to understand what arguments to provide to the function.

37. Alex May 19, 2018 at 5:43 am #

Models blindly fit on data specified like this are guaranteed to overfit.

Suppose you estimate model performance with a cross-validation procedure and you have folds:

Fold1 (January)
Fold2 (February)
Fold3 (March)
Fold4 (April)

Consider a model fit on folds 1, 2 and 4. Now you predicting some feature for March based on the value of that feature in April!

If you choose to use a lagged regressor matrix like this, please please please look into appropriate model validation.

One good reason is Hyndman’s textbook, available freely online: https://otexts.org/fpp2/accuracy.html

38. marc May 23, 2018 at 10:28 am #

Hi Jason, really nice blog and learned much from you. I implement one LSTM encoder decoder with sliding windows. The prediction was nearly the same as the input, is it usual that this happens on sliding windows ? I am a bit surprised because the model saw only in the training a little part of data and the model later predicted almost the same input. That makes me thinking I might be wrong. I do not want to post the coding it ist just standard lstm encoder decoder code, but the fact that the model saw only a little part of the data in training is confusing me.

39. james May 23, 2018 at 1:15 pm #

HI json，it’s so good your code,but i have a question that i change the window size(reframed = series_to_supervised(scaled, 1, 1) to reframed = series_to_supervised(scaled, 2, 1)),then i get bad prediction,how can i solve or what cause it
Please take a look at my revised code.

from math import sqrt
from numpy import concatenate
from matplotlib import pyplot
from pandas import DataFrame
from pandas import concat
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
# convert series to supervised learning
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
n_vars = 1 if type(data) is list else data.shape[1]
df = DataFrame(data)
cols, names = list(), list()
# input sequence (t-n, … t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
names += [(‘var%d(t-%d)’ % (j+1, i)) for j in range(n_vars)]
# forecast sequence (t, t+1, … t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
if i == 0:
names += [(‘var%d(t)’ % (j+1)) for j in range(n_vars)]
else:
names += [(‘var%d(t+%d)’ % (j+1, i)) for j in range(n_vars)]
# put it all together
agg = concat(cols, axis=1)
agg.columns = names
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg
values = dataset.values

# integer encode direction

encoder = LabelEncoder()
values[:,4] = encoder.fit_transform(values[:,4])

# ensure all data is float
values = values.astype(‘float32′)

# normalize features
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)

# frame as supervised learning
reframed = series_to_supervised(scaled, 2, 1)
# drop columns we don’t want to predict
reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)

# split into train and test sets
values = reframed.values
n_train_hours = 365*24
train = values[:n_train_hours, :]
test = values[n_train_hours:, :]

# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]

# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

# design network
model = Sequential()
# fit network
history = model.fit(train_X, train_y, epochs=50, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)
# plot history
pyplot.plot(history.history[‘loss’], label=’train’)
pyplot.plot(history.history[‘val_loss’], label=’test’)
pyplot.legend()
pyplot.show()
# make a prediction
yhat = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
# invert scaling for forecast
inv_yhat = concatenate((yhat, test_X[:, 1:8]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]
# invert scaling for actual
inv_y = scaler.inverse_transform(test_X[:,:8])
inv_y = inv_y[:,0]
# calculate RMSE
rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
print(‘Test RMSE: %.3f’ % rmse)
# plot prediction and actual
pyplot.plot(inv_yhat[:100], label=’prediction’)
pyplot.plot(inv_y[:100], label=’actual’)
pyplot.legend()
pyplot.show()

• Jason Brownlee May 23, 2018 at 2:40 pm #

The model may require further tuning for the change in problem.

• james May 23, 2018 at 4:51 pm #

I noticed that your code takes into account the effect of the last point in time on the current point in time.But this is not applicable in many cases. What are the optimization ideas?

• Jason Brownlee May 24, 2018 at 8:08 am #

Most approaches assume that the observation at t is a function of prior time steps (t-1, t-2, …). Why do you think this is not the case?

• james May 24, 2018 at 11:45 am #

oh,maybe i don’t describe my question clearly,my question is why just consider t-1,when consider(t-1,t-2,t-3),the example you gave has poor performance

• Jason Brownlee May 24, 2018 at 1:51 pm #

No good reason, just demonstration. You may change the model to include any set of features you wish.

40. Ishrat Sarwar May 24, 2018 at 2:21 pm #

Dear Sir:
I have 70 input time series. I Only need to predict 1, 2 or 3 time series out of input(70 features) time series. Here are my questions.

-> Should I use LSTM for such problem?
-> Should I predict all 70 of time series?
-> If not LSTM then what approach should I use?

• Jason Brownlee May 25, 2018 at 9:17 am #

Great questions!

– Try a suite of methods to see what works.
– Try different amounts of history and different numbers of forward time steps, find a sweet spot suitable for your project goals.
– Try classical ts methods, ml methods and dl methods.

41. marc June 1, 2018 at 6:50 pm #

HI Jason, I have a huge data with small steps between data time series, they nearly change not in total till the last cycles. I thought maybe not only shifting by 1, how can I shiift more e.g. t-1 and t by 20 steps. Does also this make sense ?

• Jason Brownlee June 2, 2018 at 6:27 am #

Not sure I follow, sorry. Perhaps give a small example?

• marc June 2, 2018 at 9:55 pm #

lets say I have this data:

5
6
7
8
9
10
11
12
and usually if you make sliding windows, shifting them by 1 from t-2 to t
5 6 7
6 7 8
7 8 9
8 9 10
9 10 11
10 11 12
11 12
12

how can I do shifting not by 1 but maybe 3 looking at the first row (in this case) or more from t-2 to t:
5 8 11
6 9 12
7 10
8 11
9 12
10
11
12

I ask that because my data range is so small that shifting by 1 is not having much effect and thought maybe something like this could help. How do I have to adjust your codes for supervised learning to do that. And do you think this is a good idea ?

• Jason Brownlee June 3, 2018 at 6:24 am #

Specify the lag (n_in) as 3 to the function.

42. Bootstrap June 17, 2018 at 10:44 am #

Hi Jason!

Once I apply this function to my data, what’s the best way to split the data between train and test set?

Normally I would use sklearn train_test_split, which can shuffle the data and apply a split based on a user set proportion. However, intuitively, something tells me this incorrect, rather I would need to split the data based on the degree of shift(). Could you please clarify?

43. brad June 27, 2018 at 5:15 am #

When I give the function a sliding window of 20 series_to_supervised(values, 20) my new data shape is (None,21) none is variable here. Why do I get 21 ? Do i need to remove the last column ? or how do I move on ? thanks a lot for your posts.

• Jason Brownlee June 27, 2018 at 8:22 am #

I would guess 20 for the input 1 for the output.

Confirm this by printing the head() of the returned data frame.

44. lara June 28, 2018 at 7:54 am #

why we must convert it into a supervised learning for lstm problem ?

• Jason Brownlee June 28, 2018 at 2:04 pm #

Because the LSTM is a supervised learning algorithm.

45. vinsondo July 11, 2018 at 9:55 am #

Hi Jason, I love the articles. Thank you very much.

I have seen you have the multiple time series inputs to predict time series output.
I have a different input feature setup and try to figure it out how to implement them and use RNN to predict the time series output.

Let’s say I have 7 input features, feature1 to feature7 in which feature1 is a time series.
feature2 to feature5 is a scalar value and feature6 and feature7 are the scalar vectors.

Another way to describe the problem, for a given single value from feature2 to feature5, (ex, 2,500, 7Mhz, 10000, respectively), and a given range of values in Feature6 and Feature7, (ex, feature6 is array [2,6,40,12,….,100] and feature7 is array [200,250,700,900,800,….,12]. Then, I need to predict the times series output from the time series input feature1.

How do I design all these 7 feature inputs to the RNN?
If you have a book that cover this, please let me know. Thank you.

• Jason Brownlee July 11, 2018 at 2:55 pm #

If you have a series and a pattern as input (is that correct?), you can have a model with an RNN for the series and another input for the pattern, e.g. a multi-headed model.

Or you can provide the pattern as an input with each step of the series along with the series data.

Try both approaches, and perhaps other approaches, and see what works best for your problem.

46. James Adams July 26, 2018 at 11:04 pm #

Thank you for this helpful article, Jason.

In case it’s helpful to others, I’ve modified the function to be used for converting time series data over an entire DataFrame, for use with multivariate data when a DataFrame contains multiple columns of time series data, [available here](https://gist.github.com/monocongo/6e0df19c9dd845f3f465a9a6ccfcef37).

47. James August 1, 2018 at 7:05 am #

Hi Jason,

This article was really helpful as a starting point in my adventure into LSTM forecasting. Along with a couple of your other articles I was able to create a multivariate multiple time step LSTM model. Just a thought on your article itself: you used really complicated data structure (I think I ended up with array of arrays and individual values very quickly) when something simpler would do and be more easily adaptable. Over all, though, this was very good tutorial and was helpful to understand the basics of my own project.

• Jason Brownlee August 1, 2018 at 7:51 am #

Thanks James.

Do you have a suggestion of something simpler?

48. Martin Šomodi August 15, 2018 at 8:27 pm #

Love and appreciate the article – helped me a lot with my master’s work in the beginning. I still have lot of work and studying to do, but this tutorial along with “Multivariate Time Series Forecasting with LSTMs in Keras” helped me to understand basics of working with keras and data preparation. Keep up the good work 🙂

• Jason Brownlee August 16, 2018 at 6:03 am #

Thanks, I’m happy to hear that.

49. Juan Carlos Vargas Sosa August 16, 2018 at 5:52 am #

Hi Jason,

Thanks for the effort you put in all the blogs that you have shared with all of us.
I want to share a small contribution of simpler series_to_supervised function. I think it only works in Python 3.

50. Xu August 17, 2018 at 1:57 pm #

Hi Jason,

Thanks for your posts. My question is: for the classification problem, is OK using the same way to reframe the data?
Best
Xu

51. Carlos B August 21, 2018 at 2:07 am #

Hi Jason,

Your site is always so helpful! I’m slightly confused here though. If I have a time series dataset that already consists of some input variables (VarIn_1 to VarIn_3) and the corresponding output values (Out_1 and Out_2), do I still need to run the dataset through the series_to_supervised() function before fitting to my LSTM model?

Example dataset:
Time Win, VarIn_1, VarIn_2, VarIn_3, Out_1, Out_2
1, 5, 3, 7, 2, 3
2, 6, 2, 4, 3, 1
3, 4, 4, 6, 1, 4
…, …, …, …, …, …,

Best wishes,
Carl

52. Julien August 29, 2018 at 8:24 am #

Dear Jason,
Thank you so much for your great efforts.

I am trying to predict day ahead using the h2o package in r. below i.e glm model.

glm_model <- glm(RealPtot ~ ., data= c(input3, target), family=gaussian)

Then I calculate the MAPE for each day using :

mape_calc <- function(sub_df) {
pred <- predict.glm(glm_model, sub_df)
actual <- sub_df$Real_data mape <- 100 * mean(abs((actual – pred)/actual)) new_df <- data.frame(date = sub_df$date[[1]], mape = mape)
return(new_df)
}

# LIST OF ONE-ROW DATAFRAMES
df_list <- by(test_data, test_data\$date, mape_calc)

# FINAL DATAFRAME
final_df <- do.call(rbind, df_list)

I am trying to implement the same above code using h2o, but I am facing difficulties in data conversion in the h2o environment. Any thoughts will be appreciated. Thanks in advance.

• Jason Brownlee August 29, 2018 at 9:18 am #

Sorry, I don’t have any experience with h2o, perhaps contact their support?

53. BenniEvolent September 10, 2018 at 5:46 pm #

Jason your articles are great. I do not mind code repetition, it does take care of issues newbies might face. The Responses section is also a big help. Thanks!

54. Aladji Diallo September 13, 2018 at 12:13 am #

I wonder how you get rid of the dates. I trying to use your method to make my prediction for times series But. I have the date as index.

• Jason Brownlee September 13, 2018 at 8:05 am #

Remove the column that contains the dates. You can do this in code or in the data file directly (e.g. via excel).

55. Andy September 14, 2018 at 12:22 am #

Hello Jason,
nice post, I have a question regarding the train/test split in this case:
E.g. I now take the first 80 % of the rows as training data and the rest as test data.
Would it be considered data leakage since the last two samples in the training data contain the first values of the test set as targets (values for t, t+1)?

• Jason Brownlee September 14, 2018 at 6:37 am #

Nope.

• Andy September 18, 2018 at 4:46 am #

Hi Jason,

thanks for your response, but why is that?
Maybe I wasn’t clear, but I found what I wanted to say in a post on medium:
https://medium.com/apteo/avoid-time-loops-with-cross-validation-aa595318543e

See their second visualization, they call it “look ahead gap” which excludes the ground truth data of the last prediction step in the training set from the test set.

What do you think about that? Is that common practice?

• Jason Brownlee September 18, 2018 at 6:23 am #

I have seen many many many papers use CV to report results for time series and they are almost always invalid.

You can use CV, but be very careful. If results really matter, use walk-forward validation. You cannot mess it up.

It like coding, you can use “goto”, but don’t.

• Andy September 18, 2018 at 9:25 am #

They also argue against classical CV, they actually do use walk-forward validation (I think their usage of the term “walk forward cross validation” is a little misleading).
So yes, I am definitely using walk forward validation!

Let me illustrate my question with a simplified example:

If we have this time series:
[1, 3, 4, 5, 6, 1]

I would split the data into training set
[1, 3, 4, 5]

… and test set
[ 6, 1]

I would do this before converting it into a supervised problem.
So if I do the conversion to a supervised problem now, I will end up with this for my training set:

t | t+1
1 | 3
3 | 4
4 | 5
5 | NaN

For the 4th row, I do not have a value for t+1, since it is not part of the training set. If I took the value 6 from my test set here, I would include information about the test set.
So here I would only train up to the 3rd row, since that is the last complete entry.

For the test I would then use this trained model to predict t+1 following the value 6.
This leads to a gap, since I will not receive a prediction for the fourth row in this iteration (the “look ahead gap”?).

If I were to convert the series into a supervised problem before the split, this issue (is it one?) doesn’t become as clear, but I would remove the last row of the training set in this case, since it contains the first value of my test set as a target.

So, can I convert first and then split or do I need to split first, then convert like in the example?
The underlying question is, if “seeing” or not “seeing” the value of following time step as a target, has an influence on the performance of the prediction in following time step?

• Jason Brownlee September 18, 2018 at 2:23 pm #

Sounds like you’re getting caught up.

Focus on this: you want to test the model the way you intend to use it.

If data is available to the model final prior the need for a prediction, the model should/must make use of the data in order to make the best possible prediction. This is the premise for walk-forward validation.

Concerns of train/test data only make sense at the point of a single prediction and its evaluation. It is not leakage to “see” data that was part of the test set for the prior forecast, unless you do not expect to use the model in that way. In which case, change the configuration of walk-forward validation from one-step to two-step or whatever.

Does that help at all?

• Andy September 18, 2018 at 5:54 pm #

I was caught up and it helps to think about what will be available when making predictions.

My problem was that I am doing a direct 3-step ahead forecast, so there are three “dead“ rows before each further prediction step, since I need 3 future values for a complete training sample (they are not really dead since I use the entries as t+1, t+2, and t+3 at t).

• Jason Brownlee September 19, 2018 at 6:16 am #

They will have real prior obs at the time a prediction is being made. So train and eval your model under that assumption.

56. SRIKANTH October 31, 2018 at 9:49 pm #

I sincerely Thanks a lot for this information by yours. Great job!!!!! and also I wish more articles from yours in future.

I am understand concepts from these two articles
Convert-time-series-supervised-learning-problem-python and Time-series-forecasting-supervised-learning.

Now I want to predicate and set the boolean either TRUE or False value based on the either Latidtude and Longitude or Geohash value, for this how can I used Multivariate Forecasting.
I am completely new to this area please suggest me the directions I will follow it and do it.

Thanks inadvacne. I am doing this in Python3 in my Mac mini

• Jason Brownlee November 1, 2018 at 6:20 am #

Sounds like a time series classification task.

57. FP November 4, 2018 at 3:02 pm #

Hi Jason,

How can I apply the lag only to variable 1 in multivariate time series? In other words, I have 5 variables, but would only to lag variable 1?

• Jason Brownlee November 5, 2018 at 6:09 am #

One approach is to use the lag obs from one variable as features to a ML model.

Another approach is to have a multi-headed model, one for the time series and one for the static variables.

58. Babak December 18, 2018 at 5:27 am #

Hi

I guess all the following lines by the code samples above:

n_vars = 1 if type(data) is list else data.shape[1]

should be rewritten as:

n_vars = 1 if type(data) is list else data.shape[0]

• Babak December 18, 2018 at 6:00 am #

OK I see, actually it’s correct the way it is, so data.shape[0] but if you pass a numpy array, then the rank should be 2 not 1.

So this doesn’t work (the program will crash):

values = array([x for x in range(10)])

But this one does:

values = array([x for x in range(10)]).reshape([10, 1])

Sorry for confusion.

• Jason Brownlee December 18, 2018 at 6:05 am #

No, shape[1] refers to columns in a 2d array.

59. mk December 21, 2018 at 3:58 pm #

One-Step Univariate Forecasting problem: t-1)as input variables to forecast the current time step (t).
if we don’t know t-1,we can not forecast the current time step (t).
e.g1. 1,2,3,4,5,6 there is no 7,how to forecast the 9?

random mising value
e.g2.1,3,4,6 there are no 2 and 5,how to forecast the 7?

THANKS

• Jason Brownlee December 22, 2018 at 6:02 am #

There are many ways to handle missing values, perhaps this will help:
https://machinelearningmastery.com/handle-missing-timesteps-sequence-prediction-problems-python/

• Jon November 19, 2019 at 5:24 pm #

Jason,
In his example,1,3,4,6. Assuming these are daily demand data. For example, date 1/3 has 1 unit sold, date 1/2 the shop is closed, date 1/3 has 3 units sold, etc. Do we need to regard Date 1/2 has missing value? Or just ignore it?

Date 1/2, the shop is closed for public holiday.

Jon

• Jason Brownlee November 20, 2019 at 6:09 am #

Try ignoring it and try imputing the missing value and compare the skill of resulting models.

60. Dazhi December 25, 2018 at 12:40 am #

Hi，Jason.
I have one question about multivariate multi-steps forecasting. For example,another Air pollution forecasting(not your tutorial showed), total 9 features. I want the out put is just the air pollution. Using 3 time-steps ahead to predict next 3 time-steps, So：
train_X and the test_X is :’var1(t-3)’, ‘var2(t-3)’, ‘var3(t-3)’, ‘var4(t-3)’, ‘var5(t-3)’,
‘var6(t-3)’, ‘var7(t-3)’, ‘var8(t-3)’, ‘var9(t-3)’, ‘var1(t-2)’,
‘var2(t-2)’, ‘var3(t-2)’, ‘var4(t-2)’, ‘var5(t-2)’, ‘var6(t-2)’,
‘var7(t-2)’, ‘var8(t-2)’, ‘var9(t-2)’, ‘var1(t-1)’, ‘var2(t-1)’,
‘var3(t-1)’, ‘var4(t-1)’, ‘var5(t-1)’, ‘var6(t-1)’, ‘var7(t-1)’,
‘var8(t-1)’, ‘var9(t-1)’,
train_y and test_y is : ‘var1(t)’, ‘var1(t+1)’, ‘var1(t+2)’ (I dropped the columns that I not want).
I used the minmax() to be normalized,If the out-put is one step, I am easily to inverse the value. However,the question is that I have three out-put values. So, can you give me some advice?
The key point is that i used the minmax(),,,,I don’t know how to inverse it when it with 3 out-puts. Could you please give me some advice? Thank you very much!

• Jason Brownlee December 25, 2018 at 7:24 am #

Perhaps use the function to get the closest match, then modify the list of columns to match your requirements.

61. Prajwal Shrestha December 28, 2018 at 12:02 am #

Hi! i’m a novice at best at this, and am trying to create a forecasting model. I have no idea what to do with the “date” variable in my dataset. should i just remove it and add a row index variable instead for the purpose of modeling?

• Jason Brownlee December 28, 2018 at 5:58 am #

Discard the date and model the data, assuming a consistent interval between observations.

62. Prajwal Shrestha December 28, 2018 at 12:24 am #

One more question, how do I export the new dataframe with t+1 and t-1 variables to a csv file?

63. Rajesh January 5, 2019 at 7:50 am #

Hello Jason,

what you’re doing for machine learning should earn you a Nobel peace prize. I constantly refer to multiple entries on your website and slowly expand my understanding, getting more knowledgeable and confident day-by-day. I’m learning a ton, but there is still a lot to learn. My goal is to get good at this within the next 5 months, and then unleash machine learning on a million projects I have in mind. You’re enabling this in clear and concise ways. Thank you.

• Jason Brownlee January 6, 2019 at 10:13 am #

Thanks, I’m very happy that the tutorials are helpful!

64. Jayashree January 11, 2019 at 4:13 pm #

Hi Jason,
Thanks for the nice tutorial 🙂 I am working on the prototype of the student evaluation system where I have scores of the students for term 1 and term 2 the past 5 years along with 3 dependent features. I need to predict the score of the student from the second year onward till one year in the future. I need your guidance on how to create the model that takes whatever data available in past to predict the current score.

Thanks.

65. Leen January 22, 2019 at 4:46 am #

Hi Jason,

I have my data in a time series format (t-1, t, t+1), where the days (the time component in my data) are chronological (one following the other). However, in my project I’m required to subset this one data frame into 12 sub data frames (according to certain filtering criteria – I am filtering by some specific column values), and after I do the filtering and come up with these 12 data frames, I am required to do forecasting on each one separately.

My question is: the time component in each of these 12 data frames is not chronological anymore (days are not following each other. Example: the first row’s date is 10-10-2015, the second row’s date is 20-10-2015 or so), is that okay? and will it create problems in forecasting later on ? If it will, what shall I do in this case?

• Jason Brownlee January 22, 2019 at 6:28 am #

I’m not sure I follow, sorry.

As long as the model is fit and makes predictions with input-output pairs that are contiguous, it should be okay.

66. daniele January 25, 2019 at 12:51 pm #

Hi Jason, what better way to split the data set into training and testing?

• Jason Brownlee January 26, 2019 at 6:07 am #

It depends on your data, specifically how much you have and what quality. Perhaps test different sized splits and evaluate models to see what looks stable.

67. Sk January 28, 2019 at 10:32 am #

Hi Jason,

Say I have a classification problem. I have 100 users and I have their sensing data for 60 days. The sensing data is aggregated over each day. For each user, I have say 2 features. What I am trying is to perform binary classification — I ask them to choose a number at the start, either 0 or 1 and I am trying to classify each user to one of those class, based on their 60 days of sensing data.

So I got the idea that I could convert it to a supervised problem like you suggested in following way:

day 60 feat 1, day 60 feat 2, day 59 feat 1, day 59 feat 2.. day 1 feat 1, day 1 feat 2, LABEL

Is this how I should be converting my dataset? But that would mean that I’ll only have 100 unique rows to train on, right?

So far, I was thinking I could structure the problem like this, but I wonder if I’m violating the independent assumption of supervised learning. For each user, I have their record for each day as a row and the same label they selected at the start as a label column.

Example: for User 1:

date, feat 1, feat 2, label
day 1, 1, 2, 1
day 2, 2, 1, 1
…………………
day 60, 1, 2, 1

This way I’d have 100×60 records to train on.

My question is: Is the first way I framed the data correct and the second way incorrect? If that is the case, then I’d have only 100 records to train on (one for each user) and that’d mean that I cannot use deep learning models for that. In such a case, what traditional ml approach can you recommend that I can try looking into? Any help is appreciated.

Thank you so much!

• Jason Brownlee January 28, 2019 at 11:49 am #

The latter way looks sensible or more traditional, but there are no rules. You have control over what a “sample” represents – e.g. how to frame the problem.

I would strongly encourage you to explore multiple different framings of the problem to see what makes the most sense or works best for your specific dataset. Maybe model users separately, maybe not. Maybe group days or not. etc.

Maybe this framework will help with brainstorming:
http://machinelearningmastery.com/how-to-define-your-machine-learning-problem/

68. Daniel January 31, 2019 at 4:49 am #

Hi Jason, thanks a lot for the article!

I have two questinos:

1) In “n_vars = 1 if type(data) is list else data.shape[1]”, n_var should not be the length of the data colecitions on the list, like “n_vars = len(transitions[0]-1) if type(transitions) is list else transitions.shape[1]”

2) In for i in range(0, n_out):
cols.append(df.shift(-i)) ==> should not be “df.shift(-i+1))?

Thanks!

• Jason Brownlee January 31, 2019 at 5:37 am #

Why do you suggest these changes, what issues do they fix exactly?

69. Mike February 11, 2019 at 9:12 am #

Hi Jason
Your posts are awesome. They have saved me a ton of time on understanding time series forecasting. Thanks a lot.

I have tested all types of time series forecasting using using your codes (multi-step, multi-variate, etc.) and it works fine.but I have problem getting the actual predictions back from the model.

For instance on the pollution data, and trying the stuff in the wonderful post at:
https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/
I am looking for predicting three time steps ahead (t, t+1 and t+2) of not one but two features (types of observations or simply the ones that are labeled var1(t), var2(t), var1(t+1), var2(t+1), var1(t+2), var2(t+2)). I arrange everything (the dimensions and stuff to fit the structure of what O am looking for, for instance I use :
reframed = series_to_supervised(scaled, 3, 3)
and I drop the columns which are related to features other than those two that I want to predict (I do this for all the three time steps ahead).

But after the model is fit (which also looks really fine), when I try the command:
yhat = model.predict(test_X)
I figure out that the yhat number of columns is always 1, which is weird since it is expected to be 6 (the predicted values for var1(t), var2(t), var1(t+1), var2(t+1), var1(t+2), var2(t+2)).
Am I missing something here?

• Jason Brownlee February 11, 2019 at 2:09 pm #

The shape of the input for one sample when calling predict() must match the expected shape of the input when training the model.

This means, the same number of timesteps and features.

Perhaps this will help:
https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/

• Mike February 11, 2019 at 9:11 pm #

It matches that expected shape, it is in fact the same test_X used for validation when fitting the model.
The reframed data looks like this:
var1(t-3) var2(t-3) var3(t-3) var4(t-3) var5(t-3) var6(t-3) \
3 0.129779 0.352941 0.245902 0.527273 0.666667 0.002290
4 0.148893 0.367647 0.245902 0.527273 0.666667 0.003811
5 0.159960 0.426471 0.229508 0.545454 0.666667 0.005332
6 0.182093 0.485294 0.229508 0.563637 0.666667 0.008391
7 0.138833 0.485294 0.229508 0.563637 0.666667 0.009912

var7(t-3) var8(t-3) var1(t-2) var2(t-2) … var5(t-1) \
3 0.000000 0.0 0.148893 0.367647 … 0.666667
4 0.000000 0.0 0.159960 0.426471 … 0.666667
5 0.000000 0.0 0.182093 0.485294 … 0.666667
6 0.037037 0.0 0.138833 0.485294 … 0.666667
7 0.074074 0.0 0.109658 0.485294 … 0.666667

var6(t-1) var7(t-1) var8(t-1) var1(t) var2(t) var1(t+1) var2(t+1) \
3 0.005332 0.000000 0.0 0.182093 0.485294 0.138833 0.485294
4 0.008391 0.037037 0.0 0.138833 0.485294 0.109658 0.485294
5 0.009912 0.074074 0.0 0.109658 0.485294 0.105634 0.485294
6 0.011433 0.111111 0.0 0.105634 0.485294 0.124748 0.485294
7 0.014492 0.148148 0.0 0.124748 0.485294 0.120724 0.470588

var1(t+2) var2(t+2)
3 0.109658 0.485294
4 0.105634 0.485294
5 0.124748 0.485294
6 0.120724 0.470588
7 0.132797 0.485294

[5 rows x 30 columns]

So, the shape of the train and test data prior to fitting the model are like this:
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
(8760, 1, 24) (8760,) (35035, 1, 24) (35035,)

And after fitting the model I am calling test_X with shape (35035, 1, 24) to be predicted but still it gives me yhat with shape (35035, 1).

What’s wrong?

• Mike February 11, 2019 at 9:34 pm #

I just realized the second dimension (8760, 1, 24) and (35035, 1, 24) should be set to 3. But doing this and fitting the model again does not change the dimension of yhat.

• Jason Brownlee February 12, 2019 at 8:01 am #

No, exactly. Input shape and output shape are unrelated.

• Mike February 11, 2019 at 10:51 pm #

Is it because of the dense layer? Since maybe dense(1) at the end of the sequential model returns dimension 1?
Howver, if I chznge this 1 in the dense layer to for example 3, I get error on not matching dimensions. So confused right now, and a lot of searching did not work

• Jason Brownlee February 12, 2019 at 8:03 am #

Correct.

If you change the model to predict a vector with 3 values per sample, you must change the training data to have 3 values per sample in y.

• Jason Brownlee February 12, 2019 at 7:59 am #

That suggests one output value for each input sample, exactly how you have designed the model.

Perhaps I don’t understand your intent?

70. Mike February 12, 2019 at 2:25 am #

I figured it out Jason. It was because of the dense layer I had to set it to dense(6) also with some modifications in the shape of train and test data.
Thanks again

71. Areej February 14, 2019 at 7:05 am #

Hello,

How can I introduce a sequence of images into the forecasting problem?

Thanks

• Jason Brownlee February 14, 2019 at 8:51 am #

Perhaps use a CNN-LSTM type model?

72. Alex Torex February 23, 2019 at 4:38 am #

Hi , my problem is how to classify time series.

I have a series of user events which happen at various distances in time and I want to classify the type of user by the events he is producing.

Ho do I pass the data to LSTM?

73. Henry Lawson March 19, 2019 at 6:06 am #

Hi Jason,

What approach would you recommend for a modelling problem where I have many time series (in this case each for a different patient), but the measurements are not taken at regular intervals. In fact, there are many concurrent time series, all with different, varying, sample times. The data is sometimes very sparse (no measurements for days) and sometimes very dense (many measurements in one hour), so I don’t want to lose information by interpolating.

I want to train a model on a subset of the patients and use it to predict for other patients. How would you recommend formatting the data?

• Jason Brownlee March 19, 2019 at 9:04 am #

I would recommend testing a range of different framings of the problem.

For example, try normalizing the intervals, try zero padding, try persistence padding, try ignoring the intervals, try limiting history to a fixed window, try only modeling a fixed window, etc. See what is learnable/useful.

74. Gauranga Das March 24, 2019 at 4:53 pm #

I am trying to get a line plot of the final results but, I get a bar graph instead.

CODE:

# plot history
plt.plot(inv_yhat, label=’forecast’)
plt.plot(inv_y,label=’actual’)
plt.legend()
plt.show()

• Jason Brownlee March 25, 2019 at 6:42 am #

Perhaps confirm that the data is an array of floats?

75. josh malina March 30, 2019 at 5:22 am #

For your function “series_to_supervised”, I like the dropna feature, but I could imagine that the user would not want to drop rows in the middle of her dataset that just happened to be NaNs. Instead, they might just like to chop the first few and the last few.

• Jason Brownlee March 30, 2019 at 6:33 am #

Yes, it is better to take control over the data preparation process and specalize it for your dataset.

76. josh malina April 2, 2019 at 5:44 am #

What’s an easy way to convert this to input required by a Keras LSTM? I would assume we would use multi-indices

• Jason Brownlee April 2, 2019 at 8:18 am #
• josh malina April 3, 2019 at 1:33 am #

Thanks Jason, I understand what it should look like, but it’s really non-trivial going from your series_to_supervised function to these three dimensional tensors, requiring more pandas multiindex, pivot-tabling know-how than I have 🙂

• Jason Brownlee April 3, 2019 at 6:45 am #

Once you have the 2D matrix, you can directly reshape it with a call to reshape().

• josh malina April 3, 2019 at 6:18 am #

I have a solution, it’s a bit ugly, but it works!

77. Juan_A April 5, 2019 at 1:34 am #

Hi Jason,

I have a doubt about your function “series_to_supervised”…in my case, I have a time series of speed data with an extra column of “datetime” information in which the traffic measures were taken.

I want to keep the “datetime” also as an input feature within your function, but without adding lagged variables for it. Any idea about how to proceed?

Regards

• Jason Brownlee April 5, 2019 at 6:18 am #

You may have to write some custom code, which will require a little design/trial and error/ and unit tests.

78. Emin April 8, 2019 at 2:58 am #

Hello Jason,

Thank you for the post. I have a question regarding classification task. Let’s say we take this series_to_supervised approach. But in that case, our goal is to predict our original values at time ‘t’, correct? What if the target is binary, let’s say? Thank you.

• Emin April 8, 2019 at 3:04 am #

I will also like to add, that if this approach is taken, then our original target function that contains 0/1 classes, will have more samples that the transformed data frame (due to dropNAN command)

• Jason Brownlee April 8, 2019 at 5:57 am #

It sounds like you are describing time series classification.

I have an example here that may help as a starting point:
https://machinelearningmastery.com/start-here/#deep_learning_time_series

• Emin April 8, 2019 at 8:02 am #

Well, while I agree with you just this is a classification problem (see my first post), if there is a need to predict a class (0/1) in advance, this becomes a prediction problem, correct?

I went through the linked URL before, and if I remember correctly, you have couple of time-series classification examples but none of the “let’s try to predict Class 0 1 day in advance”.

• Jason Brownlee April 8, 2019 at 1:56 pm #

Yes, whether the classification is for the current or future time step, is just a matter of framing – e.g. little difference.

What problem are you having exactly?

• Emin April 8, 2019 at 11:46 pm #

My problem is the following . We have a set of targets associated with every time step. For example:

X y
0.2 0
0.5 1
0.7 1
0.8 0

We perform shift once and we get:

X (t-1) X(t) y
NAN (or 0) 0.2 0
0.2 0.5 1
0.5 0.7 1
0.7 0.8 0

Correct?

Now, my problem is the following: If I use X(t-1) as my input, my target sample will be larger than X(t-1). So in this case, how can I relate/connect my lag timesteps (X(t-1), X(t-2) and so on) to my classes?

• Jason Brownlee April 9, 2019 at 6:26 am #

Each row connects the inputs and outputs.

• Emin April 9, 2019 at 7:29 am #

So, I guess the correct thing would be to apply series_to_supervised to my target class as well, and use y (t) as the target variable, while y (t-1), y (t-2),…,y(t-k) will be used as my inputs, along with X(t-1), X(t-2), …,X(t-k).

Does my approach sound like a correct one? Thank you.

• Jason Brownlee April 9, 2019 at 2:37 pm #

Perhaps try it and see if it is viable.

• aimendezl August 28, 2019 at 9:51 pm #

Hi Emin, Im working in a similar problem. I got a daily series and for each day I have a label (0,1) and I would like to use LSTM to predict the label for the next day. Did you manage to solve the issue of how to transform the input variables? Could you shed some light into the matter if you did?

79. Emin April 8, 2019 at 11:54 pm #

Btw, it will be larger, because in several cases (at least in mine), adding 0 is not a correct thing to do, as 0 represents something else, related to the domain. So, if we have NAN and we drop them, out input will be the size of (3,) and our target will be the size of (4,).

• Jason Brownlee April 9, 2019 at 6:27 am #

Correct, we lose observations when we shift.

80. Emin April 13, 2019 at 3:12 am #

Jason, I have one more question. In case of framing the problem as lagging, our time series have to be in descending order, correct? So, 11:00 AM (row 1), 0:00 AM (row 2). In that case, when we shift by 1 down, we essentially try to predict the original value at time t.

To demo with an example:
12:00 PM 0.5
11:00 AM 1.2
10:00 AM 0.3

Once shift is applied with lag_step=1 =>

12:00 PM NaN
11:00 AM 0.5
10:00 AM 1.2
9:00 AM 0.3

By doing so, we essentially shift all the values to the past and try to predict the minimize the error between real observed values at the original time (t) and modeled values at the same original time (t).

Unfortunately, all the examples I have found so far, model the time in ascending order:
http://www.spiderfinancial.com/support/documentation/numxl/reference-manual/transform/lag

It will be great if you can clarify.

Thank you.

• Jason Brownlee April 13, 2019 at 6:40 am #

Yes, data must be temporally ordered in a time series.

Typically ascending order, oldest to newest in time.

• Emin April 13, 2019 at 9:23 am #

Well, in that case, if we take (t-1) as our input, don’t we face a problem with data leakage? We are pushing the value one step down, which essentially means that we push our data in the future time.

• Jason Brownlee April 13, 2019 at 1:48 pm #

No. It comes down to how you choose to define the problem – what you are testing and how.

81. Ahmed May 4, 2019 at 1:45 am #

Jason,

Do we still have to worry about removing trend and seasonality if we use this approach?

• Jason Brownlee May 4, 2019 at 7:11 am #

It depends on the model used.

Typically, removing trend and seasonality prior to the transform makes the problem simpler to model.

82. Alla May 27, 2019 at 5:21 am #

Hi,
I tried this simple code to do the example in your book. i just have started reading it today.
x = []
y=[]
d = np.arange(12)
for i in range(len(d)):
if i+3 <=11:
x.append(d[i:i+3])
y.append(d[i+3])
print(x)
print(y)
a

Results:
x = [array([0, 1, 2]), array([1, 2, 3]), array([2, 3, 4]), array([3, 4, 5]), array([4, 5, 6]), array([5, 6, 7]), array([6, 7, 8]), array([7, 8, 9]), array([ 8, 9, 10])]

y= [3, 4, 5, 6, 7, 8, 9, 10, 11]

83. Alla Abdella May 27, 2019 at 6:38 am #

Thank you Jason. I really enjoyed this tutorial.

• Jason Brownlee May 27, 2019 at 6:52 am #

You’re welcome, I’m glad it helped.

84. Carlos June 6, 2019 at 12:20 am #

Hi Jason,
Thanks for the interesting article.
I’m having trouble with univariate forecasts that have multiple observations. I don’t see that case here and don’t see how to translate it to TimeSeriesGenerator.
Say, I have series of 100 observations and I want to train a RNN to predict the next observation after 50. This is easily done with TimeSeriesGenerator.
But now, what if I have 1000 of these series of 100 observations? Concatenating them is no good, as observation 101 has no relation with observation 99. Can I still use TimeSeriesGenerator in this scenario?
Thanks!

85. Ala June 22, 2019 at 7:25 am #

Hi Jason. Your tutorials are amazing.
1-Do you have any book about time series prediction using LSTM? (I know you have a book about time series prediction for classical methods.

2- Do you know how to predict multiple values for multivariate time series (Do you have any tutorial or can you tell me what setting I should change in keras lstm ? here is the detailed description:

assume my multivariate series after converting it to the supervised learning problem is (The first 4 numbers is input and the last 4 is output):

1 2 3 4 5 6 7 8

5 6 7 8 9 10 11 12

9 10 11 12 13 14 15 16

I would like to learn all 4 of the output values. Do you know if it is possible with your methods and what I need to change so it multivariate with more than one output and my target or output is 4 dimensional. I would be appreciated if you can help.

86. Mat June 22, 2019 at 1:48 pm #

Hi Jason. One question that I could not find any tutorial about that in your website.

Assume I have multiple time series they are very similar generated for the same event. lets say the dataset you use a lot like Shampoo dataset. Assume instead of one dataset I have 100 separate dataset (consists of time series from 1993-2000) How can I use all these data for the training ? I know how to train lstm for one but then how do you keep training for all of them. I cannot concatenate time series or sort them according to values since they might be multivariate.

I will be grateful if you can help me solve this problem

87. George July 29, 2019 at 4:38 pm #

How to use AR model coefficients for generate feature input for machine learning algorithm such as SVM ?

Now i can find AR coef already,
By this code

model = AR(train)
model_fit = model.fit()
print(‘Lag: %s’ % model_fit.k_ar)
print(‘Coefficients: %s’ % model_fit.params)

but in don’t known how to use coeff for extract to feature input for SVM
Thank you ..

• Jason Brownlee July 30, 2019 at 6:02 am #

Sorry, i’m not familiar with the approach that you’re describing.

88. George July 30, 2019 at 1:54 am #

How to use AR model coefficients for generate feature input to machine learning such as SVM ?

Now i can find AR coef of this code

model = AR(train)
model_fit = model.fit()
print(‘Lag: %s’ % model_fit.k_ar)
print(‘Coefficients: %s’ % model_fit.params)

How to create feature input (model_fit.params) for SVM with AR model coefficients ? .. Thank you

• Jason Brownlee July 30, 2019 at 6:18 am #

Sorry, i don’t have a tutorial on this topic, I cannot give you good off the cuff advice.

89. Soumya Sourav August 3, 2019 at 5:13 am #

Hello Jason. Excellent article to get started with. I just have two questions. Let’s say I have three variables time current and voltage and voltage is my target variable to be predicted. So if I transform my dataset according to the mentioned techniques and train the model and then I get a new data (validation) with just time and current as input and I need to predict the output voltage over that period of time given in the validation set, how would I transform so that my model is able to predict the voltage?
Also how would you suggest to move ahead if I have panel data in my dataset

• Jason Brownlee August 3, 2019 at 8:16 am #

Design the model based on one sample, e.g. what data you want to provide as input and what you need as output.

Then prepare data to match, fit and model and evaluate.

90. aimendezl August 28, 2019 at 10:11 pm #

Hi Jason, thank you for all your tutorials. These have been a tremendous help so far. I am now working on a classification problem with time series and I would like to know if you could help me with some advise.
I have a daily time series, and for each day there’s a label (0,1), and i would like to reframe the problem as a binary classification – predict the label of the next day – but I am having troubles with the format of the input variables and the target variable.

Do you think the function series_to_supervised() can be applied here? And if so, how could I do it?

I am thinking the following as a very naive experiment to check if this works. Label the days as (0) if the value of the variable dropped or (1) if it goes up (This can of course be done by a simple if statement but it’s for this example’s sake only)

My data then would look something like this:

——————————-

date var1 label
——————————-
1 10 nan
2 20 1
3 9 0
4 8 0

if I apply series_to_supervised(data,1,1) the data set would look something like

——————————————

date var1(t-1) var1(t) label(t)
——————————————
1 10 20 1
2 20 9 0
3 9 8 0

then to define my input/output pairs:

X = supervised.drop(‘label(t)’)
y = supervised[‘label(t)’]

is this approach correct? It seems to me the label should be very easy to learn for any ANN, but I’m not sure this format of the input/output is correct.

I would appreciate any advise on this topic. And thanks again for the amazing books and tutorials.

• Jason Brownlee August 29, 2019 at 6:10 am #

It may be, only you will know for sure – it’s your data.

91. sara September 8, 2019 at 4:44 am #

Hi Jason ,
thanks for the tutorial
I have a financial time series ,I turned the series into a supervised learning problem. I want to predict the t + 1 value using the previous 60 days. my question is about the prediction method that I have to use after xgboost and SVR. is what I am trying to do is ok

• Jason Brownlee September 8, 2019 at 5:21 am #

I recommend testing a suite of methods in order to discover what works best for your specific dataset.

92. Jaydeep September 12, 2019 at 6:24 pm #

Hi Jason,

As always great article, thanks a ton. I just want to ask how does the approach of converting Time Series problem to Supervised learning problem compare against treating it as a Time Series problem ? I think with advanced techniques such as LSTM we should no longer need conversion of Time Series problem to Supervised learning problem.

93. buttonpol September 30, 2019 at 2:01 am #

Great article, it helped me a lot.

I’ve added a little modification (I had a similar but very simpler function).

I wanted to use the data at time t-n1, t-n2, …,t-nx, to forecast values at (lets say) time t+n7 and I wanted to avoid a new step for deleting the intermediate t+n1, t+n2, …,t+n6. So I added a window parameter to do that.

It is not so great improvement, but it helped me.

For future versions, it would be nice to handle actual column names in the input data, for automated post processing (i.e. if original column name is “date”, output would be “date t-1”).
Later I would give it a try.

• Jason Brownlee September 30, 2019 at 6:15 am #

Thanks for sharing!

Great suggestion.

94. Eli September 30, 2019 at 3:24 pm #

Hey Jason,

I’ve been going through all your LSTM examples and I’ve been using your ‘series_to_supervised function for preprocessing’. For the life of me, though, I can’t wrap my head around what I’m supposed to do when I need to reshape data for n_in values and n_out values greater than 1.

For instance I used the function with 50 timesteps (n_in = 50) to predict another 50 values in the future (n_out = 50). I have over 300,000 individual samples with 19 observations each, so the output of the function, ‘agg’, is understandably quite large.

Onto reshaping. My intuition tells me to input the tuple (train_X.shape[0],50,train_X.shape[1]) for reshaping my training X data. This throws an error. What am I doing wrong here? Is your ‘series_to_supervised function’ the right away to approach this in the first place? I believe it is, but I’m at a loss for how to workaround this.

I’m particularly interested in how to frame this for both predicting 50 single observations, or even 50 sequences of all observations, assuming my terminology is correct. And for reference I’ve looked through just about every LSTM post you have, but perhaps I overlooked something. Regardless, your work has been incredibly helpful thus far–thank you for all the hard work you’ve put into your site!

95. Karan Sehgal October 5, 2019 at 2:19 am #

Hi Jason,

Till how many lags should we create the new variables ? we only need to create the lag using the target variable ?

• Jason Brownlee October 6, 2019 at 8:12 am #

Perhaps test a few different values and see what works best for your model and dataset.

96. Karan Sehgal October 7, 2019 at 5:24 am #

Hi Jason,

Thanks you so much Jason for your inputs on the above query. I have few more queries please.

1) We should make the data stationary before using supervised machine learning for time series analysis?

2) Introducing Lag (t-1) or (t-2) in the dataset makes the dataset stationary or not ?

3) Machine learning models cannot simply ‘understand’ temporal data so we much explicitly create time-based features. Here we create the temporal features day of week, calendar date, month and year from the date field using the substr function. These new features can be helpful in identifying cyclical sales patterns. Is this true ?

4) There are a wide variety of statistical features we could create here. Here we will create three new features using the unit_sales column: lag_1 (1-day lag), avg_3 (3-day rolling mean) and avg_7 (7-day rolling mean). Can we create these kind of features also ?

• Jason Brownlee October 7, 2019 at 8:32 am #

Yes, try making the series stationary prior to the transform, and compare results to ensure it adds skill to the model.

Adding lag variables does not make it stationary, you must use differencing, seasonal adjustment, and perhaps a power transform.

New features can be helpful, try it and see for your dataset.

97. Karan Sehgal October 7, 2019 at 5:39 am #

Hi Jason,

I forgot to post some more queries on the above post.

1) Lags always needs to be created for the Y variable i.e (dependent variable) only ?

2) Apart from introducing lag in dataset, do we also need to add the column of differencing for the Y variable to make the data stationary ?

Thanks,
Karan Sehgal

• Jason Brownlee October 7, 2019 at 8:33 am #

No, but lag obs from the target are often very useful.

Adding variables does not make a series stationary.

• Karan Sehgal October 7, 2019 at 9:59 pm #

Hello Jason,

1. ) So, I need to create both of the variables i.e for lag and differencing because differencing will help me creating the dataset stationary and lag can be of very useful for model and then we need to consider both of these variables together in the algorithm?

2) Above you have also mentioned seasonal adjustment, and perhaps a power transform. How can we get Seasonal Adjustment power of transform. Do we need to create a separate columns for seasonal adjustment and transformed data or we can create both in one column only.

Thanks.

• Jason Brownlee October 8, 2019 at 8:01 am #

Difference, then lag. The lagged vars will be differenced.

98. Radhouane Baba October 23, 2019 at 10:29 pm #

Hi Jason,

i am trying to forecast one day in the future (1 day = 144 values output) based on data from last week. (144*7= 1008 timesteps)
in each timestep i have 10 variables such as temperatuire and etc…

that means i have a very big input vector (144*7*10 = 10,080)

isnt it too much data at once???? ( 10,080 input values ———-> 144 output)

99. Andrei February 20, 2020 at 11:30 pm #

This is different than what you do on the LSTM tutorial:
https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/

Shouldn’t the output from here be usable in the LSTM model?

I’m a bit confused, can you explain the difference?

100. Manju February 25, 2020 at 10:17 pm #

Can we use this approach for weather forecasting?

• Jason Brownlee February 26, 2020 at 8:19 am #

Yes, but physics models perform better than ml models for weather forecasting.

101. manjunath February 26, 2020 at 5:55 am #

Sliding window and rolling window are they same ?
Sliding Window and Expanding window are they same?
Pls share some information it will help more

Thanks

• Jason Brownlee February 26, 2020 at 8:28 am #

Not really. They are siblings, but not the same.

Sliding window is a way of preparing time series data to be a supervised learning problem.

Rolling window and expanding window are data preparation methods.

102. David March 24, 2020 at 1:56 am #

Hi, if we use this method of converting a time series to a supervised learning problem, we still need to use the timestamp as a feature? Or we could simply delete that column?
Since we will only establish a window, we do not need the timestamp right? If I am wrong, there is a correct way to prepare this feature?

Another question that I want to make is: There is a problem if the data not have the same interval of time (have samples in completely random intervals, however, sequential)?

103. uthman April 3, 2020 at 6:20 pm #

One Hell of a blog you have Jason.
Much appreciated

104. Matt May 11, 2020 at 3:36 am #

Hi Jason,

Great and informative post. I’m in the midst of trying to tackle my first time series forecasting problem and had a couple questions for you. So once the series_to_supervised() function has been called and the data has been appropriately transformed, can we use this new dataframe with traditional ML algorithms (i.e. RandomForestRegressor)? Or do we then set up an ARIMA model or some other time series specific forecasting model? And then how do we make the transition to forecasting multiple variables or steps ahead in time? Sorry if these are bad questions, I’m a newbie here.

Thanks,
Matt

105. Srinivas Gummadi May 11, 2020 at 12:43 pm #

Great blog Jason. Learnt a lot and code works as advertised by you!!!

Four questions: and one suggestion.

1) if I have daily forecast data, and my target is to forecast for upcoming 6 weeks, i thought this is the way to do:
a) resample the original data on Week basis and sum the sales – so my sample size becomes 1/7
b) convert to supervised data as suggested here
c) then model it with my data – train/test and save the model
Question is: how do i use this model to forecast for future 6 weeks as the whole model is dependent on previous events. Will the learning be good

2) If i have potentially other influencing items like sales promotion, rebates, marketing event etc., how does this model comprehend? This ARIMA model is taking only a sequence of diffs. Can you pl. provide guidance how do i incorporate additional features into the model

3) If there are extraordinary items like outliers – way off the pattern -> can i prune the data and remove them?

4) Do we ever scale the data?

I wish u used dataframes more often than Series as more and more users use that data structure and code can be reused verbatim – just a suggestion

Srinivas

• Jason Brownlee May 11, 2020 at 1:38 pm #

Thanks!

Perhaps try it and see?

• Srinivas Gummadi May 11, 2020 at 2:34 pm #

Thanks for reply – i take it that ur answer pertains to 1, and 3. I think after thinking more, i do not need to do item 4 because, diff is already doing the transformation. so that answers 4.

However, on item 2, how do i add more features to ARIMA?

Srinivas

106. Giselle May 18, 2020 at 3:29 am #

Hi Jason,

As far as I understand, if I would like to predict the next 24 hours based on 5 variables’ previous observations (which is a multivariate, multi_step case) I would use ”” series_to_supervised(dataframe, 1, 24) ”” and then drop the unwanted columns, am I mistaken ?

Otherwise, if I would like to use previous month’s values, should I put “”” n_in=720 “”” instead of “”” n_out=1 “”” or use something else ?

Thank you

• Jason Brownlee May 18, 2020 at 6:21 am #

Looks like you are predicting 24 steps based on one step. That is aggressive. You might want to provide more context to the model when making a prediction – e.g. give more history/inputs.

• Giselle May 18, 2020 at 9:58 am #

Exactly, it won’t be accurate. For that, I would like to include more previous values but I don’t know how using the same function?

• Jason Brownlee May 18, 2020 at 1:25 pm #

Change the”n_in” argument to a larger number to have more lag observations as input.

107. md faiz June 4, 2020 at 1:35 am #

First of all i thank you for writing such a detailed articles.

my question is:

i have 3 independent variable and 1 dependent variable(call volume hourly data.)
i have created 24 lags for dependent variable and trained the model.

But now i have to forecast in future for 1 week ( 24 rows for a day * 7(days)= 168).
My problem is ,i am not getting how to create lags for the forecast period as i have train the model till ,for example say ,today. Now i have to forecast till 10th JUNE,2020.

Now how to create lags for dependent variable as i have no future data of call volume and that is what i have to forecast.

In the forecast data ,i have created future data for 3 independent variables because they all were derived from date such as DayofWeek etc …but how to create lags ????

108. Morgan June 12, 2020 at 3:03 am #

Is sequence_to_supervised() synonymous to “windowizing” data in preparation for the LSTM?

109. Eric June 27, 2020 at 8:30 am #

What will happen when we make predictions? Does the model expect a sequence of data as well since that is what it was trained on?

• Jason Brownlee June 27, 2020 at 2:08 pm #

New examples are provided in the same form, e.g. past observations to predict future observations.

110. Rajiv July 7, 2020 at 5:59 pm #

Hi Mr.Jason,

My question is regarding the “Multi-Step or Sequence Forecasting” section.
Suppose we have to forecast next “m” time steps, with some particular lag, say “n” in my sequence, I will have my dataset like:

v1(t-n)….v1(t-3),v1(t-2),v1(t-1) V1(t), V1(t+1), v1(t+2), v1(t+3), v1(t+4), v1(t+5)…v1(t+m) .

Now. Let me know which case is relevant to my problem:

Case-1:
Will I have ‘m’ separate models for each time period. i.e.:
v1(t+1) = f(v1(t), v1(t-1), v1(t-2), ….. v1(t-n))
v1(t+2) = f(v1(t), v1(t-1), v1(t-2), ….. v1(t-n))
v1(t+3) = f(v1(t), v1(t-1), v1(t-2), ….. v1(t-n))
…..
v1(t+m) = f(v1(t), v1(t-1), v1(t-2), ….. v1(t-n))

Case-2:
Should I feed my previous predicted value to predict my next value as a sequence.
v1(t+1) = f(v1(t), v1(t-1), v1(t-2), ….. v1(t-n))
v1(t+2) = f(v1(t+1)hat,v1(t), v1(t-1), v1(t-2), ….. v1(t-n-1))
v1(t+3) = f(v1(t+2)hat,v1(t+1)hat,v1(t), v1(t-1), v1(t-2), ….. v1(t-n-2))
….
v1(t+m) = f(v1(t+m-1)hat,v1(t+m-2)hat,v1(t+m-3)hat,….v1(t+m-n)hat)

I am confused in choosing between the approaches:

* In case-1 the number of models will be a big number and I feel the model maintenance part might be problematic if “m” is a big number…!!!
* In case-2, I will have one single model, But as I go down the time line, my predictions will more depend on the previous predicted values which will make my inputs more fragile…

Thanks In advance and Thank you for the wonderful post.

• Jason Brownlee July 8, 2020 at 6:27 am #

You can do either, the choice is yours, or whichever results in the best performance/best meets project requirements.

111. Anon August 1, 2020 at 1:11 pm #

Thanks for the article Jason, pleasure to read. I have a question: how is this different from making a window the “normal” way, over rows? Is there any benefit to doing it this way, or can I just as easily have M timesteps for my X, and 1-N timesteps for my Y, both having 1 timestep per row?

• Jason Brownlee August 1, 2020 at 1:29 pm #

You’re welcome.

Sorry, I don’t follow what you’re comparing this approach to. Perhaps you can elaborate.

This is a sliding window approach generally used when converting a time series to a supervised learning problem.

• Anon August 3, 2020 at 1:12 am #

As I understand it, the sliding window approach in this article has the window progress across columns:

Where X(t+1) is the target output, for as many features that are there in the original dataset.

On the other hand, what if the window progresses across the rows, like so:

So that if you have a window of size 3, you’d have N-3 sliding windows composed of (X(t), X(t+1), X(t+2)) to predict X(t+3), all the way up to (X(t+N-3), X(t+N-2), X(t+N-1)) to predict X(t+N)?

Is there any difference in the two strategies? The reason I ask this is because I was training an LSTM using the first method (sliding windows across columns) and kept encountering out-of-memory errors when using pandas’ shift() for a large window, but it was a relatively trivial matter when sliding across rows without using shift() as no preprocessing was necessary.

Thanks,

112. kourosh August 10, 2020 at 5:46 pm #

Hi, Mr Brownlee

i have 276 files (from 1993-2015) with dimensions of (213*276). each of the files belong to one month. i want to predict last year(last 12 month).

how i can split data and what is time steps?

• kourosh August 10, 2020 at 5:57 pm #

i mean should i reshape it to column and concatenate all years like a long column? because the data are like pixel values (like heat map) and this is confusing to me.

• Jason Brownlee August 11, 2020 at 6:30 am #

Perhaps. Try it and see if it makes sense for your dataset.

• Jason Brownlee August 11, 2020 at 6:28 am #

Perhaps load all data into memory at once or all into one file, then perhaps fit a model on all but the last year, then use walk-forward validation on the data in the last year month by month.

113. Mohammad August 21, 2020 at 3:16 am #

Hey Jason,
Thanks for the wonderful materials on your website.

the function “series_to_supervised” is great and so straightforward to use. However, I see when we use shift and transform data, data type from integer changes to float. Isn’t it better to adjust the dtype of the columns in the function as well?

Thanks again.

• Jason Brownlee August 21, 2020 at 6:35 am #

You’re welcome!

We should be working with floats – almost all ML algorithms expect floats, except maybe trees.

114. Darrell K August 28, 2020 at 12:34 pm #

Hi Jason, thank you for the awesome article. I’ve been looking for something like this for quite some time.

In my case, I’m trying to build an AR model with exogenous inputs, so I need to train the net with all the training data (X and y) and then make forecasts based on X only. My data is highly nonlinear and I want to make forecasts for many steps ahead.

Reading other sites, I understood that I would need to refer to the last N values in the X and y_hat vectors (not the whole, lagged series), slightly different from what you taught here. I would appreciate very much if you could offer any hint on how to achieve this.

115. James September 7, 2020 at 8:35 am #

I’m wondering why we need to do all this in the first place. Why can’t we just treat the Date column as any old feature, X1, and then predict y?

For ex: What’s wrong with just plugging in features X1, X2, X3 (‘Date’, ‘Temp’, ‘Region’), and training a random forest to predict y (‘Rainfall’)? If recent data is more important, shouldn’t the model be able to figure this out?

• Jason Brownlee September 7, 2020 at 8:38 am #

Great question!

You can if you like. Try it and compare.

We do this because recent lag observations are typically highly predictive of future observations. This representation is called an autoregressive model/representation.

116. Senthilkumar Radhakrishnan September 7, 2020 at 2:14 pm #

Hi Jason,

I have a training data with 143 instances and test with 30 instances with additional features like temperature and others with my target in training .
So if i create lag values it should be above 30th lag right!? because we will not be having lag 2 for all those 30 instances as we have to forecast all those 30
In this case what is the best solution and how can i add lag components to get result

• Jason Brownlee September 8, 2020 at 6:44 am #

Generally, it is a good idea to either use an ACF/PACF plot to choose an appropriate lag or grid search different values and discover what results in the best performance.

117. Jeff Hernandez September 16, 2020 at 8:09 am #

Great tutorial! This open source tool is helpful for labeling time series data.

https://github.com/FeatureLabs/compose

118. Carlos September 24, 2020 at 10:18 am #

Hi Jason,

Any idea to use this function series_to_supervised with PySpark Dataframe or can I handle it?

Thanks a lot!

Regards,

119. Joao Silva September 26, 2020 at 1:12 am #

Hi Jason,

I have a question related you dividing the time series into x (input) and y (output).

I’ve noticed that most people shift the series one step independent of the amount of steps they want to forecast. (1 Option)

1 Option
– – – – – – –
—-X————Y
1 2 3 4 5 –> 6 7 8
2 3 4 5 6 –> 7 8 9
3 4 5 6 7 –> 8 9 10

Wouldn’t that make the model forecast the same values? (7 8 and 9 )

And then make the 2nd Option more feasible since the model would have to predict new values every time?

2 Option
– – – – – – –
—-X————Y
1 2 3 4 5 –>. 6 7 8
9 10 11 12 13 –> 14 15 16
17 18 19 20 21 –> 22 23 24

Thank you for helping the ML community!

• Jason Brownlee September 26, 2020 at 6:21 am #

You are training a function to map exampels of inputs to exampels of outputs. We have to create these exampels from the sequence.

You can design these mapping examples anyway you like. Perhaps one step prediction is not appropriate for your specific dataset. That’s fine, change it.

120. May October 7, 2020 at 8:07 pm #

Hi Jason, thanks for this tutorial. I noticed in the comments that many are using LSTM for time series prediction. Do you know if it is also possible to use other models such as logistic regression for this problem? if say we would like to predict what will be the energy compliance of a home appliance (low or high) for the next hour based on the energy consumption in the last 2 to 3 hours as an example? Thank you.

• Jason Brownlee October 8, 2020 at 8:30 am #

Yes, you can use any algorithm you like once the data is prepared.

121. Yannick October 8, 2020 at 1:08 am #

Hello Jason,
I have some short series of 5 data values in range 0 to 2 under the form let’s say [1,2,1,0,0] that represent the five last results of a given soccer team, 1 being a win 2 a draw and 0 a loss where I want to predict the probability for the next value to be a 1, once I predict the next one the first value (here 1) is droped to form a new serie.
I have a good intuition on what should be the next one but would like to create a model.
Please keep in mind i’m a beginner. So i just guess i should use time series.
I would like to weight each value since I know each one depends on many parameters like the ranking of the opponent, the number of shots the team made in the last game and so on ..
I try to wrap my mind around but it’s hard. Thanks for what you do !

122. Andreas October 13, 2020 at 1:00 am #

“Again, depending on the specifics of the problem, the division of columns into X and Y components can be chosen arbitrarily, such as if the current observation of var1 was also provided as input and only var2 was to be predicted.”

In the case where we want to predict var2(t) and var1(t) is also available.

var1(t-2),var2(t-2),var1(t-1) ,var2(t-1),var1(t),var2(t)

Lstm networks want a 3D input. What shape should we give to the train_X?

Do i have to give shape [X,1,5] ?

In case we had an even number for train_X (when we dont have var1(t)), we had to shape like this,

[X,2,2]

But now its not an even number and i cannot shape like this because we have 5 features for train_X

The only solution is to give shape [X,1,5]?

*X length of dataset