It can be difficult to understand how to prepare your sequence data for input to an LSTM model.

Often there is confusion around how to define the input layer for the LSTM model.

There is also confusion about how to convert your sequence data that may be a 1D or 2D matrix of numbers to the required 3D format of the LSTM input layer.

In this tutorial, you will discover how to define the input layer to LSTM models and how to reshape your loaded input data for LSTM models.

After completing this tutorial, you will know:

- How to define an LSTM input layer.
- How to reshape a one-dimensional sequence data for an LSTM model and define the input layer.
- How to reshape multiple parallel series data for an LSTM model and define the input layer.

Let’s get started.

## Tutorial Overview

This tutorial is divided into 4 parts; they are:

- LSTM Input Layer
- Example of LSTM with Single Input Sample
- Example of LSTM with Multiple Input Features
- Tips for LSTM Input

### LSTM Input Layer

The LSTM input layer is specified by the “*input_shape*” argument on the first hidden layer of the network.

This can make things confusing for beginners.

For example, below is an example of a network with one hidden LSTM layer and one Dense output layer.

1 2 3 |
model = Sequential() model.add(LSTM(32)) model.add(Dense(1)) |

In this example, the LSTM() layer must specify the shape of the input.

The input to every LSTM layer must be three-dimensional.

The three dimensions of this input are:

**Samples**. One sequence is one sample. A batch is comprised of one or more samples.**Time Steps**. One time step is one point of observation in the sample.**Features**. One feature is one observation at a time step.

This means that the input layer expects a 3D array of data when fitting the model and when making predictions, even if specific dimensions of the array contain a single value, e.g. one sample or one feature.

When defining the input layer of your LSTM network, the network assumes you have 1 or more samples and requires that you specify the number of time steps and the number of features. You can do this by specifying a tuple to the “*input_shape*” argument.

For example, the model below defines an input layer that expects 1 or more samples, 50 time steps, and 2 features.

1 2 3 |
model = Sequential() model.add(LSTM(32, input_shape=(50, 2))) model.add(Dense(1)) |

Now that we know how to define an LSTM input layer and the expectations of 3D inputs, let’s look at some examples of how we can prepare our data for the LSTM.

## Example of LSTM With Single Input Sample

Consider the case where you have one sequence of multiple time steps and one feature.

For example, this could be a sequence of 10 values:

1 |
0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 |

We can define this sequence of numbers as a NumPy array.

1 2 |
from numpy import array data = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]) |

We can then use the *reshape()* function on the NumPy array to reshape this one-dimensional array into a three-dimensional array with 1 sample, 10 time steps, and 1 feature at each time step.

The *reshape()* function when called on an array takes one argument which is a tuple defining the new shape of the array. We cannot pass in any tuple of numbers; the reshape must evenly reorganize the data in the array.

1 |
data = data.reshape((1, 10, 1)) |

Once reshaped, we can print the new shape of the array.

1 |
print(data.shape) |

Putting all of this together, the complete example is listed below.

1 2 3 4 |
from numpy import array data = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]) data = data.reshape((1, 10, 1)) print(data.shape) |

Running the example prints the new 3D shape of the single sample.

1 |
(1, 10, 1) |

This data is now ready to be used as input (*X*) to the LSTM with an input_shape of (10, 1).

1 2 3 |
model = Sequential() model.add(LSTM(32, input_shape=(10, 1))) model.add(Dense(1)) |

## Example of LSTM with Multiple Input Features

Consider the case where you have multiple parallel series as input for your model.

For example, this could be two parallel series of 10 values:

1 2 |
series 1: 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 series 2: 1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1 |

We can define these data as a matrix of 2 columns with 10 rows:

1 2 3 4 5 6 7 8 9 10 11 12 |
from numpy import array data = array([ [0.1, 1.0], [0.2, 0.9], [0.3, 0.8], [0.4, 0.7], [0.5, 0.6], [0.6, 0.5], [0.7, 0.4], [0.8, 0.3], [0.9, 0.2], [1.0, 0.1]]) |

This data can be framed as 1 sample with 10 time steps and 2 features.

It can be reshaped as a 3D array as follows:

1 |
data = data.reshape(1, 10, 2) |

Putting all of this together, the complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
from numpy import array data = array([ [0.1, 1.0], [0.2, 0.9], [0.3, 0.8], [0.4, 0.7], [0.5, 0.6], [0.6, 0.5], [0.7, 0.4], [0.8, 0.3], [0.9, 0.2], [1.0, 0.1]]) data = data.reshape(1, 10, 2) print(data.shape) |

Running the example prints the new 3D shape of the single sample.

1 |
(1, 10, 2) |

This data is now ready to be used as input (*X*) to the LSTM with an input_shape of (10, 2).

1 2 3 |
model = Sequential() model.add(LSTM(32, input_shape=(10, 2))) model.add(Dense(1)) |

## Longer Worked Example

For a complete end-to-end worked example of preparing data, see this post:

## Tips for LSTM Input

This section lists some tips to help you when preparing your input data for LSTMs.

- The LSTM input layer must be 3D.
- The meaning of the 3 input dimensions are: samples, time steps, and features.
- The LSTM input layer is defined by the
*input_shape*argument on the first hidden layer. - The
*input_shape*argument takes a tuple of two values that define the number of time steps and features. - The number of samples is assumed to be 1 or more.
- The
*reshape()*function on NumPy arrays can be used to reshape your 1D or 2D data to be 3D. - The
*reshape()*function takes a tuple as an argument that defines the new shape.

## Further Reading

This section provides more resources on the topic if you are looking go deeper.

- Recurrent Layers Keras API
- Numpy reshape() function API
- How to Convert a Time Series to a Supervised Learning Problem in Python
- Time Series Forecasting as Supervised Learning

## Summary

In this tutorial, you discovered how to define the input layer for LSTMs and how to reshape your sequence data for input to LSTMs.

Specifically, you learned:

- How to define an LSTM input layer.
- How to reshape a one-dimensional sequence data for an LSTM model and define the input layer.
- How to reshape multiple parallel series data for an LSTM model and define the input layer.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

Great explanation of the dimensions! Just wanted to say this explanation also works for LSTM models in Tensorflow as well.

Thanks Steven.

Hi Jason,

Thanks a lot for your explanations .

I have a confusion below:

Assuming that we have multiple parallel series as input for out model.The first step is to define these data as a matrix of M columns with N rows.To be 3D(samples, time steps, and features),is this means that,samples :1 sample ,time steps: row numbers of the matrix ,and features: column numbers of the matrix ? Must it be like this?Looking forward to your reply.Thank you

Sorry, I’m not sure I follow your question.

If you have parallel time series, then each series would need the same number of time steps and be represented as a separate feature (e.g. observation at a time).

Does that help?

Hi Jason,

thanks a lot for all the explanations you gave!

I tried to understand the effect of the reshape parameters and the effect in the spyder/variable explorer. But I do not understand the result shown in the data window.

I used the code from a different tutorial:

data = array([

[0.1, 1.0],

[0.2, 0.9],

[0.3, 0.8],

[0.4, 0.7],

[0.5, 0.6],

[0.6, 0.5],

[0.7, 0.4],

[0.8, 0.3],

[0.9, 0.2],

[1.0, 0.1]])

data_re = data.reshape(1, 10, 2)

When checking the result in the variable explorer of spyder I see 3 dimensions of the array but can not connect it to the paramters sample, timestep, feature.

On axis 0 of data_re I see the complete dataset

On axis 1 of the data_re I get 0.1 and 1.0 in column 1

On axis 2 of the data_re I see the column 1 of axis 0 transposed to row 1

Would you give me a hint how to interpret it?

Regards,

Oliver.

There are no named parameters, I am referring to the dimensions by those names because that is how the LSTM model uses the data.

Sorry for the confusion.

Hi Jason,

Thanks so much for the article (and the whole series in fact!). The documentation in Keras is not very clear on many things on its own.

I have been trying to implement a model that receives multiple samples of multivariate timeseries as input. The twist is that the length of the series, i.e. the “time steps” dimension is different for different samples. I have tried to train a model on each sample individually and then merge, (but then each LSTM is going to be extremely prone to overfitting). Another idea was to scale the samples to have the same time steps but this comes with a scaling factor of time steps for each sample which is not ideal either.

Is there a way to provide the LSTM with samples of dynamic time steps? maybe using a lower-level API?

Regards,

Saga

A way I use often is to pad all sequences to the same length and use a masking layer on the front end to ignore masked time steps.

Hi Jason,

Thanks very much for your tutorials on LSTM. I am trying to predict one time series from 10 different parallel time series. All of them are different 1D series. So, the shape of my X_train is (50000,10) and Y_train is (50000,1). I couldn’t figure out how to reshape my dataset and the input shape of LSTM if I want to use let’s say 100 time steps or look back as 100.

Thanks.

This post will help you formulates your series as a supervised learning problem:

https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/

Respected Sir

I want to use LSTM RNN GRU to check changes in facial expression of the person who is watching a movie. Want to check his mental state whether he is a boar or interested to continue this movie or at what time he is a boar. Can you please help me how can I start to work on same.

That sounds like a great problem. I would recommend starting by collecting a ton of training data.

Then think of using a CNN on the front end of your LSTM.

Hi,

I have around 12,000 tweets for sentiment classification totally. Do you think 16GB CPU RAM will be enough?

Sure.

Hi Jason

Thanks for the simple explanation.

However, I have a doubt. What if you don’t know the no of time steps? How do you proceed then?

Is that why we use the embedding layer?

I intend to use it for sentiment analysis of imDb movie review dataset.

You can force all input sequences to be the same length by padding/truncating.

You could also use a model that does not specify the input length, for example:

https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/

Hi Jason,

I finally understood the input shape requirements.

Just a quick question: batch_size would be a certain number of samples inside a group e.g if we have 100 samples we can divide it into batches of 10. Batching helps with a faster training time right?

Correct, and weight updates occur and state is reset at the end of each batch.

Hi Jason,

About sample (the first argument in reshape): if I have two sequences with different number of values (let’s suppose one with 10 values and another with 8) and want them to be considered as two distinct samples (not 2 features), a zero-padding is necessary?

series 1: 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0

series 2: 1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.0, 0.0

If I do:

data = data.reshape(2, 10, 1)

It is going to understand them as 2 different samples?

Yes, padding to 10 time steps.

Yes, your reshape looks good.

Explore pre and post padding to see if it makes a difference for your model.

With this input, the model is going to understand two different series?

Why to don’t use (1, 10,2) shape?

You could treat them as two features as you suggestion, I thought they were separate samples.

Hi Jason,

Thanks a lot for the tutorial!

I am trying to understand the input shape for LSTM data (No. of timesteps & no. of features). Could I ask what each will be in the context of the iris dataset, please?

Am I correct to say that in the iris dataset, the timesteps can be 2, 3, 5, 6 – as long as it neatly divides the dataset into equal number of rows (iris has 150 rows).

And the number of features will be the number of columns (apart from the target column/class)?

Thanks ever so much!

The iris dataset is not a sequence classification problem. It does not have time steps, only samples and features.

Hello!

First of all, thank you very much for your posts, I have learned a lot.

My question is because I’m not sure how to focus the next type of problems: multiples sequences of multiple features.

For example, predict the amount that a user could spend given the previous purchases (here I can consider different features such as the previous amounts, products, day of week, etc.). If I have a dataset with data of 1000 users and I want to predict the amount for each user, how should be addressed?

Can I use a lstm for all users or each user will have a model/lstm?

I understand that a lstm for all users could see things more interesting.. But I don’t know how to organize the input of different users.. because the example of two sequences (1,10,2) I don’t know how to apply.. I want to include more features for each sequence..

I’m very lost..

Thank you in advance

Perhaps start off by modeling individual users?

Thanks!

By modelling individual users do you mean a lstm per user?

I have users with 200 purchases but others only with only 10.. would be enough?

I will try!

Thanks!

Or a user per sample.

Hi Jason,

Thanks for your tutorial and for your book!

I am not sure how to design the input shape of the following table or dataframe:

date, product, store, hasPromotion, attrib1, attrib2, quantity (t)

The first three columns are the key. We have 50000 products in 20 stores and I would like to predict the quantity (per product per store) at least 14 days ahead with LSTM.

What is the good start for the 3D input?

I am wondering if creating new features from date (as there are repetition), like day of week, day of month, month of year, etc. + the existing features + quantity (t), quantity (t+1) would do…

Thank you for your help in advance!

Br,

Drop date and you have 6 features, does that help?

OK, thanks. If there are seasonality and trend in sales, should I remove them before train the LSTM, too?

Yes, I would recommend that.

Howdy!

Thank you so much for the great amount of tutorials on LSTMs

Im trying to build an LSTM in keras using your examples and keep running into shape issues.

I have time series data set with prices for different things, and am trying to predict the price of item4 for time t+1

Item4 is a lagged value so that you can use previous set of prices to predict the next.

The data set has 400 sequential observations.

variables: datetime price1 price2 price3 item4_price

since the data variable has uniform interval of observations and none are missing, i am dropping the datetime variable.

So now i have 4 variables and 400 observations.

trainX = train[:, 0:-1] #use first 3 variables

trainY = train[:,-1] #use the last variable

so now the trainX data set has price1 price2 and price3 variables (its my undestanding that this means there are 3 “feautres” in keras)

trainY is the predictor data set and only cointains item4_price

trainX = numpy.reshape(trainX, (1, 400, 3)) #reshape, this means there is 1 sample, 400 timestamps, and 3 features

model = Sequential()

model.add(LSTM(5, input_shape=(1, 400, 3), return_sequences=True))

model.add(Dense(1))

model.compile(loss=’mean_squared_error’, optimizer=’adam’)

model.fit(trainX, trainY, epochs=100, batch_size=1, verbose=2)

Keep getting various shape errors all the time, no matter what i do. I tried switching it around, and even ommiting the first dimension.

I was wondering if you could point me in the right dirrection of what it is that i keep missing in my understnading of keras/lstm shapes.

I also dont know if the trainY set needs shaping? I tried to shape it too but python was also not happy with that.

Let me know what you think!

Thanks,

Vic

Perhaps start with one series and really nail down what is required.

Did you try this tutorial:

https://machinelearningmastery.com/prepare-univariate-time-series-data-long-short-term-memory-networks/

Hi Dr. Brownlee,

I have previously read that tutorial and feel as though i understand it fine.

But when applying what I learned to the problem in a way as described previously, find that Im running into some trouble.

So i was hoping I was just overlooking something, but at this point im not really sure what. Is what Im doing seem reasonable?

Thanks!

Perhaps, but I don’t know your problem as well as you and there is no set way to solve any ml problem.

I would encourage you to brainstorm and try a suite of approaches to see what works best.

Hi Jason,

I have gone through this tutorial but i have a input size of 1762 X 4 and output size 1762X 1.

I did as follows but the shape of y train is giving as (1394, 4) , which should be 1394,1

Can you help me on this?

Sorry, I cannot debug your code for you. I simply do not have the capacity, I’m sure you can understand.

Perhaps post your error to stackoverflow or cross validated?

I got an exception “ValueError: Input arrays should have the same number of samples as target arrays. Found 1 input samples and 21 target samples”.

=>print X_train

[[ 0.15699646 -1.59383227]

[-0.31399291 -0.03680409]

[ 0.15699646 -1.59383227]

[-0.31399291 0.78456757]

[ 0.15699646 -1.59383227]

[ 4.39590078 -1.59383227]

[-0.31399291 1.38764971]

[-0.31399291 -0.03680409]

[-0.31399291 -0.32252408]

[-0.31399291 0.6081381 ]

[-0.31399291 -0.32252408]

[-0.31399291 1.38764971]

[-0.31399291 0.78456757]

[-0.31399291 -0.03680409]

[-0.31399291 0.78456757]

[ 0.15699646 1.24889926]

[-0.31399291 -0.32252408]

[-0.31399291 1.24889926]

[-0.31399291 -0.69488163]

[-0.31399291 -0.69488163]

[-0.31399291 0.6081381 ]]

=>print y_train

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 0

8 0

9 0

10 0

11 0

12 0

13 0

14 1

15 1

16 1

17 1

18 0

19 0

20 0

Name: out, dtype: int64

=>print(y_train.shape)

(21,)

=>print X_train.shape

(21, 2)

=>print X_test.shape

(8, 2)

I have reshaped the inputs to 3dimensional input. I have followed you steps.

=>X_train = X_train.reshape(1,21, 2)

print(X_train.shape)

(1, 21, 2)

=>

model = Sequential()

model.add(LSTM(32, input_shape=(21, 2)))

model.add(Dense(1))

model.compile(optimizer=’rmsprop’,loss=’categorical_crossentropy’,metrics=[‘categorical_accuracy’])

history = model.fit(X_train,y_train,batch_size =13, epochs = 14)

—————————————————————————

ValueError Traceback (most recent call last)

in ()

—-> 1 history = model.fit(X_train,y_train,batch_size =13, epochs = 14)

/home/siji/anaconda2/lib/python2.7/site-packages/keras/models.pyc in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, **kwargs)

891 class_weight=class_weight,

892 sample_weight=sample_weight,

–> 893 initial_epoch=initial_epoch)

894

895 def evaluate(self, x, y, batch_size=32, verbose=1,

/home/siji/anaconda2/lib/python2.7/site-packages/keras/engine/training.pyc in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)

1553 class_weight=class_weight,

1554 check_batch_axis=False,

-> 1555 batch_size=batch_size)

1556 # Prepare validation data.

1557 do_validation = False

/home/siji/anaconda2/lib/python2.7/site-packages/keras/engine/training.pyc in _standardize_user_data(self, x, y, sample_weight, class_weight, check_batch_axis, batch_size)

1419 for (ref, sw, cw, mode)

1420 in zip(y, sample_weights, class_weights, self._feed_sample_weight_modes)]

-> 1421 _check_array_lengths(x, y, sample_weights)

1422 _check_loss_and_target_compatibility(y,

1423 self._feed_loss_fns,

/home/siji/anaconda2/lib/python2.7/site-packages/keras/engine/training.pyc in _check_array_lengths(inputs, targets, weights)

249 ‘the same number of samples as target arrays. ‘

250 ‘Found ‘ + str(list(set_x)[0]) + ‘ input samples ‘

–> 251 ‘and ‘ + str(list(set_y)[0]) + ‘ target samples.’)

252 if len(set_w) > 1:

253 raise ValueError(‘All sample_weight arrays should have ‘

ValueError: Input arrays should have the same number of samples as target arrays. Found 1 input samples and 21 target samples.

Please solve my problem. I am new in this area. What is the mistake

Perhaps cut the example back to a few lines to help expose the fault?

Hi Jason,

I’m trying to understand the input_shape but I think I’m totally confused about the time step variable. I have a multivariate time series with 18,000 samples and 720 features. I created a 10 lagged observation dataset to forecast the next 5 time steps so my dataset goes from t-10 to t+5, being the feature dataset from t-10 to t and the label dataset from t+1 to t+5.

Assuming that I take 15,000 samples for training, what will be the values of the reshape function? I think it should be [15000, 1, 7200 (720 features * 10)]. Regarding the time step, is the value “1” correct or it should be the number of lagged observations, that is, 10?

Thank you in advance.

Generally, I would recommend about 200-400 time steps.

Here’s some more advice on how to handle a very long time series:

https://machinelearningmastery.com/handle-long-sequences-long-short-term-memory-recurrent-neural-networks/

And here:

https://machinelearningmastery.com/prepare-univariate-time-series-data-long-short-term-memory-networks/

Hi Jason!

Thank you so much for all the tutorials on LSTMs, I’ve learned a lot.

I’m trying to implement the LSTM Architecture from the paper “Dropout improves Recurrent Neural Networks for Handwriting Recognition” for resolving the handwritten recognition problem.

Basically I have to train the network giving in input variable-sized images (different W and H but always 3 channels) and to predict what is the word written in the image. What I can’t understand is how to deal with variable sized images? Can I consider images as some sequences (for ex. a 50×30 image considered as 50 sequences with 30 features?). The authors say I give in input a block of image of size 2×2 scaning in 4 different directions (multidirectional multidimensional LSTM).

What do I have to specify here : input_size(Samples,Time Steps,Features) ? The Samples refers to the number of all images I have in training set or the number of miniblocks 2×2 ? What about time steps and features? I don’t get it and its very confusing. Can you please help with any idea? I am new in this area and Im stacked in this problem.

Thanks a lot 🙂

I would recommend padding the inputs to a fixed size.

Hi Jason!

Thanks so much for your tutorials on LSTM!

I’m trying to predict trajectory with LSTM and ARIMA now. After reading this tutorial, I’ve got some questions.

(1) Do we must transfer time series to lag observations if we want to do forecast work with LSTM?

(2) After transfering time series to supervised learning problems, the forecast is only related to “order” or “lag” rather than “time”(like ARIMA do)? Why the input is not time/date? And the time interval of data must be even?

Thanks a lot in advance!

No, LSTMs can work with the time steps directly.

The order of the observations is sufficient for the model, if the time steps are consistently spaced it does not need the absolute date/time information.

Hello Jason! Congratulations on the LSTM input tutorial!

Could you please answer three questions?

I’m working with 500 samples that have varying sizes. My doubts are related to the organization of these 500 samples within this 3-dimensional input, mainly in relation to the Samples dimension.

The dimension “Features” has already defined that it will have size 26, the dimension “Time Steps” will have to have size 100 but the dimension “Samples” is that I still do not know what its value will be.

Doubt 1: In these cases of samples with different sizes to know the dimension “Samples” I have to be based on the larger sample and for the other samples I fill in the value 0 (zero) in the additional spaces?

Doubt 2: Can I have more than one line in the “Samples” dimension representing the same sample?

Doubt 3: How do samples have varying sizes, there are possibilities to work with 4 dimensions, for example: “Samples” x “Part of Samples” x “Time Steps” x “Features”?

Thank you for your attention!

One sample is one sequence.

Each sample must have the same length, you can use zero-padding to achieve this and use Masking to ignore the padded obs.

This tutorial will make it clearer I think:

https://machinelearningmastery.com/prepare-univariate-time-series-data-long-short-term-memory-networks/