How to Prepare Univariate Time Series Data for Long Short-Term Memory Networks

By Jason Brownlee on August 5, 2019 in Deep Learning for Time Series 131

It can be hard to prepare data when you’re just getting started with deep learning.

Long Short-Term Memory, or LSTM, recurrent neural networks expect three-dimensional input in the Keras Python deep learning library.

If you have a long sequence of thousands of observations in your time series data, you must split your time series into samples and then reshape it for your LSTM model.

In this tutorial, you will discover exactly how to prepare your univariate time series data for an LSTM model in Python with Keras.

Kick-start your project with my new book Deep Learning for Time Series Forecasting, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

How to Prepare Univariate Time Series Data for Long Short-Term Memory Networks
Photo by Miguel Mendez, some rights reserved.

How to Prepare Time Series Data

Perhaps the most common question I get is how to prepare time series data for supervised learning.

I have written a few posts on the topic, such as:

But, these posts don’t help everyone.

I recently got this email:

I have two columns in my data file with 5000 rows, column 1 is time (with 1 hour interval) and column 2 is bits/sec and I am trying to forecast bits/sec. In that case can you please help me to set sample, time step and feature [for LSTMs]?

There are few problems here:

LSTMs expect 3D input, and it can be challenging to get your head around this the first time.
LSTMs don’t like sequences of more than 200-400 time steps, so the data will need to be split into samples.

In this tutorial, we will use this question as the basis for showing one way to specifically prepare data for the LSTM network in Keras.

Need help with Deep Learning for Time Series?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

1. Load the Data

I assume you know how to load the data as a Pandas Series or DataFrame.

If not, see these posts:

Here, we will mock loading by defining a new dataset in memory with 5,000 time steps.

from numpy import array

# load...
data = list()
n = 5000
for i in range(n):
	data.append([i+1, (i+1)*10])
data = array(data)
print(data[:5, :])
print(data.shape)

from numpy import array

# load...

data = list()

n = 5000

for i in range(n):

data.append([i+1, (i+1)*10])

data = array(data)

print(data[:5, :])

print(data.shape)

Running this piece both prints the first 5 rows of data and the shape of the loaded data.

We can see we have 5,000 rows and 2 columns: a standard univariate time series dataset.

[[ 1 10]
 [ 2 20]
 [ 3 30]
 [ 4 40]
 [ 5 50]]
(5000, 2)

[[ 1 10]

[ 2 20]

[ 3 30]

[ 4 40]

[ 5 50]]

(5000, 2)

2. Drop Time

If your time series data is uniform over time and there is no missing values, we can drop the time column.

If not, you may want to look at imputing the missing values, resampling the data to a new time scale, or developing a model that can handle missing values. See posts like:

Here, we just drop the first column:

# drop time
data = data[:, 1]
print(data.shape)

# drop time

data = data[:, 1]

print(data.shape)

Now we have an array of 5,000 values.

(5000,)

(5000,)

3. Split Into Samples

LSTMs need to process samples where each sample is a single time series.

In this case, 5,000 time steps is too long; LSTMs work better with 200-to-400 time steps based on some papers I’ve read. Therefore, we need to split the 5,000 time steps into multiple shorter sub-sequences.

I write more about splitting up long sequences here:

There are many ways to do this, and you may want to explore some depending on your problem.

For example, perhaps you need overlapping sequences, perhaps non-overlapping is good but your model needs state across the sub-sequences and so on.

Here, we will split the 5,000 time steps into 25 sub-sequences of 200 time steps each. Rather than using NumPy or Python tricks, we will do this the old fashioned way so you can see what is going on.

# split into samples (e.g. 5000/200 = 25)
samples = list()
length = 200
# step over the 5,000 in jumps of 200
for i in range(0,n,length):
	# grab from i to i + 200
	sample = data[i:i+length]
	samples.append(sample)
print(len(samples))

# split into samples (e.g. 5000/200 = 25)

samples = list()

length = 200

# step over the 5,000 in jumps of 200

for i in range(0,n,length):

# grab from i to i + 200

sample = data[i:i+length]

samples.append(sample)

print(len(samples))

We now have 25 sub sequences of 200 time steps each.

25

If you’d prefer to do this in a one liner, go for it. I’d love to see what you can come up with.
Post your approach in the comments below.

4. Reshape Subsequences

The LSTM needs data with the format of [samples, time steps and features].

Here, we have 25 samples, 200 time steps per sample, and 1 feature.

First, we need to convert our list of arrays into a 2D NumPy array of 25 x 200.

# convert list of arrays into 2d array
data = array(samples)
print(data.shape)

# convert list of arrays into 2d array

data = array(samples)

print(data.shape)

Running this piece, you should see:

(25, 200)

(25, 200)

Next, we can use the reshape() function to add one additional dimension for our single feature.

# reshape into [samples, timesteps, features]
# expect [25, 200, 1]
data = data.reshape((len(samples), length, 1))
print(data.shape)

# reshape into [samples, timesteps, features]

# expect [25, 200, 1]

data = data.reshape((len(samples), length, 1))

print(data.shape)

And that is it.

The data can now be used as an input (X) to an LSTM model.

(25, 200, 1)

1	(25, 200, 1)

Summary

In this tutorial, you discovered how to convert your long univariate time series data into a form that you can use to train an LSTM model in Python.

Did this post help? Do you have any questions?
Let me know in the comments below.

131 Responses to How to Prepare Univariate Time Series Data for Long Short-Term Memory Networks

Steven November 17, 2017 at 9:44 am #

Great article! I wish I had this a couple months ago when I was struggling with doing the same thing for Tensorflow. Glad to see the solution I had mostly aligns with yours.

You mention some papers that discuss optimal sample size. Would you be able to share a link to those? I’m interested to see how the authors arrive at that number.

Reply
- Jason Brownlee November 18, 2017 at 10:10 am #
  
  Thanks.
  
  Perhaps check this post:
  https://machinelearningmastery.com/much-training-data-required-machine-learning/
  
  Reply
- Edival September 4, 2018 at 12:38 pm #
  
  This publication helped me a lot! I really want to thank you for the post. Very simple and straight forward.
  
  Reply
  - Jason Brownlee September 4, 2018 at 1:53 pm #
    
    I’m happy to hear that!
    
    Reply
went November 17, 2017 at 12:32 pm #

Hi Jason, thx for sharing.

let say I have a timeseries dataset [1,2,3,4,5,6,7,8] and need to split it with time steps of 4, in your article, the result will be [1,2,3,4], [5,6,7,8]. But in some other articles I’ve read, the result sometime will be is this way: [1,2,3,4], [2,3,4,5],[3,4,5,6],[4,5,6,7],[5,6,7,8].

so what will be the best way to split the samples? thx.

Reply
- Jason Brownlee November 18, 2017 at 10:11 am #
  
  All 3 approaches you have listed are valid, try each and see what works best for your problem.
  
  Reply
  - Martin November 21, 2017 at 12:31 am #
    
    Is there litterature on the subject? The 3 solutions seem to have a very distinct training time for large datasets. I assume that for the second solution we should keep the memory for the cell, but not for the third, right?
    
    Also, is there a risk that the training overexposed certain timesteps(timestep 5 in the example) in early learning, giving a bigger weight to this data.
    
    BTW great blog and your book on LSTM is the best I found on the subject. thx.
    
    Reply
    - Jason Brownlee November 22, 2017 at 10:41 am #
      
      Not really.
      
      I would suggest framing the problem each of the 3 ways and compare them to see what works best for your specific data.
      
      Perhaps this post will help you with reframing the problem:
      https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
      
      Reply
    - Phil Chen October 25, 2018 at 11:33 am #
      
      When the original univariate time series gets split into a list of subsequences with length as m, with delay between each successive subsequence as d, this forms a new samples of with m dimension input vectors. This is called Takens embedding. When d = m = 4, this is the first case. When d = 1, m = 4, this is the 2nd case. As a matter of fact, any d > 1 is valid and the same goes for m. There are multiple methods available for determine “optimal” values of d and m. Here are some of publications on the subject:
      https://arxiv.org/pdf/1605.01571.pdf
      https://file.scirp.org/pdf/JMP_2017083015084865.pdf
      
      Reply
      - Jason Brownlee October 25, 2018 at 2:02 pm #
        
        Interesting, thanks for the refs.
a November 17, 2017 at 7:51 pm #

Nice article. One thing I live about Python is list comprehension. One possible one-liner could be

samples = [data[i:i+length] for i in range(0,n, length)]

Reply
- Jason Brownlee November 18, 2017 at 10:14 am #
  
  Nice, thanks.
  
  Reply
Pedro Cadahía November 17, 2017 at 10:38 pm #

Went, what you want is called “sliding window”, you could get it in the next code:

from itertools import islice

def window(seq, n=2):
“Returns a sliding window (of width n) over data from the iterable”
” s -> (s0,s1,…s[n-1]), (s1,s2,…,sn), … ”
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result

Reply
Daniel Salvador November 20, 2017 at 7:43 pm #

Hi Jason! First, I have to say that I really like your posts, they are very helpful.

I’m facing a time series classification problem (two classes) where I have series of around 120-200 time steps and 7 variables each. The problem is that I have only 3000 samples to train. What do you think, Is it feasible a priori to feed a LSTM network or I need more samples?

You mention that LSTM doesn’t work well with more than 200-400 timesteps. What about the number of features? Would you do dimensionality reduction?

Thank you very much in advance!

Reply
- Jason Brownlee November 22, 2017 at 10:40 am #
  
  LSTMs can support multiple features.
  
  It does not sound like enough data.
  
  You could try splitting the sequence up into multiple subsequences to see if that helps?
  
  Reply
Staffan November 21, 2017 at 6:40 am #

Hi Jason,

Thank you for this excellent summary, your work is really impressive…I’m especially impressed by how many blog posts you have taken the time to write.

I was wondering why an LSTM network prefers a sequence of 200 – 400 samples, is this due to a memory allocation issue? Or can a longer sequence affect accuracy (I wouldn’t guess this but perhaps it’s possible)?

What role does the batch size play here? Couldn’t this restriction in sequence length be mitigated by selecting a correct batch size?

BR
Staffan

Reply
- Jason Brownlee November 22, 2017 at 10:45 am #
  
  It seems to be a limitation on the training algorithm. I have seen this issue discussed in the literature, but have not pushed hard to better understand it myself.
  
  I’d encourage you to test different configurations on your problem.
  
  Reply
soloyuyang December 18, 2017 at 1:26 pm #

Hi jason,
Nice post! a little confused about the “time-steps” parameter. The “time-steps” means the steps span of input data? For example, for univariate problem,and one-step forecasting, i constructed the data with “sliding window”. For each sample,the structure is “t-6,t-5,t-4,t-3,t-2,t-1,t for input(train_x),and t+1 for output(train_y) ” .Using 7 data to forecast to the 8th. i reshaped the input(train_x) as [samples, 7,1]. Is that right?

Reply
- Jason Brownlee December 18, 2017 at 3:30 pm #
  
  Learn more about time steps in this post:
  https://machinelearningmastery.com/gentle-introduction-backpropagation-time/
  
  Reply
- huo November 29, 2018 at 10:14 pm #
  
  I think so.
  
  Reply
- Rajrudra April 28, 2020 at 6:17 pm #
  
  I think time steps argument is for the number of rows, like features is for number of arguments as explained in one of Jason’s posts
  
  Reply
  - Rajrudra April 28, 2020 at 6:23 pm #
    
    Sorry, features is for number of columns
    
    Reply
  - Jason Brownlee April 29, 2020 at 6:24 am #
    
    See this:
    https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
    
    Reply
- Achintya Mengi July 14, 2021 at 9:37 pm #
  
  Hi, can I use the same methodology to prepare data for 1D-CNN ? If not is there another section where it is described for 1D-CNN ?
  
  Reply
  - Jason Brownlee July 15, 2021 at 5:28 am #
    
    There are many tutorials on 1d CNNs, start here:
    https://machinelearningmastery.com/how-to-develop-convolutional-neural-network-models-for-time-series-forecasting/
    
    Reply
Davide February 17, 2018 at 6:45 am #

Hello Jason, sorry for my english. I’m new to neural nework and i am trying to develop a neural network to generate music.
I have many .txt file with a sequence of notes like these
[int(note number), int(time), int(length)]
68 2357 159,
64 2357 260,
…
…
What kind of neural network I have to choose for this purpose?
How can i preprocess this kind of data?
Congratulations for this website and thank you.

Reply
- Jason Brownlee February 17, 2018 at 8:51 am #
  
  For sequence prediction, perhaps RNNs like the LSTM would be a good place to start.
  
  Reply
falah March 1, 2018 at 6:43 pm #

hi
I want to classify classes each class consists of 8_time steps in each time steps 16 features. is this reshape correct
reshape(124,8,1)

Reply
- Jason Brownlee March 2, 2018 at 5:31 am #
  
  I think it would be (?, 6, 16) where “?” is the number of samples, perhaps 124 if I understand your case.
  
  Reply
C.Junior April 7, 2018 at 2:05 am #

Hello, Jason, thanks for the great work.
I’ve read your articles about organizing the data for LSTM in 3D, but I can not do this with my data, it always presents an error like this:

“Error when checking target: expected dense_363 to have 2 dimensions, but got array with shape (3455, 1, 1)”

My data is organized as follows:

Appetizer:
11,000 lines with 48 columns, each row represents one day and each column represents 0.5h,

The output Y (0, 1) is binary, it represents the occurrence of an event 1 = yes, 0 = no.

So I have X = [0.1, 0.2, 0.3, …, 0.48] Y = [0] or Y = [1]

for more details see my code:

# load data
dataframe = pd.read_csv(‘Parque_A_Classificado_V2.csv’, header=None)
dataset = dataframe.values

# split data to variables train and test
train_size = int(len(dataset) * 0.7)
test_size = len(dataset) – train_size
trainX, trainY = dataset[0:train_size,:48], dataset[train_size:len(dataset),48]
testX, testY = dataset[0:test_size, :48], dataset[test_size:len(dataset), 48]

# reshape input to be [samples, time steps, features]
trainX = trainX.reshape(trainX.shape[0],trainX.shape[1], 1)
testX = testX.reshape(testX.shape[0], testX.shape[1], 1)
trainY = trainY.reshape(trainY.shape[0], 1, 1)
testY = testY.reshape(testY.shape[0], 1, 1)

#criando modelo
model = Sequential()
model.add(LSTM(100, input_shape=(48, 1)))
model.add(Dense(1, activation=’sigmoid’))
# Compile model
model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘acc’])
model.fit(trainX, trainY, validation_data(testX, testY), epochs=1, batch_size=1)

I can not find the error, can you help me?

Reply
- Jason Brownlee April 7, 2018 at 6:35 am #
  
  Maybe this post will make it clearer:
  https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
  
  Reply
  - C.Junior April 7, 2018 at 10:09 am #
    
    Many thanks, Jason, your attitude is commendable.
    This time I had to run my model.
    
    Reply
    - Jason Brownlee April 8, 2018 at 6:10 am #
      
      Glad to hear you worked out your problem.
      
      Reply
Alon May 17, 2018 at 3:07 am #

Hi Jason,

I’m struggling with a problem similar to those described here with a slight difference.

I’m solving a disaggregation problem and so my the dimensions of my output are higher than my input. in order to simplify lets say my original data looks something like this:

X.shape == [1000,1]
Y.shape == [1000,10]

I do some of the input to make things work:

X = X.reshape(1,X.shape[0[,X.shape[1]) #leaving this parameter dependent in case I want to
later use more features

My net looks like this:

model.sequential()
model.add(LSTM(50,batch_input_shape = X.shape, stateful = True)
model.add(Dense(Y.shape[1],activation = ‘relu’) #my output values aren’t between +/-1 so I
chose relu

went with a stateful model because I will most likely have to do batch seperation when running my actual training as I have close to a 10^6 samples

and then I’ve tried both doing the same thing to the Y vector and not touching it, either way I get error (when I reshaped Y I then changed Y.shape[1] to Y.shape[2])

Any thoughts?

Reply
- Jason Brownlee May 17, 2018 at 6:37 am #
  
  Output will be 2D not 3D.
  
  Reply
Srivatsa June 4, 2018 at 4:10 pm #

How can I split the 5000-row dataset into train and test portions when I am dividing into samples and reshaping it?

Reply
- Jason Brownlee June 5, 2018 at 6:34 am #
  
  You could split before or after reshaping.
  
  This post will teach you more about how to work with arrays:
  https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/
  
  Reply
Jean June 15, 2018 at 7:11 pm #

Thanks to this article and the one about reshaping input data for LSTM, I understood how to split/reshape the inputs of the LSTM network but I can’t see how to handle labels…

My dataset is 3000 time steps and 9 features. As explained in the article, I split it to get 15 samples of 200 time-steps so my input shape is (15, 200, 9).

My labels are a binary matrix (3000, 6) i.e. I want to solve a 6-class classification problem.

If I feed the labels as is, I’ll get an error “Found 15 input samples and 3000 target samples”.
How to correctly feed the labels to the network? What confuses me is that the targets should be 2D (unlike inputs) so I don’t see how I could split them in the same way as inputs, for example to get a (15, 200, 6) shape…

Reply
- Jason Brownlee June 16, 2018 at 7:26 am #
  
  You will need one label per input sample.
  
  Reply
NN June 19, 2018 at 9:41 am #

Great blog thank you!
From what I understand you showed how to handle one long time series, but I couldn’t understand what to do with multiple inputs.
For example my input is x1 with dimensions (25, 200, 1)
but I have multiple inputs for my training X = [x1,x2…xn]

How should I shape for model.fit and for the LSTM layers? a 4D tensor?

Reply
- Jason Brownlee June 19, 2018 at 2:45 pm #
  
  I explain more here:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-prepare-my-data-for-an-lstm
  
  Reply
NL June 19, 2018 at 9:51 am #

Thank you for the wonderful blog.
Where does the total number of samples to train go in the reshape?
As I understood: (num of subsamples, time stamps, features per timestamp)

Reply
- Jason Brownlee June 19, 2018 at 2:45 pm #
  
  Correct: [samples, timesteps, features].
  
  Reply
Mr. T June 22, 2018 at 2:19 am #

I love all your posts!
Im a bit confused:
I would guess that the number of time steps limits the number of recurrent layers. Since the number of time steps is equivalent to the amount of time steps you run your recurrent neural network. Is this true? If yes how can the memory of the LSTM be larger than the amount of recursions?

And if it isnt larger, why would anybody choose time steps = 1 like you did in some posts?
Thanks.

Reply
- Jason Brownlee June 22, 2018 at 6:14 am #
  
  The time steps and the nodes/layers are unrelated.
  
  Reply
  - Mr.T June 22, 2018 at 10:03 am #
    
    Sorry, I fomulated my question badly.
    
    I meant: if I have a sample sequence of lets say 100 time steps, can the memory of the LSTM be greater than these 100 time steps?
    Is the memory limited by the amount of time steps given in a sequence?
    
    Thanks for your time. T
    
    Reply
    - Jason Brownlee June 22, 2018 at 2:55 pm #
      
      The limit for the LSTM seems to be about about 200-400 time steps, from what I have read.
      
      Reply
K June 27, 2018 at 6:35 am #

Hi Jason,

Can you please explain what you mean by LSTM does not work well for 200-400 time steps, while you replied to Daniel Salvador that 3000 training samples are not enough?

Does 200-400 mean 200-400 steps ahead prediction?

How many number of training samples you think is fairly enough?

Reply
- Jason Brownlee June 27, 2018 at 8:25 am #
  
  The input data has a number of time steps. The LSTM performance appears to degrade if the number of input time steps is more than about 200-400 steps, according to the literature.
  
  I have not tested this in experiments myself though.
  
  Reply
  - Ala Agrebi February 3, 2024 at 2:41 am #
    
    Question please in this situation. how can i choose the size of batch when i want to train my model with Keras.
    
    NB: the shape of my data is like this : [3700, 200, 150] when :
    
    – 3700 : number of samples (740 samples * 5 classes)
    – 200: time steps
    – 150: features
    
    Reply
    - James Carmichael February 3, 2024 at 9:39 am #
      
      Hi Ala…The following resource may be of interest to you:
      
      https://machinelearningmastery.com/how-to-control-the-speed-and-stability-of-training-neural-networks-with-gradient-descent-batch-size/
      
      Reply
Eric Zheng August 23, 2018 at 12:32 pm #

Dear Jason,

Could you help me with this:
I have many phrases, and each phrases contains many words (I have padded so that they are of the same length), and I have trained word embedding for each word. So, in this case, if I want to use LSTM in keras to do some classification task (e.g. each phrase is labeled as 1 or 0, it’s related to the order of words), what will be my input shape for the LSTM layer in this case? Is it like shape (#of phrases, #of words in phrase, # of dimension of word embedding) ? I am a little confused here. Thanks for your help.

Reply
- Jason Brownlee August 23, 2018 at 1:55 pm #
  
  Probably: [total phrases, phrase length, 1]
  
  Reply
  - Eric Zheng August 24, 2018 at 1:41 am #
    
    Thanks for your reply. But I’m still confused here.
    1. Why it is “1” at last?
    2. I think the shape of my input numpy arrary (which will be thrown into Keras sequential model, whose first layer is a LSTM layer) is (#of phrases, #of words in phrase, # of dimension of word embedding). Does is mean that the input shape is (#of words in phrase, # of dimension of word embedding)? Because I want to learn something based on the sequence order between words.
    My task is very similar to the task in one of your post. https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/
    In that post, the original input is a vector of words. Then, it will be put into a keras sequential model, however, the first layer is a Word Embedding layer, then followed by the LSTM layer. The output shape of word embedding layer should be a (2D) array, right?. Does that means the input shape of LSTM in this case is 2D rather than 3D? If it’s not, what will be the input shape in that case.
    Thanks for your help.
    
    Reply
    - Jason Brownlee August 24, 2018 at 6:14 am #
      
      Because you have a 1d sequence of integers to feed into your model, e.g. as input to the embedding.
      
      The word embedding will handle its own dimensionality, don’t reshape data for it.
      
      Reply
Pier September 3, 2018 at 9:22 pm #

Hi Jason, could you help me on this?
My dataset has not been collected continuously, but it’s the result of many experiments, each one representing a specific class that I want my LSTM model to learn and predict.
Which is the best strategy to prepare the sequences for the training phase?
Should I concatenate all timeseries available and then use a sliding window to generate the sequences? in this case I may risk to have data of different classes in the same sequence…
Or would it be better to create the sequences separately for each individual class?
Thanks in advance

Reply
- Jason Brownlee September 4, 2018 at 6:06 am #
  
  Perhaps brainstorm 3-5 different ways to frame the prediction problem, then prototype a few. This will help you clarify what the right approach might be.
  
  Reply
Christopher September 15, 2018 at 4:50 pm #

Hi Jason,
Super post!
You did not do things like the following in your multivariate time seriies of PM 2.5 exmaple at https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/ .
# split into samples (e.g. 5000/200 = 25)
samples = list()
length = 200
# step over the 5,000 in jumps of 200
for i in range(0,n,length):
# grab from i to i + 200
sample = data[i:i+length]
samples.append(sample)
print(len(samples))

Is it becasue that PM2.5 example assumes overlaping subsequences? Or would you have any other reasons?

For your convenience, you have the following snippets in that PM2.5 example:
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
(8760, 1, 8) (8760,) (35039, 1, 8) (35039,)

After I change the n_hours=3, i.e., the timesteps, I have the following output in my Spyder:
(8760, 3, 8) (8760,) (35037, 3, 8) (35037,)
Train on 8760 samples, validate on 35037 samples

This means they are overlapping subsequences.

Please let me know if I get it right or not.

Many thanks.

Reply
- Jason Brownlee September 16, 2018 at 5:57 am #
  
  Because I try to keep tutorials simple and focused.
  
  Reply
  - Christopher September 16, 2018 at 11:58 am #
    
    Thanks for your reply. You did not elaborate in this tutorial on when one needs overlapping subsequences, when not. Would you have a tutorial about that, or any tips?
    
    Reply
    - Jason Brownlee September 17, 2018 at 6:28 am #
      
      It really depends on the problem, e.g. the composition of the input samples is defined by what you want the model to learn or map to the output sample.
      
      Reply
Rashmi Ravishankar September 18, 2018 at 11:06 pm #

This is awesome (as is your entire series)! I consistently find your articles concise, clear and lucid, so thank you.

A small suggestion about the LSTM series however- you could add a couple of lines about the shaping of Y and the return sequence option. I struggled with it earlier, despite reading all your LSTM articles so it would probably help others!

Reply
- Jason Brownlee September 19, 2018 at 6:20 am #
  
  The return sequence will be a one value per input time step, per node in the layer.
  
  E.g. the layer gets 10 time steps of 1 variable and the layer has 100 nodes, you will get [100, 10, 1].
  
  Reply
  - Akhi March 7, 2020 at 12:07 am #
    
    Hii Jason, I have related question:
    
    I have aggregated power consumption of house and individual power consumption of appliance (example: dish washer). Here, Aggragated power is my training set and appliance power consumption is target set.
    
    Do i need to reshape both for training my model, just like the way you did in this article?
    
    Reply
    - Jason Brownlee March 7, 2020 at 7:18 am #
      
      Perhaps. LSTMs have a specific expectation when it comes to the shape of input, see this:
      https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
      
      Reply
Nancyz20 September 22, 2018 at 9:39 am #

Would you provide example of the shape of label data? what should be the dimension? can we train on 24 samples and predict the 25th sample?

Reply
- Jason Brownlee September 23, 2018 at 6:35 am #
  
  Here’s an example of making a prediction:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
Dominik Zulovec Sajovic November 6, 2018 at 5:39 am #

Hi,

I just wanna say you have awesome articles!

Here is my question:

Let’s say we split the data into a shape

(100, 60, 5)

Meaning 100 samples, each of them looking 60-time steps back and 5 features.

Would I be correct to assume that after we split the data as described, we could now shuffle the 100 samples as we wished and the result would be the same.

So we could apply normal cross-validation which is otherwise not possible with RNNs?

Reply
- Jason Brownlee November 6, 2018 at 6:35 am #
  
  Thanks.
  
  No, you cannot shuffle the samples. You must use walk-forward validation:
  https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
  
  Reply
Guicheng Tian November 9, 2018 at 6:38 pm #

Hi Jason! First, I have to say that I really like your posts, they are very helpful.
I have some questions about TimeSeries, would you give me some suggestion ?

1. Suppose the data: t1, t2,…t10, I prepare the data by rolling window, the window size is 3, such as [t1,t2,t3] -> t4, Then i trained a LSTM model, I want to know how to predict one time step in future ? for example: predict value on time t20 in future, but the histiry feature [t17, t18, t19] is null.

2. DO i need to prepare my data by rolling window if every timestep has a label? such as binary classify problem:

t1, f11, f12, f13, 1
t2, f21, f22, f23, 0
….
tn, fn1, fn2, fn3, 1

When i train LSTM, i reshape N time samples [N, 3] to [-1, timesteps, 3], N is number of time samples, shape of train data feed to LSTM is [-1, timesteps, 3], but this require N must equal to k * timesteps, for exampe, [60, 3] -> [-1, 12, 3] will be Ok, but [50, 3] -> [-1, 12, 3] will be wrong. I want to know how to process last 2 time sampes, should i pad zeros vector to get a sequence size 12 ?

Thank you very very much.

Reply
- Jason Brownlee November 10, 2018 at 6:00 am #
  
  I have many example, you can get started here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
weiliu November 21, 2018 at 2:43 pm #

Hi, Jason. Thanks for your post.

I have some question. Suppose I have to forecasting the number of people in one region. We divide the region into 3×3 grid, each grid has the value of the current number of people. Then every one hour time interval there are 3×3 matrix, for example 8:00 there are 3×3 matrix, 9:00 there are 3×3 matrix, our goal is to use the previous two time interval (i.e. 10:00 and 11:00) to forecasting the next two interval (i.e. 10:00 and 11:00) numbers of people. How should I to deal this task. Thanks!

Reply
- Jason Brownlee November 22, 2018 at 6:20 am #
  
  You can use a CNN-LSTM or ConvLSTM for read in a matrix time series, then use an encoder-decoder model to output multiple steps.
  
  Reply
Giulia February 8, 2019 at 7:53 pm #

Every thing clear, but I’ve a problem.
The line code:
data = array(samples)
doesn’t return (number_samples, time_steps), but only (number_samples,), consequently the reshape instruction doesn’t work.
How can I solve this problem?

Reply
- Jason Brownlee February 9, 2019 at 5:56 am #
  
  Sorry that you’re having trouble, perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Sandipan Banerjee March 17, 2019 at 10:31 am #

Thanks a lot for this blog. I am a complete newbie in this field, ( I am a PhD student in Mechanical Engg), and want to ask you for your opinion on how to handle spatio-temporal data. Let me explain, I am have some simulation results of fluid flow in a 2-d domain, the location of each point of the domain has specific x and y coordinates, has specific velocities, pressure, vorticities etc. For each time-step I have a separate data file, each data file contains multiple rows and columns. Each row corresponds to a specific point on the domain, and each column has the x-coordinate, y-coordinate of the point, the velocities, vorticities at the point, etc. What type of neural network would you suggest for this problem? I would also be very helpful if you can point me to a post which deals with similar situations if you are aware of such posts or blog?

Once again. Please accept my thank-you for the immense help I have received from your blog posts

Reply
- Sandipan Banerjee March 17, 2019 at 10:34 am #
  
  I forgot to add:
  
  With the data for multiple time-steps I would like to train a model which would predict the velocities and vorticity at each point for future time-steps. Again, Each time-step has separate files containing multiple data-points. The locations of each specific point remains unchanged with time.
  
  Reply
- Jason Brownlee March 18, 2019 at 6:03 am #
  
  Great question!
  
  If the data is spatially related, e.g. like a time series of 2d images, then I would recommend looking into CNN-LSTMs and ConvLSTMs.
  
  Reply
  - Sandipan Banerjee March 19, 2019 at 12:11 am #
    
    Thanks for your response, I will look into CNN-LSTMs. One more question. How can I feed data of multiple time-steps when each time-step has a separate file containing multiple rows? The examples I find all have each time-step has a single row in a data file. But my data has multiple time-steps each time-step with a single file.
    
    Reply
    - Jason Brownlee March 19, 2019 at 8:58 am #
      
      Perhaps load all the data into memory or use a data generator to load the data progressively.
      
      Reply
David May 5, 2019 at 4:29 am #

Very nice tutorial. I have a question about y_train.
My X_train has a shape of (958, 75, 10) after applying this tutorial. However, my y_train is just (71850, 9) which is a long array containing a one_hot_encoder vector of 9 possible classes.
I don’t understand how to reshape y_train to be ready for the LSTM model in Keras.

Reply
- Jason Brownlee May 5, 2019 at 6:35 am #
  
  The input and output elements of samples must match up.
  
  E.g. if you have 958 input samples, you need 958n output elements for those same samples.
  
  Reply
  - David May 6, 2019 at 4:55 pm #
    
    Thanks for the answer. To clarify, that means that for each batch I only need one one_hot_encoder vector? Because samples I have in both 71850, but then I reshape that using batches of 75 samples. Then I would need only to catch one vector [1,0,0,0,0,0,0,0,0,0] every 75 samples?
    
    Reply
    - Jason Brownlee May 7, 2019 at 6:13 am #
      
      I don’t follow, sorry. I don’t know where one hot encoding or batches came into the picture.
      
      Reply
      - David May 17, 2019 at 9:42 pm #
        
        Hi Jason,
        I just did the question in StackOverflow maybe it is more clear:
        
        https://stackoverflow.com/questions/55983867/what-is-the-correct-format-of-y-train-when-the-output-is-one-of-8-classes-using
      - Jason Brownlee May 18, 2019 at 7:37 am #
        
        Perhaps you can summarize it in a sentence or two?
David May 6, 2019 at 9:31 pm #

Hello, I have a new question as updating the last one.
If I’m correct the shape of y_train depends on the Model if I train a stateful model, it has to be (958, 9) however if I train non-stateful model it has to be (958, 75, 9)?

Reply
Ryan Gan June 19, 2019 at 2:20 am #

Hello Dr. Brownlee,

Is it possible to send vectors of vectors into LSTM? I have the word tokens (seq 1) “hello” “how” “are” “you” etc… each word is represented as a vector in word2vec. SO I get vector of vectors. This is for the seq 1. I have upto seq n. How can I use these as input to LSTM?

Reply
- Jason Brownlee June 19, 2019 at 8:16 am #
  
  Yes, typically we use an embedding to store the vectors. You can provide the vectors directly to the LSTM, each each element in the vector is a feature.
  
  Therefore you will have a sequence (words or timesteps) of feature vectors (features or embedding).
  
  Reply
mahag July 9, 2019 at 8:51 pm #

great tutorial, i have a question
so if i have a single dimensional data, should i have to make it become two dimensional data by adding the time steps?
so it become for example:
[1 20]
[2 25]
[3 30]
.
.
.
[10 65]
thank you
best regards

Reply
- Jason Brownlee July 10, 2019 at 8:07 am #
  
  Good question, this will help you understand:
  https://machinelearningmastery.com/time-series-forecasting-supervised-learning/
  
  Reply
Hugh July 23, 2019 at 11:53 pm #

Great resource, but I am still confused about something. I have a classification problem with 5 features but i want to train the model to recognize the sequence of timesteps for each label. That is to say that sequence_feature1 + sequence_feature2 + sequence_feature3 + sequence_feature4 + sequence_feature5 = label. Is there a way to amend the LSTM input to account for this? Otherwise it seems like I am training the model that feature1+feature2 +feature3 + feature4 +feature5 = label at each timestep (which is not correct). Thanks

Reply
- Jason Brownlee July 24, 2019 at 8:01 am #
  
  I don’t follow the structure of your problem sorry.
  
  If you want memory of prior predictions, the model will have some internal state – at least until that state is reset.
  
  Reply
Kent August 8, 2019 at 4:31 pm #

The finalized data you prepared as 3D numpy array is non-overlap right?

So, the non-overlap means that there are some points we cannot predict.

In the example ( [samples, timesteps, feature] = [25, 200, 1] ),
・Predict value at 201 by using values at 1-200.
・Predict value at 401 by using values at 201-400, and so on.

So , we cannot predict, say, 202 or 207, right. Because there is no overlapping.

Is my understanding correct?

Reply
- Jason Brownlee August 9, 2019 at 8:05 am #
  
  You can design the problem framing any way you wish.
  
  Overlapping is preferred generally, it really depends on how you want to use the model in practice.
  
  Reply
kent August 9, 2019 at 3:40 pm #

Thanks very much for the answer.

Reply
- Jason Brownlee August 10, 2019 at 7:11 am #
  
  You’re welcome.
  
  Reply
Ruben Chevez January 29, 2020 at 6:34 am #

Thank you, Jason, for the great post. Following your post we end up with the following data shape
(25, 200, 1). Now, let’s imagined our model is trained. Our model now expects an input of ( variable , 200, 1). What happens if I have a single data point and I want to predict the binary output (positive/negative).

Even reshaping it, it will end up as (1, 1, 1) and the second dimension will make the model crash because it expects 200 lines per batch. Should I just adjust to the second dimension and fill the rest with null values? (1, 200, 1) The first line will be my data point but the rest will be null rows?

Reply
- Jason Brownlee January 29, 2020 at 6:49 am #
  
  You’re welcome.
  
  Perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
Jung Song February 16, 2020 at 5:47 am #

Hello Jason,

I appreciate your article, and I have a question on LSTM:

When you feed (25, 200, 1) into an LSTM layer, will there be 200 LSTM processors(where input, forget gate exist) in this LSTM layer to process inputs starting from 10, 20, then all the way to 2,000 and memorize?

For your multivariate example with the input shape of (8760, 1, 8), will there be only 1 LSTM processor because the timestep is 1? Would LSTM be effective since there is only one timestep to remember?

Thanks!

Reply
- Jason Brownlee February 16, 2020 at 6:16 am #
  
  No.
  
  Number of time steps and number of units in the layer are different.
  
  See this:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
Viktor April 24, 2020 at 4:32 am #

Hello Jason,

thank you for nice learning materials. I would like to ask you how would you proceed if you would have multiple 5000 series (for example from multiple different observations). For example if you would have got 1000 measures, 5000 each and you would not care about the individual ones because you would like to learn some general pattern (so there is no point to differ between them in features)?

Thanks!

Reply
- Jason Brownlee April 24, 2020 at 5:54 am #
  
  Good question, this will give you ideas:
  https://machinelearningmastery.com/faq/single-faq/how-to-develop-forecast-models-for-multiple-sites
  
  Reply
  - Viktor April 24, 2020 at 6:12 am #
    
    Thank you for the reply, but the answers in link still takes in mind some individuality. Take as an example car speed prediction from some sensors reading (for example accelerometer). You can have thousands of sessions where you don’t care what car or sensor it was. Everything that interests you is session (that have some beginning and the end), accelerometer readings and speed. And you have thousands of really long sessions.
    
    Does in this case make sense for example training stateful LSTM per session (the session could be time serie, chopped to smaller segments, which will serve as inputs for prediction – as it is in the article) and run training in loop through all sessions with resetting states in between?
    
    I am trying to wrap my head around recurrent networks and most of the articles are based on maybe little bit oversimplified cases.
    
    Thanks
    
    Reply
    - Jason Brownlee April 24, 2020 at 8:01 am #
      
      Probably not, you must start with a really strong definition of what you want to predict and what might be useful as input to making the prediction:
      https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
      
      Reply
      - Viktor April 24, 2020 at 4:55 pm #
        
        OK, so taking my example – I got source data in format [2000, 30, 250, 3] – 2000 sequences, each splitted to 30 time series with length 250 and 3 feature (reading from accelerometer). For each of time series I want to predict speed.
        
        How to train on these data?
        
        for i in range(2000]):
        model.fit(X[i], y[i], epochs=1, verbose=0, batch_size=1)
        
        with X[0] in shape 30x250x3 and y[0] 30×1 (if this is many to one)?
        
        Sorry for bothering you, but as I said, I have never seen example using data like this, so I’m not sure how to approach it.
        
        Than again for your help.
      - Jason Brownlee April 25, 2020 at 6:39 am #
        
        It depends how the 300 time series for each sample are related.
        
        If they are unrelated, you have 2000*300 samples.
        If they are related, perhaps the above, or perhaps fit a separate model on each.
        Or perhaps a convlstm or cnn lstm.
Viktor April 25, 2020 at 7:04 am #

OK, thank you for answers. I was just not sure, if I am not missing something.

Reply
- Jason Brownlee April 25, 2020 at 7:04 am #
  
  You’re welcome.
  
  Reply
Rajrudra April 28, 2020 at 6:27 pm #

Sorry, features is for number of columns

Reply
Rajrudra April 28, 2020 at 6:49 pm #

Hey Jason, I love your way of explanation but I have a doubt
I want to predict using LSTM but i am facing problems. Here is my code

def predict(self) -> list:
“””
A method to predict using the test data used in creating the class
“””
yhat = []

if(self.train_test_split > 0):

# Getting the last n time series
_, X_test, _, _ = self.create_data_for_NN()

# Making the prediction list
yhat = [y[0] for y in self.model.predict(X_test)]

return yhat

def predict_n_ahead(self, n_ahead: int):
“””
A method to predict n time steps ahead
“””
X, _, _, _ = self.create_data_for_NN(use_last_n=self.lag)

# Making the prediction list
yhat = []

for _ in range(n_ahead):
# Making the prediction
fc = self.model.predict(X)
yhat.append(fc)

# Creating a new input matrix for forecasting
X = np.append(X, fc)

# Ommiting the first variable
X = np.delete(X, 0)

# Reshaping for the next iteration
X = np.reshape(X, (1, len(X), 1))

return yhat
Can you suggest me a better way to predict ?
Because in the predict() function, there is recursion which is a little difficult for me to wrap my mind around

Reply
- Jason Brownlee April 29, 2020 at 6:24 am #
  
  Sorry, I don’t have the capacity to review/debug your code.
  
  Perhaps you can summarize your problem as a question in a sentence or two?
  
  Reply
  - Rajrudra April 29, 2020 at 3:40 pm #
    
    So if you could make any sense of my code, I actually want a shorter and easier method to make predictions
    
    Reply
Rajrudra April 30, 2020 at 3:05 pm #

Please Jason, I need your help

Reply
- Jason Brownlee May 1, 2020 at 6:28 am #
  
  I’m happy to answer specific questions about machine learning or the tutorials.
  
  Reply
Rajrudra May 1, 2020 at 3:42 pm #

So can you tell me ,perhaps with an example code, how to use keras evaluate() ?

Reply
- Jason Brownlee May 2, 2020 at 5:39 am #
  
  Yes, you can see 100s of examples on the blog, perhaps start here:
  https://machinelearningmastery.com/start-here/#deeplearning
  
  Reply
Espen Holst Wiik May 13, 2020 at 7:18 am #

I tried to reshape with my own dataset which has 38881 rows and 7 columns. 8 if you count Target variable as my problem is a classification problem.
I’m having such a hard time with this step.
Your example did not work for me as my dataset is already a 2d array (I think).
How do i convert my dataset to a 3d dataset?
Thank you for your help

Reply
- Jason Brownlee May 13, 2020 at 7:44 am #
  
  Perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
  - Espen Holst Wiik May 13, 2020 at 8:50 pm #
    
    Hi. Thank you for the reply. It did help somewhat, but when i attempt data.reshape (x, x ,x) my array cannot be reshaped into my desired 3d array as the numbers are not devidable without getting decimals. Any way around this? e.g. the last 2d array doesn’t has to be full?
    
    Reply
    - Jason Brownlee May 14, 2020 at 5:49 am #
      
      Perhaps reshape using numbers that are factors for your data.
      
      Reply
Monica May 16, 2020 at 1:29 am #

Hi Jason. Thanks for the guide, it is the only one clear I have found so far.
However, I am not able to run my case.

The main problem it seems when it comes to prediction.
I am trying to predict the response of a non-linear system using time sequences (it is a non-linear mass-spring system forced by f(t)=cost*sin(omega*t)).

From what you said in this article, so I have:
– 2 samples (input force at 2 different omega values).
– Each sample contains 2000 time steps.
– 1 feature (the input force is the only input I give to my model)

=> (2,2000,1) is the input tensor.
I gave input_shape=(2000,1) to the LSTM layer.

This dataset trains the model in a quite satisfactory way. But :
1- I am not able to use validation_data of shape (1,2000,1). It corresponds to an input force at a different omega than training set.
2- I am not able to predict the response of the system at a different omega input force. Same shape as for validate but used a different omega for the input force

Could you help me understanding where I am wrong?

I get the following error on prediction code line:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Specified a list with shape [2,1] from a tensor with shape [1,1]

Hope you can help! Thanks in advance.

Reply
- Jason Brownlee May 16, 2020 at 6:16 am #
  
  2k time steps is too many. Try to limit it to 200-400.
  
  Perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
  - Monica May 18, 2020 at 4:57 am #
    
    Hi Jason, I have already read it and I have set the data dimensions according to it. I am asking how validation data and the input data must be if I train using more than 1 sample of data. I always get the error about expecting [2,1] dimensions for the data I want to predict too. I would aspect I train using more samples and I can predict on other bunch of data which is given in the form of 1sample (same number of features, same number of time steps), so:
    
    training data : (2,2000,1)
    validation data : (1,2000,1)
    prediction data : (1,2000,1)
    
    I see 2000 time steps may be too long, but even if I reduce time steps, the data shape for prediction is not accepted by the predict() function.
    Thanks
    
    Reply
    - Jason Brownlee May 18, 2020 at 6:22 am #
      
      Not sure about why you’re getting an error – but the shape of your data does not look right. Too many time steps and far too few samples.
      
      Reply
Shubham Agrawal March 5, 2021 at 6:11 am #

Great Article. I wonder how is labelling done for each sample? Suppose timeseries heart rate data to predict whether person is stationary(0) or moving(1). Heart-rate : [1,2,3,4,5,6,7,8,9,10,11,12]. Target variable: [1,0,1,0,1,1,0,1,0,1,0,0]. Suppose we split this into 4 timesteps. This will give 3 samples: [1,2,3,4],[5,6,7,8],[9,10,11,12]. How will the target variable for each sample look like now?

Reply
- Jason Brownlee March 5, 2021 at 8:15 am #
  
  You would have one output value or output sequence per input sequence (sample).
  
  Reply
Josh September 1, 2021 at 8:02 pm #

I have read in comments and I still don’t get how to change the labelling for each sample. I know I will have one output sequence per input sample but: how do we choose this value?

Before we had for example 5000 time steps and therefore 5000 labels for each. Now we have 25 samples of 200 time steps, but how do we split the labels to have 25 labels according to the 25 samples?

Thank you in advance.

Reply
- Jason Brownlee September 2, 2021 at 5:08 am #
  
  This will help:
  https://machinelearningmastery.com/time-series-forecasting-supervised-learning/
  
  And this:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  And this:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
David August 11, 2023 at 7:05 pm #

Hello Jason, thank you for your article.
I am facing a problem, I have a dataset of approximately 9000 different products, each having a monthly demand over 5 years (60 periods demand). So one row represents a product, and each of the 60 columns represents its 5 years demand.
I want to build an LSTM network to make forecast for 1 year for each products. What is the format of the input of the LSTM do I need ? I want my LSTM to learn from all the products.

Reply
- James Carmichael August 12, 2023 at 9:56 am #
  
  Hi David…You are very welcome! You may wish to approach your problem as a “multivariate LSTM”.
  
  https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/
  
  Reply

Navigation

How to Prepare Univariate Time Series Data for Long Short-Term Memory Networks

How to Prepare Time Series Data

Need help with Deep Learning for Time Series?

1. Load the Data

2. Drop Time

3. Split Into Samples

4. Reshape Subsequences

Further Reading

Related Posts

API

Summary

Develop Deep Learning models for Time Series Today!

Develop Your Own Forecasting models in Minutes

Finally Bring Deep Learning to your Time Series Forecasting Projects

More On This Topic

131 Responses to How to Prepare Univariate Time Series Data for Long Short-Term Memory Networks

Leave a Reply Click here to cancel reply.