How to Prepare Univariate Time Series Data for Long Short-Term Memory Networks

It can be hard to prepare data when you’re just getting started with deep learning.

Long Short-Term Memory, or LSTM, recurrent neural networks expect three-dimensional input in the Keras Python deep learning library.

If you have a long sequence of thousands of observations in your time series data, you must split your time series into samples and then reshape it for your LSTM model.

In this tutorial, you will discover exactly how to prepare your univariate time series data for an LSTM model in Python with Keras.

Let’s get started.

How to Prepare Univariate Time Series Data for Long Short-Term Memory Networks

How to Prepare Univariate Time Series Data for Long Short-Term Memory Networks
Photo by Miguel Mendez, some rights reserved.

How to Prepare Time Series Data

Perhaps the most common question I get is how to prepare time series data for supervised learning.

I have written a few posts on the topic, such as:

But, these posts don’t help everyone.

I recently got this email:

I have two columns in my data file with 5000 rows, column 1 is time (with 1 hour interval) and column 2 is bits/sec and I am trying to forecast bits/sec. In that case can you please help me to set sample, time step and feature [for LSTMs]?

There are few problems here:

  • LSTMs expect 3D input, and it can be challenging to get your head around this the first time.
  • LSTMs don’t like sequences of more than 200-400 time steps, so the data will need to be split into samples.

In this tutorial, we will use this question as the basis for showing one way to specifically prepare data for the LSTM network in Keras.

1. Load the Data

I assume you know how to load the data as a Pandas Series or DataFrame.

If not, see these posts:

Here, we will mock loading by defining a new dataset in memory with 5,000 time steps.

Running this piece both prints the first 5 rows of data and the shape of the loaded data.

We can see we have 5,000 rows and 2 columns: a standard univariate time series dataset.

2. Drop Time

If your time series data is uniform over time and there is no missing values, we can drop the time column.

If not, you may want to look at imputing the missing values, resampling the data to a new time scale, or developing a model that can handle missing values. See posts like:

Here, we just drop the first column:

Now we have an array of 5,000 values.

3. Split Into Samples

LSTMs need to process samples where each sample is a single time series.

In this case, 5,000 time steps is too long; LSTMs work better with 200-to-400 time steps based on some papers I’ve read. Therefore, we need to split the 5,000 time steps into multiple shorter sub-sequences.

I write more about splitting up long sequences here:

There are many ways to do this, and you may want to explore some depending on your problem.

For example, perhaps you need overlapping sequences, perhaps non-overlapping is good but your model needs state across the sub-sequences and so on.

Here, we will split the 5,000 time steps into 25 sub-sequences of 200 time steps each. Rather than using NumPy or Python tricks, we will do this the old fashioned way so you can see what is going on.

We now have 25 sub sequences of 200 time steps each.

If you’d prefer to do this in a one liner, go for it. I’d love to see what you can come up with.
Post your approach in the comments below.

4. Reshape Subsequences

The LSTM needs data with the format of [samples, time steps and features].

Here, we have 25 samples, 200 time steps per sample, and 1 feature.

First, we need to convert our list of arrays into a 2D NumPy array of 25 x 200.

Running this piece, you should see:

Next, we can use the reshape() function to add one additional dimension for our single feature.

And that is it.

The data can now be used as an input (X) to an LSTM model.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Related Posts

API

Summary

In this tutorial, you discovered how to convert your long univariate time series data into a form that you can use to train an LSTM model in Python.

Did this post help? Do you have any questions?
Let me know in the comments below.

Develop LSTMs for Sequence Prediction Today!

Long Short-Term Memory Networks with Python

Develop Your Own LSTM models in Minutes

…with just a few lines of python code

Discover how in my new Ebook:
Long Short-Term Memory Networks with Python

It provides self-study tutorials on topics like:
CNN LSTMs, Encoder-Decoder LSTMs, generative models, data preparation, making predictions and much more…

Finally Bring LSTM Recurrent Neural Networks to
Your Sequence Predictions Projects

Skip the Academics. Just Results.

Click to learn more.


39 Responses to How to Prepare Univariate Time Series Data for Long Short-Term Memory Networks

  1. Steven November 17, 2017 at 9:44 am #

    Great article! I wish I had this a couple months ago when I was struggling with doing the same thing for Tensorflow. Glad to see the solution I had mostly aligns with yours.

    You mention some papers that discuss optimal sample size. Would you be able to share a link to those? I’m interested to see how the authors arrive at that number.

  2. went November 17, 2017 at 12:32 pm #

    Hi Jason, thx for sharing.

    let say I have a timeseries dataset [1,2,3,4,5,6,7,8] and need to split it with time steps of 4, in your article, the result will be [1,2,3,4], [5,6,7,8]. But in some other articles I’ve read, the result sometime will be is this way: [1,2,3,4], [2,3,4,5],[3,4,5,6],[4,5,6,7],[5,6,7,8].

    so what will be the best way to split the samples? thx.

    • Jason Brownlee November 18, 2017 at 10:11 am #

      All 3 approaches you have listed are valid, try each and see what works best for your problem.

      • Martin November 21, 2017 at 12:31 am #

        Is there litterature on the subject? The 3 solutions seem to have a very distinct training time for large datasets. I assume that for the second solution we should keep the memory for the cell, but not for the third, right?

        Also, is there a risk that the training overexposed certain timesteps(timestep 5 in the example) in early learning, giving a bigger weight to this data.

        BTW great blog and your book on LSTM is the best I found on the subject. thx.

  3. a November 17, 2017 at 7:51 pm #

    Nice article. One thing I live about Python is list comprehension. One possible one-liner could be

    samples = [data[i:i+length] for i in range(0,n, length)]

  4. Pedro Cadahía November 17, 2017 at 10:38 pm #

    Went, what you want is called “sliding window”, you could get it in the next code:

    from itertools import islice

    def window(seq, n=2):
    “Returns a sliding window (of width n) over data from the iterable”
    ” s -> (s0,s1,…s[n-1]), (s1,s2,…,sn), … ”
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
    yield result
    for elem in it:
    result = result[1:] + (elem,)
    yield result

  5. Daniel Salvador November 20, 2017 at 7:43 pm #

    Hi Jason! First, I have to say that I really like your posts, they are very helpful.

    I’m facing a time series classification problem (two classes) where I have series of around 120-200 time steps and 7 variables each. The problem is that I have only 3000 samples to train. What do you think, Is it feasible a priori to feed a LSTM network or I need more samples?

    You mention that LSTM doesn’t work well with more than 200-400 timesteps. What about the number of features? Would you do dimensionality reduction?

    Thank you very much in advance!

    • Jason Brownlee November 22, 2017 at 10:40 am #

      LSTMs can support multiple features.

      It does not sound like enough data.

      You could try splitting the sequence up into multiple subsequences to see if that helps?

  6. Staffan November 21, 2017 at 6:40 am #

    Hi Jason,

    Thank you for this excellent summary, your work is really impressive…I’m especially impressed by how many blog posts you have taken the time to write.

    I was wondering why an LSTM network prefers a sequence of 200 – 400 samples, is this due to a memory allocation issue? Or can a longer sequence affect accuracy (I wouldn’t guess this but perhaps it’s possible)?

    What role does the batch size play here? Couldn’t this restriction in sequence length be mitigated by selecting a correct batch size?

    BR
    Staffan

    • Jason Brownlee November 22, 2017 at 10:45 am #

      It seems to be a limitation on the training algorithm. I have seen this issue discussed in the literature, but have not pushed hard to better understand it myself.

      I’d encourage you to test different configurations on your problem.

  7. soloyuyang December 18, 2017 at 1:26 pm #

    Hi jason,
    Nice post! a little confused about the “time-steps” parameter. The “time-steps” means the steps span of input data? For example, for univariate problem,and one-step forecasting, i constructed the data with “sliding window”. For each sample,the structure is “t-6,t-5,t-4,t-3,t-2,t-1,t for input(train_x),and t+1 for output(train_y) ” .Using 7 data to forecast to the 8th. i reshaped the input(train_x) as [samples, 7,1]. Is that right?

  8. Davide February 17, 2018 at 6:45 am #

    Hello Jason, sorry for my english. I’m new to neural nework and i am trying to develop a neural network to generate music.
    I have many .txt file with a sequence of notes like these
    [int(note number), int(time), int(length)]
    68 2357 159,
    64 2357 260,


    What kind of neural network I have to choose for this purpose?
    How can i preprocess this kind of data?
    Congratulations for this website and thank you.

    • Jason Brownlee February 17, 2018 at 8:51 am #

      For sequence prediction, perhaps RNNs like the LSTM would be a good place to start.

  9. falah March 1, 2018 at 6:43 pm #

    hi
    I want to classify classes each class consists of 8_time steps in each time steps 16 features. is this reshape correct
    reshape(124,8,1)

    • Jason Brownlee March 2, 2018 at 5:31 am #

      I think it would be (?, 6, 16) where “?” is the number of samples, perhaps 124 if I understand your case.

  10. C.Junior April 7, 2018 at 2:05 am #

    Hello, Jason, thanks for the great work.
    I’ve read your articles about organizing the data for LSTM in 3D, but I can not do this with my data, it always presents an error like this:

    “Error when checking target: expected dense_363 to have 2 dimensions, but got array with shape (3455, 1, 1)”

    My data is organized as follows:

    Appetizer:
    11,000 lines with 48 columns, each row represents one day and each column represents 0.5h,

    The output Y (0, 1) is binary, it represents the occurrence of an event 1 = yes, 0 = no.

    So I have X = [0.1, 0.2, 0.3, …, 0.48] Y = [0] or Y = [1]

    for more details see my code:

    # load data
    dataframe = pd.read_csv(‘Parque_A_Classificado_V2.csv’, header=None)
    dataset = dataframe.values

    # split data to variables train and test
    train_size = int(len(dataset) * 0.7)
    test_size = len(dataset) – train_size
    trainX, trainY = dataset[0:train_size,:48], dataset[train_size:len(dataset),48]
    testX, testY = dataset[0:test_size, :48], dataset[test_size:len(dataset), 48]

    # reshape input to be [samples, time steps, features]
    trainX = trainX.reshape(trainX.shape[0],trainX.shape[1], 1)
    testX = testX.reshape(testX.shape[0], testX.shape[1], 1)
    trainY = trainY.reshape(trainY.shape[0], 1, 1)
    testY = testY.reshape(testY.shape[0], 1, 1)

    #criando modelo
    model = Sequential()
    model.add(LSTM(100, input_shape=(48, 1)))
    model.add(Dense(1, activation=’sigmoid’))
    # Compile model
    model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘acc’])
    model.fit(trainX, trainY, validation_data(testX, testY), epochs=1, batch_size=1)

    I can not find the error, can you help me?

  11. Alon May 17, 2018 at 3:07 am #

    Hi Jason,

    I’m struggling with a problem similar to those described here with a slight difference.

    I’m solving a disaggregation problem and so my the dimensions of my output are higher than my input. in order to simplify lets say my original data looks something like this:

    X.shape == [1000,1]
    Y.shape == [1000,10]

    I do some of the input to make things work:

    X = X.reshape(1,X.shape[0[,X.shape[1]) #leaving this parameter dependent in case I want to
    later use more features

    My net looks like this:

    model.sequential()
    model.add(LSTM(50,batch_input_shape = X.shape, stateful = True)
    model.add(Dense(Y.shape[1],activation = ‘relu’) #my output values aren’t between +/-1 so I
    chose relu

    went with a stateful model because I will most likely have to do batch seperation when running my actual training as I have close to a 10^6 samples

    and then I’ve tried both doing the same thing to the Y vector and not touching it, either way I get error (when I reshaped Y I then changed Y.shape[1] to Y.shape[2])

    Any thoughts?

  12. Srivatsa June 4, 2018 at 4:10 pm #

    How can I split the 5000-row dataset into train and test portions when I am dividing into samples and reshaping it?

  13. Jean June 15, 2018 at 7:11 pm #

    Thanks to this article and the one about reshaping input data for LSTM, I understood how to split/reshape the inputs of the LSTM network but I can’t see how to handle labels…

    My dataset is 3000 time steps and 9 features. As explained in the article, I split it to get 15 samples of 200 time-steps so my input shape is (15, 200, 9).

    My labels are a binary matrix (3000, 6) i.e. I want to solve a 6-class classification problem.

    If I feed the labels as is, I’ll get an error “Found 15 input samples and 3000 target samples”.
    How to correctly feed the labels to the network? What confuses me is that the targets should be 2D (unlike inputs) so I don’t see how I could split them in the same way as inputs, for example to get a (15, 200, 6) shape…

  14. NN June 19, 2018 at 9:41 am #

    Great blog thank you!
    From what I understand you showed how to handle one long time series, but I couldn’t understand what to do with multiple inputs.
    For example my input is x1 with dimensions (25, 200, 1)
    but I have multiple inputs for my training X = [x1,x2…xn]

    How should I shape for model.fit and for the LSTM layers? a 4D tensor?

  15. NL June 19, 2018 at 9:51 am #

    Thank you for the wonderful blog.
    Where does the total number of samples to train go in the reshape?
    As I understood: (num of subsamples, time stamps, features per timestamp)

  16. Mr. T June 22, 2018 at 2:19 am #

    I love all your posts!
    Im a bit confused:
    I would guess that the number of time steps limits the number of recurrent layers. Since the number of time steps is equivalent to the amount of time steps you run your recurrent neural network. Is this true? If yes how can the memory of the LSTM be larger than the amount of recursions?

    And if it isnt larger, why would anybody choose time steps = 1 like you did in some posts?
    Thanks.

    • Jason Brownlee June 22, 2018 at 6:14 am #

      The time steps and the nodes/layers are unrelated.

      • Mr.T June 22, 2018 at 10:03 am #

        Sorry, I fomulated my question badly.

        I meant: if I have a sample sequence of lets say 100 time steps, can the memory of the LSTM be greater than these 100 time steps?
        Is the memory limited by the amount of time steps given in a sequence?

        Thanks for your time. T

        • Jason Brownlee June 22, 2018 at 2:55 pm #

          The limit for the LSTM seems to be about about 200-400 time steps, from what I have read.

  17. K June 27, 2018 at 6:35 am #

    Hi Jason,

    Can you please explain what you mean by LSTM does not work well for 200-400 time steps, while you replied to Daniel Salvador that 3000 training samples are not enough?

    Does 200-400 mean 200-400 steps ahead prediction?

    How many number of training samples you think is fairly enough?

    • Jason Brownlee June 27, 2018 at 8:25 am #

      The input data has a number of time steps. The LSTM performance appears to degrade if the number of input time steps is more than about 200-400 steps, according to the literature.

      I have not tested this in experiments myself though.

Leave a Reply