How to Reshape Input Data for Long Short-Term Memory Networks in Keras

It can be difficult to understand how to prepare your sequence data for input to an LSTM model.

Often there is confusion around how to define the input layer for the LSTM model.

There is also confusion about how to convert your sequence data that may be a 1D or 2D matrix of numbers to the required 3D format of the LSTM input layer.

In this tutorial, you will discover how to define the input layer to LSTM models and how to reshape your loaded input data for LSTM models.

After completing this tutorial, you will know:

  • How to define an LSTM input layer.
  • How to reshape a one-dimensional sequence data for an LSTM model and define the input layer.
  • How to reshape multiple parallel series data for an LSTM model and define the input layer.

Let’s get started.

How to Reshape Input for Long Short-Term Memory Networks in Keras

How to Reshape Input for Long Short-Term Memory Networks in Keras
Photo by Global Landscapes Forum, some rights reserved.

Tutorial Overview

This tutorial is divided into 4 parts; they are:

  1. LSTM Input Layer
  2. Example of LSTM with Single Input Sample
  3. Example of LSTM with Multiple Input Features
  4. Tips for LSTM Input

LSTM Input Layer

The LSTM input layer is specified by the “input_shape” argument on the first hidden layer of the network.

This can make things confusing for beginners.

For example, below is an example of a network with one hidden LSTM layer and one Dense output layer.

In this example, the LSTM() layer must specify the shape of the input.

The input to every LSTM layer must be three-dimensional.

The three dimensions of this input are:

  • Samples. One sequence is one sample. A batch is comprised of one or more samples.
  • Time Steps. One time step is one point of observation in the sample.
  • Features. One feature is one observation at a time step.

This means that the input layer expects a 3D array of data when fitting the model and when making predictions, even if specific dimensions of the array contain a single value, e.g. one sample or one feature.

When defining the input layer of your LSTM network, the network assumes you have 1 or more samples and requires that you specify the number of time steps and the number of features. You can do this by specifying a tuple to the “input_shape” argument.

For example, the model below defines an input layer that expects 1 or more samples, 50 time steps, and 2 features.

Now that we know how to define an LSTM input layer and the expectations of 3D inputs, let’s look at some examples of how we can prepare our data for the LSTM.

Example of LSTM With Single Input Sample

Consider the case where you have one sequence of multiple time steps and one feature.

For example, this could be a sequence of 10 values:

We can define this sequence of numbers as a NumPy array.

We can then use the reshape() function on the NumPy array to reshape this one-dimensional array into a three-dimensional array with 1 sample, 10 time steps, and 1 feature at each time step.

The reshape() function when called on an array takes one argument which is a tuple defining the new shape of the array. We cannot pass in any tuple of numbers; the reshape must evenly reorganize the data in the array.

Once reshaped, we can print the new shape of the array.

Putting all of this together, the complete example is listed below.

Running the example prints the new 3D shape of the single sample.

This data is now ready to be used as input (X) to the LSTM with an input_shape of (10, 1).

Example of LSTM with Multiple Input Features

Consider the case where you have multiple parallel series as input for your model.

For example, this could be two parallel series of 10 values:

We can define these data as a matrix of 2 columns with 10 rows:

This data can be framed as 1 sample with 10 time steps and 2 features.

It can be reshaped as a 3D array as follows:

Putting all of this together, the complete example is listed below.

Running the example prints the new 3D shape of the single sample.

This data is now ready to be used as input (X) to the LSTM with an input_shape of (10, 2).

Longer Worked Example

For a complete end-to-end worked example of preparing data, see this post:

Tips for LSTM Input

This section lists some tips to help you when preparing your input data for LSTMs.

  • The LSTM input layer must be 3D.
  • The meaning of the 3 input dimensions are: samples, time steps, and features.
  • The LSTM input layer is defined by the input_shape argument on the first hidden layer.
  • The input_shape argument takes a tuple of two values that define the number of time steps and features.
  • The number of samples is assumed to be 1 or more.
  • The reshape() function on NumPy arrays can be used to reshape your 1D or 2D data to be 3D.
  • The reshape() function takes a tuple as an argument that defines the new shape.

Further Reading

This section provides more resources on the topic if you are looking go deeper.

Summary

In this tutorial, you discovered how to define the input layer for LSTMs and how to reshape your sequence data for input to LSTMs.

Specifically, you learned:

  • How to define an LSTM input layer.
  • How to reshape a one-dimensional sequence data for an LSTM model and define the input layer.
  • How to reshape multiple parallel series data for an LSTM model and define the input layer.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop LSTMs for Sequence Prediction Today!

Long Short-Term Memory Networks with Python

Develop Your Own LSTM models in Minutes

…with just a few lines of python code

Discover how in my new Ebook:
Long Short-Term Memory Networks with Python

It provides self-study tutorials on topics like:
CNN LSTMs, Encoder-Decoder LSTMs, generative models, data preparation, making predictions and much more…

Finally Bring LSTM Recurrent Neural Networks to
Your Sequence Predictions Projects

Skip the Academics. Just Results.

Click to learn more.


126 Responses to How to Reshape Input Data for Long Short-Term Memory Networks in Keras

  1. Steven August 31, 2017 at 2:14 am #

    Great explanation of the dimensions! Just wanted to say this explanation also works for LSTM models in Tensorflow as well.

    • Jason Brownlee August 31, 2017 at 6:20 am #

      Thanks Steven.

      • yuan September 1, 2017 at 6:42 pm #

        Hi Jason,

        Thanks a lot for your explanations .
        I have a confusion below:
        Assuming that we have multiple parallel series as input for out model.The first step is to define these data as a matrix of M columns with N rows.To be 3D(samples, time steps, and features),is this means that,samples :1 sample ,time steps: row numbers of the matrix ,and features: column numbers of the matrix ? Must it be like this?Looking forward to your reply.Thank you

        • Jason Brownlee September 2, 2017 at 6:06 am #

          Sorry, I’m not sure I follow your question.

          If you have parallel time series, then each series would need the same number of time steps and be represented as a separate feature (e.g. observation at a time).

          Does that help?

      • Md. Abul Kalam Azad August 16, 2018 at 12:21 pm #

        Hello Sir,
        I have used your multivariates code for testing lstm model.Its working fine but I did not understand why lstm one row is hidden from train as well as test dataset. If I test two rows in test then got one prediction as a output. And datetime output this one prediction result?

        Please help me.

        Thanks in advance.

        Azad

        • Jason Brownlee August 16, 2018 at 1:57 pm #

          Sorry, I don’t follow.

          Perhaps you can elaborate on your question?

  2. Oliver August 31, 2017 at 9:23 pm #

    Hi Jason,

    thanks a lot for all the explanations you gave!
    I tried to understand the effect of the reshape parameters and the effect in the spyder/variable explorer. But I do not understand the result shown in the data window.
    I used the code from a different tutorial:

    data = array([
    [0.1, 1.0],
    [0.2, 0.9],
    [0.3, 0.8],
    [0.4, 0.7],
    [0.5, 0.6],
    [0.6, 0.5],
    [0.7, 0.4],
    [0.8, 0.3],
    [0.9, 0.2],
    [1.0, 0.1]])
    data_re = data.reshape(1, 10, 2)

    When checking the result in the variable explorer of spyder I see 3 dimensions of the array but can not connect it to the paramters sample, timestep, feature.

    On axis 0 of data_re I see the complete dataset
    On axis 1 of the data_re I get 0.1 and 1.0 in column 1
    On axis 2 of the data_re I see the column 1 of axis 0 transposed to row 1

    Would you give me a hint how to interpret it?

    Regards,
    Oliver.

    • Jason Brownlee September 1, 2017 at 6:46 am #

      There are no named parameters, I am referring to the dimensions by those names because that is how the LSTM model uses the data.

      Sorry for the confusion.

  3. Saga September 1, 2017 at 6:46 pm #

    Hi Jason,

    Thanks so much for the article (and the whole series in fact!). The documentation in Keras is not very clear on many things on its own.

    I have been trying to implement a model that receives multiple samples of multivariate timeseries as input. The twist is that the length of the series, i.e. the “time steps” dimension is different for different samples. I have tried to train a model on each sample individually and then merge, (but then each LSTM is going to be extremely prone to overfitting). Another idea was to scale the samples to have the same time steps but this comes with a scaling factor of time steps for each sample which is not ideal either.

    Is there a way to provide the LSTM with samples of dynamic time steps? maybe using a lower-level API?

    Regards,
    Saga

    • Jason Brownlee September 2, 2017 at 6:07 am #

      A way I use often is to pad all sequences to the same length and use a masking layer on the front end to ignore masked time steps.

      • Mohamad May 13, 2018 at 10:43 pm #

        HI Jason,
        Thank you for this amazing article.
        I have the same problem here. which is the samples have many different lengths.

        I did not get the idea you said.
        “A way I use often is to pad all sequences to the same length and use a masking layer on the front end to ignore masked time steps.”
        can you please provide more details about that?
        or maybe provide articles explain how to solve this problem.

        Thank you in advance.

  4. Shrimanti September 14, 2017 at 2:42 am #

    Hi Jason,

    Thanks very much for your tutorials on LSTM. I am trying to predict one time series from 10 different parallel time series. All of them are different 1D series. So, the shape of my X_train is (50000,10) and Y_train is (50000,1). I couldn’t figure out how to reshape my dataset and the input shape of LSTM if I want to use let’s say 100 time steps or look back as 100.

    Thanks.

  5. Anil Pise October 14, 2017 at 6:38 am #

    Respected Sir
    I want to use LSTM RNN GRU to check changes in facial expression of the person who is watching a movie. Want to check his mental state whether he is a boar or interested to continue this movie or at what time he is a boar. Can you please help me how can I start to work on same.

    • Jason Brownlee October 15, 2017 at 5:16 am #

      That sounds like a great problem. I would recommend starting by collecting a ton of training data.

      Then think of using a CNN on the front end of your LSTM.

  6. Rafiya October 23, 2017 at 9:28 pm #

    Hi,
    I have around 12,000 tweets for sentiment classification totally. Do you think 16GB CPU RAM will be enough?

  7. Ravil November 19, 2017 at 10:34 pm #

    Hi Jason

    Thanks for the simple explanation.

    However, I have a doubt. What if you don’t know the no of time steps? How do you proceed then?
    Is that why we use the embedding layer?

    I intend to use it for sentiment analysis of imDb movie review dataset.

  8. Mohammad November 23, 2017 at 5:21 pm #

    Hi Jason,
    I finally understood the input shape requirements.
    Just a quick question: batch_size would be a certain number of samples inside a group e.g if we have 100 samples we can divide it into batches of 10. Batching helps with a faster training time right?

    • Jason Brownlee November 24, 2017 at 9:35 am #

      Correct, and weight updates occur and state is reset at the end of each batch.

  9. Eduardo Andrade November 30, 2017 at 4:24 am #

    Hi Jason,

    About sample (the first argument in reshape): if I have two sequences with different number of values (let’s suppose one with 10 values and another with 8) and want them to be considered as two distinct samples (not 2 features), a zero-padding is necessary?

    series 1: 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0
    series 2: 1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.0, 0.0

    If I do:

    data = data.reshape(2, 10, 1)

    It is going to understand them as 2 different samples?

    • Jason Brownlee November 30, 2017 at 8:23 am #

      Yes, padding to 10 time steps.

      Yes, your reshape looks good.

      Explore pre and post padding to see if it makes a difference for your model.

      • Tseo December 8, 2017 at 12:49 am #

        With this input, the model is going to understand two different series?

        Why to don’t use (1, 10,2) shape?

        • Jason Brownlee December 8, 2017 at 5:42 am #

          You could treat them as two features as you suggestion, I thought they were separate samples.

  10. Ali December 2, 2017 at 10:55 pm #

    Hi Jason,

    Thanks a lot for the tutorial!
    I am trying to understand the input shape for LSTM data (No. of timesteps & no. of features). Could I ask what each will be in the context of the iris dataset, please?

    Am I correct to say that in the iris dataset, the timesteps can be 2, 3, 5, 6 – as long as it neatly divides the dataset into equal number of rows (iris has 150 rows).
    And the number of features will be the number of columns (apart from the target column/class)?

    Thanks ever so much!

    • Jason Brownlee December 3, 2017 at 5:25 am #

      The iris dataset is not a sequence classification problem. It does not have time steps, only samples and features.

  11. Tsep December 7, 2017 at 11:58 am #

    Hello!

    First of all, thank you very much for your posts, I have learned a lot.

    My question is because I’m not sure how to focus the next type of problems: multiples sequences of multiple features.

    For example, predict the amount that a user could spend given the previous purchases (here I can consider different features such as the previous amounts, products, day of week, etc.). If I have a dataset with data of 1000 users and I want to predict the amount for each user, how should be addressed?
    Can I use a lstm for all users or each user will have a model/lstm?

    I understand that a lstm for all users could see things more interesting.. But I don’t know how to organize the input of different users.. because the example of two sequences (1,10,2) I don’t know how to apply.. I want to include more features for each sequence..

    I’m very lost..
    Thank you in advance

    • Jason Brownlee December 7, 2017 at 3:04 pm #

      Perhaps start off by modeling individual users?

      • Tseo December 7, 2017 at 9:28 pm #

        Thanks!
        By modelling individual users do you mean a lstm per user?

        I have users with 200 purchases but others only with only 10.. would be enough?

        I will try!

        Thanks!

  12. Peter December 17, 2017 at 7:50 am #

    Hi Jason,

    Thanks for your tutorial and for your book!
    I am not sure how to design the input shape of the following table or dataframe:

    date, product, store, hasPromotion, attrib1, attrib2, quantity (t)

    The first three columns are the key. We have 50000 products in 20 stores and I would like to predict the quantity (per product per store) at least 14 days ahead with LSTM.

    What is the good start for the 3D input?
    I am wondering if creating new features from date (as there are repetition), like day of week, day of month, month of year, etc. + the existing features + quantity (t), quantity (t+1) would do…

    Thank you for your help in advance!
    Br,

    • Jason Brownlee December 17, 2017 at 8:56 am #

      Drop date and you have 6 features, does that help?

      • Peter December 18, 2017 at 8:02 am #

        OK, thanks. If there are seasonality and trend in sales, should I remove them before train the LSTM, too?

  13. Vic January 8, 2018 at 11:31 am #

    Howdy!

    Thank you so much for the great amount of tutorials on LSTMs
    Im trying to build an LSTM in keras using your examples and keep running into shape issues.

    I have time series data set with prices for different things, and am trying to predict the price of item4 for time t+1
    Item4 is a lagged value so that you can use previous set of prices to predict the next.
    The data set has 400 sequential observations.

    variables: datetime price1 price2 price3 item4_price

    since the data variable has uniform interval of observations and none are missing, i am dropping the datetime variable.
    So now i have 4 variables and 400 observations.

    trainX = train[:, 0:-1] #use first 3 variables
    trainY = train[:,-1] #use the last variable

    so now the trainX data set has price1 price2 and price3 variables (its my undestanding that this means there are 3 “feautres” in keras)
    trainY is the predictor data set and only cointains item4_price

    trainX = numpy.reshape(trainX, (1, 400, 3)) #reshape, this means there is 1 sample, 400 timestamps, and 3 features

    model = Sequential()
    model.add(LSTM(5, input_shape=(1, 400, 3), return_sequences=True))
    model.add(Dense(1))
    model.compile(loss=’mean_squared_error’, optimizer=’adam’)
    model.fit(trainX, trainY, epochs=100, batch_size=1, verbose=2)

    Keep getting various shape errors all the time, no matter what i do. I tried switching it around, and even ommiting the first dimension.
    I was wondering if you could point me in the right dirrection of what it is that i keep missing in my understnading of keras/lstm shapes.

    I also dont know if the trainY set needs shaping? I tried to shape it too but python was also not happy with that.

    Let me know what you think!

    Thanks,
    Vic

    • Jason Brownlee January 8, 2018 at 3:53 pm #

      Perhaps start with one series and really nail down what is required.

      Did you try this tutorial:
      https://machinelearningmastery.com/prepare-univariate-time-series-data-long-short-term-memory-networks/

      • Vic January 9, 2018 at 1:33 am #

        Hi Dr. Brownlee,

        I have previously read that tutorial and feel as though i understand it fine.
        But when applying what I learned to the problem in a way as described previously, find that Im running into some trouble.

        So i was hoping I was just overlooking something, but at this point im not really sure what. Is what Im doing seem reasonable?

        Thanks!

        • Jason Brownlee January 9, 2018 at 5:33 am #

          Perhaps, but I don’t know your problem as well as you and there is no set way to solve any ml problem.

          I would encourage you to brainstorm and try a suite of approaches to see what works best.

  14. sujan Ghimire January 10, 2018 at 12:31 pm #

    Hi Jason,

    I have gone through this tutorial but i have a input size of 1762 X 4 and output size 1762X 1.

    I did as follows but the shape of y train is giving as (1394, 4) , which should be 1394,1

    Can you help me on this?

    • Jason Brownlee January 10, 2018 at 3:43 pm #

      Sorry, I cannot debug your code for you. I simply do not have the capacity, I’m sure you can understand.

      Perhaps post your error to stackoverflow or cross validated?

  15. Siji January 17, 2018 at 6:28 pm #

    I got an exception “ValueError: Input arrays should have the same number of samples as target arrays. Found 1 input samples and 21 target samples”.

    =>print X_train

    [[ 0.15699646 -1.59383227]
    [-0.31399291 -0.03680409]
    [ 0.15699646 -1.59383227]
    [-0.31399291 0.78456757]
    [ 0.15699646 -1.59383227]
    [ 4.39590078 -1.59383227]
    [-0.31399291 1.38764971]
    [-0.31399291 -0.03680409]
    [-0.31399291 -0.32252408]
    [-0.31399291 0.6081381 ]
    [-0.31399291 -0.32252408]
    [-0.31399291 1.38764971]
    [-0.31399291 0.78456757]
    [-0.31399291 -0.03680409]
    [-0.31399291 0.78456757]
    [ 0.15699646 1.24889926]
    [-0.31399291 -0.32252408]
    [-0.31399291 1.24889926]
    [-0.31399291 -0.69488163]
    [-0.31399291 -0.69488163]
    [-0.31399291 0.6081381 ]]

    =>print y_train

    0 1
    1 1
    2 1
    3 1
    4 1
    5 1
    6 1
    7 0
    8 0
    9 0
    10 0
    11 0
    12 0
    13 0
    14 1
    15 1
    16 1
    17 1
    18 0
    19 0
    20 0
    Name: out, dtype: int64

    =>print(y_train.shape)

    (21,)

    =>print X_train.shape

    (21, 2)

    =>print X_test.shape

    (8, 2)

    I have reshaped the inputs to 3dimensional input. I have followed you steps.
    =>X_train = X_train.reshape(1,21, 2)
    print(X_train.shape)

    (1, 21, 2)
    =>
    model = Sequential()
    model.add(LSTM(32, input_shape=(21, 2)))
    model.add(Dense(1))

    model.compile(optimizer=’rmsprop’,loss=’categorical_crossentropy’,metrics=[‘categorical_accuracy’])

    history = model.fit(X_train,y_train,batch_size =13, epochs = 14)

    —————————————————————————
    ValueError Traceback (most recent call last)
    in ()
    —-> 1 history = model.fit(X_train,y_train,batch_size =13, epochs = 14)

    /home/siji/anaconda2/lib/python2.7/site-packages/keras/models.pyc in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, **kwargs)
    891 class_weight=class_weight,
    892 sample_weight=sample_weight,
    –> 893 initial_epoch=initial_epoch)
    894
    895 def evaluate(self, x, y, batch_size=32, verbose=1,

    /home/siji/anaconda2/lib/python2.7/site-packages/keras/engine/training.pyc in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
    1553 class_weight=class_weight,
    1554 check_batch_axis=False,
    -> 1555 batch_size=batch_size)
    1556 # Prepare validation data.
    1557 do_validation = False

    /home/siji/anaconda2/lib/python2.7/site-packages/keras/engine/training.pyc in _standardize_user_data(self, x, y, sample_weight, class_weight, check_batch_axis, batch_size)
    1419 for (ref, sw, cw, mode)
    1420 in zip(y, sample_weights, class_weights, self._feed_sample_weight_modes)]
    -> 1421 _check_array_lengths(x, y, sample_weights)
    1422 _check_loss_and_target_compatibility(y,
    1423 self._feed_loss_fns,

    /home/siji/anaconda2/lib/python2.7/site-packages/keras/engine/training.pyc in _check_array_lengths(inputs, targets, weights)
    249 ‘the same number of samples as target arrays. ‘
    250 ‘Found ‘ + str(list(set_x)[0]) + ‘ input samples ‘
    –> 251 ‘and ‘ + str(list(set_y)[0]) + ‘ target samples.’)
    252 if len(set_w) > 1:
    253 raise ValueError(‘All sample_weight arrays should have ‘

    ValueError: Input arrays should have the same number of samples as target arrays. Found 1 input samples and 21 target samples.

    Please solve my problem. I am new in this area. What is the mistake

    • Jason Brownlee January 18, 2018 at 10:05 am #

      Perhaps cut the example back to a few lines to help expose the fault?

  16. Mikel February 15, 2018 at 2:34 am #

    Hi Jason,

    I’m trying to understand the input_shape but I think I’m totally confused about the time step variable. I have a multivariate time series with 18,000 samples and 720 features. I created a 10 lagged observation dataset to forecast the next 5 time steps so my dataset goes from t-10 to t+5, being the feature dataset from t-10 to t and the label dataset from t+1 to t+5.

    Assuming that I take 15,000 samples for training, what will be the values of the reshape function? I think it should be [15000, 1, 7200 (720 features * 10)]. Regarding the time step, is the value “1” correct or it should be the number of lagged observations, that is, 10?

    Thank you in advance.

  17. Amim April 11, 2018 at 7:15 pm #

    Hi Jason!
    Thank you so much for all the tutorials on LSTMs, I’ve learned a lot.

    I’m trying to implement the LSTM Architecture from the paper “Dropout improves Recurrent Neural Networks for Handwriting Recognition” for resolving the handwritten recognition problem.
    Basically I have to train the network giving in input variable-sized images (different W and H but always 3 channels) and to predict what is the word written in the image. What I can’t understand is how to deal with variable sized images? Can I consider images as some sequences (for ex. a 50×30 image considered as 50 sequences with 30 features?). The authors say I give in input a block of image of size 2×2 scaning in 4 different directions (multidirectional multidimensional LSTM).
    What do I have to specify here : input_size(Samples,Time Steps,Features) ? The Samples refers to the number of all images I have in training set or the number of miniblocks 2×2 ? What about time steps and features? I don’t get it and its very confusing. Can you please help with any idea? I am new in this area and Im stacked in this problem.

    Thanks a lot 🙂

    • Jason Brownlee April 12, 2018 at 8:36 am #

      I would recommend padding the inputs to a fixed size.

  18. Jeyel April 19, 2018 at 6:07 pm #

    Hi Jason!
    Thanks so much for your tutorials on LSTM!
    I’m trying to predict trajectory with LSTM and ARIMA now. After reading this tutorial, I’ve got some questions.
    (1) Do we must transfer time series to lag observations if we want to do forecast work with LSTM?
    (2) After transfering time series to supervised learning problems, the forecast is only related to “order” or “lag” rather than “time”(like ARIMA do)? Why the input is not time/date? And the time interval of data must be even?
    Thanks a lot in advance!

    • Jason Brownlee April 20, 2018 at 5:47 am #

      No, LSTMs can work with the time steps directly.

      The order of the observations is sufficient for the model, if the time steps are consistently spaced it does not need the absolute date/time information.

  19. Leonardo April 20, 2018 at 4:34 pm #

    Hello Jason! Congratulations on the LSTM input tutorial!
    Could you please answer three questions?

    I’m working with 500 samples that have varying sizes. My doubts are related to the organization of these 500 samples within this 3-dimensional input, mainly in relation to the Samples dimension.

    The dimension “Features” has already defined that it will have size 26, the dimension “Time Steps” will have to have size 100 but the dimension “Samples” is that I still do not know what its value will be.

    Doubt 1: In these cases of samples with different sizes to know the dimension “Samples” I have to be based on the larger sample and for the other samples I fill in the value 0 (zero) in the additional spaces?

    Doubt 2: Can I have more than one line in the “Samples” dimension representing the same sample?

    Doubt 3: How do samples have varying sizes, there are possibilities to work with 4 dimensions, for example: “Samples” x “Part of Samples” x “Time Steps” x “Features”?

    Thank you for your attention!

  20. Kate April 23, 2018 at 9:17 am #

    Hi Jason,

    Thank you for such a good tutorial! This really helps!

    I am not sure if I understand the model correctly:
    The sequence of samples does matter in lstm because the state of current one is affected by last one in the sequence.

    If this is the case, can you let me know how I should deal with the following scenarios?

    I have non-equally spaced trajectory data. The interval varies from seconds to days. Solutions I come up with are interpolation or adding time feature. What do you think is a good way to prepare the data?

    Assuming last problem is solved, how can I organize the input if my data contains trajectories of different people? For example, trajectory of one person is (100, 5, 2) and trajectory of another one is (200, 5, 2). How to train both sequence in one model?

    Thank you very much!

  21. erik May 2, 2018 at 12:13 am #

    Hello Jason, you are making a good job Dr.! I am a bit confused about my data shape for the network: I have 300 different samples, where the next one always is measured lets say in 1 min steps, so I have in total 300 timesteps, and each file is containing 1 column, 2.000 rows. When we say I want to reduce the ‘features’, rows I am thinking that my inputshape must be therefore (1,300,2000) and than I can reduce to something e.g.200. with the lstm decoder ?

    • Jason Brownlee May 2, 2018 at 5:45 am #

      How can 1 sample have 300 time steps, 1 column and 2k rows? I do not understand sorry.

      • erik May 2, 2018 at 7:21 am #

        Sorrry for the obscurity, 1 sample has 2000 rows all in one column so only one type of value temperature is measured. in total I have 300 samples and time distance between their recording is 1 min

        • Jason Brownlee May 3, 2018 at 6:27 am #

          I still don’t follow.

          Are the 2000 rows related in time for one feature? Or are the 2000 rows separate features at one time?

          • erik May 4, 2018 at 9:37 am #

            I think if I understood you right, the 2000 rows are related to one measurement (so I measure in 1 second 2000 times the temperature). But when you regard it with an lstm autoencoder I try to reduce the “features” to learn from them and make than the prediction. I do not know the shape for the lstm encoder decoder either it should be (1, 300,2000) or (2000,300,1) but for the last one I got strange results, the first one is closer to the real data. Which one is right ?

          • Jason Brownlee May 4, 2018 at 1:32 pm #

            The input to the encoder will be [300, ?, 2000] where ? represents the number of time steps you wish to model.

            The encoder decoder is not appropriate for all sequence prediction problems, it is suited to sequence output that differs in length to the input. If you are doing straight sequence classification/regression it might not be appropriate.

  22. paul May 14, 2018 at 3:34 am #

    HI Jason, the example is really good. Besides this I have a question for my data. I have temperature values measured for a sampling rate of 1 second with a sampling frequency of 10.000. So I measure in 1 second 10.000 different values but same unit(lets say force). This I repeat with certain time intervals. Do I have than 10.000 different features or only one feature as input dimension ?

    • Jason Brownlee May 14, 2018 at 6:40 am #

      Sounds like 10K features at each time step.

      • catherine May 30, 2018 at 10:04 pm #

        this means that one time step can have 10k features?

  23. Hadeer El-Zayat May 18, 2018 at 7:21 am #

    great tutorial jason .. but i have a problem in the reshaping my RNN model,,
    this my code

    import numpy as np
    from keras.datasets import imdb
    from keras.models import Sequential
    from keras.layers import Dense
    from keras.layers import LSTM
    from keras.layers import Bidirectional
    from keras.preprocessing import sequence
    # fix random seed for reproducibility
    np.random.seed(7)

    train = np.loadtxt(“featwithsignalsTRAIN.txt”, delimiter=”,”)
    test = np.loadtxt(“featwithsignalsTEST.txt”, delimiter=”,”)

    x_train = train[:,[2,3,4,5,6,7]]
    x_test = test[:,[2,3,4,5,6,7]]
    y_train = train[:,8]
    y_test = test[:,8]

    # create the model
    model = Sequential()
    model.add(LSTM(20, input_shape=(10,6)))
    model.add(Dense(1415684, activation = ‘sigmoid’))
    model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
    model.fit(x_train, y_train, epochs = 2)

    • Jason Brownlee May 18, 2018 at 8:09 am #

      What problem?

      • Hadeer El-Zayat May 18, 2018 at 11:21 am #

        a problem of reshaping the dataset..

        • Hadeer El-Zayat May 18, 2018 at 11:22 am #

          this is a sample of my dataset

          a sample of my dataset (patient number, time in mill/sec., normalization of X Y and Z, kurtosis, skewness, pitch, roll and yaw, label) respectively.

          1,15,-0.248010047716,0.00378335508419,-0.0152548459993,-86.3738760481,0.872322164158,-3.51314800063,0

          1,31,-0.248010047716,0.00378335508419,-0.0152548459993,-86.3738760481,0.872322164158,-3.51314800063,0

          1,46,-0.267422664673,0.0051143782875,-0.0191247001961,-85.7662354031,1.0928406847,-4.08015176908,0

          1,62,-0.267422664673,0.0051143782875,-0.0191247001961,-85.7662354031,1.0928406847,-4.08015176908,0

  24. Hadeer El-Zayat May 20, 2018 at 4:03 am #

    i didn’t kow how td do it !

    • Jason Brownlee May 20, 2018 at 6:40 am #

      Take it slow, one step at a time.

      • Hadeer El-Zayat May 20, 2018 at 7:24 am #

        this is what i have accomplished

        train = np.loadtxt(“featwithsignalsTRAIN.txt”, delimiter=”,”)
        test = np.loadtxt(“featwithsignalsTEST.txt”, delimiter=”,”)

        x_train = train[:,[2,3,4,5,6,7]]
        x_test = test[:,[2,3,4,5,6,7]]
        y_train = train[:,8]
        y_test = test[:,8]

        model = Sequential()
        model.add(LSTM(64,activation=’relu’,batch_input_shape=(100, 10, 1),
        stateful=True,
        return_sequences=False))
        model.add(Dense(1, activation=’linear’))
        model.compile(loss=’mean_squared_error’, optimizer=’adam’)

        is that true ??

        • Jason Brownlee May 21, 2018 at 6:21 am #

          Nice work.

          What do you mean by true?

          Our job is to find a model that gives “good enough” results when making predictions. This requires careful experimentation.

      • Hadeer El-Zayat May 21, 2018 at 2:25 am #

        thank you. i have tried the following code

        np.random.seed(7)

        train = np.loadtxt(“featwithsignalsTRAIN.txt”, delimiter=”,”)
        test = np.loadtxt(“featwithsignalsTEST.txt”, delimiter=”,”)

        x_train = train[:,[2,3,4,5,6,7]]
        x_test = test[:,[2,3,4,5,6,7]]
        y_train = train[:,8]
        y_test = test[:,8]

        x_train = x_train.reshape((-1,1,6))

        model = Sequential()
        model.add(LSTM(64,activation=’relu’,input_shape=(1, 6)))
        model.add(Dense(1, activation=’softmax’))

        model.compile(loss=’binary_crossentropy’,
        optimizer=’adam’,
        metrics=[‘accuracy’])

        model.fit(x_train, y_train, batch_size = 128, epochs = 10, verbose = 2)

        but it gets a very low accuracy with very high loss

        Epoch 1/20 – 63s – loss: 15.0343 – acc: 0.0570
        Epoch 2/20 – 60s – loss: 15.0343 – acc: 0.0570
        Epoch 3/20 – 60s – loss: 15.0343 – acc: 0.0570
        Epoch 4/20 – 60s – loss: 15.0343 – acc: 0.0570

  25. Ryan May 20, 2018 at 10:15 am #

    What does 32 in model.add(LSTM(32)) mean?

  26. Suzi May 28, 2018 at 9:33 pm #

    Hi, Jason,

    Thank you for the great tutorial. It helps me to predict time series data sequences with the lstm model.

    However, I have a question about how to determine the length of look_back time steps.

    For example, there is a time series sequence X1, X2, X3, …, Xn. When I apply ARIMA for prediction Xn+1, I can use ACF and PCF to determine the parameter pi and qi. The number of pi indicates the look_back time steps. Then, the ARIMA equation can be used to predict Xn+1.

    But for lstm, I do not know how to determine the look_back time steps, in other words, the reshape size for a time series sequence. Is there any way to get an appropriate look_back time steps in reshaping the time series sequence data for lstm? Could you pls give me some suggestions about it?

    Thanks a lot.

    Suzi

    • Jason Brownlee May 29, 2018 at 6:26 am #

      Looking at ACF/PACF plots might be a good start to get an idea of the number of lag obs that are significant.

      • Suzi May 29, 2018 at 5:25 pm #

        Thanks for your quick reply.

        I am still confused about your suggestion. Do you mean that I need plot the ACF/PACF to find the number of time lag for applying lstm?

        I do not think the the ACF/PACF can be used for determining the look_back time steps for lstm. These two criteria explain the linear correlation of time series.
        For those nonlinear correlation time series sequences, the ACF/PACF is not truncating or tailing and ARIMA cannot be used to model them.

        Then I use lstm to model the nonlinear correlation time series sequences and lstm is good at it. Unfortunately, the ACF/PACF is not able to find the time lag in applying lstm.

        Before applying lstm for a time series prediction, I must decide the reshape size. However, I cannot find any information on the internet about how to determine it. Is there any book or tutorial can help me to solve this problem?

        Thank you very much.

  27. Eriz May 29, 2018 at 6:39 pm #

    Hi Jason,

    Thanks for the article and clarifying the dimensions which some of us have trouble with them.

    However, my question goes to something I didn’t find anybody asked in the Q&A:

    Why do you put 32 units in the input LSTM layer?

    I mean, if you have 2 features in each of the 10 time steps and one sample example, why would we want to have more than 10 neurons in the first input layer?

    As I understand LSTMs, each neuron gets feed with the features of one specific time step (in the cell images of colah’s blog it is stated as Xt, as you will surely know).

    If you feed the first one with “t” and continue like “t+1,t+2,t+3…t+10”, what time step will we use in the case of t+17 for example which would be the 17th neuron?

    In fully connected ANN the first input layer has the same number of neurons that of features. Is there anything I’m missing or is there any rule to select the number of neurons if we choose that our input layer is a LSTMs one?

    Thanks for the attention and for correcting any error that I may be not understanding.

    • Jason Brownlee May 30, 2018 at 6:38 am #

      The number of units in the hidden layer is unrelated to the number of input or output time steps.

      We configure neural nets by trial and error, more here:
      https://machinelearningmastery.com/faq/single-faq/how-many-layers-and-nodes-do-i-need-in-my-neural-network

      • Eriz May 30, 2018 at 9:08 am #

        Hi again Jason,

        Thanks for the quick reply.

        Let me please introduce some numbers:

        Input_shape = (300, 10, 2)
        Batch_size = 1
        Num_units in input/first LSTM layer = 32

        So, as you say, if the units in the input LSTM layer (I am supposing that it is the first layer we use) are not related to the time steps, each time we feed a batch of data into that layer through “Xt” we will feed one row (one sample) of those 300 with 10 columns and we will do it two times: one for the first feature and another for the second feature, and the important point, this feeding will be to every unit of those 32 that compose the LSTM layer. Am I getting the point?

        I get confused because in normal feedforward ANN, the first layer (the input layer) has as many nodes as features we have, so that we can feed each feature in one node.

        If you could clarify this for me, you would be doing me a big favour because there is not much insight about this details elsewhere.

        Thanks in advance,

        • Jason Brownlee May 30, 2018 at 3:08 pm #

          If your batch size is 1, then each batch will contain one sample (sequence).

          Yes, the sequence will be exposed to each unit in the first hidden layer.

          • Eriz May 30, 2018 at 5:43 pm #

            Hi Jason,

            Okey, perfect. Now I get almost all the points.

            Thank you for your kindness,

  28. Manuel Gonçalves June 14, 2018 at 8:16 am #

    Hi Jason, thank you for the articles and books… I just have some open questions about shape. Since I have a 2D multivariate data ex: (samples = 1024, features = 6) , and make a supervised learning dataset with ten (10) lags, the shape will be (samples = 1024, features = 60).
    The question is: The shape for LSTM is (samples, timesteps, features) so it will be data.reshape(1024, 10, 60) ? I dont, understand why some tutorials use something lile (1, 10, 1) and how to reshape/split train/test on the new shape. The steps are:

    1 – convert to supervised problem.
    2 – reshape the entire 2D dataset or split here and reshape after?
    3 – how about shape of Y to make predictions?

    I just need a step with these key points… Thanks for the excelent posts.

    • Jason Brownlee June 14, 2018 at 4:06 pm #

      From your description, there is no need to worry about the lag, the time steps take care of that.

      The shape would be (1024, 10, 6)

      • Manuel Gonçalves June 22, 2018 at 12:02 am #

        Hi again Jason, thanks for the reply. On your example for one and multiple features, you say:
        – consider a matrix of 2 columns with 10 rows, This data can be framed as 1 sample with 10 timesteps and 2 features.

        So, this “1 sample” is drive me crazy. When I have a 2D data like lines vs columns (sample, features), I thought the number of samples will be the number of lines of a matrix 2D data; So it always be (sample, features) –> (sample, timesteps, features). On your example, the rows turned into timesteps and I can’t realize this sample = 1 in this post. Why one sample? Why rows become timesteps here and become samples on other examples?
        Another question is: After rechape input data, how to reshape X_train, y_train and new data for predictions.

        • Jason Brownlee June 22, 2018 at 6:11 am #

          It is challenging in the beginning.

          Think about it like this: you are taking a 2D dataset and projecting it into a 3D space.

  29. Hajar June 21, 2018 at 6:04 pm #

    Hello,

    Why do we need to reshape data on 3 dimensions for the LSTM

    Thank you

    • Jason Brownlee June 22, 2018 at 6:03 am #

      Because LSTMs expect data as input in 3 dimensions.

  30. akbar June 26, 2018 at 7:51 pm #

    Hi Thank you so much for this article, it helped me understand Keras and Overall Input thing.
    I am really confused how can I prepare the output data. Overall the output

    at one point we use


    model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=3, batch_size=64)


    I have (25000, 15) input shape, How can I prepare the overall output

  31. koho July 3, 2018 at 5:55 pm #

    Thanks for sharing. I am confused about the padding and “sliding window” method. Suppose the dataset contains two sequences s1,s2 and time_step is set to 3, then s1=(1,2,3,4,5) should have 2 subsequences: [(1,2,3), (2,3,4)], s2=(6,7,8,9,10,11,12) should have 4 subsequences: [(6,7,8), (7,8,9), (8,9,10), (9,10,11)]. Theses 6 subsequences have the same length equal to time_step, so it can be reshaped to 3D tensor (6, 3, 1) without padding s1 and s2 into same length. If all the sequences length are greater than the time_step, then we don’t need to pad the sequences into same length. Am I right?

    • Jason Brownlee July 4, 2018 at 8:20 am #

      Padding is only required if the number of time steps differ and/or if obs for a time step are missing.

  32. Anna July 10, 2018 at 4:13 am #

    Hi, thanks for the great article!

    Say I have a normalized 2D array data with a shape of (10,2)

    but when I want to reshape the data to a 3D array of (10,3,2), I got an error saying:

    “ValueError: cannot reshape array of size 20 into shape (10,3,2)”

    It seems that the previous 2D array multiply the samples of data of 10 with the input dimensions of 2 before reshaping it to a 3D array, and perhaps that caused the error?

    Thanks in advance,

    • Jason Brownlee July 10, 2018 at 6:52 am #

      You need more data to go from (10,2) to (10,3,2), think about it, maybe even draw a pic of it. You are invention dimensions that don’t exist in your original data.

      You would beed (10,6) to go to (10,3,2)

      • Anna July 10, 2018 at 3:17 pm #

        Ok I think I got it, I should actually divide the samples with the timesteps, because doing this finally solved my problem! Thanks Jason!

  33. Anna July 10, 2018 at 3:07 pm #

    Ok I see. So, does reframed first the data with lagged t-n solved the issue?

  34. Rana July 13, 2018 at 12:45 am #

    If you may I Have a Question : I Have 20 Topics (classes) each topic have 700 files each file is a represent a document but in word embedding representation (size of each file : number of words X 300 features ) I want to train a LSTM Network is it possible and how ?

    • Jason Brownlee July 13, 2018 at 7:42 am #

      You can get started with LSTMs and text data here:
      https://machinelearningmastery.com/start-here/#nlp

      • Rana July 17, 2018 at 12:00 am #

        Thank you so much I will look it up …

        • Rana July 18, 2018 at 8:55 pm #

          I have another question please, so for my problem does your book “Deep Learning for Natural Language Processing” have LSTM in it because I don’t only want to take the word embedding only but I want to take word’s order in consideration .
          or I’ll need your book “Long Short-Term Memory Networks With Python” also ?

          Sorry for bothering you with my questions but I’m really stuck and I don’t have anyone who can help me in this matter.

          Thanks in advance…

          Best Regards,

          • Jason Brownlee July 19, 2018 at 7:50 am #

            I give examples of addressing NLP problems with LSTMs as well as other networks like MLPs and CNNs in “deep learning for nlp”.

  35. Neda July 14, 2018 at 5:24 am #

    Thanks, Jason, for your wonderful blog posts!

    I have a question regarding the input shape which I cannot find a solid answer to. I don’t know how much this question is related to this blog post, but would appreciate to hear your answer to my question:

    I have a training set which contains sequences of images (say n is the number of the images in the sequence and c, h, w are channel , height, and width). I have trained a CNN-LSTM on that with the input shape of (n,c,h,w).

    Now, for predicting through this network, it seems I have to feed sequences of data to it at each time (not a single frame). That is, with each new frame I need to update the sequence and feed it to the network to get the results.

    However, I was under the impression that when dealing with RNN or LSTM, we can feed one frame at a time (because of recurrency), rather than feeding the whole sequence. Was this impression wrong?

    So, briefly, when having an LSTM network for real time prediction, do we need to feed sequences to the network, or are there cases that we may feed a single signal/frame/datapoint?

    Thanks a lot in advance!

    • Jason Brownlee July 14, 2018 at 6:23 am #

      Yes, you can feed one frame of video at a time and have the CNN interpret the frames, then the LSTM put the sequence together and interpret them all.

      I have an example of this in my LSTM book. I have a summary of how to do this in Keras here:
      https://machinelearningmastery.com/cnn-long-short-term-memory-networks/

      • Neda July 14, 2018 at 7:08 am #

        Thanks! I went through your other blog post before (and now again). But still I don’t see how I can feed one frame at a time. How about the input size?

        Do you mean I can have a CNN and a seperate LSTM. Feed frames one at a time to CNN and then in a sequence to LSTM? This means, again, I have to create the sequence myself to feed to the network?

        What I don’t understand is that the input-shape of the trained network is defined to be (n, c, h, w), how can I feed an input of shape (1,c,h,w) when n is not 1?

        • Neda July 14, 2018 at 7:12 am #

          By the way, I have already wrapped my CNN in TimeDistributed layer. my code is as below

          model.add(TimeDistributed(Conv2D(24, (5, 5), padding=’same’, activation=’relu’, kernel_constraint = maxnorm(3), kernel_initializer=’he_normal’), input_shape=(5,1, 125, 150)))
          model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2), strides = (2,2))))
          model.add(TimeDistributed(Dropout(0.4)))

          model.add(TimeDistributed(Conv2D(36, (5, 5), activation=’relu’, padding=’same’, kernel_constraint=maxnorm(3))))
          model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2), strides = (2,2))))
          model.add(TimeDistributed(Dropout(0.6)))

          model.add(TimeDistributed(Conv2D(50, (5, 5), padding=’same’, activation=’relu’, kernel_constraint = maxnorm(3))))
          model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2), strides = (2,2))))
          model.add(TimeDistributed(Dropout(0.4)))

          model.add(TimeDistributed(Conv2D(70, (5, 5), padding=’same’, activation=’relu’, kernel_constraint = maxnorm(3))))
          model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2), strides = (2,2))))
          model.add(TimeDistributed(Dropout(0.5)))

          model.add(TimeDistributed(Flatten()))

          # define LSTM model
          model.add(LSTM(128, return_sequences=True))
          model.add(LSTM(32))
          model.add(Dense(2, activation=’sigmoid’))

        • Jason Brownlee July 15, 2018 at 6:02 am #

          No, it is one model. The size to the CNN is the size of one image. Images are exposed to the model in a sequence, something like:
          [samples, frames, width, height, channels]

  36. WESIN RIBEIRO ALVES July 24, 2018 at 3:12 am #

    Thanks, Jason, for your wonderful post!

    I created a model before to read your post, and I see that I made a mistake. I swapped time step by feature. How much this can impact in model performance?

    Best regards!

    • Jason Brownlee July 24, 2018 at 6:22 am #

      The model learns over time. Time is key to the models understanding of the sequence.

  37. Guy Ben Mayor July 26, 2018 at 5:42 pm #

    Hi, thanks for the great article 🙂

    One thing I did’t understand on LSTM network-
    If the output of each time step suppose to predict the next input,
    how come that the input vector dimension (witch related to the features number) not equal to output vector dimension (witch related to the number of units in the layer)

  38. IM July 31, 2018 at 1:05 am #

    Hi Jason,

    Many thanks for the post – it was really useful!

    I would just like to run my problem through with you just to verify you feel the approach outlined in this tutorial is suitable for me:

    I have two columns of data – one is resonance energies and the other is corresponding neutron widths. I want to feed 70% of this data to the network i.e values of resonance energies and neutron neutrons. Then ask the network to predict the neutron width values given the remaining 30% unseen resonance energies.

    I wanted an LSTM layer as it may help use previous computations in its current prediction.

    So I believe I have 2 inputs and one output.

    If i have 300 values of [resonance energies, neutron widths] (i.e 300 rows of data) would my reshape be:
    reshape(1, 300, 2) or reshape(1, 300, 1) ? I’m not sure if the second column is technically a feature as its meant to be the output.

    Also would i need any explicit pairing given each resonance energy is related to the neutron width on the same row? Or should i use some key-value pair?

    (This is also the first experiment, I also hope to then use resonance energy and neutron width to predict another variable but essentially in exactly the same way as described in this problem just the new experiment contains one more feature)

  39. IM August 1, 2018 at 9:23 am #

    Thank you very much for the swift reply.

    Ah my data is here
    https://pastebin.com/index/9qwJU3AQ

    I have read the link you provided however I am unclear as to whether my data allows me to drop the time variable as mentioned in your article. If so, I could perhaps have the sample of [1,300,1] as opposed to [1,300,2]

    One query I have is I’m getting a score of ‘Test Score: 0.00 MSE (0.01 RMSE)” for my test set (which is 30% of my samples) would not having enough samples really be shown by such low RMSE scores? If anything doesn’t that show the predictions are almost too good (or overfitting)?

    Sorry one final thing from reading this tutorial – If one uses an LSTM layer, is it still possible to use look_back? (an argument used quite frequently in your other tutorials when creating a dataset). If I am correct, an LSTM layer essentially allows for previous calculations to be examined when determining the current calculation, but look_back determines how many previous timesteps can be consider at each timestep calculation?

    • Jason Brownlee August 1, 2018 at 2:22 pm #

      I do not have the capacity to shape the data for you. I believe you have everything you need to shape your data for an LSTM model.

      A look-back refers to the number of prior time steps of data to feed to the model in one sample. E.g. the “timesteps” in [samples, timesteps, features].

      • Isaac August 3, 2018 at 5:11 am #

        Hi Jason,

        Sure that sounds good I will have a go at that.

        One quick question I had is when I plot my results in many of your tutorial you tend to use the lines:

        testPredictPlot = numpy.empty_like(dataset)
        testPredictPlot[:, :] = numpy.nan
        testPredictPlot[len(trainPredict)+(look_back)+1:len(dataset)-1, :] = testPredict

        so the first line simply creates the numpy matrix like dataset,
        but does the second line fill it with nan values? (if so why, or is it just a check?)
        The third line then shift the test predict plot?

        Many thanks

  40. Ray August 18, 2018 at 11:26 pm #

    Hi Jason,
    Thank you for this post. I’ve learned a lot from it.
    I have a question about my LSTM model for classification.
    My input data is 4842 samples, 34 time steps, 254 features. In other words, it’s (4842,34,254).
    I have trained it with proper parameters. Although I got a decent result with 98% accuracy on validation data, I got pretty low accuracy at around 20% on test data (from separate data).
    My first thought is overfitting but I also tried callback function such as earlystopping or reducelronplateau. It does not give me a better result.
    Could you give me any suggestions on this issue?

    Many thanks!

Leave a Reply