How to Diagnose Overfitting and Underfitting of LSTM Models

Last Updated on

It can be difficult to determine whether your Long Short-Term Memory model is performing well on your sequence prediction problem.

You may be getting a good model skill score, but it is important to know whether your model is a good fit for your data or if it is underfit or overfit and could do better with a different configuration.

In this tutorial, you will discover how you can diagnose the fit of your LSTM model on your sequence prediction problem.

After completing this tutorial, you will know:

  • How to gather and plot training history of LSTM models.
  • How to diagnose an underfit, good fit, and overfit model.
  • How to develop more robust diagnostics by averaging multiple model runs.

Discover how to develop LSTMs such as stacked, bidirectional, CNN-LSTM, Encoder-Decoder seq2seq and more in my new book, with 14 step-by-step tutorials and full code.

Let’s get started.

Tutorial Overview

This tutorial is divided into 6 parts; they are:

  1. Training History in Keras
  2. Diagnostic Plots
  3. Underfit Example
  4. Good Fit Example
  5. Overfit Example
  6. Multiple Runs Example

1. Training History in Keras

You can learn a lot about the behavior of your model by reviewing its performance over time.

LSTM models are trained by calling the fit() function. This function returns a variable called history that contains a trace of the loss and any other metrics specified during the compilation of the model. These scores are recorded at the end of each epoch.

For example, if your model was compiled to optimize the log loss (binary_crossentropy) and measure accuracy each epoch, then the log loss and accuracy will be calculated and recorded in the history trace for each training epoch.

Each score is accessed by a key in the history object returned from calling fit(). By default, the loss optimized when fitting the model is called “loss” and accuracy is called “acc“.

Keras also allows you to specify a separate validation dataset while fitting your model that can also be evaluated using the same loss and metrics.

This can be done by setting the validation_split argument on fit() to use a portion of the training data as a validation dataset.

This can also be done by setting the validation_data argument and passing a tuple of X and y datasets.

The metrics evaluated on the validation dataset are keyed using the same names, with a “val_” prefix.

2. Diagnostic Plots

The training history of your LSTM models can be used to diagnose the behavior of your model.

You can plot the performance of your model using the Matplotlib library. For example, you can plot training loss vs test loss as follows:

Creating and reviewing these plots can help to inform you about possible new configurations to try in order to get better performance from your model.

Next, we will look at some examples. We will consider model skill on the train and validation sets in terms of loss that is minimized. You can use any metric that is meaningful on your problem.

3. Underfit Example

An underfit model is one that is demonstrated to perform well on the training dataset and poor on the test dataset.

This can be diagnosed from a plot where the training loss is lower than the validation loss, and the validation loss has a trend that suggests further improvements are possible.

A small contrived example of an underfit LSTM model is provided below.

Running this example produces a plot of train and validation loss showing the characteristic of an underfit model. In this case, performance may be improved by increasing the number of training epochs.

In this case, performance may be improved by increasing the number of training epochs.

Diagnostic Line Plot Showing an Underfit Model

Diagnostic Line Plot Showing an Underfit Model

Alternately, a model may be underfit if performance on the training set is better than the validation set and performance has leveled off. Below is an example of an

Below is an example of an an underfit model with insufficient memory cells.

Running this example shows the characteristic of an underfit model that appears under-provisioned.

In this case, performance may be improved by increasing the capacity of the model, such as the number of memory cells in a hidden layer or number of hidden layers.

Diagnostic Line Plot Showing an Underfit Model via Status

Diagnostic Line Plot Showing an Underfit Model via Status

4. Good Fit Example

A good fit is a case where the performance of the model is good on both the train and validation sets.

This can be diagnosed from a plot where the train and validation loss decrease and stabilize around the same point.

The small example below demonstrates an LSTM model with a good fit.

Running the example creates a line plot showing the train and validation loss meeting.

Ideally, we would like to see model performance like this if possible, although this may not be possible on challenging problems with a lot of data.

Diagnostic Line Plot Showing a Good Fit for a Model

Diagnostic Line Plot Showing a Good Fit for a Model

5. Overfit Example

An overfit model is one where performance on the train set is good and continues to improve, whereas performance on the validation set improves to a point and then begins to degrade.

This can be diagnosed from a plot where the train loss slopes down and the validation loss slopes down, hits an inflection point, and starts to slope up again.

The example below demonstrates an overfit LSTM model.

Running this example creates a plot showing the characteristic inflection point in validation loss of an overfit model.

This may be a sign of too many training epochs.

In this case, the model training could be stopped at the inflection point. Alternately, the number of training examples could be increased.

Diagnostic Line Plot Showing an Overfit Model

Diagnostic Line Plot Showing an Overfit Model

6. Multiple Runs Example

LSTMs are stochastic, meaning that you will get a different diagnostic plot each run.

It can be useful to repeat the diagnostic run multiple times (e.g. 5, 10, or 30). The train and validation traces from each run can then be plotted to give a more robust idea of the behavior of the model over time.

The example below runs the same experiment a number of times before plotting the trace of train and validation loss for each run.

In the resulting plot, we can see that the general trend of underfitting holds across 5 runs and is a stronger case for perhaps increasing the number of training epochs.

Diagnostic Line Plot Showing Multiple Runs for a Model

Diagnostic Line Plot Showing Multiple Runs for a Model

Further Reading

This section provides more resources on the topic if you are looking go deeper.


In this tutorial, you discovered how to diagnose the fit of your LSTM model on your sequence prediction problem.

Specifically, you learned:

  • How to gather and plot training history of LSTM models.
  • How to diagnose an underfit, good fit, and overfit model.
  • How to develop more robust diagnostics by averaging multiple model runs.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop LSTMs for Sequence Prediction Today!

Long Short-Term Memory Networks with Python

Develop Your Own LSTM models in Minutes

…with just a few lines of python code

Discover how in my new Ebook:
Long Short-Term Memory Networks with Python

It provides self-study tutorials on topics like:
CNN LSTMs, Encoder-Decoder LSTMs, generative models, data preparation, making predictions and much more…

Finally Bring LSTM Recurrent Neural Networks to
Your Sequence Predictions Projects

Skip the Academics. Just Results.

Click to learn more.

103 Responses to How to Diagnose Overfitting and Underfitting of LSTM Models

  1. Shiraz Hazrat September 1, 2017 at 6:26 am #

    Very useful insight. Thanks for sharing!

  2. Jon September 1, 2017 at 8:36 am #

    Some good info thanks. Are you sure your x axis units are correct though?

    • Jason Brownlee September 1, 2017 at 8:42 am #

      Yes, in one case (Overfit Example) I truncated the data for readability.

  3. Irati September 1, 2017 at 5:26 pm #

    Hi Jason!

    And what if the loss fluctuates? I get up and down peaks after the second epoch and even with 25 epochs the loss in the validation set is greater than 0.5

    Could you give me some clue of what is happening?


    • Jason Brownlee September 2, 2017 at 6:04 am #

      Good question.

      I would recommend looking at the trends in loss over 10s to 100s of epochs, not over very short periods.

  4. Eshika Roy September 1, 2017 at 10:31 pm #

    Thank you for posting this informative blog on how to diagnose LSTM models. i will definitely try this nifty trick. Please keep on sharing more helpful tips and suggestions in the upcoming posts.

    • Jason Brownlee September 2, 2017 at 6:08 am #

      Thanks. I hope it helps.

      What other types of tips would you like me to write about?

  5. Louis September 2, 2017 at 12:57 am #

    Hi, James! Any tips how to detect overfitting without validation set (when I have Dropout layers) ?
    (I am beginner at deep learning)

    • Jason Brownlee September 2, 2017 at 6:14 am #

      The idea of overfitting the training set only has meaning in the context of another dataset, such as a test or validation set.

      Also, my name is Jason, not James.

      • Long October 17, 2017 at 9:41 am #

        Hi Jason,

        Could a test dataset be used to detect overfitting or underfitting to a training dataset without validation dataset? How to do? Is it different to the method using validation dataset?

        Thanks a lot. BTW, your lessons are quite benefit, helpful to study machine learning.

        • Jason Brownlee October 17, 2017 at 4:04 pm #

          Perhaps, but one data point (as opposed to an evaluation each epoch) might not be sufficient to make claims/diagnose model behavior.

  6. Andrei September 2, 2017 at 11:28 pm #

    Hi Jason,

    Hyper-parameter tuning for LSTMS is something really useful – especially in the context of time-series. Looking forward for a blog-post on this topic..


    • Jason Brownlee September 3, 2017 at 5:44 am #

      What would you like to see exactly? What parameters?

      I have a few posts on tuning LSTMs.

  7. Amin September 2, 2017 at 11:42 pm #

    Thank you for your post Jason.
    There is also another case, when val loss goes below training loss! This case indicates a highly non-stationary time series with growing mean (or var), wherein the network focuses on the meaty part of the signal which happenes to fall in the val set.

  8. Andrey Sharypov September 3, 2017 at 4:47 am #

    Hi Jason!
    Thank you for this very useful way of diagnosis of LSTM. I’m working on human activity recognition. Now my plot looks like this What can you advise?
    I’m trying to classify fitness exercises.

    • Jason Brownlee September 3, 2017 at 5:49 am #

      Great work!

      Maybe try early stopping around epoch 10?

      • Andrey Sharypov September 3, 2017 at 4:30 pm #

        In this case my accuracy will be:
        Train on 21608 samples, validate on 5403 samples

        21608/21608 [==============================] – 802s – loss: 0.2115 – acc: 0.9304 – val_loss: 0.1949 – val_acc: 0.9337
        Epoch 7/50
        21608/21608 [==============================] – 849s – loss: 0.1803 – acc: 0.9424 – val_loss: 0.2132 – val_acc: 0.9249
        Epoch 8/50
        21608/21608 [==============================] – 786s – loss: 0.1632 – acc: 0.9473 – val_loss: 0.2222 – val_acc: 0.9297
        Epoch 9/50
        21608/21608 [==============================] – 852s – loss: 0.1405 – acc: 0.9558 – val_loss: 0.1563 – val_acc: 0.9460
        Epoch 10/50
        21608/21608 [==============================] – 799s – loss: 0.1267 – acc: 0.9590 – val_loss: 0.1453 – val_acc: 0.9606
        Epoch 11/50
        21608/21608 [==============================] – 805s – loss: 0.1147 – acc: 0.9632 – val_loss: 0.1490 – val_acc: 0.9567
        Epoch 12/50
        21608/21608 [==============================] – 788s – loss: 0.1069 – acc: 0.9645 – val_loss: 0.1176 – val_acc: 0.9626
        Epoch 13/50
        21608/21608 [==============================] – 838s – loss: 0.1028 – acc: 0.9667 – val_loss: 0.1279 – val_acc: 0.9578
        Epoch 14/50
        21608/21608 [==============================] – 808s – loss: 0.0889 – acc: 0.9707 – val_loss: 0.1183 – val_acc: 0.9648
        Epoch 15/50
        21608/21608 [==============================] – 785s – loss: 0.0843 – acc: 0.9729 – val_loss: 0.1000 – val_acc: 0.9706

        After 50 epochs accuracy:
        Epoch 50/50
        21608/21608 [==============================] – 793s – loss: 0.0177 – acc: 0.9950 – val_loss: 0.0772 – val_acc: 0.9832

        Also I didn’t use dropout and regularization.
        One of my class (rest) have much more samples than other (exercises)
        Confusion matrix –

        I use next model:
        model = Sequential()
        model.add(Bidirectional(LSTM(128), input_shape = (None,3)))
        model.add(Dense(9, activation=’softmax’))
        model.compile(loss=’categorical_crossentropy’, optimizer=’rmsprop’, metrics=[‘accuracy’])

        mcp = ModelCheckpoint(‘best_model_50_epochs.hd5’, monitor=”val_acc”,
        save_best_only=True, save_weights_only=False)
        history =,

        And I tried dropout:
        model.add(Bidirectional(LSTM(128, dropout=0.5, recurrent_dropout=0.5), input_shape = (None,3)))
        train_loss vs val_loss –
        accuracy on 50 epochs:
        loss: 0.2269 – acc: 0.9244 – val_loss: 0.1574 – val_acc: 0.9558

      • Andrey Sharypov September 7, 2017 at 6:34 pm #

        Hi Jason! What can you advise to increase accuracy, when I have multi classes and one class takes 50% of samples?
        (I showed my model above)
        Thank you!

  9. Andrei September 3, 2017 at 6:35 pm #

    Activation, batch_size (I noticed it correlates with the test size but not always), loss function, number of hidden layers, number of memory cells, optimizer type, input series history length (

    For all of these parameters I do trial/error testing. Would be nice to know if there is a more systematic way to find the best parameters.

  10. Andrei September 4, 2017 at 4:38 pm #

    Thank you, Jason! Great work with your blog!

  11. Leonard September 4, 2017 at 6:16 pm #

    Hi Jason,

    Great post to for us to know how to diagnose our program! I have been playing with RNNs for a while now. And i started by reading your previous posts. They were great for beginners to start off with!

    Now as i want to go into more advanced level, i hope i could get your insights/summary on more interesting recent/state-of-the-art models like WaveNet, Deep Speech, using Attention with keras, etc.


  12. Mazen September 9, 2017 at 12:15 am #

    Hi Jason,
    thank you for this very useful and clear post.
    I have a question.
    When we set validation_data in fit(), is it just to see the behaviour of the model when fit() is done (this is what i gues!), or does Keras use this validation_data somehow while optimizing from epoch to another to best fit such validation_data? (This is what I hope :-))
    I usually prepare three disjoint data sets: training data to train, validation data to optimize the hyper-parameters, and at the end testing data, as out-of-sample data, to test the model. Thus, if Keras optimizes the model based on validation_data, then I don’t have to optimize by myself!

    • Jason Brownlee September 9, 2017 at 11:58 am #

      Validation data is only used to give insight into the skill of the model on unseen data during training.

  13. Long September 11, 2017 at 11:16 pm #

    Hi Jason,

    Why accuracy value always are 0s when I regress? Loss values decreased. What is the reason for this? Thank you.

    model.compile(optimizer=’adam’, loss=’mse’, metrics=[‘mean_squared_error’, ‘accuracy’])

    • Jason Brownlee September 13, 2017 at 12:23 pm #

      We cannot measure accuracy for regression. It is a measure of correct label predictions and there are no labels in regression.

  14. Shabnam October 23, 2017 at 8:15 pm #


    Thank you for your helpful posts.
    Is overfitting always solvable?

    In other words, what are the necessary and sufficient conditions for a data to be trainable?
    Maybe we have enough data, but because data is not trainable, we have overfit.

  15. Farzad November 7, 2017 at 6:41 am #

    Hi Jason,

    Thanks for your great and useful post.

    I have a kind of weird problem in my LSTM train/validation loss plot. As you can see here [1], the validation loss starts increasing right after the first (or few) epoch(s) while the training loss decreases constantly and finally becomes zero. I used drop-out to deal with this severe overfitting problem, however, at the best case, the validation error remains the same value as it was in the first epoch. I’ve also tried changing various parameters of the model but in all configuration, I have such a increasing trend in validation loss. Is this really an overfitting problem or something is wrong with my implementation or problem phrasing?


    • Jason Brownlee November 7, 2017 at 9:55 am #

      You could explore models with more representational capacity, perhaps more neurons or more layers?

  16. Cliff November 16, 2017 at 1:57 pm #

    Hi, Jason.I’m quite shocked by your posts. I’m facing a hard situation where validation loss >>
    training loss when using LSTM so I googled it . Here and here they suggested that this can be overfitting. But in your posts, this should be underfitting. I’m confused, can you explained it ?

  17. Ahmad B December 4, 2017 at 3:22 pm #

    Thanks for the good article i want to compare it with other tip and trick to make sure there is no conflict.
    – Here you say:
    Underfit: Training perform well, testing dataset perform poor.
    – In other tip and trick it say:
    Overfit: Training loss is much lower than validation loss

    Is both tips are the same with different diagnostic ? since perform well should mean loss is low

  18. Ishaan December 6, 2017 at 2:25 am #

    Hi Jason,
    There is something that I did not understand from ‘3. underfit’ example. The graph of the first example in this section shows the validation loss decreasing and you also vouch for loss to decrease even further if the network is trained even more with more epochs. How is it possible that the network will give correct prediction on data sequences that it has never seen before. Specifically how can learning on sequences from 0.0 to 0.5 (data from get_train() function) improve prediction on 0.5 to 0.9 (data from get_val() fucntion)

    And other thing i want to mention is i have been reading posts on your blog for a month now though commenting for the first time and I would like to tha k you for the persistent content you’ve put here and making machine learning more accessible.

    • Jason Brownlee December 6, 2017 at 9:08 am #

      Good question, in the ideal case the model will generalize from examples.

      Thanks for your support!

      • Ishaan December 6, 2017 at 3:51 pm #

        I did not get that Jason, could you please elaborate a bit. Thanks!

  19. Abdur Rehman Nadeem December 18, 2017 at 8:34 am #

    Hi Jason,

    I tested the underfit example for 500 epochs, and I mentioned the ” metrics = [‘accuracy’] ” in the model.compile functions but when I fit the model , the accuracy metrics was 0 for all 500 epochs. Why accuracy is 0% in this case ?

  20. Abdur Rehman Nadeem December 19, 2017 at 5:50 am #

    What do you mean by poor skill ? I tested this blog example (underfit first example for 500 epochs , rest code is the same as in underfit first example ) and checked the accuracy which gives me 0% accuracy but I was expecting a very good accuracy because on 500 epochs Training Loss and Validation loss meets and that is an example of fit model as mentioned in this blog also.

    • Jason Brownlee December 19, 2017 at 3:54 pm #

      Sorry for the confusion, model skill refers to the model’s ability to make high-quality predictions.

  21. Abdur Rehman Nadeem December 20, 2017 at 3:12 am #

    I tested the good fit example of this blog and its also giving me 0 % accuracy. If a good fit model gives 0% accuracy, we have nothing to tune its accuracy because it is already good fit. However, good fit model should give very high quality predictions. I think you should check yourself accuracy of good fit example of this blog and please give some comments on it to clear the confusion.

  22. Malte February 1, 2018 at 5:41 am #

    great article, thank you!

    Two short question:
    1) The way you designed the code, the states will not be reseted after every epoch which is a problem. Is there a way to include model.state_reset without training the for only one epoch in a for loop? So basically:,y, epochs = 100, validation_data = (valx, valy), shuffle = FALSE, stateful = TRUE, state_reset= TRUE) or something like that?

    2) If that is not possible, how can I imitate the same behavior? Would it be something like?

    for i in range(epos):,y, epos=1, stateful=true)
    yhat_t = model.predict(train, …)
    yhat_val = model.predict(val, …)
    train_error = msqrt_error(y, yhat_t)
    val_error = msqrt_error(y, yhat_val)

    I just want to plot the error and accuracy development over the epochs


    • Jason Brownlee February 1, 2018 at 7:27 am #

      To manage state you must execute the epochs manually one at a time in a loop.

  23. erwin March 9, 2018 at 7:26 pm #

    Is it available also for DecisionTreeClasfier ?

  24. Michael April 10, 2018 at 10:53 pm #

    Hi Dr Brownlee,

    I’m working on a classification problem using CNN layers, however I’m getting strange loss curves for training and validation. Training loss decreases very rapidly to convergence at a low level with high accuracy. My validation curve eventually converges as well but at a far slower pace and after a lot more epochs. Thus there is a huge gap between the training and validation loss that suddenly closes in after a number of epochs.

    I am using Keras/ Tensorflow Architecture is Conv + Batch Normalization + Convo + Batch Normalization + MaxPooling2D repeated 4 times, my CNN is meant to classify an image as one out of around 30 categories. I have more details posted on the following stack exchange

    Do you have any insights into what the problem may be for this situation?

    • Jason Brownlee April 11, 2018 at 6:39 am #

      Perhaps the model is underspecified? Explore larger models to see if it makes any difference.

  25. Ahmad Aldali April 27, 2018 at 5:39 pm #

    Hello Jason.
    Thank you for this. I’m working on machine translation using LSTM. Now my plot looks like this . What can you advise?
    I’m trying to improve acc.
    I’m using 4 Layer in encoder and decoder.
    Here , the basic implementation.

    encoder1 = LSTM(lstm_units, return_state=True , return_sequences = True , name = ‘LSTM1’)
    encoder2 = LSTM(lstm_units, return_sequences=True, return_state=True , name = ‘LSTM2’)
    encoder3 = LSTM(lstm_units, return_sequences=True, return_state=True , name = ‘LSTM3’)
    encoder4 = LSTM(lstm_units, return_state=True , name = ‘LSTM4’)
    encoder_outputs4 , state_h4 , state_c4 = encoder4(encoder3(encoder2(encoder1(embedded_output))))

    decoder_lstm1 = LSTM(lstm_units, return_sequences=True, return_state=True , name = ‘Dec_LSTM1’)
    decoder_lstm2 = LSTM(lstm_units, return_sequences=True, return_state=True , name = ‘Dec_LSTM2’)
    decoder_lstm3 = LSTM(lstm_units, return_sequences=True, return_state=True , name = ‘Dec_LSTM3’)
    decoder_lstm4 = LSTM(lstm_units, return_sequences=True, return_state=True , name = ‘Dec_LSTM4′)
    decoder_outputs4 , _, _ = decoder_lstm4(decoder_lstm3(decoder_lstm2(decoder_lstm1(embedded_Ar_output,
    decoder_dense = Dense(output_vector_length, activation=’softmax’)
    decoder_outputs = decoder_dense(decoder_outputs4)

    *lstm_units : 500
    *epochs : 200

    fit model part:
    #Compile Model
    model.compile(loss=’mse’, optimizer=’adam’ , metrics=[‘acc’] )

    #Train and fit model # Learning!
    history =[dataset_encoder_input, dataset_decoder_input], target ,

    *batch_size = 512

  26. JorgepeñAraya May 27, 2018 at 4:40 pm #

    Hi, Jason, thanks for your website, it’s amazing …

    A question, exists in python (I am new in python) a “package” like a Caret in R, where I can with a few steps train an LSTM model (or other deep learning algorithms), that find parameters or hyperparameters (I give it a vector ), scale (or preprocess the data) and validation or cross validation simultaneously?

    the objective is not to make so many lines of command and to combine the search for optimal parameters or hyperparameters.

  27. Ching Li July 25, 2018 at 12:42 pm #

    Hi Jason!

    Awesome post with great plots. Do these methods of evaluating overfitting vs. underfitting generalize to models other than LSTM, e.g. feedforward neural network or CNN?

  28. Mike Hawkins August 9, 2018 at 10:38 pm #

    Hey Jason, first of all thanks for the post, all your posts are extremely useful. Anyway I am slightly confused by the second plot in section 3: underfit model. Is this not over fitting here? The model has low loss on the training error but high loss on the validation set i.e. not generalizing well? The recommendation you have in this case is to increase the capacity of the model, but would this not just improve the fit on the training set and hence widen this gap between val and training loss? Cheers, Mike

    • Jason Brownlee August 10, 2018 at 6:18 am #

      Note the lines start and remain far apart. This is the signal we are looking for when underfitting.

  29. Emma September 18, 2018 at 4:20 am #

    Dear Jason,

    I am using a LSTM to predict some timeseries data.

    Here is my model with early stopping.

    And here it is running for 100 epochs.

    Is this looking okay, am I overfitting? I am wondering why the validation loss is smaller than training loss in the beginning?

    Many thanks for your help, and giving us this all this great work.

  30. MLT September 28, 2018 at 6:52 pm #

    Hi Jason,

    It is a very useful article. In the example, only one value is forecasts. For time series prediction, it may need to forecast several values: t, t+1 … t+n. Do you know how history[‘loss’] and history[‘val_loss’] are calculated please? Or if there are other metric which we can refer to please?

    Thanks in advance.

      • MLT October 1, 2018 at 6:16 pm #

        Thanks a lot for reply.

        I have read this tutorial you mentioned before. Do you mean to create own function such as evaluate_forecasts(…) to calculate t+i rmse for each epoch, and then plot the rmse vs epoch?

        • Jason Brownlee October 2, 2018 at 6:23 am #

          Did the tutorial not help?

          • MLT October 2, 2018 at 7:39 pm #

            Hi Jason,

            Yes, the tutorial you mentioned helps, but I want to make sure if I understand you correctly.

            My question was how to plot train loss and validation loss for time series prediction t+1 … t+n. I just wonder if history[‘loss’] and history[‘val_loss’] are only for t+1, or they are the mean of t+1 … t+n. Is there other metric for this purpose?

            If there is no metric in history to measure train loss and validation loss for t+1 … t+n. Do I need to create an own function such as evaluate_forecasts(…) to calculate t+i rmse for each epoch?

            Thanks in advance.

          • Jason Brownlee October 3, 2018 at 6:16 am #

            Sorry, now I understand.

            There is no concept of loss at different time steps during training, the model calculates loss based on the output vector given an input.

            If you want to know error of each future time step during training, you can step through training epochs manually one by one, calculate the error and plot all errors at the end of the run.

  31. Ansh September 29, 2018 at 3:48 am #

    Hi Jason,

    I’m trying to make LSTM model.

    Below is my model code and the result.

    The model is over-fitting but I’m not sure how to converge this model. I changed my variable scaling with the help of this post still I am getting the same result :


    model = Sequential()
    model.add(LSTM(66, input_shape=(train_X.shape[1], train_X.shape[2])))
    model.compile(loss=’mae’, optimizer=’Adam’)
    # fit network
    history =, train_y, epochs=30, batch_size=66, validation_data=(test_X, test_y), verbose=2, shuffle=False)


    Could you Please help me understand what I am missing.

    • Jason Brownlee September 29, 2018 at 6:37 am #

      I’m happy to answer technical questions, but I cannot review your code for you, sorry.

  32. giuseppe October 22, 2018 at 6:56 pm #

    Hi Jason

    what should I thing If I have a good training loss and accuracy line and a divergent line in test?
    as for loss think of an X:
    from up left, to down right for training
    from down left, to up right for test


    • Jason Brownlee October 23, 2018 at 6:24 am #

      I’m not sure I follow, perhaps link to a picture of your graph?

  33. Nora November 1, 2018 at 10:23 pm #

    Hello Jason,
    Thank you for your great tutorials and books (my friend and I have purchased the NLP bundle and it has been a great help).

    I understand that a Good fit means that Validation error is low, but could be equal or slightly higher than the training error.
    My confusion is: how much difference would be considered okay and not mean overfitting?
    e.g if in the last epoch train loss = .08 and val_loss= .11 is this an indication of overfitting or not?

    • Jason Brownlee November 2, 2018 at 5:50 am #

      It really depends on the problem. There is no general rule.

      Overfitting is clear if performance continues to get better on the training set and starts to get worse on the test or validation sets.

  34. Anne January 3, 2019 at 2:22 pm #

    Hello! First of all, thank you for explaining this overfitting and underfitting concept. This is very helpful, considering how thoroughly explained the steps are. But, if I may ask, do you have any concrete sources (or online journals will do) regarding to how you decided which model’s deemed as the underfitted/overfitted one? If you may, will you please include it in the post? Thank you so much, I’ll be looking forward to your response!

    • Jason Brownlee January 4, 2019 at 6:26 am #

      Not off hand, what are you looking for exactly?

      • Anne January 5, 2019 at 1:37 am #

        Oh! I’m sorry if it sounds confusing. But I’m looking for any journals/books/papers I can use to decide if a model’s considered overfitted, undefitted, or fitted. If you know any, can you please let me know? Thank you.

  35. Chris February 5, 2019 at 6:12 pm #

    Hi Jason,

    I am trying to build a model with my own time series data using your tutorial

    I can’t even get the train curve to go below 0.05. So its not an overfitting or underfitting problem. What should I look for?

  36. Shri March 2, 2019 at 6:05 am #

    Hello Jason!
    I am working on EEG dataset for classification purposes. The dataset consists of raw EEG signals of 65000x14x320 (sample X eegchannels X timesteps) as input data and labeled with 11 classes. The timesteps are not fixed but vary from 130-320. So I chose to zeropad them all to 320.

    Since data is sequential I opt to train them with LSTM in keras. Tried to play with various hyperparameters but all in vain. The model is not learning at all. The accuracy always hovers along 1/11, i.e no better than random.

    Any advice for handling EEG data in deep learning?

  37. Azaf March 3, 2019 at 2:40 am #

    Hello, Jason!

    First off, your tutorials have always been a great help!
    Secondly, I’m currently working with an ANN for which my validation loss is better then my training loss (my metric is mae). I’ve concluded that its an under-fit. However, no amount of hyper parameter tuning is fixing this. I’v tweaked everything from epochs to hidden layers and number of neurons per layer. Any advice on how to possibly handle this?


    • Jason Brownlee March 3, 2019 at 8:03 am #

      It may also be the case that the validation dataset is too small or not representative?

      Try changing it, using a large validation set?

  38. Dan S March 5, 2019 at 8:07 pm #

    Hi Jason

    Thanks for the article.

    We have a seemingly unusual problem with a prototype model we’re developing for comment analysis. Im using a Keras (Tensorflow layers) LSTM Model.

    We’re using the tensorflow.js framework in production. And have developed a deployment and training platform with a UI.

    The UI includes plots of the training summary coming back from the model.

    We recently trained a model which is supposed to infer on quite a simple (but specific) semantic rubric which may explain what’s coming next.

    We trained a model which is showing 99% accuracy (!) on unseen test data. This percentage is based on us programmatically counting the binary classifiers predictions as a response to new predictions. We are not reading back the validation accuracy or accuracy from the model.

    For all intensive purposes the model IS 99% accurate compared to our human labels on this new test data. We are then measuring precision for both 0 and 1 scores and are getting close to this aggregate average on each (99 precision on 0, 98 precision on 1)

    A) this seems wierd .. even though as i said we are testing the results independantly
    B) The plot of the summary shows the validation error rising above the training error significantly at about epoch 30, we are training up to epoch 100 with 81 LSTM units.

    When we train to training and validation error convergance we end up with a markedly less accurate model

    Wondering about your thoughts on this?



    • Jason Brownlee March 6, 2019 at 7:51 am #

      Wow, tensorflow.js in production? Why not the Python version?

      Perhaps the validation dataset is small or not representative of the training dataset and in turn the results are misleading?

      Also, loss is the best measure of overfitting, accuracy often remains flat with an overfit model in my experience.

  39. Johan Ericson March 7, 2019 at 7:27 pm #

    Hi Jason!

    I’m trying to apply this to a code similar the one in your electric power prediction tutorial.
    But when I try to use the validation_split I get nothing in the history, it contains nothing. history.keys is empty.
    the shape of my training data is (1.10.3), can this be the reason for my problems? That “the first shape” is 1 so it cannot be split?

    Thank you for awesome tutorials!

    • Jason Brownlee March 8, 2019 at 7:46 am #

      It is challenging to use a validation dataset with time series because we often use walk-forward validation.

      You might need to run the validation evaluation manually and build your own graph.

  40. Kazim Serte May 3, 2019 at 4:21 am #

    Hello, Thanks for sharing. Can I ask you how to detect overfitting on Logistic regression and Knn ? There is much information about how to avoid overfitting but I couldn’t find any proper information which explains how to detect overfitting?

    • Jason Brownlee May 3, 2019 at 6:23 am #

      Yes, you could calculate model performance on a validation dataset, e.g. a learning point instead of a learning curve.

      Perhaps average results over multiple folds.

  41. Hariom July 13, 2019 at 12:09 am #

    Is there any numerical metric to do the same rather than looking at the plots ?

  42. Jose July 18, 2019 at 11:27 am #

    Hi Jason,

    I was just wondering if the graphs described in the website (to diagnose under fitting and over fitting) also apply to other ML models such as multilayer perceptrons … Thanks !

  43. Ponraj July 23, 2019 at 1:14 am #

    Hi Jason,

    I have trained my LSTM model (Single time) and observed some good results during prediction.
    please see the link below related to Loss & Accuracy

    Link 1 :

    After that based on your post I have tried my LSTM model for Multiple Runs up to 5 iteration. (lstm’s are Stochastic)

    please see the link for multiple run

    Questions :
    What can i understand from the multiple run image ?
    Whether my model is not good ?
    If not, how can i improve it kindly share some blog to improve the model in such cases ?

    • Jason Brownlee July 23, 2019 at 8:06 am #

      Looks a little overfit, perhaps try regularization.

      See here for ideas:

      • Ponraj July 23, 2019 at 10:38 pm #

        Hello Jason,
        I have used 2 Drop out layers of 10%, just to regularize my model. Since I have only 2200 parameters in my model, the usage of drop outs results only limited amount of data to be learnt.

        So on avoiding the drop out, i can get my loss & val_loss follows a pattern.

        please see the image in below link…

        Kindly comment whether the process i have followed is correct

        • Jason Brownlee July 24, 2019 at 7:54 am #

          I recommend testing a suite of interventions to see what works best for your specific problem.

          Also your link does not work.

  44. lotfi toumi September 2, 2019 at 1:54 am #

    hi can you text me i have plot dont like this when i try to train my model can you help me

Leave a Reply