How to Update LSTM Networks During Training for Time Series Forecasting

Last Updated on

A benefit of using neural network models for time series forecasting is that the weights can be updated as new data becomes available.

In this tutorial, you will discover how you can update a Long Short-Term Memory (LSTM) recurrent neural network with new data for time series forecasting.

After completing this tutorial, you will know:

  • How to update an LSTM neural network with new data.
  • How to develop a test harness to evaluate different update schemes.
  • How to interpret the results from updating LSTM networks with new data.

Discover how to build models for multivariate and multi-step time series forecasting with LSTMs and more in my new book, with 25 step-by-step tutorials and full source code.

Let’s get started.

  • Updated Apr/2017: Added the missing update_model() function.
  • Updated Apr/2019: Updated the link to dataset.
How to Update LSTM Networks During Training for Time Series Forecasting

How to Update LSTM Networks During Training for Time Series Forecasting
Photo by Esteban Alvarez, some rights reserved.

Tutorial Overview

This tutorial is divided into 9 parts. They are:

  1. Shampoo Sales Dataset
  2. Experimental Test Harness
  3. Experiment: No Updates
  4. Experiment: 2 Update Epochs
  5. Experiment: 5 Update Epochs
  6. Experiment: 10 Update Epochs
  7. Experiment: 20 Update Epochs
  8. Experiment: 50 Update Epochs
  9. Comparison of Results


This tutorial assumes you have a Python SciPy environment installed. You can use either Python 2 or 3 with this example.

This tutorial assumes you have Keras v2.0 or higher installed with either the TensorFlow or Theano backend.

This tutorial also assumes you have scikit-learn, Pandas, NumPy, and Matplotlib installed.

If you need help setting up your Python environment, see this post:

Shampoo Sales Dataset

This dataset describes the monthly number of sales of shampoo over a 3-year period.

The units are a sales count and there are 36 observations. The original dataset is credited to Makridakis, Wheelwright, and Hyndman (1998).

The example below loads and creates a plot of the loaded dataset.

Running the example loads the dataset as a Pandas Series and prints the first 5 rows.

A line plot of the series is then created showing a clear increasing trend.

Line Plot of Shampoo Sales Dataset

Line Plot of Shampoo Sales Dataset

Next, we will take a look at the LSTM configuration and test harness used in the experiment.

Need help with Deep Learning for Time Series?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Experimental Test Harness

This section describes the test harness used in this tutorial.

Data Split

We will split the Shampoo Sales dataset into two parts: a training and a test set.

The first two years of data will be taken for the training dataset and the remaining one year of data will be used for the test set.

Models will be developed using the training dataset and will make predictions on the test dataset.

The persistence forecast (naive forecast) on the test dataset achieves an error of 136.761 monthly shampoo sales. This provides an acceptable lower bound of performance on the test set.

Model Evaluation

A rolling-forecast scenario will be used, also called walk-forward model validation.

Each time step of the test dataset will be walked one at a time. A model will be used to make a forecast for the time step, then the actual expected value from the test set will be taken and made available to the model for the forecast on the next time step.

This mimics a real-world scenario where new Shampoo Sales observations would be available each month and used in the forecasting of the following month.

This will be simulated by the structure of the train and test datasets.

All forecasts on the test dataset will be collected and an error score calculated to summarize the skill of the model. The root mean squared error (RMSE) will be used as it punishes large errors and results in a score that is in the same units as the forecast data, namely monthly shampoo sales.

Data Preparation

Before we can fit an LSTM model to the dataset, we must transform the data.

The following three data transforms are performed on the dataset prior to fitting a model and making a forecast.

  1. Transform the time series data so that it is stationary. Specifically, a lag=1 differencing to remove the increasing trend in the data.
  2. Transform the time series into a supervised learning problem. Specifically, the organization of data into input and output patterns where the observation at the previous time step is used as an input to forecast the observation at the current time step
  3. Transform the observations to have a specific scale. Specifically, to rescale the data to values between -1 and 1 to meet the default hyperbolic tangent activation function of the LSTM model.

These transforms are inverted on forecasts to return them into their original scale before calculating an error score.

LSTM Model

We will use an LSTM model with 1 neuron fit for 500 epochs.

A batch size of 1 is required as we will be using walk-forward validation and making one-step forecasts for each of the final 12 months of data.

A batch size of 1 means that the model will be fit using online training (as opposed to batch training or mini-batch training). As a result, it is expected that the model fit will have some variance.

Ideally, more training epochs would be used (such as 1000 or 1500), but this was truncated to 500 to keep run times reasonable.

The model will be fit using the efficient ADAM optimization algorithm and the mean squared error loss function.

Experimental Runs

Each experimental scenario will be run 10 times.

The reason for this is that the random initial conditions for an LSTM network can result in very different performance each time a given configuration is trained.

Let’s dive into the experiments.

Experiment: No Updates

In this first experiment, we will evaluate an LSTM trained once and reused to make a forecast for each time step.

We will call this the ‘no updates model‘ or the ‘fixed model‘ as no updates will be made once the model is first fit on the training data. This provides a baseline of performance that we would expect experiments that make modest updates to the model to outperform.

The complete code listing is provided below.

Running the example stores the RMSE scores calculated on the test dataset using walk-forward validation. These are stored in a file called experiment_fixed.csv for later analysis. A summary of the scores is printed, shown below.

The results suggest an average performance that outperforms the persistence model showing a test RMSE of 109.565465 compared to 136.761 monthly shampoo sales for persistence.

Next, we will start looking at configurations that make updates to the model during the walk-forward validation.

Experiment: 2 Update Epochs

In this experiment, we fit the model on all of the training data, then update the model after each forecast during the walk-forward validation.

Each test pattern used to elicit a forecast in the test dataset is then added to the training dataset and the model is updated.

In this case, the model is fit for an additional 2 training epochs before making the next forecast.

The same code listing as was used in the first experiment is used. The changes to the code listing are shown below.

Running the experiment saves the final test RMSE scores in “experiment_update_2.csv” and prints summary statistics of the results, listed below.

Experiment: 5 Update Epochs

This experiments repeats the above update experiment and trains the model for an additional 5 epochs after each test pattern is added to the training dataset.

Running the experiment saves the final test RMSE scores in “experiment_update_5.csv” and prints summary statistics of the results, listed below.

Experiment: 10 Update Epochs

This experiments repeats the above update experiment and trains the model for an additional 10 epochs after each test pattern is added to the training dataset.

Running the experiment saves the final test RMSE scores in “experiment_update_10.csv” and prints summary statistics of the results, listed below.

Experiment: 20 Update Epochs

This experiments repeats the above update experiment and trains the model for an additional 20 epochs after each test pattern is added to the training dataset.

Running the experiment saves the final test RMSE scores in “experiment_update_20.csv” and prints summary statistics of the results, listed below.

Experiment: 50 Update Epochs

This experiments repeats the above update experiment and trains the model for an additional 50 epochs after each test pattern is added to the training dataset.

Running the experiment saves the final test RMSE scores in “experiment_update_50.csv” and prints summary statistics of the results, listed below.

Comparison of Results

In this section, we compare the results saved from the previous experiments.

We load each of the saved results, summarize the results with descriptive statistics, and compare the results using box and whisker plots.

The complete code listing is provided below.

Running the example first calculates and prints descriptive statistics for each of the experimental results.

If we look at the average performance, we can see that the fixed model provides a good baseline of performance, but we see that a modest number of update epochs (20 and 50) produce worse test set RMSE on average.

We see that a small number of update epochs result in better overall test set performance, specifically 2 epochs followed by 5 epochs. This is encouraging.

A box and whisker plot is also created that compares the distribution of test RMSE results from each experiment.

The plot highlights the median (green line) as well as the middle 50% of the data (box) for each experiment. The plot tells the same story as the average performance, suggesting that a small number of training epochs (2 or 5 epochs) result in the best overall test RMSE.

The plot shows a rise in test RMSE as the number of updates increases to 20 epochs, then back down again for 50 epochs. This might be a sign of significant further training improving the model (11 * 50 epochs) or an artifact of the small number of repeats (10).

Box and Whisker Plots Comparing the Number of Update Epochs

Box and Whisker Plots Comparing the Number of Update Epochs

It is important to point out that these results are specific to the model configuration and this dataset.

It is hard to generalize these results beyond this specific example, although these experiments do provide a framework for performing similar experiments on your own predictive modeling problem.


This section lists ideas for extensions to the experiments in this section.

  • Statistical Significance Tests. We could calculate pairwise statistical significance tests, such as the Student t-test, to see if the differences between the means in the populations of results are statistically significant or not.
  • More Repeats. We could increase the number of repeats from 10 to 30, 100, or more in an attempt to make the findings more robust.
  • More Epochs. The base LSTM model was only fit for 500 epochs with online training and it is believed that additional training epochs will result in a more accurate baseline model. The number of epochs was cut to decrease experiment run time.
  • Compare to more Epochs. The results for experiments that update the model should be compared directly to experiments of a fixed model that uses the same number of overall epochs to see if adding the additional test patterns to the training dataset makes any noticeable difference. For example, 2 update epochs for each test pattern could be compared to a fixed model trained for 500 + (12-1) * 2) or 522 epochs, an update model 5 compared to a fixed model fit for 500 + (12-1) * 5) or 555 epochs, and so on.
  • Completely New Model. Add an experiment where a new model is fit after each test pattern is added to the training dataset. This was attempted, but the extended run time prevented the results being collected prior to finalizing this tutorial. This would be expected to provide an interesting point of comparison to the update and fixed models.

Did you explore any of these extensions?
Report your results in the comments; I’d love to hear what you discovered.


In this tutorial, you discovered how to update an LSTM network as new data becomes available for time series forecasting in Python.

Specifically, you learned:

  • How to design a systematic set of experiments to explore the effect of updating LSTM models.
  • How to update an LSTM model as new data becomes available.
  • That updates to an LSTM model can result in a more effective predictive model, but careful calibration is required on your forecast problem.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning models for Time Series Today!

Deep Learning for Time Series Forecasting

Develop Your Own Forecasting models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Time Series Forecasting

It provides self-study tutorials on topics like:
CNNs, LSTMs, Multivariate Forecasting, Multi-Step Forecasting and much more...

Finally Bring Deep Learning to your Time Series Forecasting Projects

Skip the Academics. Just Results.

See What's Inside

121 Responses to How to Update LSTM Networks During Training for Time Series Forecasting

  1. Rohit April 14, 2017 at 8:39 am #

    Hey Jason,

    Your tutorials and experiments are to the point and well explained. Thanks for sharing and Keep up the good work.

  2. Leo April 14, 2017 at 12:10 pm #

    A bit off-topic. Does LSTM perform better than the usual time series methods or machine learning methods?

    • Jason Brownlee April 15, 2017 at 9:32 am #

      Ouch. It depends, like any algorithm.

      See the no free lunch theorem:

      The real question is under what circumstances should we use LSTMs over classical methods like ARIMA.

      I hope to answer that in my new book. Generally, if you want to easily handle multivariate input, multi-step output, and non-linear relationships in the data.

  3. Kunpeng Zhang April 14, 2017 at 12:22 pm #

    Hi Jason,
    Nice post. I am working on LSTM. I’d like add two layers for each variable with different weight. Is it possible?
    Could you have me any heads up?

    • Jason Brownlee April 15, 2017 at 9:33 am #

      Sorry, not sure what you mean two layers for each variable with different weight.

      Perhaps you could elaborate?

      • Kunpeng Zhang April 15, 2017 at 12:41 pm #

        Sorry, I did not make my point.
        Current, I have two input and I know that one of the input is more important than another. I’d like to give importance to it when training my model. For what I mean, I try to give each of the two input a different weight in order to make my forecasting more precise.
        Is it possible?
        Thank you for your time.

        • Jason Brownlee April 16, 2017 at 9:23 am #

          Generally, no. We prefer to let the algorithm learn the importance of the inputs.

          • Kunpeng Zhang April 18, 2017 at 12:46 pm #

            Thank you for your advice. Got it.

    • Kamal May 11, 2019 at 5:20 am #

      Hi. Very interesting subject. Thank you.

      What do mean by update 2, 20, 50 epoch?

  4. dr s kotrappa April 14, 2017 at 5:22 pm #

    Very nice post Thank you Jason

  5. Cristian April 15, 2017 at 12:09 am #

    Hey Jason, that’s really an interesting post.

    By the way, during experiment 2, row 24, you use an update_model never defined before.


    • Jason Brownlee April 15, 2017 at 9:38 am #

      Opps, sorry. I’ve added the missing function. Thanks.

  6. Lukasz Jastrzebski April 15, 2017 at 1:57 am #

    Hi Jason,

    I noticed that update_model function is not defined, do you mind adding it to the listings?

  7. Shovon Sengupta April 15, 2017 at 8:28 am #

    It is indeed a nice post. But how do you generate forward forecast using LSTM here?

    • Jason Brownlee April 15, 2017 at 9:41 am #

      What do you mean exactly?

      Finalize your model and call model.predict().

  8. Sam Lai April 17, 2017 at 12:35 pm #

    heartfelt gratitude to this excellent post!

  9. Tom June 3, 2017 at 6:04 am #

    How is the learning rate factored in with each .fit ? My understanding was that it restarts with each each “update”, making online learning difficult in this case – is that correct?

    (not saying anything is correct or incorrect, just trying to understand)

    • Jason Brownlee June 3, 2017 at 7:27 am #

      In this case we use Adam where each “parameter” (weight) has its own learning rate. See here:

      • Tom June 7, 2017 at 2:19 am #

        And it preserves the decay / rate for each weight from the previous “update”?

        • David May 21, 2020 at 5:16 pm #

          Hi Jason, very interesting! Thanks!

          Is it possible to perform online training keeping the last state of the previous fitting (learning rate, momentum, etc)? I believe that could bring different results.


          • Jason Brownlee May 22, 2020 at 6:04 am #

            You’re welcome!

            You can keep the last state. Not sure about the state of the SGD. It may not be worth it. Perhaps experiment and see if it helps.

  10. Fernando June 9, 2017 at 10:46 pm #

    Thank you very much Jason. These LSTM posts are amazing!

    I have one question about the update. Would it make a difference to make the update not using the entire training + last test_scaled[i, 0:-1], test_scaled[i, -1], but just a smaller part on the train and the slice? For example, 1/3 last training samples + test. What do you think?

  11. Kim Miller June 30, 2017 at 4:17 am #

    Trying to understand your batch size of 1, i.e. “A batch size of 1 is required as we will be using walk-forward validation and making one-step forecasts … the model will be fit using online training (as opposed to batch training or mini-batch training).”

    First, we’re doing two fits: An initial fit and then an update fit. Seems our batch size could be larger for the initial fit?

    But second, even for that update fit, we’re not fitting just one observation alone, but rather refitting the entire training set + 1 more observation.

    My dataset is 17,000 observations, making training time quite long. Just looking for ideas to speed it up.

  12. Lucas August 6, 2017 at 9:10 pm #

    Hi Jason,

    Wonderful post.

    However, when I try to run the update for the 2 epochs it seems to get stuck in an infinite loop. It doesn’t finish. Maybe there is a mistake in the update function? What do you think?

    • Jason Brownlee August 7, 2017 at 8:45 am #

      Perhaps reduce the amount of data and see if it makes a difference – it could be a machine speed or RAM thing?

  13. Saurabh Agrawal August 10, 2017 at 2:01 am #

    Hi Jason,
    Thanks for the wonderful post.

    I am a newbie to Recurrent Neural Networks and have a naive question regarding the cell state updates in experiment 0:

    You said that the model would not be updated once trained on the training set in expt. 0. To me, this means that none of the model parameters learnt after training lstm on training data (first 24 months) would be changed while making predictions for next 12 months. However, I am wondering whether the cell state of LSTM be also fixed or is allowed to be continuously updated as we encounter the data points in testing set? That is, while making prediction at the 30th month, would the cell state of LSTM be updated based on the inputs it received for 25th-29th months? Or would it be fixed to what was obtained after training LSTM on first 24 months?

    My intuition says that it should be continuously updated as we receive the data points from testing set. However, I could not see it happening explicitly in the code, unless it happens internally in lstm model?

    • Jason Brownlee August 10, 2017 at 7:00 am #

      THe cell state is updated as we move through the data, but the cell state is also reset at the end of each batch, or explicitly when we all reset_states() on the model.

  14. Rahul October 9, 2017 at 9:07 pm #

    How i retrain my model with new dataset .and append the result with previous result?

  15. Devakar Kumar Verma November 28, 2017 at 8:44 pm #

    Hi Jason,
    At time of updation of model, why it is required to reset the cell state? As per my understanding and intuition it should hold previous cell state for better efficiency in time series modelling.

    • Jason Brownlee November 29, 2017 at 8:22 am #

      Generally, the end of a sample or end of batch is a good time to clear state as it is often not relevant to the next sample or state.

      If it is, don’t reset the state.

      Experiment and see what works best for your data.

  16. Francisco December 12, 2017 at 8:38 am #

    Hi Jason,

    In your statement:

    Transform the observations to have a specific scale. Specifically, to rescale the data to values between -1 and 1 to meet the default hyperbolic tangent activation function of the LSTM model.

    Is this always the case? If we opt to use Relu as the activation function, is it best practice to always keep the range of the series 0< ?


    • Jason Brownlee December 12, 2017 at 4:06 pm #

      No, today I would recommend rescaling data to 0-1 for LSTMs.

      I also do not recommend changing the transfer functions within LSTM units.

      • Neel June 4, 2019 at 11:25 pm #

        Hi Jason,

        In some of your earlier posts (2017) with LSTM, you have scaled using MixMaxScaler between -1 and 1 and not given any activation in the first .fit line allowing the default Tahn activation

        However, in other set of newer posts (early 2018 onwards), you have given activiation=relu in the first line of .fit command and not scaled the data.

        Based on your comment above, do you recommend to use MixMaxScaler (0,1) and RELU together while fitting the model for an LSTM?

  17. Francisco January 3, 2018 at 6:16 pm #

    Hi Jason,

    What is the reason behind multiple updates?

    “In this case, the model is fit for an additional 2 training epochs before making the next forecast.”

    It doesn’t make sense to me. Shouldn’t it only be 1 training epoch, for the “next-time step”?


    • Jason Brownlee January 4, 2018 at 8:08 am #

      Learning in neural networks is an iterative process of reducing error. We cannot know the right number of iterations other than by testing.

      • Francisco Mendoza January 11, 2018 at 7:09 am #

        That makes sens, thank you! But just throwing it out there, shouldn’t it be the same number of iterations as the initial model? To keep things uniform?

        • Francisco Mendoza January 11, 2018 at 7:19 am #

          where updates is actually nb_epochs in your fit_lstm().

        • Jason Brownlee January 12, 2018 at 5:47 am #

          Not if we are using the trained weights as a starting point. Something has already been learned and we are refining it.

          That being said, always test updated models vs new models.

          • Francisco Mendoza January 24, 2018 at 7:33 am #

            Thank you for your continued reply. I am learning a lot.

            One more question, if you don’t mind.

            Am I wrong when I say that “the model learns the sequential dependencies and patterns of the test data set as it goes through the test data to predict when the model is stateful”?

          • Jason Brownlee January 24, 2018 at 9:59 am #

            It does this for stateful and stateless models.

            The difference is that stateful models give you control over when internal state is reset.

          • Francisco Mendoza January 25, 2018 at 6:28 am #

            So, in that sense, there is no need to update the model by refitting, since as it goes through each test step, it learns new patterns. Is that correct?

          • Jason Brownlee January 25, 2018 at 9:08 am #

            No, updating refers to updating the model weights.

            Think of the internal state as variables that the model can use as it is making predictions.

  18. Ashima January 18, 2018 at 2:22 am #

    Hi Jason.
    Is it possible that rather than training the lstm network all over again with all the previous data points, we can just fit the model to the new data point as well since for the above approach it is taking a very long time to retrain itself making it extremely slow for any real time work. It is taking hours for a dataset of around 1000 values even on a fairly powerful system.

    • Jason Brownlee January 18, 2018 at 10:12 am #

      Sure, try many approaches and see what works best for your specific data and project requirements.

      • Ashima January 18, 2018 at 5:19 pm #

        But herein the entire LSTM is being trained all over again. But is there any way to simply fit the existing weights and model to the new data point only since that may be faster than to process al the datapoints again.

        • Jason Brownlee January 19, 2018 at 6:28 am #

          We are only updating the existing weights each time.

          • Ashima February 2, 2018 at 10:40 pm #

            ok.glad to hear that.
            Thanks a lot once again Jason.

  19. MLT March 15, 2018 at 8:54 am #

    Hi Jason,

    Thanks a lot for providing the link of this page.

    In real world, we cannot do difference, supervise conversion, scale for the new data, like test set of the example. Does it means that we have to keep the raw data and merge it with new data, and then repeat difference, supervise conversion and scale procedure every time slot?

    If so, it costs a lot of resource of the system. If there is better way to update model with the new data without retraining everything.

    • Jason Brownlee March 15, 2018 at 2:45 pm #

      You do not need to rebuild the whole model, you can refit only on new/recent data.

      • MLT March 16, 2018 at 8:23 am #

        Thanks for replying, Jason.

        Continue with Shampoo Sales example. After receiving the latest month data, if it could be used to fit model directly? It is a single value. Thereby difference and scale are not feasible., y, nb_epoch=1, batch_size=batch_size, verbose=0, shuffle=False)

        Does it also mean model.reset_states() should not execute for the whole online training procedure, after the offline training?

        • Jason Brownlee March 16, 2018 at 2:24 pm #

          Why? The prior ob is required to difference, and the min/max can be estimated from training data.

          Resetting state may or may not impact the model. It is worth testing whether warming up internal state matters or not for a given problem.

  20. Drew March 29, 2018 at 11:16 pm #

    Hi Jason — correct me if I’m wrong here (and also my apologies if you’ve answered this elsewhere), but are you resetting the state after every single batch (of size 1)? Doesn’t that mean we are more or less losing the effectiveness of the LSTM cell? We can’t learn anything about dynamics of anything earlier in time than time t-1 if we reset the state after only seeing one step…

    Thanks for the post!

    • Jason Brownlee March 30, 2018 at 6:39 am #

      Yes, mostly, we loose BPTT, but the LSTM still has memory across the samples.

  21. Mick Andul April 4, 2018 at 6:30 am #

    Hello Jason, first of all Thanks for sharing your knowledge with all of us.

    I already got trained model and want to re-train it, with new data, (data_original+data_new). Problem is, if I load the model and want to continue in training, it seems to start from scratch. This happens even when I use exactly the same setup and data, which were used for training original model. Could you please give me a hint what I am doing wrong?

    def update_model(model, train, batch_size, updates):
    X, y = train[:, :-n_seq], train[:, -n_seq:]
    X = X.reshape(X.shape[0], n_lag, n_features)
    for i in range(updates):, y, nb_epoch=1, batch_size=batch_size, verbose=0, shuffle=False)
    return model

    model = load_model(“multivariete_model.h5”)
    update_model(model, train, 1, updates)

    • Jason Brownlee April 5, 2018 at 5:41 am #

      You must save the weights to file, then later you can reload them and continue training.

  22. Suraj Kumar April 13, 2018 at 7:24 pm #

    Can we apply Grid search method for hyperparameter optimization in LSTM model?

  23. Austin October 11, 2018 at 6:35 pm #

    Hi, Jason, I have some questions about online learning of LSTM network for time series forecasting.
    Considering the time cost due to LSTM network training, we hope to retrain the previous model when processing the next time point data.
    So, we need to do two fits: an initial fit and then an update fit.
    For the initial fit, we choose a larger batch-size for the stateless LSTM network. And for the update fit, we choose batch-size=1 for retraining the model based on the initial fit.
    Is there any problem with this idea?

    • Jason Brownlee October 12, 2018 at 6:36 am #

      I recommend testing and use model performance to guide whether it is a good idea or not.

      • Leo Jingbo April 14, 2020 at 8:53 pm #

        I would also like to ask you very much. I am a rookie in machine learning. Only update tne model for the new data (online-learning). Is there any way to program it??

  24. Kiko November 19, 2018 at 1:43 am #

    Hi Jason,

    I am not sure why I keep getting the following error when executing the Experiment: No update.

    127 results = DataFrame()
    128 # run experiment
    –> 129 results[‘results’] = experiment(repeats, series)
    130 # summarize results
    131 print(results.describe())

    AttributeError: ‘NoneType’ object has no attribute ‘values’.

    This code is the same as the one in which just worked perfectly when I tried to follow the post in its entirety.

    I restarted the kernel for Update LSTM experiment no update many times… still not helping…

    Thanks in advance!

  25. StatTester January 15, 2019 at 7:18 pm #

    Hi Jason,

    just on a quick note: you write “Statistical Significance Tests. We could calculate pairwise statistical significance tests, such as the Student t-test, to see if the differences between the means in the populations of results are statistically significant or not.”

    One would probably test the significance of differences between the means by performing an Anova, I guess?


  26. Savan Gowda January 18, 2019 at 12:31 am #

    Hi Jason,

    I think this is not revelant to this topic, Sorry!
    But anyways I wanted to know if you have a post explaining about incremental sequence learning?


  27. wonde January 18, 2019 at 9:22 pm #

    Hi jason,
    is it possible to train lstm with swarm intelligence algorithms such as aco and pso ?

    • Jason Brownlee January 19, 2019 at 5:40 am #

      Perhaps, but I would expect it to be less efficient than SGD.

  28. Utkarsh February 13, 2019 at 6:57 pm #

    Hi Jason,
    How can i make an LSTM model work if I dont have a large training dataset? Moreover, if the time series data is updating as lets say – the data i am tracking gets updated once in a month and I get a single new data point in a month , in order to predict one step ahead of it , I need to train the model with the new data point again?

  29. studentt April 20, 2019 at 8:10 am #

    hi Jason ,
    How de we make prediction for future values ?

  30. Neel June 4, 2019 at 11:28 pm #

    Hi Jason,

    “For example, 2 update epochs for each test pattern could be compared to a fixed model trained for 500 + (12-1) * 2) or 522 epochs, an update model 5 compared to a fixed model fit for 500 + (12-1) * 5) or 555 epochs, and so on.”

    From the above example what is 12?

    Say I have 10 days of data. Each day has 375 rows.
    For the first model, I run 100 epochs and subsequent fits of 9 model I select 2 epochs. Then for Model 2 total epochs would be

    100 + (375-1)*2?

  31. AI June 13, 2019 at 7:27 am #

    Hey Jason,
    This is a great work! I really learned a lot from your post. Thank you very much.
    However, I a, wondering if you have tried to forecast your test data by using the predicted values instead of the available test data?

    In other word, in “Experiment: 2 Update Epochs/ Forecast test dataset section in line 35, you use the test data to predict next time step.that is okay for initializing the forecasting, but my question is why you do not use the predicted values that generated from previews prediction step to update the model and make forecasting for next time step? I think that is more realistic.

    • Jason Brownlee June 13, 2019 at 2:32 pm #

      This post may help you to make a forecast:

      • AI June 14, 2019 at 12:02 am #

        I think this post is different than what I asked about.
        My question is especially about line 35 in Experiment: 2 Update Epochs/ Forecast test dataset section where you used the available test data to update the model and make new prediction.
        My question is why you did not use the prediction value to update the model(retrain it) and later compare the forecast test with the real test data? (just assume you do not have the test data and make prediction then compare with the available test data)
        I am asking this because I want to see if there is way to make prediction for 10 or 20 time step ahead and that is what most investors would like to see before making any big decision!
        What do you think is the best way to deal with that?

        Again, the goal of my question is to learn NOT to criticize your work!

        I’m looking forward to your feedback
        Thank you

        • Jason Brownlee June 14, 2019 at 6:46 am #

          Perhgaps because I was demonstrating how to update models, that is the focus of the post. It was years ago.

  32. yoni June 19, 2019 at 5:38 am #

    Im sure you get this question asked a lot, but I must know. Are you related to Marquees Brownlee?

  33. Dimitris June 22, 2019 at 8:55 pm #

    Hi Jason,

    Thank you for the nice tutorial. I am new to RNNs, and here’s the (possibly dumb) question that I have:

    If the aim is to train an LSTM in an incremental fashion, why do we need training data? Can’t we just start from complete cold-start and incrementally train the model one observation at a time, using the prediction for the next time step for evaluation (I think this is called prequential evaluation)?

    • Jason Brownlee June 23, 2019 at 5:33 am #

      Not sure I follow, perhaps you can elaborate?

      Generally, we are learning a mapping from inputs to outputs, like all ML algorithms, except in this case the input is a sequence.

      We cannot make accurate predictions until we learn this mapping from historical observations.

      • Dimitris June 24, 2019 at 9:32 pm #

        What I meant was evaluating and training the model in purely online, reinforcement learning-type of setting. The training data would be revealed one data point at a time. The next data point is used as the ground truth for prediction, and then fed into the algorithm for an incremental update. So you basically start with no data, and there is no train-test split. Do you think an RNN can be trained in that way?

        • Jason Brownlee June 25, 2019 at 6:20 am #

          It can be done, it is called online learning, but is not common with deep learning, that requires the opposite – lots of data and training prior to use.

        • Jason J January 21, 2020 at 11:02 pm #

          a few points about online learning from scratch:
          – if you have no historical data, at what point in time will you be able to trust the prediction? How will you know when you arrive at success?

          – in order to tune your model you need to alliteratively retrain with different parameters. this can’t be done with no data, so really you have to wait until some point in the future when you have accumulated enough data to re-run through tests with different settings, and run statistics on the results to find the best tuned model. so really online from scratch doesn’t exist!

          • Jason Brownlee January 22, 2020 at 6:24 am #

            You need a hold out test set to evaluate the skill of a model.

            Same for evaluating model hyperparameters.

            Once a model has been evaluated/configured, it can be fit on all available data and used to make predictions.

  34. Ethan July 19, 2019 at 7:45 pm #

    It would not be appropriate to scale the series after it has been transformed into a supervised learning problem as each column would be handled differently, which would be incorrect.

    But why do you scale data after it has been transformed into a supervised learning problem in the code of this post?

    • Jason Brownlee July 20, 2019 at 10:50 am #

      I agree.

      I would not do it that way if I was to write the post again.

  35. Radhouane Baba September 26, 2019 at 6:47 pm #

    Hi Jason,

    I have some simple questions;
    I am using LSTM to predict load forecast or 1 day (24 values ahead).
    After trying to predict only 1 value and append it to history and predict the following 23 values same way (result is not good), i now want to use dense(24), predicting at once 24 points.
    (Input 24×10 for example, output 24 Outputs at once).

    I tried for example splitting my input and output data in this method; for example 10 days before and the output is 1 day.
    The tesult is a shape of middle line (does not follow the real data), not good.. (maybe needs more epochs.. but now i had a new idea:

    Is it possible to split the Inputs so that the next sample is not not t+1, but t+24 (next day)?? Or Am i doing something wrong here?
    That means i want to try: t-24, t-48, t-72, …… so the model sees exactly 1 day shifts.
    Am i transgressing a rule?

    Second question:
    Another case:
    For my Cross-validation, i did this:
    Using a function i created to do incremental learning following this loop:
    (train from day 1 to 19-> predict day20 -> refit again the model using the real data for the next train (true data of the prediction day20-> predict day 21-> train using real day21-> predict day22 -> train using real day22-> predict day23 etc…)

    the model somehow forgets everything after refitting it one more time and gives me an output nearly the same as last train day..
    (predicted day 21 ist same as train day 20)
    (predicted day 22 ist same as train day 21)
    (predicted day 23 ist same as train day 22)

    Does fitting an LSTM model again let it forget everything it has learned from previous fits?
    If then, how can i do incremental learning for my cross validation?

    Thank you for your Website!

  36. Radhouane Baba September 27, 2019 at 7:25 pm #

    i mean, can the sliding window slide/move along the time series 24-time step at a time (not one-time step at a time)?

    • Radhouane Baba September 27, 2019 at 11:20 pm #

      i tried it and results are better and faster than sliding each 1-time step !

    • Jason Brownlee September 28, 2019 at 6:15 am #


  37. Fitus Maticus December 18, 2019 at 8:47 am #

    Hi, I am trying to teach an LSTM network a very basic substitution rule, lets say replace:
    a =>x
    b =>y
    c => w
    I am using this character level architecture which uses an LSTM but unfortunately even if I get very lost function i.e. less than1e-6, I am getting very poor results on the predictions. For this toy experiment, the output length should be always the same as the input length, like:
    abc => xyw

    But LSTM some times gives extra characters is this the nature of the LSTM?

  38. Nedisha February 13, 2020 at 7:00 am #

    Hello Sir, I have implemented the codes from I have already train the model and I am doing the prediction part. From this implementation, the guy used a sample text to predict the emotion, but me I want to use a dataset containg a dataframe of comment and I want the system to predict the result for each comment as well as creating a new column in the dataframe so as to input the predicted result in each row for their respective comments. Can you help me on how to do that? Please sir.

    \you can mail me if possible so that I can show you how my dataframe is and what output I am willing to get,.

    Thank you so much. I would be grateful if you could help me. Thank you sir.

    • Jason Brownlee February 13, 2020 at 1:20 pm #

      Sorry, I am not familiar with that code. Perhaps contact the author directly?

  39. Parag May 23, 2020 at 4:12 am #


    I’m trying to execute the “Experiment: No Updates” code mentioned above. However, getting the error “ValueError: setting an array element with a sequence.” on the line “results[‘results’] = experiment(repeats, series)”.
    Please help me to solve this.
    Thanks in advance

  40. graboosky May 30, 2020 at 7:50 pm #

    Nice, self explanatory. I wonder to ask how would you create you neural network if would like to train model with first six month, and “update-train” model with 12-18months data. Is it even possible?

    • Jason Brownlee May 31, 2020 at 6:21 am #

      I would encourage you to test a suite of different models and different data preparation methods and discover what works best for the prediction problem.

      This framework will help:

      • Lakshmi June 15, 2020 at 7:07 pm #

        Hi jason,
        Nice post, my training time takes more than an hour even for a small data. can you provide any suggestions.

        • Jason Brownlee June 16, 2020 at 5:36 am #

          Try a smaller model?
          Try less data?
          Try a faster machine?

          • Lakshmi June 19, 2020 at 10:11 pm #

            thanks for ur rply. you have mentioned batch size 1 in the initial fit this makes the training time more. i need to get one step prediction can i increase the batch size for intial fit

          • Jason Brownlee June 20, 2020 at 6:14 am #

            You can adapt the model to your problem any way you like.

  41. Sinara July 4, 2020 at 7:28 pm #

    Hi, first of all I’d like to thank you for your tutorials.
    However I have a question. I want to use LSTM for anomaly detection on not labelled time series data. What approach would you recommend me? Currently I’m trying to use similar approach as I used with other models (ARIMA and smoothing) – create a rolling forecast of the data (using the method suggested in this tutorial), calculate differences between actuals and predictions. Calculate moving average of the differences and label as a anomaly all distances with difference larger than the corresponding moving average value plus some multiple of standard deviation. Is this a good way to go?
    Thanks in advance for your reply.

    • Jason Brownlee July 5, 2020 at 7:02 am #

      You’re welcome!

      I recommend comparing a suite of diffrent methods in order to discover what works best for your specific dataset.

Leave a Reply