Multi-step Time Series Forecasting with Long Short-Term Memory Networks in Python

The Long Short-Term Memory network or LSTM is a recurrent neural network that can learn and forecast long sequences.

A benefit of LSTMs in addition to learning long sequences is that they can learn to make a one-shot multi-step forecast which may be useful for time series forecasting.

A difficulty with LSTMs is that they can be tricky to configure and it can require a lot of preparation to get the data in the right format for learning.

In this tutorial, you will discover how you can develop an LSTM for multi-step time series forecasting in Python with Keras.

After completing this tutorial, you will know:

  • How to prepare data for multi-step time series forecasting.
  • How to develop an LSTM model for multi-step time series forecasting.
  • How to evaluate a multi-step time series forecast.

Let’s get started.

Multi-step Time Series Forecasting with Long Short-Term Memory Networks in Python

Multi-step Time Series Forecasting with Long Short-Term Memory Networks in Python
Photo by Tom Babich, some rights reserved.

Tutorial Overview

This tutorial is broken down into 4 parts; they are:

  1. Shampoo Sales Dataset
  2. Data Preparation and Model Evaluation
  3. Persistence Model
  4. Multi-Step LSTM


This tutorial assumes you have a Python SciPy environment installed. You can use either Python 2 or 3 with this example.

This tutorial assumes you have Keras v2.0 or higher installed with either the TensorFlow or Theano backend.

This tutorial also assumes you have scikit-learn, Pandas, NumPy, and Matplotlib installed.

If you need help setting up your Python environment, see this post:

Next, let’s take a look at a standard time series forecasting problem that we can use as context for this experiment.

Shampoo Sales Dataset

This dataset describes the monthly number of sales of shampoo over a 3-year period.

The units are a sales count and there are 36 observations. The original dataset is credited to Makridakis, Wheelwright, and Hyndman (1998).

You can download and learn more about the dataset here.

The example below loads and creates a plot of the loaded dataset.

Running the example loads the dataset as a Pandas Series and prints the first 5 rows.

A line plot of the series is then created showing a clear increasing trend.

Line Plot of Shampoo Sales Dataset

Line Plot of Shampoo Sales Dataset

Next, we will take a look at the model configuration and test harness used in the experiment.

Data Preparation and Model Evaluation

This section describes data preparation and model evaluation used in this tutorial

Data Split

We will split the Shampoo Sales dataset into two parts: a training and a test set.

The first two years of data will be taken for the training dataset and the remaining one year of data will be used for the test set.

Models will be developed using the training dataset and will make predictions on the test dataset.

For reference, the last 12 months of observations are as follows:

Multi-Step Forecast

We will contrive a multi-step forecast.

For a given month in the final 12 months of the dataset, we will be required to make a 3-month forecast.

That is given historical observations (t-1, t-2, … t-n) forecast t, t+1 and t+2.

Specifically, from December in year 2, we must forecast January, February and March. From January, we must forecast February, March and April. All the way to an October, November, December forecast from September in year 3.

A total of 10 3-month forecasts are required, as follows:

Model Evaluation

A rolling-forecast scenario will be used, also called walk-forward model validation.

Each time step of the test dataset will be walked one at a time. A model will be used to make a forecast for the time step, then the actual expected value for the next month from the test set will be taken and made available to the model for the forecast on the next time step.

This mimics a real-world scenario where new Shampoo Sales observations would be available each month and used in the forecasting of the following month.

This will be simulated by the structure of the train and test datasets.

All forecasts on the test dataset will be collected and an error score calculated to summarize the skill of the model for each of the forecast time steps. The root mean squared error (RMSE) will be used as it punishes large errors and results in a score that is in the same units as the forecast data, namely monthly shampoo sales.

Persistence Model

A good baseline for time series forecasting is the persistence model.

This is a forecasting model where the last observation is persisted forward. Because of its simplicity, it is often called the naive forecast.

You can learn more about the persistence model for time series forecasting in the post:

Prepare Data

The first step is to transform the data from a series into a supervised learning problem.

That is to go from a list of numbers to a list of input and output patterns. We can achieve this using a pre-prepared function called series_to_supervised().

For more on this function, see the post:

The function is listed below.

The function can be called by passing in the loaded series values an n_in value of 1 and an n_out value of 3; for example:

Next, we can split the supervised learning dataset into training and test sets.

We know that in this form, the last 10 rows contain data for the final year. These rows comprise the test set and the rest of the data makes up the training dataset.

We can put all of this together in a new function that takes the loaded series and some parameters and returns a train and test set ready for modeling.

We can test this with the Shampoo dataset. The complete example is listed below.

Running the example first prints the entire test dataset, which is the last 10 rows. The shape and size of the train test datasets is also printed.

We can see the single input value (first column) on the first row of the test dataset matches the observation in the shampoo-sales for December in the 2nd year:

We can also see that each row contains 4 columns for the 1 input and 3 output values in each observation.

Make Forecasts

The next step is to make persistence forecasts.

We can implement the persistence forecast easily in a function named persistence() that takes the last observation and the number of forecast steps to persist. This function returns an array containing the forecast.

We can then call this function for each time step in the test dataset from December in year 2 to September in year 3.

Below is a function make_forecasts() that does this and takes the train, test, and configuration for the dataset as arguments and returns a list of forecasts.

We can call this function as follows:

Evaluate Forecasts

The final step is to evaluate the forecasts.

We can do that by calculating the RMSE for each time step of the multi-step forecast, in this case giving us 3 RMSE scores. The function below, evaluate_forecasts(), calculates and prints the RMSE for each forecasted time step.

We can call it as follows:

It is also helpful to plot the forecasts in the context of the original dataset to get an idea of how the RMSE scores relate to the problem in context.

We can first plot the entire Shampoo dataset, then plot each forecast as a red line. The function plot_forecasts() below will create and show this plot.

We can call the function as follows. Note that the number of observations held back on the test set is 12 for the 12 months, as opposed to 10 for the 10 supervised learning input/output patterns as was used above.

We can make the plot better by connecting the persisted forecast to the actual persisted value in the original dataset.

This will require adding the last observed value to the front of the forecast. Below is an updated version of the plot_forecasts() function with this improvement.

Complete Example

We can put all of these pieces together.

The complete code example for the multi-step persistence forecast is listed below.

Running the example first prints the RMSE for each of the forecasted time steps.

This gives us a baseline of performance on each time step that we would expect the LSTM to outperform.

The plot of the original time series with the multi-step persistence forecasts is also created. The lines connect to the appropriate input value for each forecast.

This context shows how naive the persistence forecasts actually are.

Line Plot of Shampoo Sales Dataset with Multi-Step Persistence Forecasts

Line Plot of Shampoo Sales Dataset with Multi-Step Persistence Forecasts

Multi-Step LSTM Network

In this section, we will use the persistence example as a starting point and look at the changes needed to fit an LSTM to the training data and make multi-step forecasts for the test dataset.

Prepare Data

The data must be prepared before we can use it to train an LSTM.

Specifically, two additional changes are required:

  1. Stationary. The data shows an increasing trend that must be removed by differencing.
  2. Scale. The scale of the data must be reduced to values between -1 and 1, the activation function of the LSTM units.

We can introduce a function to make the data stationary called difference(). This will transform the series of values into a series of differences, a simpler representation to work with.

We can use the MinMaxScaler from the sklearn library to scale the data.

Putting this together, we can update the prepare_data() function to first difference the data and rescale it, then perform the transform into a supervised learning problem and train test sets as we did before with the persistence example.

The function now returns a scaler in addition to the train and test datasets.

We can call this function as follows:

Fit LSTM Network

Next, we need to fit an LSTM network model to the training data.

This first requires that the training dataset be transformed from a 2D array [samples, features] to a 3D array [samples, timesteps, features]. We will fix time steps at 1, so this change is straightforward.

Next, we need to design an LSTM network. We will use a simple structure with 1 hidden layer with 1 LSTM unit, then an output layer with linear activation and 3 output values. The network will use a mean squared error loss function and the efficient ADAM optimization algorithm.

The LSTM is stateful; this means that we have to manually reset the state of the network at the end of each training epoch. The network will be fit for 1500 epochs.

The same batch size must be used for training and prediction, and we require predictions to be made at each time step of the test dataset. This means that a batch size of 1 must be used. A batch size of 1 is also called online learning as the network weights will be updated during training after each training pattern (as opposed to mini batch or batch updates).

We can put all of this together in a function called fit_lstm(). The function takes a number of key parameters that can be used to tune the network later and the function returns a fit LSTM model ready for forecasting.

The function can be called as follows:

The configuration of the network was not tuned; try different parameters if you like.

Report your findings in the comments below. I’d love to see what you can get.

Make LSTM Forecasts

The next step is to use the fit LSTM network to make forecasts.

A single forecast can be made with the fit LSTM network by calling model.predict(). Again, the data must be formatted into a 3D array with the format [samples, timesteps, features].

We can wrap this up into a function called forecast_lstm().

We can call this function from the make_forecasts() function and update it to accept the model as an argument. The updated version is listed below.

This updated version of the make_forecasts() function can be called as follows:

Invert Transforms

After the forecasts have been made, we need to invert the transforms to return the values back into the original scale.

This is needed so that we can calculate error scores and plots that are comparable with other models, like the persistence forecast above.

We can invert the scale of the forecasts directly using the MinMaxScaler object that offers an inverse_transform() function.

We can invert the differencing by adding the value of the last observation (prior months’ shampoo sales) to the first forecasted value, then propagating the value down the forecast.

This is a little fiddly; we can wrap up the behavior in a function name inverse_difference() that takes the last observed value prior to the forecast and the forecast as arguments and returns the inverted forecast.

Putting this together, we can create an inverse_transform() function that works through each forecast, first inverting the scale and then inverting the differences, returning forecasts to their original scale.

We can call this function with the forecasts as follows:

We can also invert the transforms on the output part test dataset so that we can correctly calculate the RMSE scores, as follows:

We can also simplify the calculation of RMSE scores to expect the test data to only contain the output values, as follows:

Complete Example

We can tie all of these pieces together and fit an LSTM network to the multi-step time series forecasting problem.

The complete code listing is provided below.

Running the example first prints the RMSE for each of the forecasted time steps.

We can see that the scores at each forecasted time step are better, in some cases much better, than the persistence forecast.

This shows that the configured LSTM does have skill on the problem.

It is interesting to note that the RMSE does not become progressively worse with the length of the forecast horizon, as would be expected. This is marked by the fact that the t+2 appears easier to forecast than t+1. This may be because the downward tick is easier to predict than the upward tick noted in the series (this could be confirmed with more in-depth analysis of the results).

A line plot of the series (blue) with the forecasts (red) is also created.

The plot shows that although the skill of the model is better, some of the forecasts are not very good and that there is plenty of room for improvement.

Line Plot of Shampoo Sales Dataset with Multi-Step LSTM Forecasts

Line Plot of Shampoo Sales Dataset with Multi-Step LSTM Forecasts


There are some extensions you may consider if you are looking to push beyond this tutorial.

  • Update LSTM. Change the example to refit or update the LSTM as new data is made available. A 10s of training epochs should be sufficient to retrain with a new observation.
  • Tune the LSTM. Grid search some of the LSTM parameters used in the tutorial, such as number of epochs, number of neurons, and number of layers to see if you can further lift performance.
  • Seq2Seq. Use the encoder-decoder paradigm for LSTMs to forecast each sequence to see if this offers any benefit.
  • Time Horizon. Experiment with forecasting different time horizons and see how the behavior of the network varies at different lead times.

Did you try any of these extensions?
Share your results in the comments; I’d love to hear about it.


In this tutorial, you discovered how to develop LSTM networks for multi-step time series forecasting.

Specifically, you learned:

  • How to develop a persistence model for multi-step time series forecasting.
  • How to develop an LSTM network for multi-step time series forecasting.
  • How to evaluate and plot the results from multi-step time series forecasting.

Do you have any questions about multi-step time series forecasting with LSTMs?
Ask your questions in the comments below and I will do my best to answer.

47 Responses to Multi-step Time Series Forecasting with Long Short-Term Memory Networks in Python

  1. Masum May 10, 2017 at 6:48 am #


    you are the best

    Did not had to wait for long. Asked for it in different blog few days back

    • Jason Brownlee May 10, 2017 at 8:53 am #

      I hope you find the post useful!

      • Masum May 10, 2017 at 9:59 am #

        I believe so. Things are getting deeper here.

        Will we get recursive LSTM MODEL for multi step forecasting soon?

        Will eagerly wait for that blog.


        • Jason Brownlee May 11, 2017 at 8:22 am #


          • Masum May 11, 2017 at 8:43 am #


            Hope to see that soon.

  2. jvr May 17, 2017 at 1:27 am #

    Thanks a lot for this post. I was trying to make this for my thesis since september, with no well results. But I’m having trouble: I’m not able to compile. Maybe you or someone who reads this is able to tell me why this happens: I’m getting the following error when running the code:

    The TensorFlow library wasn’t compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.

    The TensorFlow library wasn’t compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.

    The TensorFlow library wasn’t compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
    The TensorFlow library wasn’t compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.

    The TensorFlow library wasn’t compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.

    The TensorFlow library wasn’t compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.

    Obviously it has something to do with Tensorflow (I have read about this problem and I think its becase is not installed on source, but have no idea about how to fix it).

    Thank you in advance.

  3. Shamsul May 17, 2017 at 9:17 pm #


    Can we say that multiple output strategy ( avoiding, 2. Recursive, recursive hybrid strategies) have been used here ?

    Am I right ?

    • Jason Brownlee May 18, 2017 at 8:36 am #

      I think the LSTM has implemented a direct strategy.

  4. jinhua zhang May 18, 2017 at 11:26 am #

    Your article is very useful! I have a problem, if the data series are three-dimensional data, the 2th line is the put -in data,and the 3th line is the forecasting data(all include the train and test data ),Do they can run the” difference”and “tansform”?
    Thank you very much!

    • Jason Brownlee May 19, 2017 at 8:11 am #

      Great question.

      You may want to only make the prediction variable stationary. Consider perform three tests:

      – Model as-is
      – Model with output variable stationary
      – Model with all variables stationary (if others are non-stationary)

    • jvr May 21, 2017 at 10:21 pm #

      I have discovered how to do it by asking some people. The object series is actually a Pandas Series. It’s a vector of information, with a named index. Your dataset, however, contains two fields of information, in addition to the time series index, which makes it a DataFrame. This is the reason why the tutorial code breaks with your data.

      To pass your entire dataset to MinMaxScaler, just run difference() on both columns and pass in the transformed vectors for scaling. MinMaxScaler accepts an n-dimensional DataFrame object:

      ncol = 2
      diff_df = pd.concat([difference(df[i], 1) for i in range(1,ncol+1)], axis=1)
      scaler = MinMaxScaler(feature_range=(0, 1))
      scaled_values = scaler.fit_transform(diff_df)

      So, with this, we can use as many variables as we want. But now I have a big doubt.

      When the transform or dataset into a supervised learning problem, we have a distribution in columns as shown in

      I mean, for a 2 variables dataset as yours, we can set, for example, this values:


      so we will have a supervised dataset like this:

      var1(t-1) var2(t-1) var1(t) var2 (t) var1(t+1) var2 (t+1)

      so, if we want to train the ANN to forecast var2 (which is the target we want to predict) with the var1 as input and the previous values of var2 also as input, we have to separate them and here is where my doubt begins.

      In the part of the code:

      def fit_lstm(train, n_lag, n_seq, n_batch, nb_epoch, n_neurons):
      # reshape training into [samples, timesteps, features]
      X, y = train[:, 0:n_lag], train[:, n_lag:]
      X = X.reshape(X.shape[0], 1, X.shape[1])

      I think that if we want to define X, we should use:


      this means we are selecting this as X from the previous example:

      var1(t-1) var2(t-1)

      (number of lags*number of variables), so: X=train[:,0:1*2]=train[:,0:2]


      Y=train[:,n_lag*n_vars:] is the vector of ¿targets?

      the problem is that, on this way, we are selecting this as targets:

      var1(t) var2(t) var1(t+1) var2(t+1)

      so we are including var1 (which we don’t have the aim to forecast, just use as input).

      I would like to know if there is any solution to solve this in order to use the variable 1,2…n-1 just as input but not forecasting it.

      Hope this is clear :/

  5. jvr May 19, 2017 at 3:16 am #

    Thanks for the previous clarification. I have a dubt in relation to the section “fit network” in the code. I’m having some trouble trying to plot the training graph (validation vs training) in order to see if the network is or not overfitted, but due to the “model.reset_states()” sentence, i can only save the last loss and val_loss from de history sentence. Is there any way to solve this?

    thank you in advance 🙂

    • jvr May 19, 2017 at 3:45 am #

      I reply to myself, if someone is also interested.

      Just creating 2 list (or 1, but i see it more clear on this way) and returning then on the function. Then, outside, just plot them. I’m sorry for the question, maybe the answer is obvious, but I’m starting on python and I’m not a programmer.

      # fit network
      for i in range(nb_epoch):, y, epochs=1, batch_size=n_batch,shuffle=True, validation_split=val_split)

      return model,loss,val_loss

      # fit model
      model,loss,val_loss=fit_lstm(train, n_lag, n_seq, n_batch, n_epochs, n_neurons)

      pyplot.title(‘cross validation’)
      pyplot.legend([‘training’, ‘test’], loc=’upper left’)

    • Jason Brownlee May 19, 2017 at 8:22 am #

      History is returned when calling

      We are only fitting one epoch at a time, so you can retrieve and accumulate performance each epoch in the epoch loop then do something with the data (save/graph/return it) at the end of the loop.

      Does that help?

      • jvr May 19, 2017 at 9:17 pm #

        It does help, thank you.

        Now I’m trying to find a way to make the training process faster and reduce RMSE, but it’s pretty dificult (the idea is to make results better than in the NARx model implemented in the Matlab Neural Toolbox, but results and computational time are hard to overcome).

        • Jason Brownlee May 20, 2017 at 5:37 am #

          LSTMs often need to be trained longer than you think and can greatly benefit from regularization.

  6. DJ June 2, 2017 at 1:42 am #


    Thanks for the great tutorial, I’m wondering if you can help me clarify the reason you have


    (line 83)
    when fitting the model, I was able to achieve similar results without the line as well.


  7. DJ June 2, 2017 at 4:11 pm #

    Thanks for the quick reply Jason :-). I’ve seen other places where reset is done by using callbacks parameter in

    class ResetStatesCallback(Callback):
    def __init__(self):
    self.counter = 0

    def on_batch_begin(self, batch, logs={}):
    if self.counter % max_len == 0:
    self.counter += 1

    Then the callback is used by as follows:, y, epochs=1, batch_size=1, verbose=2,
    shuffle=False, callbacks=[ResetStatesCallback()])

    The ResetStatesCallback snippet was obtained from:

    Please let me know what you think.


    • Jason Brownlee June 3, 2017 at 7:21 am #

      Yes, there are many ways to implement the reset. Use what works best for your application.

  8. QQ June 2, 2017 at 5:00 pm #

    Hi Jason, greate post, and I have some questions:

    1. in your fit_lstm function, you reset each epoch state, why?
    2. why you iterate each epoch by yourself, instead of using, y, epochs)

    thx Jason

    # fit an LSTM network to training data
    def fit_lstm(train, n_lag, n_seq, n_batch, nb_epoch, n_neurons):
    # reshape training into [samples, timesteps, features]
    X, y = train[:, 0:n_lag], train[:, n_lag:]
    X = X.reshape(X.shape[0], 1, X.shape[1])
    # design network
    model = Sequential()
    model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
    model.compile(loss=’mean_squared_error’, optimizer=’adam’)
    # fit network
    for i in range(nb_epoch):, y, epochs=1, batch_size=n_batch, verbose=0, shuffle=False)
    return model

    • Jason Brownlee June 3, 2017 at 7:23 am #

      The end of the epoch is the end of the sequence and the internal state should not carry over to the start of the sequence on the next epoch.

      I run the epochs manually to give fine grained control over when resets occur (by default they occur at the end of each batch).

  9. J June 7, 2017 at 12:48 am #

    I’d like to clarify line 99 in the LSTM example:

    —– plot_forecasts(series, forecasts, n_test+2)

    Is the n_test + 2 == n_test + n_lag – n_seq?


    • jvr June 15, 2017 at 11:49 pm #

      I’d also like to know why using n_test + 2

  10. Kao June 10, 2017 at 5:46 pm #

    Hi jason,
    When I applied your code into a 22-year daily time series, I find out that the LSTM forecast result is similar to persistence one, i.e. the red line is just a horizontal bar. I’m sure I did not mess those two methods, I wonder what cause this?

    My key configure as follows:
    n_lag = 1
    n_seq = 3
    n_test = 365*3

    and my series length is 8035.

    • Jason Brownlee June 11, 2017 at 8:21 am #

      You will need to tune the model to your problem.

      • Kao June 25, 2017 at 6:55 pm #

        Thanks to your tutorial, I’ve been tuning the parameters such as numbers of epochs and neurons these days. However, I noticed that you mentioned the grid search method to get appropriate parameters, could you please explain how to implement it into LSTM? I’m confused about your examples on some other tutorial which has a model class, seems unfamiliar to me.

  11. MM June 13, 2017 at 6:44 am #


    Thank you for these tutorials. These are the best tutorials on the web. One question: what is the best way to forecast the last two values?

    Thank you

    • Jason Brownlee June 13, 2017 at 8:31 am #

      Thanks MM.

      No one can tell you the “best” way to do anything in applied machine learning, you must discover it through trial and error on your specific problem.

      • MM June 13, 2017 at 9:29 am #


        Understood. Let me re-phrase the question. In a practical application, one would be interested in forecasting the last data point, i.e. in the shampoo dataset, “3-12”. How would you suggest doing that?

        • Jason Brownlee June 14, 2017 at 8:41 am #

          Fit your model to all of the data then call predict() passing whatever lag inputs your model requires.

      • MM June 13, 2017 at 10:24 am #


        Should the line that starts the offset point in plot_forecasts() be

        off_s = len(series) – n_test + i + 1


        off_s = len(series) – n_test + i – 1

  12. Michael June 21, 2017 at 4:03 am #

    Hi Jason,

    Thanks for your excellent tutorials!

    I have followed a couple of your articles about LSTM and did learn a lot, but here is a question in my mind: can I introduce some interference elements in the model? For example for shampoo sale problem, there may be some data about holiday sales, or sales data after an incident happens. If I want to make prediction for sales after those incidents, what can I do?

    What’s more, I noticed that you will parse date/time with a parser, but you did not really introduce time feature into the model. For example I want to make prediction for next Monday or next January, how can I feed time feature?


    • Jason Brownlee June 21, 2017 at 8:18 am #

      Yes, see this post for ideas on adding additional features:

      • Michael June 22, 2017 at 5:53 pm #

        Thanks for clarification.

        I have two more specific questions:
        1) In inverse_transform, why index = len(series) – n_test + i – 1?

        2) In fit_lstm, you said “reshape training into [samples, timesteps, features]”, but I think the code in line 74 is a little different from your format:

        73 X, y = train[:, 0:n_lag], train[:, n_lag:]
        74 X = X.reshape(X.shape[0], 1, X.shape[1])

        In line 74, I think it should be X = X.reshape(X.shape[0], X.shape[1], 1)

        • Jason Brownlee June 23, 2017 at 6:52 am #

          Hi Michael,

          Yes, the offset finds one step prior to the forecast in the original time series. I use this motif throughout the tutorial.

          In the very next line I say: “We will fix time steps at 1, so this change is straightforward.”

  13. Michael June 22, 2017 at 6:01 pm #

    Hi Jason,

    I would like to know how to do short term and long term prediction with minimum number of models?

    For example, I have a 12-step input and 12-step output model A, and a 12-step input and 1-step output model B, would model A gives better prediction for next first time step than model B?

    What’s more, if we have 1-step input and 1-step output model, it is more error prone to long term prediction.
    if we have multi-step input and 1-step output mode it is still more more error prone long term. So how to regard the long term and short term prediction?

    • Jason Brownlee June 23, 2017 at 6:53 am #

      I would recommend developing and evaluating each model for the different uses cases. LSTMs are quite resistant to assumptions and rules of thumb I find in practice.

  14. jzx June 25, 2017 at 1:17 pm #

    Hello, thanks for your tutorial
    If my prediction model is three time series a, b, c, I would like to use a, b, c to predict the future a, how can I build my LSTM model.
    thank you very much!

    • Jason Brownlee June 26, 2017 at 6:05 am #

      Each of a, b, and c would be input features. Remember, the shape or dimensions of input data is [samples, timesteps, features].

  15. Kedar June 26, 2017 at 6:03 pm #

    Does stationarizing data really help the LSTM? If so, what is the intuition behind that? I mean, I can understand that for ARIMA-like methods, but why for LSTM’s?

    • Jason Brownlee June 27, 2017 at 8:27 am #

      Yes in my experience, namely because it is a simpler prediction problem.

      I would suggest trying a few different “views” of your sequence and see what is easiest to model / gets the best model skill.

Leave a Reply