How to Develop LSTM Models for Time Series Forecasting

Long Short-Term Memory networks, or LSTMs for short, can be applied to time series forecasting.

There are many types of LSTM models that can be used for each specific type of time series forecasting problem.

In this tutorial, you will discover how to develop a suite of LSTM models for a range of standard time series forecasting problems.

The objective of this tutorial is to provide standalone examples of each model on each type of time series problem as a template that you can copy and adapt for your specific time series forecasting problem.

After completing this tutorial, you will know:

  • How to develop LSTM models for univariate time series forecasting.
  • How to develop LSTM models for multivariate time series forecasting.
  • How to develop LSTM models for multi-step time series forecasting.

This is a large and important post; you may want to bookmark it for future reference.

Kick-start your project with my new book Deep Learning for Time Series Forecasting, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

How to Develop LSTM Models for Time Series Forecasting

How to Develop LSTM Models for Time Series Forecasting
Photo by N i c o l a, some rights reserved.

Tutorial Overview

In this tutorial, we will explore how to develop a suite of different types of LSTM models for time series forecasting.

The models are demonstrated on small contrived time series problems intended to give the flavor of the type of time series problem being addressed. The chosen configuration of the models is arbitrary and not optimized for each problem; that was not the goal.

This tutorial is divided into four parts; they are:

  1. Univariate LSTM Models
    1. Data Preparation
    2. Vanilla LSTM
    3. Stacked LSTM
    4. Bidirectional LSTM
    5. CNN LSTM
    6. ConvLSTM
  2. Multivariate LSTM Models
    1. Multiple Input Series.
    2. Multiple Parallel Series.
  3. Multi-Step LSTM Models
    1. Data Preparation
    2. Vector Output Model
    3. Encoder-Decoder Model
  4. Multivariate Multi-Step LSTM Models
    1. Multiple Input Multi-Step Output.
    2. Multiple Parallel Input and Multi-Step Output.

Univariate LSTM Models

LSTMs can be used to model univariate time series forecasting problems.

These are problems comprised of a single series of observations and a model is required to learn from the series of past observations to predict the next value in the sequence.

We will demonstrate a number of variations of the LSTM model for univariate time series forecasting.

This section is divided into six parts; they are:

  1. Data Preparation
  2. Vanilla LSTM
  3. Stacked LSTM
  4. Bidirectional LSTM
  5. CNN LSTM
  6. ConvLSTM

Each of these models are demonstrated for one-step univariate time series forecasting, but can easily be adapted and used as the input part of a model for other types of time series forecasting problems.

Data Preparation

Before a univariate series can be modeled, it must be prepared.

The LSTM model will learn a function that maps a sequence of past observations as input to an output observation. As such, the sequence of observations must be transformed into multiple examples from which the LSTM can learn.

Consider a given univariate sequence:

We can divide the sequence into multiple input/output patterns called samples, where three time steps are used as input and one time step is used as output for the one-step prediction that is being learned.

The split_sequence() function below implements this behavior and will split a given univariate sequence into multiple samples where each sample has a specified number of time steps and the output is a single time step.

We can demonstrate this function on our small contrived dataset above.

The complete example is listed below.

Running the example splits the univariate series into six samples where each sample has three input time steps and one output time step.

Now that we know how to prepare a univariate series for modeling, let’s look at developing LSTM models that can learn the mapping of inputs to outputs, starting with a Vanilla LSTM.

Need help with Deep Learning for Time Series?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Vanilla LSTM

A Vanilla LSTM is an LSTM model that has a single hidden layer of LSTM units, and an output layer used to make a prediction.

We can define a Vanilla LSTM for univariate time series forecasting as follows.

Key in the definition is the shape of the input; that is what the model expects as input for each sample in terms of the number of time steps and the number of features.

We are working with a univariate series, so the number of features is one, for one variable.

The number of time steps as input is the number we chose when preparing our dataset as an argument to the split_sequence() function.

The shape of the input for each sample is specified in the input_shape argument on the definition of first hidden layer.

We almost always have multiple samples, therefore, the model will expect the input component of training data to have the dimensions or shape:

Our split_sequence() function in the previous section outputs the X with the shape [samples, timesteps], so we easily reshape it to have an additional dimension for the one feature.

In this case, we define a model with 50 LSTM units in the hidden layer and an output layer that predicts a single numerical value.

The model is fit using the efficient Adam version of stochastic gradient descent and optimized using the mean squared error, or ‘mse‘ loss function.

Once the model is defined, we can fit it on the training dataset.

After the model is fit, we can use it to make a prediction.

We can predict the next value in the sequence by providing the input:

And expecting the model to predict something like:

The model expects the input shape to be three-dimensional with [samples, timesteps, features], therefore, we must reshape the single input sample before making the prediction.

We can tie all of this together and demonstrate how to develop a Vanilla LSTM for univariate time series forecasting and make a single prediction.

Running the example prepares the data, fits the model, and makes a prediction.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

We can see that the model predicts the next value in the sequence.

Stacked LSTM

Multiple hidden LSTM layers can be stacked one on top of another in what is referred to as a Stacked LSTM model.

An LSTM layer requires a three-dimensional input and LSTMs by default will produce a two-dimensional output as an interpretation from the end of the sequence.

We can address this by having the LSTM output a value for each time step in the input data by setting the return_sequences=True argument on the layer. This allows us to have 3D output from hidden LSTM layer as input to the next.

We can therefore define a Stacked LSTM as follows.

We can tie this together; the complete code example is listed below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example predicts the next value in the sequence, which we expect would be 100.

Bidirectional LSTM

On some sequence prediction problems, it can be beneficial to allow the LSTM model to learn the input sequence both forward and backwards and concatenate both interpretations.

This is called a Bidirectional LSTM.

We can implement a Bidirectional LSTM for univariate time series forecasting by wrapping the first hidden layer in a wrapper layer called Bidirectional.

An example of defining a Bidirectional LSTM to read input both forward and backward is as follows.

The complete example of the Bidirectional LSTM for univariate time series forecasting is listed below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example predicts the next value in the sequence, which we expect would be 100.

CNN LSTM

A convolutional neural network, or CNN for short, is a type of neural network developed for working with two-dimensional image data.

The CNN can be very effective at automatically extracting and learning features from one-dimensional sequence data such as univariate time series data.

A CNN model can be used in a hybrid model with an LSTM backend where the CNN is used to interpret subsequences of input that together are provided as a sequence to an LSTM model to interpret. This hybrid model is called a CNN-LSTM.

The first step is to split the input sequences into subsequences that can be processed by the CNN model. For example, we can first split our univariate time series data into input/output samples with four steps as input and one as output. Each sample can then be split into two sub-samples, each with two time steps. The CNN can interpret each subsequence of two time steps and provide a time series of interpretations of the subsequences to the LSTM model to process as input.

We can parameterize this and define the number of subsequences as n_seq and the number of time steps per subsequence as n_steps. The input data can then be reshaped to have the required structure:

For example:

We want to reuse the same CNN model when reading in each sub-sequence of data separately.

This can be achieved by wrapping the entire CNN model in a TimeDistributed wrapper that will apply the entire model once per input, in this case, once per input subsequence.

The CNN model first has a convolutional layer for reading across the subsequence that requires a number of filters and a kernel size to be specified. The number of filters is the number of reads or interpretations of the input sequence. The kernel size is the number of time steps included of each ‘read’ operation of the input sequence.

The convolution layer is followed by a max pooling layer that distills the filter maps down to 1/2 of their size that includes the most salient features. These structures are then flattened down to a single one-dimensional vector to be used as a single input time step to the LSTM layer.

Next, we can define the LSTM part of the model that interprets the CNN model’s read of the input sequence and makes a prediction.

We can tie all of this together; the complete example of a CNN-LSTM model for univariate time series forecasting is listed below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example predicts the next value in the sequence, which we expect would be 100.

ConvLSTM

A type of LSTM related to the CNN-LSTM is the ConvLSTM, where the convolutional reading of input is built directly into each LSTM unit.

The ConvLSTM was developed for reading two-dimensional spatial-temporal data, but can be adapted for use with univariate time series forecasting.

The layer expects input as a sequence of two-dimensional images, therefore the shape of input data must be:

For our purposes, we can split each sample into subsequences where timesteps will become the number of subsequences, or n_seq, and columns will be the number of time steps for each subsequence, or n_steps. The number of rows is fixed at 1 as we are working with one-dimensional data.

We can now reshape the prepared samples into the required structure.

We can define the ConvLSTM as a single layer in terms of the number of filters and a two-dimensional kernel size in terms of (rows, columns). As we are working with a one-dimensional series, the number of rows is always fixed to 1 in the kernel.

The output of the model must then be flattened before it can be interpreted and a prediction made.

The complete example of a ConvLSTM for one-step univariate time series forecasting is listed below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example predicts the next value in the sequence, which we expect would be 100.

Now that we have looked at LSTM models for univariate data, let’s turn our attention to multivariate data.

Multivariate LSTM Models

Multivariate time series data means data where there is more than one observation for each time step.

There are two main models that we may require with multivariate time series data; they are:

  1. Multiple Input Series.
  2. Multiple Parallel Series.

Let’s take a look at each in turn.

Multiple Input Series

A problem may have two or more parallel input time series and an output time series that is dependent on the input time series.

The input time series are parallel because each series has an observation at the same time steps.

We can demonstrate this with a simple example of two parallel input time series where the output series is the simple addition of the input series.

We can reshape these three arrays of data as a single dataset where each row is a time step, and each column is a separate time series. This is a standard way of storing parallel time series in a CSV file.

The complete example is listed below.

Running the example prints the dataset with one row per time step and one column for each of the two input and one output parallel time series.

As with the univariate time series, we must structure these data into samples with input and output elements.

An LSTM model needs sufficient context to learn a mapping from an input sequence to an output value. LSTMs can support parallel input time series as separate variables or features. Therefore, we need to split the data into samples maintaining the order of observations across the two input sequences.

If we chose three input time steps, then the first sample would look as follows:

Input:

Output:

That is, the first three time steps of each parallel series are provided as input to the model and the model associates this with the value in the output series at the third time step, in this case, 65.

We can see that, in transforming the time series into input/output samples to train the model, that we will have to discard some values from the output time series where we do not have values in the input time series at prior time steps. In turn, the choice of the size of the number of input time steps will have an important effect on how much of the training data is used.

We can define a function named split_sequences() that will take a dataset as we have defined it with rows for time steps and columns for parallel series and return input/output samples.

We can test this function on our dataset using three time steps for each input time series as input.

The complete example is listed below.

Running the example first prints the shape of the X and y components.

We can see that the X component has a three-dimensional structure.

The first dimension is the number of samples, in this case 7. The second dimension is the number of time steps per sample, in this case 3, the value specified to the function. Finally, the last dimension specifies the number of parallel time series or the number of variables, in this case 2 for the two parallel series.

This is the exact three-dimensional structure expected by an LSTM as input. The data is ready to use without further reshaping.

We can then see that the input and output for each sample is printed, showing the three time steps for each of the two input series and the associated output for each sample.

We are now ready to fit an LSTM model on this data.

Any of the varieties of LSTMs in the previous section can be used, such as a Vanilla, Stacked, Bidirectional, CNN, or ConvLSTM model.

We will use a Vanilla LSTM where the number of time steps and parallel series (features) are specified for the input layer via the input_shape argument.

When making a prediction, the model expects three time steps for two input time series.

We can predict the next value in the output series providing the input values of:

The shape of the one sample with three time steps and two variables must be [1, 3, 2].

We would expect the next value in the sequence to be 100 + 105, or 205.

The complete example is listed below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example prepares the data, fits the model, and makes a prediction.

Multiple Parallel Series

An alternate time series problem is the case where there are multiple parallel time series and a value must be predicted for each.

For example, given the data from the previous section:

We may want to predict the value for each of the three time series for the next time step.

This might be referred to as multivariate forecasting.

Again, the data must be split into input/output samples in order to train a model.

The first sample of this dataset would be:

Input:

Output:

The split_sequences() function below will split multiple parallel time series with rows for time steps and one series per column into the required input/output shape.

We can demonstrate this on the contrived problem; the complete example is listed below.

Running the example first prints the shape of the prepared X and y components.

The shape of X is three-dimensional, including the number of samples (6), the number of time steps chosen per sample (3), and the number of parallel time series or features (3).

The shape of y is two-dimensional as we might expect for the number of samples (6) and the number of time variables per sample to be predicted (3).

The data is ready to use in an LSTM model that expects three-dimensional input and two-dimensional output shapes for the X and y components of each sample.

Then, each of the samples is printed showing the input and output components of each sample.

We are now ready to fit an LSTM model on this data.

Any of the varieties of LSTMs in the previous section can be used, such as a Vanilla, Stacked, Bidirectional, CNN, or ConvLSTM model.

We will use a Stacked LSTM where the number of time steps and parallel series (features) are specified for the input layer via the input_shape argument. The number of parallel series is also used in the specification of the number of values to predict by the model in the output layer; again, this is three.

We can predict the next value in each of the three parallel series by providing an input of three time steps for each series.

The shape of the input for making a single prediction must be 1 sample, 3 time steps, and 3 features, or [1, 3, 3]

We would expect the vector output to be:

We can tie all of this together and demonstrate a Stacked LSTM for multivariate output time series forecasting below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example prepares the data, fits the model, and makes a prediction.

Multi-Step LSTM Models

A time series forecasting problem that requires a prediction of multiple time steps into the future can be referred to as multi-step time series forecasting.

Specifically, these are problems where the forecast horizon or interval is more than one time step.

There are two main types of LSTM models that can be used for multi-step forecasting; they are:

  1. Vector Output Model
  2. Encoder-Decoder Model

Before we look at these models, let’s first look at the preparation of data for multi-step forecasting.

Data Preparation

As with one-step forecasting, a time series used for multi-step time series forecasting must be split into samples with input and output components.

Both the input and output components will be comprised of multiple time steps and may or may not have the same number of steps.

For example, given the univariate time series:

We could use the last three time steps as input and forecast the next two time steps.

The first sample would look as follows:

Input:

Output:

The split_sequence() function below implements this behavior and will split a given univariate time series into samples with a specified number of input and output time steps.

We can demonstrate this function on the small contrived dataset.

The complete example is listed below.

Running the example splits the univariate series into input and output time steps and prints the input and output components of each.

Now that we know how to prepare data for multi-step forecasting, let’s look at some LSTM models that can learn this mapping.

Vector Output Model

Like other types of neural network models, the LSTM can output a vector directly that can be interpreted as a multi-step forecast.

This approach was seen in the previous section were one time step of each output time series was forecasted as a vector.

As with the LSTMs for univariate data in a prior section, the prepared samples must first be reshaped. The LSTM expects data to have a three-dimensional structure of [samples, timesteps, features], and in this case, we only have one feature so the reshape is straightforward.

With the number of input and output steps specified in the n_steps_in and n_steps_out variables, we can define a multi-step time-series forecasting model.

Any of the presented LSTM model types could be used, such as Vanilla, Stacked, Bidirectional, CNN-LSTM, or ConvLSTM. Below defines a Stacked LSTM for multi-step forecasting.

The model can make a prediction for a single sample. We can predict the next two steps beyond the end of the dataset by providing the input:

We would expect the predicted output to be:

As expected by the model, the shape of the single sample of input data when making the prediction must be [1, 3, 1] for the 1 sample, 3 time steps of the input, and the single feature.

Tying all of this together, the Stacked LSTM for multi-step forecasting with a univariate time series is listed below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example forecasts and prints the next two time steps in the sequence.

Encoder-Decoder Model

A model specifically developed for forecasting variable length output sequences is called the Encoder-Decoder LSTM.

The model was designed for prediction problems where there are both input and output sequences, so-called sequence-to-sequence, or seq2seq problems, such as translating text from one language to another.

This model can be used for multi-step time series forecasting.

As its name suggests, the model is comprised of two sub-models: the encoder and the decoder.

The encoder is a model responsible for reading and interpreting the input sequence. The output of the encoder is a fixed length vector that represents the model’s interpretation of the sequence. The encoder is traditionally a Vanilla LSTM model, although other encoder models can be used such as Stacked, Bidirectional, and CNN models.

The decoder uses the output of the encoder as an input.

First, the fixed-length output of the encoder is repeated, once for each required time step in the output sequence.

This sequence is then provided to an LSTM decoder model. The model must output a value for each value in the output time step, which can be interpreted by a single output model.

We can use the same output layer or layers to make each one-step prediction in the output sequence. This can be achieved by wrapping the output part of the model in a TimeDistributed wrapper.

The full definition for an Encoder-Decoder model for multi-step time series forecasting is listed below.

As with other LSTM models, the input data must be reshaped into the expected three-dimensional shape of [samples, timesteps, features].

In the case of the Encoder-Decoder model, the output, or y part, of the training dataset must also have this shape. This is because the model will predict a given number of time steps with a given number of features for each input sample.

The complete example of an Encoder-Decoder LSTM for multi-step time series forecasting is listed below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example forecasts and prints the next two time steps in the sequence.

Multivariate Multi-Step LSTM Models

In the previous sections, we have looked at univariate, multivariate, and multi-step time series forecasting.

It is possible to mix and match the different types of LSTM models presented so far for the different problems. This too applies to time series forecasting problems that involve multivariate and multi-step forecasting, but it may be a little more challenging.

In this section, we will provide short examples of data preparation and modeling for multivariate multi-step time series forecasting as a template to ease this challenge, specifically:

  1. Multiple Input Multi-Step Output.
  2. Multiple Parallel Input and Multi-Step Output.

Perhaps the biggest stumbling block is in the preparation of data, so this is where we will focus our attention.

Multiple Input Multi-Step Output

There are those multivariate time series forecasting problems where the output series is separate but dependent upon the input time series, and multiple time steps are required for the output series.

For example, consider our multivariate time series from a prior section:

We may use three prior time steps of each of the two input time series to predict two time steps of the output time series.

Input:

Output:

The split_sequences() function below implements this behavior.

We can demonstrate this on our contrived dataset.

The complete example is listed below.

Running the example first prints the shape of the prepared training data.

We can see that the shape of the input portion of the samples is three-dimensional, comprised of six samples, with three time steps, and two variables for the 2 input time series.

The output portion of the samples is two-dimensional for the six samples and the two time steps for each sample to be predicted.

The prepared samples are then printed to confirm that the data was prepared as we specified.

We can now develop an LSTM model for multi-step predictions.

A vector output or an encoder-decoder model could be used. In this case, we will demonstrate a vector output with a Stacked LSTM.

The complete example is listed below.

Running the example fits the model and predicts the next two time steps of the output sequence beyond the dataset.

We would expect the next two steps to be: [185, 205]

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

It is a challenging framing of the problem with very little data, and the arbitrarily configured version of the model gets close.

Multiple Parallel Input and Multi-Step Output

A problem with parallel time series may require the prediction of multiple time steps of each time series.

For example, consider our multivariate time series from a prior section:

We may use the last three time steps from each of the three time series as input to the model and predict the next time steps of each of the three time series as output.

The first sample in the training dataset would be the following.

Input:

Output:

The split_sequences() function below implements this behavior.

We can demonstrate this function on the small contrived dataset.

The complete example is listed below.

Running the example first prints the shape of the prepared training dataset.

We can see that both the input (X) and output (Y) elements of the dataset are three dimensional for the number of samples, time steps, and variables or parallel time series respectively.

The input and output elements of each series are then printed side by side so that we can confirm that the data was prepared as we expected.

We can use either the Vector Output or Encoder-Decoder LSTM to model this problem. In this case, we will use the Encoder-Decoder model.

The complete example is listed below.

Running the example fits the model and predicts the values for each of the three time steps for the next two time steps beyond the end of the dataset.

We would expect the values for these series and time steps to be as follows:

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

We can see that the model forecast gets reasonably close to the expected values.

Further Reading

Summary

In this tutorial, you discovered how to develop a suite of LSTM models for a range of standard time series forecasting problems.

Specifically, you learned:

  • How to develop LSTM models for univariate time series forecasting.
  • How to develop LSTM models for multivariate time series forecasting.
  • How to develop LSTM models for multi-step time series forecasting.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning models for Time Series Today!

Deep Learning for Time Series Forecasting

Develop Your Own Forecasting models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Time Series Forecasting

It provides self-study tutorials on topics like:
CNNs, LSTMs, Multivariate Forecasting, Multi-Step Forecasting and much more...

Finally Bring Deep Learning to your Time Series Forecasting Projects

Skip the Academics. Just Results.

See What's Inside

978 Responses to How to Develop LSTM Models for Time Series Forecasting

  1. Avatar
    Jenna Ma November 16, 2018 at 12:09 am #

    This tutorial is so helpful to me. Thank you very much!
    It will be more helpful in the real projects if the dataset is split into batches. Hope you will mention this in the future.

    • Avatar
      Jason Brownlee November 16, 2018 at 6:16 am #

      Keras will split the dataset into batches.

      • Avatar
        Jenna Ma November 16, 2018 at 7:27 pm #

        I think this blog ( https://machinelearningmastery.com/use-different-batch-sizes-training-predicting-python-keras/) may answer my question. I will do more research. Thanks a lot.

      • Avatar
        maria October 8, 2019 at 8:52 pm #

        Hi!

        i would like to cite your book “Deep Learning for Time Series Forecasting: Predict the Future
        with MLPs, CNNs and LSTMs in Python.” Is there an appropriate format for doing this?

      • Avatar
        H October 31, 2019 at 2:31 am #

        Hi Jason,
        I want please an example of Sliding window-based support vector regression for prediction.
        have you this example .

        Thanks a lot

        • Avatar
          Jason Brownlee October 31, 2019 at 5:35 am #

          Thanks for the suggestion.

          • Avatar
            Sruthi March 29, 2021 at 11:14 pm #

            Hi Jason, It was a great tutorial
            I have a question :

            IN Multiple Parallel inputs, the output of the LSTM Encdoer 0Decoder model will be 3D, how do we transform it back to 2D? I am asking this because I have performed scaling on the data using minmaxscaler() and it expects the input to be a 2d array.

            In order to compare the predicted values with the original values, I need to perform inverse scaling, but I am stuck at how to reshape the 3d input and output back to 2d without losing any data.

          • Avatar
            Jason Brownlee March 30, 2021 at 6:05 am #

            You might need to write custom code to collect values for each variable before inverting the scale.

    • Avatar
      chi August 27, 2019 at 7:07 am #

      Hello Jason,

      Thank you so so much for your post, it was super helpful. For the multiple timesteps output LSTM model, I am wondering what will be the difference of the performance between model-1 and model-2? Model-1 is your multiple timesteps output LSTM model, for example, we input last 7 days data features, and the output is the next 5 days prices. Model-2 is the simple 1-timstep output LSTM model, where the input is last 7 days data features, output is the next day price. Then we use our predicted price as the new input to predict future prices until we predict all next 5 days prices.
      I am wondering what are the key differences between those 2 strategies to predict the next 5 days prices? What are the advantages and disadvantages of those 2 LSTM models?

      Thank you,

    • Avatar
      Rick March 28, 2020 at 5:45 pm #

      Hey Jason,
      Thanks for the blogs. They are really helpful and I have learned a lot from machinelearningmastery.
      This blog about LSTM is very informative, but I have a question

      I have a set of amplitude scans, and I want to predict next scan (many to one problem). So my data is of (6,590) and the result should be (1,590). 590 are the amplitude values in the scan.

      A. Is it possible to address this problem with LSTM and
      B. Even if possible how much accurate do you think the system might perform given the number of time steps and features it is predicting.

      Thanks

  2. Avatar
    Amy November 16, 2018 at 7:17 am #

    Thanks Jason for this good tutorial. I have a question. When we have two different time series, 1 and 2. Time series 1 will influence time series 2 and our goal is to predict the future value of time series 2. How can we use LSTM for this case?

  3. Avatar
    Kwan November 22, 2018 at 8:03 pm #

    Thanks Jason for this good tutorial, I have read your tutorial for a long time , I have a question. How to use LSTM model forecasting Multi-Site Multivariate Time Series, such as EMC Data Science Global Hackathon dataset, thank you very much!

  4. Avatar
    Caiyuan November 29, 2018 at 1:33 pm #

    Thank you for sharing. I found that the results of time series prediction using LSTM are similar to the results of one step behind the original sequence. What do you think?

    • Avatar
      Jason Brownlee November 29, 2018 at 2:40 pm #

      Sounds like the model has learned a persistance model and may not be skillful.

      • Avatar
        Sudrit Saisa-ing August 9, 2019 at 8:48 pm #

        I have some question?
        If I have model from LSTM,I want to know percent of accurate of new prediction.

        How to know percent accurate for new forcast?

        Thank you

  5. Avatar
    WLF December 5, 2018 at 6:04 pm #

    Thanks a lot! I have read your websites for a long time!
    I have a question, in “Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras” you said that:
    “LSTMs are sensitive to the scale of the input data, specifically when the sigmoid (default) or tanh activation functions are used. It can be a good practice to rescale the data to the range of 0-to-1, also called normalizing. ”
    So why don’t you normalize input here?
    Because you used relu? Because the data is increasing (so we can’t normalize the future input)? Or because you just give us an example?
    Do you suggest normalizing here?

    • Avatar
      Jason Brownlee December 6, 2018 at 5:51 am #

      It would be a good idea to prepare the data with normalization or similar here.

      I chose not to because it seems to confuse more readers than it helps. Also, choice of relu does make the model a lot more robust to unscaled data.

  6. Avatar
    rkk621 December 6, 2018 at 2:27 am #

    Thanks for a great article. Minor typo or confusion:

    For the Multiple input case in Multivariate series, if we use three time steps and

    10,15
    20,25
    30,35

    as our inputs, shouldn’t the output (predicted val used for training) be

    85

    instead of 65?

    • Avatar
      Jason Brownlee December 6, 2018 at 5:57 am #

      In the chosen framing of the problem, we want to predict the output at t not t+1, given inputs up to and including t.

      You can choose to frame the problem differently if you like. It is arbitrary.

    • Avatar
      WLF December 6, 2018 at 11:02 pm #

      You can also reference ‘Multiple Parallel …’

      So you can find the differences in function ‘split_sequences’

      if you want to predict 85, you can change the code to:

      if end_ix > len(sequences)-1:
      break
      seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix, -1]

      Notice ‘len(sequences)-1’, and ‘sequences[end_ix, -1]’

  7. Avatar
    Ida December 10, 2018 at 7:55 pm #

    Thanks sooooo much Jason.
    It helped me a lot.

  8. Avatar
    John December 12, 2018 at 9:21 am #

    Hi Jason,

    Thanks for this nice blog! I am new to LSTM in time-series, and I need your help.

    Most info on internet is for a single time series and for next-step forecasting. I want to produce 6 months ahead forecast using previous 15 months for 100 different time series, each of length 54 months.

    So, there is 34 windows for each time-series if we use sliding windows. So, my initial X_train has a shape of (3400,15). Then. I am reshaping my X_train [samples, timesteps, features] as follows: (3400, 15, 1). Is this reshaping correct? In genera, how can we choose “timesteps” and “features” arguments in this multi-input multi-step forecast?

    Also, how can I choose “batch_size” and “units”? Since I want 6 months ahead forecast, my output should be a matrix with dimensions (100,6). I chose units=6, and batch_size=1. Are these numbers correct?

    Thanks for your help!

    • Avatar
      Jason Brownlee December 12, 2018 at 2:14 pm #

      Looks good.

      Time steps is really problem specific – e.g. how much history do you need to make a prediction. Perhaps test with your data.

      Batch size and units – again, depends on your problem. Test. 6 units is too few. Start with 100, try 500, 1000, etc. Batch size of 1 seems small, perhaps also try 32, 64, etc.

      Let me know how you go.

      • Avatar
        John December 13, 2018 at 2:06 am #

        Hi Jason,

        Thanks for your response.

        I don’t understand “6 units is too few”. In documentation of lstm functions in R, units is defined as “dimensionality of the output space”. Since I need an output with 6 columns (6 months forecast), I define units=6. Any other number does not produce the output I want. Is there anything wrong in my interpretation?

        • Avatar
          Jason Brownlee December 13, 2018 at 7:55 am #

          I recommend using a Dense layer as the output rather than the outputting from the LSTM directly.

          Then dramatically increase the capacity of the model by increasing the number of LSTM units.

    • Avatar
      Ravi Varma Injeti December 18, 2019 at 12:55 am #

      Hii Jason that’s great tutorial. I have time series data of the size 2245 where timings of bus from starting station to destination station. I want to find the pattern is it possible through LSTM WITHOUT THE CATEGORICAL RESPONSES.

  9. Avatar
    Shaifali December 16, 2018 at 1:21 am #

    Bidirectional LSTM works better than LSTM. Can you please explain the working of bidirectional LSTM. Since we do not know future values. How do we do prediction?

  10. Avatar
    Jenna Ma December 16, 2018 at 4:24 pm #

    In the last encoder-decoder model, if I have different features of input and output, is it correct that I change the code like this?
    model = Sequential()
    model.add(LSTM(200, activation=’relu’, input_shape=(n_steps_in, n_features_in)))
    model.add(RepeatVector(n_steps_out))
    model.add(LSTM(200, activation=’relu’, return_sequences=True))
    model.add(TimeDistributed(Dense(n_features_out)))
    model.compile(optimizer=’adam’, loss=’mse’)

    • Avatar
      Jason Brownlee December 17, 2018 at 6:19 am #

      I’m sure I understand, what do you mean exactly?

      • Avatar
        Jenna Ma December 18, 2018 at 1:50 pm #

        I am sorry for not expressing my question clearly.
        In the last part of your tutorial, you gave an example like this:
        [[10 15 25]
        [20 25 45]
        [30 35 65]]
        [[ 40 45 85]
        [ 50 55 105]]
        Then, you introduced the Encoder-Decoder LSTM to model this problem.
        If I want to use the last three time steps from each of the three time series as input to the model and predict the next two time steps of the third time series as output. Namely, my input and output elements are like the following. The shapes of input and output are (5, 3, 3) and (5, 2, 1) respectively.
        [[10 15 25]
        [20 25 45]
        [30 35 65]]
        [[85]
        [105]]
        When I define the Encoder-Decoder LSTM model, the code will be like this:
        model = Sequential()
        model.add(LSTM(200, activation=’relu’, input_shape=(3,3)))
        model.add(RepeatVector(2))
        model.add(LSTM(200, activation=’relu’, return_sequences=True))
        model.add(TimeDistributed(Dense(1)))
        model.compile(optimizer=’adam’, loss=’mse’)
        Is it correct?
        Thank you very much!

        • Avatar
          Jason Brownlee December 18, 2018 at 2:36 pm #

          It looks correct, but I don’t have the capacity to test the code to be sure.

          • Avatar
            Jenna Ma December 18, 2018 at 6:05 pm #

            Thank you!
            I test the code, and I want to show you what I got.
            I assume the input sequence:
            in_seq1 = np.arange(10,1000,10)
            in_seq2 = np.arange(15,1005,10)
            Define the prediction input:
            x_input = np.array([[960, 965, 1925], [970, 975, 1945], [980, 985, 1965]])
            I expect the output values would be as follows:
            [ [1985] [2005] ]
            And the model forecasts: [ [1997.1425] [2026.6136] ]
            I think this means that the model can work.

          • Avatar
            Jason Brownlee December 19, 2018 at 6:31 am #

            Nice work! Now you can start tuning the model to lift skill.

  11. Avatar
    dani December 19, 2018 at 12:55 pm #

    how we can test these examples if have big excel data set?and its time series data, kindly refer to a link?

  12. Avatar
    mk December 20, 2018 at 7:13 pm #

    Can Multivariate time series apply to cnn-lstm model?

  13. Avatar
    Lionel December 21, 2018 at 6:16 pm #

    I want to predict visibility on one airport for the next 120 hours.
    I already build a LSTM to predict the visibility for the next hour, solely based on visibility observation. (Basically, the network learned that persistance is a good algorithm.)

    My next step is to include a weather model forecast of say humidity as input.

    I have then as input:
    visibility observation on the airport (past and present)
    prediction of humidity for the next 120 hours.

    I have trouble to combine these two information.
    Do you have suggestions?

    • Avatar
      Jason Brownlee December 22, 2018 at 6:03 am #

      What trouble are you having exactly?

      • Avatar
        Lionel December 22, 2018 at 7:21 pm #

        let’s say:
        Input : last 120 h of measured visibility
        weather forcast for the next 120 h

        Output: visibility prediction for the next 120 h

        Implementation:
        make visibility prediction every hour for the next 120 h

        I have trouble to see how the LSTM will update its state every hour, since it will only get as new information a measured visibility for the last hour, and not about the full 120 h prediction.

        I must say that I’m a newbie in ML.

        • Avatar
          Jason Brownlee December 23, 2018 at 6:04 am #

          The model is only aware of the data that you provide it.

  14. Avatar
    Potofski December 22, 2018 at 3:48 am #

    Thanks a lot for your post. Your work is a great resource on forecasts with lstm!

    Assume, I have dependent time series (heating costs and temperature) and I want to predict the dependent (heating costs), how could I implement temperature predictions (from other weather forecasts) into my model for heating cost predictions?

    Do you know of any common approaches to this? Or any papers on how to handle external forecasts for independent variables?

  15. Avatar
    Jenna Ma January 4, 2019 at 9:35 pm #

    Hi Jason,
    I think I saw you mentioning the activation function ‘relu’ usually works better than ‘tanh’ in LSTM model. But, I forget I saw this in which post. I don’t find any post from your blog that focuses on how to choose the activation function. So, I submit this question under this post and hope you don’t mind.
    Is it true that ‘relu’ often works better than ‘tanh’ in your experience? If you have any post talking about activation function, please give me the title or URL.
    Thank you very much!

    • Avatar
      Jason Brownlee January 5, 2019 at 6:55 am #

      It really depends on the dataset, I have found LSTMs with relu more robust on some problems.

  16. Avatar
    Jenna Ma January 7, 2019 at 12:50 am #

    Thank you! So, the way I can make sure which activation function is the best for my dataset is to enumerate and see the results?

  17. Avatar
    Matt January 9, 2019 at 3:21 am #

    This is awesome for someone starting out with LSTM.

    All the content on your site is amazing, I really appreciate it. Thank you.

  18. Avatar
    Andrew Jabbitt January 10, 2019 at 4:23 am #

    Hi Jason,

    Still lovin’ your work!

    1 question: can you please explain the purpose of the out_seq series in the Multiple Parallel Series example?

    Many thanks,
    Andrew

    • Avatar
      Jason Brownlee January 10, 2019 at 7:57 am #

      It is the output sequence, dependent upon the input sequences.

      • Avatar
        Andrei February 20, 2020 at 2:59 am #

        Correct me if I’m wrong, but isn’t the prediction the output? I mean, besides the way you obtained the out_seq sequence in the first place, it’s no different than in_seq1 or in_seq2. It could even be considered an engineered feature that expands the data.

        • Avatar
          Jason Brownlee February 20, 2020 at 6:19 am #

          Prediction is the output of the model.

          Perhaps I don’t follow your question?

  19. Avatar
    sophia January 22, 2019 at 8:29 am #

    another great article, Jason! I’m trying to get started on a project that is similar to the LSTM model described in this article: https://medium.com/bcggamma/using-deep-learning-to-predict-not-just-what-but-when-fae6515acb1b

    I’d greatly appreciate your input on how to develop an LSTM model that can predict ‘what’ a consumer may buy and ‘when’ they will buy it;

    Based on your article, it looks like the right model to choose would be Multiple Parallel Input and Multi-Step Output. Would you agree or do you think i should choose a different model? Any pointers or links to relevant articles would help!

    Thanks,

    • Avatar
      Jason Brownlee January 22, 2019 at 11:42 am #

      I’d encourage you to prototype and explore a suite of different framings of the problem in order to discover what works best for your specific dataset.

  20. Avatar
    Raman January 22, 2019 at 3:17 pm #

    I have used your code to get started, at the last step I am getting a below error-
    NameError: name ‘to_list’ is not defined

    Could you please help, I am not sure what am i missing here.

    Thanks for your help

  21. Avatar
    Raman January 23, 2019 at 4:09 pm #

    Hi Jason,

    Thanks for taking time out, I have copied your code line by line and checked couple of times as well. Example is from Vanila LSTM.

    Checks done-
    I was getting some error, then I followed stack overflow and downgraded my keras to Version: 2.1.5
    I searched stack overflow and related questions and even posted my questions there.

    Your help is appreciated.

    • Avatar
      Jason Brownlee January 24, 2019 at 6:38 am #

      I recommend using the latest version of Keras and TensorFlow.

  22. Avatar
    Sarra January 30, 2019 at 3:02 am #

    Please, have you an example of LSTM encoder-decoder with the train / test-evaluation partitions.

    I tried but it does not work like this:

    # split into samples

    trainX, trainy = split_sequence(train, n_steps_in, n_steps_out)
    testX, testy = split_sequence(test, n_steps_in, n_steps_out)

    # reshape

    trainX = trainX.reshape((trainX.shape[0], trainX.shape[1], n_features))
    testX = testX.reshape((testX.shape[0], testX.shape[1], n_features))
    ….

    # fit model
    model.fit(trainX, trainy, epochs=5, verbose=2)

    # make predictions
    trainPredict = model.predict(trainX)
    testPredict = model.predict(testX)

    # calculate root mean squared error
    trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))
    print(‘Train Score: %.2f RMSE’ % (trainScore))
    testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))
    print(‘Test Score: %.2f RMSE’ % (testScore))

    thank you very much

    • Avatar
      Jason Brownlee January 30, 2019 at 8:14 am #

      I may, you can use the search box to look at all tutorials that use the encoder-decoder pattern.

  23. Avatar
    Gunay February 6, 2019 at 2:43 am #

    Hi Jason,

    Thanks for this tutorial. I am quite new to the time series forecasting with LSTM. I have a question about the part “Multiple Parallel Input and Multi-Step Output”. The output data shape is (5,2,3). I mean the each instance on the output is not just a sequence, It is a sequence of sequence. And you have show the example there with Encoder and Decoder. I just want to implement one of the methods of Stacked or Bidirectional LSTM. But I am not sure which number I should put the Dense layer. For example, in the previous examples, the output shape is like (6,2) and It is obvious we should put 2 for the Dense layer. But I can not figure out the right thing for the Stacked LSTM. Do you have any example tutorial for this?

    Kind Regards,
    Gunay

    • Avatar
      Jason Brownlee February 6, 2019 at 7:51 am #

      With multi-step output, the number of nodes in the output layer must match the number of output time steps.

      With multivariate multi-step, a vanilla or bidirectional LSTM is not suited. You could force it, but you will need n x m nodes in the output for n time steps for m time series. The time steps of each series would be flattened in this structure. You must interpret each of the outputs as a specific time step for a specific series consistently during training and prediction.

      I don’t have an example, it is not an ideal approach.

      • Avatar
        Gunay February 6, 2019 at 7:27 pm #

        Thank you!

  24. Avatar
    Gunay February 6, 2019 at 7:29 pm #

    Is there any alternative structure for this kind of problems except Encoder-Decoder?

    • Avatar
      Jason Brownlee February 7, 2019 at 6:37 am #

      Yes, the one I described. There may be others, it is good to brainstorm and prototype approaches.

  25. Avatar
    Tian February 10, 2019 at 5:29 pm #

    Thanks for your great tutorial. I just wonder should we avoid using bidirectional LSTM for time series data? Does it mean we use future data to train the past model parameters?

    • Avatar
      Jason Brownlee February 11, 2019 at 7:56 am #

      No, it means the model will process the input sequence forwards and backwards at the same time.

  26. Avatar
    Gunay February 15, 2019 at 8:56 am #

    Hi Jason,

    I faced one problem and just interesting maybe you did it before. I have the forecasting problem as like Multiple Input Multi-Step Output but a little bit different. Let’s just assume, my input(which are features dataset) and output (target we want to forecast) datasets have historic data. And I should forecast one week ahead for the target. But I have also the one week ahead forecasted input dataset(which is forecasted by another system). I should use both the historic input and one week ahead forecasted input to forecast one week ahead output. But I do not know how I should use that one week ahead forecasted input data during the learning process. Can you give me any hint?

  27. Avatar
    Anirban February 15, 2019 at 4:08 pm #

    What if we want to predict anything for the next 20 upcoming days! Here sequentially we have to predict for 20 days. How can we apply LSTM here?

  28. Avatar
    Aaron March 6, 2019 at 2:47 pm #

    HI Jason, thanks for all the tutorials. They are really helpful. I am looking to try and implement an LSTM that returns a sequence, and had read this tutorial – https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/

    One thing I am having trouble understanding is how to really shape the input data and get a sequence output using Tensorflow / Keras. I am looking to predict the sequence T – T+12 hours using T-1 – T-48 hours. So predicting the next 12 hours from the last 48 hours in 1 hour increments. Each hour of data has a dozen or so features for that time step. From what I have read of yours so far it seems as if each of the 48 previous time steps should be considered features of the time step T to predict a sequence for the next 12 hours. And so basically, from what I gather, I would end up with the input for Timestep T having 576 columns (48 time steps, each with 12 features) – I mean does that seem right? I am also a bit unsure of what particular model I should use… is it going to be a multi-step, multi-input network… just a bit confused on the jargon as well and maybe thats why I’m having trouble figuring out what I need to do.

    Looking at some of your books too, but not sure what might be the right one to help guide me through a problem like this.

    Thanks,
    Aaron

      • Avatar
        Aaron March 6, 2019 at 11:59 pm #

        Thanks! That definitely makes sense now from the input shape standpoint. If I have 20 samples with 48 timesteps and 12 features the input shape would be [20, 48, 12]

        For the output however, looking through the Keras docs https://keras.io/layers/recurrent/, I am trying to get a return sequence. Would I be using a 3D tensor? (batch_size, timesteps, units) where it would look like (20, 12, 1)? Since I am trying to find 1 value at each of the 12 time steps for the sample size of 20

        Thanks again!
        Aaron

        • Avatar
          Jason Brownlee March 7, 2019 at 6:52 am #

          I don’t recommend returning a sequence from the LSTM itself, instead use an encoder-decoder model:
          https://machinelearningmastery.com/start-here/#lstm

          • Avatar
            Aaron March 7, 2019 at 10:05 am #

            Why don’t you recommend returning a sequence from the LSTM? If I was using the below encoder-decoder model from another one of your posts, what would the output of the first LSTM be?

            model = Sequential()
            model.add(LSTM(…, input_shape=(…)))
            model.add(RepeatVector(…))
            model.add(LSTM(…, return_sequences=True))
            model.add(TimeDistributed(Dense(…)))

          • Avatar
            Jason Brownlee March 7, 2019 at 2:32 pm #

            Generally the output sequence from an LSTM is the activation of the nodes from each step in the input sequence. It is unlikely to capture anything meaningful.

            It is better to interpret these activations or the final activations with more LSTM or Dense layers, and the output a sequence of the same or different lengths using a separate model.

          • Avatar
            Gideon May 2, 2019 at 6:10 am #

            Hi there,
            I love this tutorial, all of your tutorials actually but this one I have found the most helpful. Questions about the MIMO LSTM output shape has come up a few times, and I am also having trouble with it.

            I am trying to use a Dense layer as my final layer as you suggest, passing it n_steps_out as an argument. I am predicting 3 variables and n_steps_out is 10.

            Keras complains that it is expecting the dense layer to have 2 dimensions, but I am passing it an array with shape (n_samples,n_steps_out,n_features)

            Can you help me make sense of this?

            Thank you

          • Avatar
            Jason Brownlee May 2, 2019 at 8:09 am #

            I would recommend a model with a time distributed wrapper or decoder for multivariate multi-step output, so you can output one vector for each time step.

  29. Avatar
    Abderrahim March 15, 2019 at 5:20 pm #

    Hi Jason,
    I have a question: are LSTM suitable for predicting based on a test set with the same nature of inputs as of train set ? Like in other cases of prediction where you will be having input signals in train set, that the model will work on. plus the memory based on the fact that entries are ordered.
    I trained an LSTM on a CNN model acting on ordered images, to predict a timeserie. on test set I have the following ordered set of images by time. I guess there is no concept of horizon here, how should I improve my model, and what starting point in predicting test set in this case?

    Many thanks.

    • Avatar
      Jason Brownlee March 16, 2019 at 7:48 am #

      I would recommend modeling the raw time series directly, instead of images of the time series.

  30. Avatar
    Tayson March 26, 2019 at 12:06 am #

    Hello Jason,

    Many thanks for the helpful article..
    I have tried to copy the code “Multiple Parallel Input and Multi-Step Output” and run it exactly the same without any changing but I got a different results than the one you got.

    [ [
    [147.56306 167.8626 312.92883]
    [185.38152 205.36024 385.96536] ] ]

    Is there any reason for that?

    Best regards,
    Tayson

  31. Avatar
    Chris March 26, 2019 at 1:27 am #

    Hi Jason,
    How would you handle building the LSTM model for time series data with irregular time intervals (e.g. Jan 1, Jan 2, Jan 4, Jan 7, Jan 13, Jan 14, etc…)?

    It appears this model presupposes a regular time-interval spacing.

    You could fill the “missing” days with zeros or impute them with, say, the mean of the last 3 values, but I would like to know how to make the LSTM model without filling/imputing the time series data. How would you handle this?

    Thanks, and great lesson.

    • Avatar
      Jason Brownlee March 26, 2019 at 8:10 am #

      Yes, I would try many approaches and compare results, such as:

      – model as is
      – normalize interval with padding
      – upsample/downsample to new intervals
      – etc.

      • Avatar
        neb August 21, 2019 at 7:17 am #

        Follow-up to this question
        Holding number of features constant

        Are the various combination of models above able to cope when the number of time-steps per each Sample is variable?

        Or do the underlying model assumptions break in some way?

        • Avatar
          Jason Brownlee August 21, 2019 at 1:57 pm #

          Yes, you can either pad all samples to the same length or use a dynamic RNN. Assumptions of the model hold for both cases.

  32. Avatar
    Ron March 27, 2019 at 1:06 am #

    If we are forecasting in monthly buckets and using 5 years of data, how do we know how many months of data to have on each row?

    • Avatar
      Jason Brownlee March 27, 2019 at 9:05 am #

      Perhaps perform a sensitivity analysis of the model to see how history impacts model performance.

      There will be a sweet spot for a given dataset.

      • Avatar
        Ron March 27, 2019 at 1:16 pm #

        Thanks Jason! If the history has distinct patterns for each quarter, should we have 3 months in each row? How would the results differ when we keep 12 months on each row versus 3 months on each row versus 1 month on each row?

        • Avatar
          Jason Brownlee March 27, 2019 at 2:07 pm #

          Depends on the dataset, I recommend testing to discover the specific answers with your data and model.

  33. Avatar
    Peter March 28, 2019 at 5:57 am #

    Hi Jason,

    I am trying to predict high and low value of a time series in next X days, my output layer in RNN is :

    model.add(Dense(2, activation=’linear’))

    so basically output vector is [y_high, y_low], the model works pretty well however it sometimes outputs y_low > y_high, which of course doesn’t make any sense, is there a way to enforce model so that condition y_high >= y_low is always met.

    • Avatar
      Jason Brownlee March 28, 2019 at 8:24 am #

      Interesting, perhaps you simplify the problem and predict a value in a discrete ordinal interval, e.g. each category is a blocks of values?

      • Avatar
        Peter March 29, 2019 at 3:00 am #

        I was trying to modify loss function but I am unable to access y_pred individual members, I don’t even know whether it’s ultimately possible.

  34. Avatar
    Joe March 29, 2019 at 2:43 am #

    Hi Jason, a colleague and I are thinking of trying an LSTM model for time series forecasting. We are faced with over a thousand potential predictors, and would like to select only a smaller number for the final model. In particular, I have recently become fascinated by SHAP values; e.g., see this informal blog post by Scott Lundberg himself, in the context of XGBoost.
    https://towardsdatascience.com/interpretable-machine-learning-with-xgboost-9ec80d148d27

    Tantalizingly, Scott L. demonstrates SHAP values in the context of an LSTM model here:
    https://slundberg.github.io/shap/notebooks/deep_explainer/Keras%20LSTM%20for%20IMDB%20Sentiment%20Classification.html
    But that is using text input (sentiment classification in the IMDB data set), which involves an Embedding layer just before the LSTM layer. For a non-text problem like time series forecasting, we would exclude the Embedding layer. But doing so breaks the code.

    Do you have any suggestions how SHAP values might be used in the context of LSTMs for time series forecasting (not text processing)? If not, do you have any suggestions for feature selection in that context?

    Thanks!

    • Avatar
      Jason Brownlee March 29, 2019 at 8:42 am #

      I don’t know what SHAP is, sorry.

    • Avatar
      Sam April 14, 2020 at 6:31 pm #

      Hi, Joe. I am running into the exact same topic. Have you found a way to implement SHAP to multivariate timeseries forecasting?

  35. Avatar
    Hsin March 29, 2019 at 5:26 pm #

    Hi Jason,
    Thanks for this useful tutorial.
    I am confused to inverse scaling of my data after splitting it into the form:
    x(data_length, n_step, feature)
    Because the scaler only can be used in 2D condition.

    What I want to do is evaluate rmse between prediction and true values, so I have to
    inverse transform data. Could you please tell me how to deal with this problem?

  36. Avatar
    Pratik March 30, 2019 at 12:12 am #

    Hi Jason,
    Firstly, I must say you have a fabulous chunk of articles on ML/DL. Thanks for helping out the community at large.

    Coming to LSTMs, I am stuck in one problem from last few days. Here is how it goes –
    I have 3 columns namely customer id and basket_index and timestamp. For every customer, each row represents one time stamp. Lets say there are 3 customers with variable time stamps. First one is having 30 time stamps, 2nd is having 25 and 3rd is having 50. So, the total number of rows are 105. Now for the column basket index, each row signifies a list of product keys bought by any customer on a particular timestamp. Here is the snapshot of the dataset –

    CustomerID basket_index timestamp predicted_basket
    111 [1,2,3] 1 [4,5]
    111 [4,5] 2 [9,7]
    111 [9,7] 3 [3,5,6,1]
    .
    .
    222 [6,2,3] 1 [1,0,2,5]
    222 [1,0,2,5] 2 [7,5]
    .
    .
    333
    .
    . and so on..
    Now, since every customer has a different time series,

    1) How to pass everything into one network?
    2) Do I have to build multiple LSTM models (one for each customer) in this case?

    3) Also, I am creating an embedding layer for both customer and product keys (taking mean for every basket). How to specify how many steps back does every time series look in such cases?
    4) How should I specify batch size in this case?

    Your help will be really appreciated. Thanks!

  37. Avatar
    Huiping March 30, 2019 at 1:13 am #

    Thanks Jason for nice post.

    One question hopes to get your guide: For a LSTM work, we can’t stop on say the model is good but most important is how to use the good model outcome.

    For example flu or not for patients. Now I want to predict the flu for future half year (Jun-2019 to Dec-2019) but what I have is history data (I have past 4 years those people’s flu data and target on that model is half year from 6-1-2018 to 12-31-2018).

    How can I apply history LSTM outcome to predict future?

    Can I get a list of important features from the history model with some value(like a weight) and apply this to my future data?

    Or can i get the list of important feature from a good fit LSTM model and those features are important than other features?

    Appreciate your guide!

  38. Avatar
    Jeyson Hernández April 3, 2019 at 11:40 am #

    Hi Jason,

    Amazing work! Thanks sharing us your knowledge, this tutorial was so helpfull.

    I’m new in ML/DL, i’m trying to predict sales in a company for future six months using LSTM. But i have an issue, i’m not sure about how to get more than 1 next step from your code using just one x vector by input. I’m using a monthly time step

    Could you help me to understand a little bit better how to get it?

  39. Avatar
    Md. Abul Kalam Azad April 4, 2019 at 3:05 pm #

    Dear Sir,

    Thanks for your sharing example. I have collected traffic information like (Road property, weather, datetime,adjacent road speed, target road speed and more) for predicting road speed. Currently, I have prepared my code using Vanilla LSTM model for one step as well as multi-step-ahead prediction. Can you suggest me for which below model will be best for road speed prediction with higher accuracy?

    Models are:
    Data Preparation
    Vanilla LSTM
    Stacked LSTM
    Bidirectional LSTM
    CNN LSTM
    ConvLSTM

    I am waiting for your response.

    Thanks,
    Azad

  40. Avatar
    Fazano April 8, 2019 at 8:58 pm #

    hi Jason, im using vanilla LSTM for forecasting,and i want to forecast 10 days ahead using this code

    # Forecat real future

    # Number of desired forecast in the future
    L=10
    #creat inputs and output empty matrices for future forecasting
    Future_input=np.zeros((L,3))
    Future=np.zeros((L,1))

    #add last 3 forecast as input for forecasting next day (tommorow)
    Future_input[0,:]=[predict[-3],predict[-2],predict[-1]]
    #create 3 dimension input for LSTM inputs
    Future_input= np.reshape(Future_input,(Future_input.shape[0],1,Future_input.shape[1]))
    #predict tommorrow value
    Future[0,0]=model.predict(np.expand_dims(Future_input[0],axis=0))

    #Loop to predict next 9 days values
    for i in range (0,9):
    Future_input[i+1,0,:]=np.roll(Future_input[i,0,:], -1, axis=0)
    Future_input[i+1,0,2]=Future[i,0]
    Future[i+1,0]=model.predict(np.expand_dims(Future_input[i],axis=0))

    #print 10 day ahead values
    print(Future)

    can it be like that?

  41. Avatar
    Sher April 13, 2019 at 2:41 am #

    Hi, do you have any tips for implementing univariate ConvLSTM for two-dimensional spatial-temporal data? I’m trying to input 10 time steps of 55 x 55 images for single-step time series forecasting.

    The following error code appears:
    “ValueError: Error when checking target: expected dense_10 to have 2 dimensions, but got array with shape (10, 55, 55)”

  42. Avatar
    Randy April 13, 2019 at 6:21 am #

    Dear Sir,
    i have sequence 1247 data and i want to forecast 30 next, so the data would be 1277.
    i follow this tutorial, but it just can 1 or 2 forecast. and i follow this tutorial

    https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/

    but i get little confusion. so you have any advise to me?
    its stock price data actually.

  43. Avatar
    Lloyd April 19, 2019 at 6:23 am #

    Amazing Tutorial, thank you.

    I have a question, is there a model where the outputs can influence each other?

    I.e. you have multiple sequences all which move independently but can influence the others?

    Thank you

    • Avatar
      Jason Brownlee April 19, 2019 at 3:03 pm #

      Thanks.

      Yes, an encoder-decoder model that outputs a time step for each series in concert might be such an approach.

  44. Avatar
    Jules April 19, 2019 at 10:07 pm #

    Awesome. Great Explanation as always. I have always got rather frustrated and confused over the shape of data going into Keras models. So I relied upon your tutorials to make it clear.

    Anyway using your examples I have been able demonstrate use of LSTM in predicting simple 2-D ballistics prediction calculations. I have used your code to help me here.

    https://github.com/JulesVerny/BallisticsRNNPredictions

    Pygame is required to animate the simulations

  45. Avatar
    Kishore April 19, 2019 at 11:48 pm #

    Dear Prof,

    Imagine I have raw text containing only words ‘N1,N2,N3,………….,N1000’ in a shuffled format , i.e, 1 million words, each of which can belong to any of these 1000 words.

    I want to select the number of time steps =5, and predict the next word.
    Eg: An input of [N1,N6,N5,N88,N32] would be followed by ‘N73′.

    Now, assume that I have tokenized all the 1000 possible words into numbers.

    This is a scenario with 1000 possible output classes.
    So should I replace model.add(Dense(1)) with model.add(Dense(1000,activation=’softmax’)) ?
    If not, what is the main change I need to make, as compared to your univariate stacked LSTM code ?

    • Avatar
      Jason Brownlee April 20, 2019 at 7:39 am #

      If the words are shuffled, then there would be no structure for a model to learn.

  46. Avatar
    HK April 23, 2019 at 8:06 pm #

    Dear Jason!

    I’m trying to use stacked lstm for this problem – Multiple Parallel Input and Multi-Step Output.
    However I’m not sure how the final Dense layer should look like. Could you give me some hints, please?

    • Avatar
      Jason Brownlee April 24, 2019 at 7:57 am #

      Perhaps start with the example in the above post and then add an additional LSTM layer?

      • Avatar
        HK April 29, 2019 at 6:11 am #

        Which example do you mean? I can’t find any example for Multiple parallel input and multi step output LSTM, which uses stacked LSTM layers instead of encoder decoder.

        • Avatar
          Jason Brownlee April 29, 2019 at 8:28 am #

          Yes, under the section “Multivariate Multi-Step LSTM Models”

          Specifically the subsection “Multiple Parallel Input and Multi-Step Output”

          The examples can be adapted to use any models you wish.

  47. Avatar
    Raman Singh April 25, 2019 at 8:27 am #

    Thanks Jason for detailed explanation.

    Could you please tell how can we add hyperparameters for tuning “Forget Gate”, Input Gate” and “Output Gate” in LSTM compile or fit methods or is it done internally and we can’t control these gates?

  48. Avatar
    will April 25, 2019 at 1:36 pm #

    How to predict multiple such inputs, x_input = array([[70,75,145], [80,85,165], [90,95,185],…,[200,205,405]]),Expect the next output, [210,215,425],
    See this input in the article,x_input = array([[70,75,145], [80,85,165], [90,95,185]]),Predict such results,[[101.76599 108.730484 206.63577 ]],But it doesn’t seem to matter why you need to enter such a sequence.in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90]),in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])
    thanks

    • Avatar
      Jason Brownlee April 25, 2019 at 2:45 pm #

      I believe there is a few multi-time step models listed above that will provide a good starting point.

  49. Avatar
    John April 25, 2019 at 4:33 pm #

    Hi Jason,

    Thanks for the article.
    I was working with your code and planning to implement in my work, but I have noticed a different behavior. If I compile and run the code different times, it gives different result each time although I didn’t change anything in your code. I have tried with your example data and run several times and each time I got different results. I tried with my own dataset and the result is the same.

    Now I am confused to implement LSTM in my work.
    Could you please clarify this behavior?

  50. Avatar
    Shiva April 26, 2019 at 10:59 pm #

    Hi jason,

    Say we have 3 variates(X).. and 1 dependent (Y)
    The relation of 2 variate in X is like for 3 lags and 1 variate is 30 lag.

    What is your advice when we have to model in such case?

  51. Avatar
    Raghu April 28, 2019 at 3:45 pm #

    Hi Jason,

    Thanks for the very informative tutorial. Can you please throw more light on how to come up with confidence intervals for the predicted value

  52. Avatar
    parsa April 28, 2019 at 10:42 pm #

    Hi Jason
    Thanks for your helpful tutorial
    Could you please tell how can we predict the futures that we don’t have its data available
    for example, I finalized my LSTM model, how can I predict the values on 2050

  53. Avatar
    will April 28, 2019 at 11:41 pm #

    Thanks for the article.However, I have a problem that every prediction results are different, such as Multiple Parallel Series,The first time is [[101.25582 106.49429 207.8928 ]],The second time it became [[101.82945 107.527626 209.8016 ]],Why is this?
    thanks

  54. Avatar
    shiva April 29, 2019 at 4:39 am #

    I want to restate my question…
    Suppose we are trying to model a water bucket that was 1 open inlet at the top and 2 outlets at the side one near the top and one near the bottom.
    this will mean that the outlet at the top can release when the water is really good..
    the outlet near the bottom has release which is exponential function of water above it.

    now say such systems are in paralell(one above another, say2) and series(say 2, the final outlet from each parallel series join at the final output.) (Total 4 buckets).

    can this be modeled by LSTM?
    I have done this analytically…results are ok ..
    tyring to use lstm for this ,,,

  55. Avatar
    Ali April 29, 2019 at 7:53 pm #

    Hello Jason,

    to the step:

    # define input sequence
    in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])
    in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])

    I want to ask how I can load a fully column out of a dataset.
    I don´t want to insert each value because I have more than 22 million rows. After that I want to split into sequences of 200-400 time steps.

    To the step:

    out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])

    I don´t have a right mathematical equation. I want to predict the output without any knowledge about the relationship between the input signals.

    I hope you can help me.

    Kind regards

    Ali

  56. Avatar
    Sree April 29, 2019 at 8:59 pm #

    Hi Jason,

    Thanks for these explanations and sample codes!

    I was interested in the example you have provided for multi-variate version of LSTM. You have provided an example of a simple addition case. How can this be extended to instances where there are multiple inputs, but an exact relation between the inputs are not known even though it is known that the inputs are correlated? Thanks much for your guidance!

    • Avatar
      Jason Brownlee April 30, 2019 at 6:54 am #

      The model will learn the relationship, addition was just for demonstration.

  57. Avatar
    Sree May 1, 2019 at 9:34 am #

    Thanks Jason! That’s perfect.

    In that case, what should the statement “out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])” be replaced by, since we don’t know the exact relation between the variables? Thanks again!

  58. Avatar
    Sree May 2, 2019 at 8:17 pm #

    Thanks Jason. I shall read the content on that link.

    Cheers,
    Sree.

  59. Avatar
    Gideon May 3, 2019 at 5:37 am #

    Hello, thank you again. I think my previous question could be made more clear.
    I would like to use the vector output approach for a mimo lstm, making multi step predictions into the future similar to your encoder/decoder example.

    I have tried using the split_sequences method from the encoder/decoder example with the vector output example and the dimensions dont work out. I end up with a value error

    ValueError: Error when checking target: expected dense_2 to have 2 dimensions, but got array with shape (5, 2, 3)

    I greatly appreciate your help, I have been struggling with this for a while. I would imagine the output should be a matrix (number of features X prediction horizon) so I think there is something conceptually I am not understanding.

    Thank you, and thank you for all of the wonderful tutorials

    Gideon

    • Avatar
      Jason Brownlee May 3, 2019 at 6:25 am #

      Perhaps start wit the code example you want to use and slowly change it for your needs.

      If the data size does not match the models expectations, you will need to change the data shape or change the model’s expectations.

      • Avatar
        Gideon May 3, 2019 at 7:30 am #

        I will toil away some more, but I just want to be sure it is possible to use a dense layer/vector output approach for Multiple Parallel Input and Multi-Step Output LSTM in Keras.
        Thanks again for your time.

        Gideon

        • Avatar
          Jason Brownlee May 3, 2019 at 2:40 pm #

          It is possible to use a Dense for multi-step multivariate output without a decoder or timedistributed wrapper layer, it is just ugly.

          E.g. the output would be a vector with n x m nodes, where n is number of variates and m is the number of steps.

          • Avatar
            Gideon Prior May 4, 2019 at 8:20 am #

            Ive figured it out, and its not too ugly and exactly what I needed. I was unaware of the Reshape layer in Keras.

            from keras.layers import Reshape

            model.add(Dense(n_steps_out*n_features))
            model.add(Reshape((n_steps_out,n_features)))

            Thank you again for your help. I am buying your book right now.

            Cheers

            Gideon

          • Avatar
            Jason Brownlee May 5, 2019 at 6:18 am #

            Nice work.

        • Avatar
          George July 17, 2019 at 11:48 pm #

          Hi Gideon,

          I was struggling around something similar and applying your solution, solved all the matters! Do you have any more documentation on this?

  60. Avatar
    Alberto May 3, 2019 at 11:07 pm #

    Hello Jason,

    Great article, very useful. I want to use LSTM to predict sun irradiance 12 hours ahead using 8 features (including sun irradiance) of the last 24 hours as inputs. Thus, it would be a multivariate multi-step LSTM where the output is a sequence of 12 timesteps. I have 8 years of data and I want to use first 6 for training and last 2 for testing. I have some questions:

    1) Should I overlap the input sequences?

    2) Should I use a vector output model or an encoder-decoder model?

    • Avatar
      Jason Brownlee May 4, 2019 at 7:08 am #

      I recommend testing both approaches and use data to make the decision, e.g. choose the model that gives the best result.

  61. Avatar
    aravind May 5, 2019 at 3:54 am #

    hai jason,
    the article was very much helpful.
    can you just tell me which approach should I take if I have two columns in my dataset .
    one is time in ddmmyyyy format and the other is stock price.
    I have the data for last 12 months.
    I want to predict the stock price for 4 upcoming months.
    how can I do the same.
    one more doubt is that if the column for time is not actually having a same interval in between them, then is there anything more that I should do to or consider for predicting the 4 upcoming months stock price

  62. Avatar
    shiva May 7, 2019 at 7:33 am #

    In Multiple Input Series,
    (7, 3, 2) (7,)

    [[10 15]
    [20 25]
    [30 35]] 65
    [[20 25]
    [30 35]
    [40 45]] 85
    [[30 35]
    [40 45]
    [50 55]] 105
    [[40 45]
    [50 55]
    [60 65]] 125
    [[50 55]
    [60 65]
    [70 75]] 145
    [[60 65]
    [70 75]
    [80 85]] 165
    [[70 75]
    [80 85]
    [90 95]] 185

    1. How many lstm block will be here in this example( x=7)
    if batch size = 3,is the number of lstm block equal to the number of x in the batches?
    or the number of timesteps?

    2.are timesteps, neurons and batchsize all hyperparameter? how do we optimize them

  63. Avatar
    shiva May 8, 2019 at 4:56 am #

    thanks..
    Then what is the total number of LSTM blocks?
    for every epoch, are the weights reinitialized and states are reset?

    • Avatar
      Jason Brownlee May 8, 2019 at 6:46 am #

      The number of LSTM units is specified in each hidden LSTM layer.

      LSTM states are reset at the end of every batch.

  64. Avatar
    shiva May 8, 2019 at 8:15 am #

    sorry but i dont get this?

    In model.add(LSTM(50, activation=’relu’, input_shape=(n_steps, n_features)))
    input_shape here is equal to an input to each LSTM node right?

    and here 50 means,, h(hidden layer) is a vector of 50*1 right?

    my question is the number of individual LSTM nodes(block) equal to number of samples in the a batch?

    • Avatar
      Jason Brownlee May 8, 2019 at 2:08 pm #

      Yes, the shape defines the shape of each input sample (time steps and features).

      Yes, 50 refers to units in the first hidden layer.

      The number of units and sample shape are both unrelated to the batch size. Unless you are working with a stateful LSTM, in which case the input shape must also specify the batch size.

      Does that help?

    • Avatar
      shiva May 8, 2019 at 11:08 pm #

      yeah.. one followup question
      [10 15]
      [20 25]
      [30 35]] 65
      here is it like many to one ?

      this feeds as xt (single input) right?
      in this case what is the size of weight ?

      • Avatar
        Jason Brownlee May 9, 2019 at 6:43 am #

        Yes, multivariate multistep input to one output.

        • Avatar
          shiva May 9, 2019 at 11:05 am #

          how does this input concatenate with hidden layer … i cannot visualize this..
          i was thinking the input were a vector[n*1]

          • Avatar
            Jason Brownlee May 9, 2019 at 2:05 pm #

            Each node in the hidden layer gets the complete put sequence.

          • Avatar
            shiva May 12, 2019 at 9:33 am #

            Thank you so much..

            [10 15]
            [20 25]
            [30 35]] 65

            so in this case ,,, what is the size of xt and weight matrix?

          • Avatar
            Jason Brownlee May 13, 2019 at 6:42 am #

            You can calculate it based on the number of nodes in your network.

          • Avatar
            shiva May 15, 2019 at 10:32 pm #

            Thank you jason.. you have so kind and helpful..

  65. Avatar
    shiva May 8, 2019 at 11:07 am #

    The number of cells is equal to the number of fixed time steps.
    The blogs says so. I am very confused with number of cells and what controls it.

    https://stackoverflow.com/questions/37901047/what-is-num-units-in-tensorflow-basiclstmcell#39440218

    Sorry for trouble

  66. Avatar
    Philipp May 13, 2019 at 2:26 am #

    Dear Jason,

    Thank you for writing all these awesome tutorials!

    My question:
    As I understood it, a LSTM network learns the information in a time-series by backpropagation through a specific length (in time) at which the LSTM cells are unrolled during training.
    So, while training it is necessary to define the number of timesteps provided in the training data. But shouldn’t it be possible to use the (trained) network with ANY number of input timesteps to make a prediction (because of the recurrent nature in which the LSTM cells work)?
    Am I getting something wrong here from the beginning?

    Thank you for hints on this
    Philipp

  67. Avatar
    sumitra May 13, 2019 at 3:41 pm #

    Dear Jason,

    I am currently working on a disease outbreak prediction model. I have 4 years of data with over 100 input variables and each year has got 365 data points. I would like to create a LSTM model that will be able to predict the future outbreak (whether thr will be an outbreak-1 or no outbreak-0) based on the given input variables. For example, given 7 days of data points, i would like to predict the occurance of outbreak (whether 0 or 1) on the 8th day.

    However, i am not sure on which LSTM model will best fit my case. Will ‘multiple input multi-step output) be the best approach? Your guidance will be much appreciated.

    Thank you

  68. Avatar
    Nitin May 26, 2019 at 1:54 pm #

    Hi Jason,

    Can you please provide some pointers that will help us in minimizing the step-loss during model fitting….

    Thanks

  69. Avatar
    ICHaLiL May 29, 2019 at 12:26 am #

    Dear Jason,

    Thank you for your tutorials. They are really useful for us.

    I’ve one question about LSTM. I have different time series more than one (for example 100). I need to train network with 100 different time series. and test 10 different time series. Which method should I use?

    Thanks for your helps.

  70. Avatar
    QuantCub May 30, 2019 at 2:00 pm #

    Hi Jason,

    Thank you for sharing. I wonder if there is a way to set timestep > 1 without doing subsequence sampling as you did in data preparation, e.g. convert a 9-by-1 time series to a 6-by-3 data set. After the conversion, the 3-feature dataset is no more time dependent. You are able to use any kind of ML models (say OLS) to predict y. So why LSTM? Should LSTM be able to select (forget) previous information without this conversion?

    • Avatar
      Jason Brownlee May 30, 2019 at 2:55 pm #

      LSTM does have the benefit that it can remember across samples.

      This may or may not be useful, and is often not useful for simple autoregressions.

      • Avatar
        QuantCub May 31, 2019 at 1:19 am #

        Thank you for your quick reply. In your example, if I do a subsequence sampling and convert
        [10, 20, 30, 40, 50, 60, 70, 80, 90] to
        [[10, 20, 30],
        [40, 50, 60],
        [70, 80, 90]] (no replacement between each subsequence)
        and run LSTM(input_shape=(3,1)), is that the same as I run LSTM(batch_input_shape=(3,1,1), stateful=True) on the origin time series (9-by-1)?

  71. Avatar
    Neel June 11, 2019 at 9:08 pm #

    For a classification LSTM, using a Seed I get the same classification matrix each time I run it. However, when I vary the batch size in model.predict, I get the following:

    Prediction Batch Sizes:

    32 = Different Classification Matrix on each repeat

    Batch size in predictions is merely for ram managment. Correct? If yes, what do you think Dr. Jason would cause these irregularities ?

      • Avatar
        Neel June 12, 2019 at 4:06 pm #

        Hi Jason,

        Sorry I didn’t explain my concern well. I was referring to the Batch Size parameter that we mention in “model.predict i.e. predicting” and not while training. I agree that batch size during training will have an impact. During prediction, the default size is 32 as defined by keras but when I change that to anything but 32 I get a different classification matrix even though I use a seed. When I leave the batch size as default, my seed is able to produce the same results.

        • Avatar
          Jason Brownlee June 13, 2019 at 6:11 am #

          Recall that with the LSTM, the state is reset at the end of each batch. This explains why you are getting different results for the same model with different inference batch sizes.

  72. Avatar
    Diego June 24, 2019 at 5:23 am #

    Hi Jason,

    Thanks for the tutorial.
    I’d like to apply this example to a real case.

    I have to forecast how much money will be withdrawn every day from a group of ATMs.
    Currently I am using a time series for every ATM. (100 ATMs = 100 time series).

    Wich method do you think could be better from this tutorial ?
    I need to use historical information and external information such as holidays, day of week, etc.
    Thanks in advance.

  73. Avatar
    Liang Zhao June 25, 2019 at 5:58 am #

    Hi Jason, I want to use some kind of machine learning method to demonstrate that there is a relationship between the score gap of two basketball teams and the demand for a taxi outside the stadium.

    I have time series of pick-ups near a stadium. I have the score gap time series between two basketball teams.

    What I want to achieve is that training a machine learning model that could tell me, based on the taxi pick-ups at time t, what is the taxi pick-ups at time t+1.
    I also want to see if I also have the score gap at time t, can I improve my prediction accuracy of pick-ups at time t+1.

    Which machine learning model should I use?

    thank you so much!

  74. Avatar
    James July 1, 2019 at 1:11 am #

    Hi Jason,

    Thanks for the tutorial.

    Suppose I have several time series showing cumulative bookings for different trains last year. I don’t want to forecast but just classify those time series to see if some of them have similar patterns. Can I include all those series into one LSTM model? Is there any risks when doing so?

    Thanks in advance.

    • Avatar
      Jason Brownlee July 1, 2019 at 6:35 am #

      Sure, it means you are learning/modeling across books. Sounds reasonable.

      • Avatar
        James July 1, 2019 at 11:44 pm #

        Thanks Jason!

        So is it the same as multivariate LSTM? Sorry I’m new to modelling so still find things confusing

        • Avatar
          Jason Brownlee July 2, 2019 at 7:32 am #

          Probably not, each example is a separate sample or input-output pair for the model to learn from.

  75. Avatar
    Irini July 1, 2019 at 9:35 am #

    Hi Jason,

    thanks for the nice tutorial!

    I have a dataset with 3000 univariate timeseries (i.e. 3000 samples) and each sample has 4000 timesteps. When i use [samples, time steps, features]=[3000, 4000, 1] the code is extremely slow and with bad performance.
    On the other hand, if instead [3000, 4000, 1] i write [3000, 1, 4000] the code is very fast and with great performance.
    But is the reshape [3000, 1, 4000] correct? I mean according to the rule [samples, time steps, features] and given the fact that each of my samples have 4000 timesteps and for each time step there is one feature the correct should be [3000, 4000, 1].

    So is [3000, 1, 4000] correct? And if it is not (logically it is not) why it works much better than [3000, 4000, 1] ?

    Thanks in advance

    • Avatar
      Jason Brownlee July 1, 2019 at 11:35 am #

      I would recommend not using more than 200 to 400 time steps per sample. Perhaps you can truncate your data?

      • Avatar
        Irini July 1, 2019 at 8:15 pm #

        I did also an experiment and i truncated my data and used as input [samples, time steps, features]=[3000, 400, 1]. It was quicker but i got a mean accuracy 42% (in 10 random splits).
        As i told you in my previous post when i exchange timesteps with features namely when i use [3000, 1, 4000] i get an accuracy 90%.
        But giving 1 timestep means that i don’t exploit the memory, whis is the characteristic of lstm?

        I am confused as to whether i should use [3000, 1, 4000], which is very quick and gives very good results but maybe it is not very correct? Or it is correct as if i used [3000, 400, 1](if i truncated my data to 400)

        • Avatar
          Jason Brownlee July 2, 2019 at 7:30 am #

          The state of the LSTM is reset at the end of each batch by default, so you can get some across-sample memory.

          I recommend testing a suite of different configurations to see what works well or best for your specific dataset. I cannot know what will work well, you must discover the answer.

  76. Avatar
    Manish July 2, 2019 at 2:39 am #

    Hello Jason,

    I am quite new to ML and LSTMs. I have a scenario where I intend to train a model using my hourly sensor values. For eg

    12-1-2019 12:00:00 12
    12-1-2019 13:00:00 16

    12-5-2019 12:00:00 14

    Once I am done with my training I intend to predict values every hour and compare the values with live sensor values….I am planning to use LSTM and which approach do you recommend me ?

  77. Avatar
    Harish July 2, 2019 at 7:18 pm #

    Jason, this is very useful. Im try to to do some prediction around IT incidents. based on historic data i want to predict what type incident i can expect next month/week/day. do you have anything similar done if so request to share pls

  78. Avatar
    Lopa July 3, 2019 at 12:35 am #

    Hi Jason,

    Thanks for answering my question in your other tutorials. I have a minor doubt suppose my data has a continuous time series(non stationary) & other categorical variables (which are already encoded). Under that circumstance what is the best way to difference the data ? Because categorical data are not differenced but they have to be used while training the model.

    The function written above differences all the variables irrespective of whether they are continuous or categorical. It would be great if you can help.

    • Avatar
      Jason Brownlee July 3, 2019 at 8:36 am #

      Difference the real-values only, and only if they are non-stationary.

  79. Avatar
    myro July 3, 2019 at 5:16 pm #

    Hi Jason,
    I copied and pasted your first example from Multi-Step LSTM Models, the one with the vector output of two values and the input being one.

    You report as an output the values:

    input [[70 80 90]]
    output [[100.98096 113.28924]]

    but with those parameters I cannot get any closer than

    input [[70 80 90]]
    output [[122.678955 139.9465 ]]

    This you use the parameters you report? Is this so dependant on architecture?

    • Avatar
      Jason Brownlee July 4, 2019 at 7:40 am #

      Results are dependent upon the model, the model configuration and the data, the performance is also stochastic, subject to random variance.

      • Avatar
        myro July 19, 2019 at 7:49 pm #

        Hi, thanks for your reply.
        I understand that, that’s why I am asking,
        I have same model, same model config. same data, and the stochasticity should be symmetrically distributed (?). Then I assume that the results you report are not from the parameters you have in the code examples.

  80. Avatar
    Lopa July 3, 2019 at 7:10 pm #

    My data is non stationary & there are seasonality every 7 days ( as evident from the ADF tests & ETS plots) & a first order differencing makes it stationary.

    I totally get that I have to difference only the real values & that is what I have been aiming to do . But the reason I asked this question because the moment I difference the real values its get shifted by one place so if the original data has 100 observations the differenced data will have 99 observations (with a first order differencing). But the categorical data which cannot be differenced remains to be the same 100. How do I deal with this ?

    • Avatar
      Jason Brownlee July 4, 2019 at 7:44 am #

      You discard the first observation and the difference value corresponds to the categorical value at the same time step.

  81. Avatar
    Lopa July 3, 2019 at 8:06 pm #

    I think I have been able to solve the issue thanks Jason for addressing my query

  82. Avatar
    Leon July 5, 2019 at 5:58 am #

    in the Vector Output Model section,
    I copied your code and tried, the actual answer is not correct as of the expected [100, 110], they are actually [110, 120].

    • Avatar
      Jason Brownlee July 5, 2019 at 8:11 am #

      Perhaps try running the example a few times? It can very given the stochastic nature of the learning algorithm.

      • Avatar
        Leon July 11, 2019 at 1:30 am #

        never get any chance to around [100, 110]. I ran many times, the output is always around [110, 120] with some variations.

        no kidding 🙂 you can try that part of codes. The output looks ridiculous.

        • Avatar
          Jason Brownlee July 11, 2019 at 9:50 am #

          Intersting.

          Is Keras/TensorFlow/Python up to date?

  83. Avatar
    Matthew July 5, 2019 at 5:56 pm #

    Hi Jason, I am doing an electrical demand forecast and am trying to build a model which predicts the demand for the following 24 hours given the last 90 hours. I have implemented two types: a 24 step prediction and a recursively defined prediction, which predicts the next hour and then uses the previous 89 true values and the new predicted value to predict the next value, and so on. I am wondering which method you believe to be the best(if either) and any tips for improving my model as depending on the time of year the forecast can vary massively with accuracy. I currently have an LSTM(50) connected to a Dense(20) connected to an output Dense(1) for both cases.
    Any help would be greatly appreciated. Thank you. Matthew

    • Avatar
      Jason Brownlee July 6, 2019 at 8:29 am #

      Well done, very cool!

      I recommend testing each method and use the one with the lowest error.

      Also, get creative and test a suite of other configurations. Ensure your test harness is robust and reliable so that you can trust the decisions you make.

  84. Avatar
    skyrim4ever July 8, 2019 at 8:13 pm #

    Hello, this example was nice to follow and seemed little more simpler than other LSTM examples because of no pre-processinhg transformations (normalization, standardization, making data into stationary, etc.). However, should I perform these pre-processing transformations in general for time series prediction? Should I do such thing for this kind of examples too even though the dataset is simple?

    • Avatar
      Jason Brownlee July 9, 2019 at 8:09 am #

      Yes, test to see if the data preparation improves model performance.

      I keep it out of examples for brevity.

  85. Avatar
    Nutakki July 11, 2019 at 8:59 pm #

    #In Multiple Parallel Series
    I have defined the input like this
    # define input sequence
    in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])
    in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])
    x_input = array([[70,75,1,4], [80,85,165,5], [90,95,185,6]])
    n_steps=4
    n_features=X.shape[2]

    how the input is looping to obtain output as follows: [[ 72.74373 106.51455 251.78499]]?
    Can you give a clear idea what does n_steps=4, n_features=X.shape[2] really means and how does it function?

  86. Avatar
    Amelie July 12, 2019 at 2:10 am #

    Please, is there a method to find the correct parameter of an ANN model: LSTM, MLP (hidden layer number, activation function, loss function ..)

    what does it mean when my train and validation loss curves are parallel while the Train Score and Test Score are small?

    Is there a method to optimize all these results?

  87. Avatar
    Samil July 17, 2019 at 8:16 am #

    Thanks for the tutorial. I have applied the multistep, multivariate logic to my own dataset. Namely, I have 12 look-back, 12 look-ahead and 41 features (all having exact look-back as the main variable of interest). Trying the TimeDistributed code snippet gave me progressively increasing RMSE. Is this due to the nature of my time series or is it a sign of mistake done during construction of the model? It is hard to tell for you but maybe you can share your take on this issue. Thanks

    • Avatar
      Jason Brownlee July 17, 2019 at 8:33 am #

      It could be either.

      Perhaps try fewer features and evaluate impact?
      Perhaps try different models and evaluate impact?

      • Avatar
        Samil July 19, 2019 at 9:42 am #

        Thanks II tried encoder-decoder and stack LSTM. Both gives me increasing RMSE for further look-aheads.It is understandable for encoder-decoder as it uses the output as an input (so associated error also comes with the prediction and builds up over time) but not sure why I see the same thing with the stack lstm. Anyways, thanks again for the response and the post!

        • Avatar
          Samil July 19, 2019 at 9:48 am #

          Also, one quick related question. You use “-1” in multi step future multivariate split_sequence models (such as n_steps_out-1 etc.). This reduces the number of resulting features by one when compared to other split_sequence snippets. I tested it with the other multistep split_sequence code you shared above. Not sure but are’nt we supposed to have the same number of features? Thanks

        • Avatar
          Jason Brownlee July 19, 2019 at 2:20 pm #

          Well done on the improvement!

  88. Avatar
    Aziz Ahmad July 22, 2019 at 4:56 am #

    Sir Plz! Suggest me good learning sources about my project ( carbon emission forcasting using LSTM).

  89. Avatar
    Mans Oshanov July 22, 2019 at 6:20 pm #

    Thank you for the great tutorial. Is it possible to get the probability of prediction(in percentage) or second best prediction out of these models? Thank you)

  90. Avatar
    Kennard July 26, 2019 at 11:57 am #

    Hi, Jason

    Your tutorial helps me a lot, thank you very much!

    And I have a question that how to adjust the learning rate of the LSTM network in the CNN-LSTM code you’ve mentioned above.

    I’m looking forward to your reply, thank you!

    (The reply I left in https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/?unapproved=494293&moderation-hash=2b6d045a4e1ff047d0720753b2b1e418#comment-494293 is in wrong place, sorry about that)

  91. Avatar
    Luis July 27, 2019 at 3:10 am #

    This is amazing. I love the blog

  92. Avatar
    Armande Kertanio July 28, 2019 at 8:31 am #

    Thank for this nice explanation.

    I have a problem when reshaping the data for multiple output architecture.

    the architecture is:

    outputs=[]

    main_input = Input(shape= (seq_length,feature_cnt), name=’main_input’)
    lstm = LSTM(32,return_sequences=True)(main_input)
    for _ in range((5)):
    prediction = LSTM(8,return_sequences=False)(lstm)
    out = Dense(1)(prediction)
    outputs.append(out)

    model = Model(inputs=main_input, outputs=outputs)
    model.compile(optimizer=’rmsprop’,loss=’mse’)

    and when reshaping the y using:

    y=y.reshape((len(y),5,1))

    I got a reshaping error:

    ValueError: Error when checking model target: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 5 array(s), but instead got the following list of 1 arrays: [array([[0.35128802, 0.01439778, 0.60109704, 0.52722118, 0.25493708],

    would you please help?

    • Avatar
      Jason Brownlee July 29, 2019 at 5:58 am #

      Perhaps define what you want the output shape to be, e.g. n samples with m time steps, then confirm your data has that shape, or if not set that shape?

  93. Avatar
    Florian July 29, 2019 at 2:00 am #

    You use “model.add(TimeDistributed(MaxPooling1D(pool_size=2)))” and write “max pooling layer that distills the filter maps down to 1/4 of their size”. A typo or is there a different reason explaining the use of 2 vs. 4 here?

    • Avatar
      Jason Brownlee July 29, 2019 at 6:16 am #

      Sorry for the confusion.

      If the map is 8×8 and we apply a 2×2 pooling layer, then we get a 4×4 out, e.g. 1/4 the area (64 down to 16).

      For time series, if we have 1×8 and apply a 1×2 pooling, we get 1×4, you’re right. 1/2 the size, not 1/4 as in image data.

      Fixed. Thnaks!

  94. Avatar
    Nic July 29, 2019 at 5:53 pm #

    Hi Jason,

    first of all, thanks for that awesome introduction into LSTM-Models.
    There is just one thing i don’t get.

    In the section “Multiple Input Series” you used the following example:
    [[ 10 15 25]
    [ 20 25 45]
    [ 30 35 65]
    [ 40 45 85]
    [ 50 55 105]
    [ 60 65 125]
    [ 70 75 145]
    [ 80 85 165]
    [ 90 95 185]]

    As you mentioned the first two entries in the arrays refer to the two time series and the last one to the corresponding target variable. To train the LSTM you split the data into input and output samples like:
    [[10 15]
    [20 25]
    [30 35]] 65

    Why do I drop the first two target entries (25 and 45). Isn’t that information my network loses for training? Why don’t we use each (single) sample like x = [10 15] y[25] to train the time series. Isn’t it easier to lern the series if i have the target for each step?

    • Avatar
      Jason Brownlee July 30, 2019 at 6:04 am #

      Good question.

      We must create samples of inputs and outputs.

      Some of the input at the beginning of the dataset don’t have enough prior data to recreate an input, therefore must be removed.

  95. Avatar
    Joel August 1, 2019 at 12:04 am #

    Good work, However, you should provide the library imports, to make it easier for beginners.

    • Avatar
      Jason Brownlee August 1, 2019 at 6:53 am #

      All library inputs are provided in the “complete example” listed in the post.

      Sorry for the confusion.

      • Avatar
        GODFREY JOSEPH SAQWARE October 8, 2021 at 4:45 pm #

        Hello Sir, I am so happy with your illustration, I have a problem with how to do forecasting based on your demonstration. I will be happy to get your email

  96. Avatar
    will August 11, 2019 at 12:27 am #

    Hi, Jason,I need to predict a hundred thousand sequences like this[10, 20, 30, 40, 50, 60, 70, 80, 90], how do I do it, do I do it in cycles, one by one, I do it in cycles, it feels like it’s going to take longer

    • Avatar
      Jason Brownlee August 11, 2019 at 5:59 am #

      If the model is read only and you are not dependent upon state across samples, you can run the model in parallel on different machines and prepare batches of samples for each model to make predictions.

  97. Avatar
    PRADEEP CHAKRAVARTHI NUTAKKI August 11, 2019 at 3:42 am #

    Hi, I am very happy to have this LSTM example to have a practice.

    I have a problem as follows:

    I have 300 excel workbooks of which each excel sheet has 3 values…..

    the 3 values will be in this format [1.02,2.20,1.0]; [2.9,3.5,3.3];…….like this 300 sets.

    Now i want to train and test my model with the data from 300 excel workbooks as input and the model has to predict the 301th set for example: [5,3.3,2.4] depending on the sequence of previous values.
    Note: the output shouldn’t be the probability set from the 300 sets, the output should be a new set.

    Can you suggest me any solution to this problem?

    • Avatar
      Jason Brownlee August 11, 2019 at 6:04 am #

      Perhaps you can use some custom code to extract all of the data from the excel files into a csv file ready for modeling?

  98. Avatar
    y jing August 12, 2019 at 4:45 pm #

    How to construct parallel three lstms, and then add a DNN in series.

  99. Avatar
    Doron August 13, 2019 at 4:42 pm #

    Hi Jason,

    Thanks for this wonderful post. I have been trying to digest LSTM’s (metaphorically) and one particular aspect was not clear to me. I know the general structure of LSTM’s but I’m having hard time to understand:

    model.add(LSTM(50, activation=’relu’, input_shape=(n_steps, n_features)))

    When ReLU is set as an activation function, but not in the output layer, what exactly happens behind the scenes? To make myself clear, I am aware of the gates and their respective activation functions: sigmoid and tanh. But if we set ReLU like above, does that mean that each unit/LSTM cell outputs a hidden state –> pass it to a ReLu –> pass it to the next unit/LSTM cell?

    Thanks!

    • Avatar
      Jason Brownlee August 14, 2019 at 6:33 am #

      Yes, that is correct. It controls the output gate, not the internal gates which are governed by a sigmoid.

  100. Avatar
    Dawjidda August 17, 2019 at 1:46 am #

    hello Mr Jason Brownlee please my dataset is in matrics form, i want convert it to fit into GRU or LSTM sequential model,

  101. Avatar
    Tommy August 18, 2019 at 9:22 pm #

    Hi Jason,

    A problem is involved in my mind, If it is possible, I want to know your opinion.

    What will happen if we use both lstm and gru layers simultaneously in the model? Does this make sense?
    For example this architecture:

    model=Sequential()
    model.add(GRU(256 , input_shape = (x.shape[1], x.shape[2]) , return_sequences=True))
    model.add(LSTM(256))
    model.add(Dense(64))
    model.add(Dense(1))

    Because I used this model and I got good results compared to using each one separately.

  102. Avatar
    Ali Altin August 21, 2019 at 6:14 pm #

    Hello Jason and community,

    I have a question. My dataset has 27 features. 26 of them I want to use as input and the last one as output (this feature is also the last column in my dataset). I use the multiple input multi-step output code from above. After using the function “def split_sequences(sequences, n_steps_in, n_steps_out)”, I split the dataset into train and test sets and choose a number of time steps for n_steps_in and n_steps out. After transforming from 2D to 3D with “split_sequences(train, n_steps_in, n_steps_out)” I printed the shape of train_X, train_y, test_X and test_y. The results are:

    (14476887, 25, 26) (14476887, 20) (7130386, 25, 26) (7130386, 20)

    My three questions are:

    1.) Does python count from 0 upwards, so that 0 is my first feature or does it count from 1 upwards?

    2.) Does python work from left to right, so that the left feature in the csv file is my first feature and so on?

    3.) Is the shape above (7130386, 20) equal to (7130386, 20, 1) or why is it 2D?

    I hope that I could explain my problem and the questions good enough.

    Many thanks in advance.

    Ali

    • Avatar
      Jason Brownlee August 22, 2019 at 6:23 am #

      Yes, array indexes start at 0.

      Yes, arrays run from left to right.

      Yes, you can transform (7130386, 20) to (7130386, 20, 1) directly. They are the same thing.

      • Avatar
        Ali Altin August 22, 2019 at 5:59 pm #

        Hello Jason,

        thank you so much for the answer. I have to other questions:

        I take the ‘split a multivariate sequence into samples’ code from above:

        def split_sequences(sequences, n_steps_in, n_steps_out):
        X, y = list(), list()
        for i in range(len(sequences)):
        # find the end of this pattern
        end_ix = i + n_steps_in
        out_end_ix = end_ix + n_steps_out-1
        # check if we are beyond the dataset
        if out_end_ix > len(sequences):
        break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1:out_end_ix, -1]
        X.append(seq_x)
        y.append(seq_y)
        return array(X), array(y)

        After that I split the dataset into train and test sets:

        train_size = int(len(values) * 0.67)
        test_size = len(values) – train_size
        train, test = values[0:train_size,:], values[train_size:len(values),:]
        print(len(train), len(test))

        The result is:

        14476930 7130429

        The next step is to define the number of time steps:

        n_steps_in, n_steps_out = 25, 20

        train_X, train_y = split_sequences(train, n_steps_in, n_steps_out)
        test_X, test_y = split_sequences(test, n_steps_in, n_steps_out)
        print(train_X, train_y, test_X and test_y)

        The result is:

        (14476887, 25, 26) (14476887, 20) (7130386, 25, 26) (7130386, 20)

        The last point is to create and fit the LSTM network:

        n_features = 26

        model = Sequential()
        model.add(LSTM(50, input_shape=(n_steps_in, n_features)))

        A lot of code, sorry for that. Now the short questions:

        I want to predict the last column (column 27) in my csv-file. The first 26 are the input features (columns).

        1.) Where in the codes above do I explicitly define my input features and my output feature?

        2.) Do I have to explicitly use n_features in the code ‘model.add(LSTM(50, input_shape=(n_steps_in, n_features)))’. My aim is to train the model with the input features and the output feature and test it only with the test data without the output feature. The output feature shall be predicted.

        Is the code with n_features = 26 in my case wrong?

        Sorry that I bother you with this banal questions but I have not enough experience.

        Many thanks in advance.
        Ali

  103. Avatar
    Joe August 21, 2019 at 6:42 pm #

    Hi Jason,

    Thanks for your post.

    I would like to use a network architecture like:

    cnn = Sequential([
    Conv1D(filters=16, kernel_size=4, strides=2, activation=’relu’, input_shape=(n_steps, n_features)),
    BatchNormalization(),
    MaxPooling1D(pool_size=2)
    ])

    model = Sequential()
    model.add(cnn)
    model.add(LSTM(50, activation=’relu’))
    model.add(Dense(1))

    The reason is that when the true model is path dependency, longer look back period should be used, but it is not very efficient for LSTM dealing with large time step, so I use CNN to reduce the length of time step and encode some predictive information.

    Is this make sense to you?
    Do you think pre-train would make some contribution in stacked network structure?

    Joe

    • Avatar
      Jason Brownlee August 22, 2019 at 6:25 am #

      Don’t put stock into my speculations, perhaps try it and see?

  104. Avatar
    Ahmad August 23, 2019 at 7:28 pm #

    Dear Jason,

    Thank you for your great tutorial. I just have a question:

    As I understood from your explanations, for bidirectional neural networks we need both past and future input data to predict the current time step. So, in case of univariate LSTM, when we are going to predict the energy use of current time as example, we need to know the energy use of future? This is abit confusing to me. Would you please explain about it.

    Thank you

    • Avatar
      Jason Brownlee August 24, 2019 at 7:47 am #

      No, the future is predicted from the past.

      Or you can frame your prediction problem any way you wish.

      • Avatar
        Ahmad August 24, 2019 at 7:04 pm #

        Thank you for your answer. Can you explain a bit more to make it clear? Because as I just checked the mathematical formulation of Bidirectional RNNs, I see that there is a hidden state of the next time step as the input: ( x(t), h(t-1) and h(t+1) are used to calculate y(t) ).

        So, when there is a hidden state from the next time step as the input, how is t possible to just use the past data in univariate bidirectional RNN?

        Thank you in advance for your guidance

  105. Avatar
    Amirreza August 24, 2019 at 10:23 pm #

    Actually I applied the bidirectional layer but I got much higher error than typical LSTM network. Is it possible or I am doing wrong?

    When I write 50 neurons it means that each single layer of bidirectional has 50 neurons or it would be the summation of two layers?

    • Avatar
      Jason Brownlee August 25, 2019 at 6:37 am #

      Bidirectional may require more training.

      Each direction has 50.

  106. Avatar
    helloworld August 27, 2019 at 4:31 pm #

    Hi, I have question regarding data normalization (scaling values between specific number such as [0,1]). Should I perform it before making the dataset supervised form as in this example? Or after the supervised form?

    I noticed that if I do after, the columns looks little different from each other because the scaling are done via columns only. Here is example output if done after:

    t-1 t t+1
    -1.000000 -1.000000 -0.870529
    -1.000000 -0.869976 -0.895359
    -0.869976 -0.894799 -0.897133
    -0.894799 -0.896572 -0.901271

    Is this problematic to forecast via LSTM?

  107. Avatar
    Samit August 28, 2019 at 7:26 pm #

    Hi Jason.

    Great Tutorial. I have electronic health record data which has multivariate time series inputs. Is it better to use normal LSTM or bidirectional LSTM for prediction?

    Thanks

  108. Avatar
    Radhouane Baba August 28, 2019 at 8:20 pm #

    Hi Jason,

    i am trying to train my model to forecast a 144 data points (1 day) (10 minutes for each values (load forecast for a home)) based on 5 days (=144*5 values) (i have more data but till now i didnt find a good result so i training my model by less amount of data.. it takes so long)
    there a seasonality each day.. so i chose the n_input to be 144.
    i am varying the batch size from 1 to 6… and the epochs from 25 to 150,
    but my problem is: each time i get a result, i have one of these problems:
    1- values converge to a constant (i thought maybe it is underfitting)… so i try to reduce batch size and increase epochs
    2- when i do so.. i always get a loss value of n.a.n and then i get no predictions from the model….

    can you please recommend something?

    thank you so much!!!!
    i appreciate it!

  109. Avatar
    Radhouane Baba August 28, 2019 at 8:28 pm #

    Hello Jason,

    i still have another question:

    is it better to forecast 144 values through the dense(144) at once?
    or
    like what i am doing.. i am forecasting only 1 value and then append my history with it:

    history.append(yhat_sequence)

    add.dense(1)

    Thank you so Much!!!

    • Avatar
      Jason Brownlee August 29, 2019 at 6:05 am #

      Perhaps compare a few approaches for your dataset and discover what works best.

  110. Avatar
    Ken August 30, 2019 at 7:15 pm #

    Hello Jason,
    Thanks for this great tutorial and dive into LSTMs.
    For Multiple Parallel Input and Multi-Step Output you also mention that it is possible to use the vector version of LSTMs. I cannot get my head around it how that model should look like.

    model = Sequential()
    model.add(LSTM(100, activation=’relu’, return_sequences=True, input_shape=(n_steps_in, n_features)))
    model.add(LSTM(100, activation=’relu’, return_sequences=True))
    # What is needed here??? dim to n_steps_out
    model.add(TimeDistributed(Dense(n_features)))
    model.compile(optimizer=’adam’, loss=’mse’)

    The above architecture doesn’t what I intend. In the end there should be an output of dim (batch_size, n_steps_out, n_features) but what I achieve is (batch_size, 100, n_feautres) or an error. So how to make the above architecture work without the encoder-decoder version of your snippets?

    Thanks a lot for all of your hard work

    • Avatar
      Jason Brownlee August 31, 2019 at 6:03 am #

      Perhaps you can use the example in the post as a starting point?

  111. Avatar
    Vishal Saini August 30, 2019 at 8:37 pm #

    Hi Jason,

    Really interesting article!!
    Actually I have a doubt, I am currently trying to forecast sales of business based on the discounting burn. So, The future dependent variable values are usually fixed, is there any code which deals with such a problem.

    Thanks and Regards

  112. Avatar
    Sai Krishna Natesan September 2, 2019 at 6:23 am #

    Dear Jason,

    I have a question about Multivariate LSTM Models.

    In the Multiple Input Series, your input is

    80, 85
    90, 95
    100, 105

    And you’re trying to predict the output of 205.

    In the Multiple Parallel Series, your input is

    70, 75, 145
    80, 85, 165
    90, 95, 185

    And you are trying to predict the output of
    [100, 105, 205]

    My question is, in the first model, you know more information about the output in the past that you are not passing on to the model.

    So, the actual input should be
    80, 85, 165
    90, 95, 185
    100, 105, X
    Where we are trying to predict X

    Similarly, in the second model let us assume that you know the first two fields 100 and 105 and you only want to predict the 205.
    70, 75, 145
    80, 85, 165
    90, 95, 185
    100, 105, X
    Again we are unnecessarily trying to predict some known values.

    Is there a model where I can use all available information from the previous time series and try to predict X?

    I learnt a lot from this post and the above question is something I am trying to answer. Thanks a lot for sharing your knowledge. It is helping us a lot.

    • Avatar
      Jason Brownlee September 2, 2019 at 1:48 pm #

      Yes, you can frame the problem anyway you wish.

      In your proposed framing, you could use a new token to indicate missing and then use a Masking input layer.

      Or a multiple input model with a separate input for the dependent variables and the univariate series that your predicting.

      Perhaps experiment and see what model you prefer and what works best for your specific dataset.

  113. Avatar
    Jem September 3, 2019 at 2:13 am #

    Hi Jason,

    I was implementing the cross-validation method for the LSTM Encoder-Decoder model, I wanted to ask you if it is better that at each step I recreate the class or I can use the old one calling the fit method.

    Thanks and Regards

  114. Avatar
    sara j September 4, 2019 at 12:31 am #

    Hi Jason,

    If I have to train my model in such a manner that I have the data like :
    Input are two columns i.e temperature and pressure i.e. the first 25 perc data and output are also two column temperature and pressure i.e. the 75 perc data remaining one.
    My goal is to predict the temperature and pressure together by giving little input and receiving greater output to LSTM
    . If i train my model by giving input [x,y] can I predict [x,y] but I do not want to give time stamp. Which method should I follow?

    I have already made my data according to your blog and I am now confused hot to train the model without time steps

  115. Avatar
    rita September 10, 2019 at 12:20 am #

    Hi Jason,

    Why you have not used the minmax scaler over here while training the input sequence in the LSTM model?

    • Avatar
      Jason Brownlee September 10, 2019 at 5:50 am #

      Good question, I skipped scaling to keep the example simpler – e.g. for brevity.

  116. Avatar
    rita September 10, 2019 at 6:08 pm #

    Thank you very much and I have one more question if I have 200000 data points and I have to make time steps for them maybe dividing the data into 5 time steps and giving 40,000 points in each of the timestep for LSTM will it be a good training? or you can suggest something for this? So, that I can prepare the data properly.

    I have a multivariate data of 2 variables and want to predict both of them. So, basically 2 inputs and 2 outputs but do I have to make them supervised first as they are temperature and viscosity and they are dependent on each other with respect to time.

    So, should I supervise them first or I can directly use multivariate time series for the prediction by dividing the data into 5 time steps and predicting 2 outputs.

    Do you provide any consultations also?

  117. Avatar
    Himanshi September 13, 2019 at 5:28 pm #

    Hi,

    can you please tell me how to visualize the results. As, when I am reshaping the array it is not able to get reshaped into 2 dimension from 3D.

    Thank you and have a nice day!

  118. Avatar
    christina September 16, 2019 at 7:52 pm #

    I think I did not reframe my question properly. My question is for example: I trained my LSTM model with 300 n_step_in and 300 n_steps_out. Now, after the training, yhat has a shape (20000, 300,2) . So, when I am reshaping it to 2D so as to see the results it is giving me an error and is not able to reshape it back.

  119. Avatar
    sx September 17, 2019 at 3:54 pm #

    Hi can i add an extra layer under this one and if yes how should i do that?
    model.add(LSTM(200, activation=’relu’, input_shape=(n_timesteps, n_features)))

    Thanks in advance.

  120. Avatar
    bhavna September 17, 2019 at 6:21 pm #

    Hi, can you please tell me is this type of prediction only suitable for sequential data?

  121. Avatar
    suraj September 17, 2019 at 8:09 pm #

    Hi Jason,

    If I have unsupervised data and I make it supervised for the training in the LSTM model.
    My question is that when we make the data supervised and we give input data points and we predict the output data points, but the output is just the n+1 point of input and at last we are only predicting 1 point from the whole data. Basically we are giving the model all the points in the training only. What is the model actually doing?

    • Avatar
      Jason Brownlee September 18, 2019 at 6:05 am #

      The model learns a function that takes input points and predicts the next point.

      • Avatar
        Suraj September 18, 2019 at 4:57 pm #

        but what if I want the model to get not all data as input points and just few input points to predict the remaining data? then what strategy is used?

        • Avatar
          Jason Brownlee September 19, 2019 at 5:52 am #

          You control what data goes in and out of the model.

          Prepare the data you want to feed in and make a prediction.

          The examples above will provide a template you can use to start with and adapt for your problem.

  122. Avatar
    sx September 18, 2019 at 12:24 am #

    Yes i want to apply it at time series

  123. Avatar
    Peter September 26, 2019 at 5:52 pm #

    Sorry you are talking about time series, what if there is a date with time (I didn’t see the feature of date and time in your created data)

    • Avatar
      Jason Brownlee September 27, 2019 at 7:47 am #

      Date and time are removed from the dataset and the series of observations is worked with directly.

  124. Avatar
    Luis September 29, 2019 at 9:32 am #

    Hi Jason,

    I have really enjoyed many of your articles over the last half year. Question on your output vector model using stacked LSTM model. Under the hood, what type of architecture is being used here for 3 input time-steps and 2 output time-steps. I’m sure it is a many-to-many problem, but can you help me with the exact visual connection? Is the first output time-step laid out directly over the second time-step of the input series?

  125. Avatar
    Al October 4, 2019 at 8:57 pm #

    Hi Jason, thanks for your great posts and prompt replies. On a Multi-Step LSTM Models when I loaded my dataset I first noticed that the number of steps should be a number divisible by the length of the dataset (i.e. if my data is 1239 rows, a step in number of 59 is suitable since 1239/59 = 21). In fact trying with non-divisible numbers assigned to n_steps_in would result in nan loss values when fitting the model. I was indeed able to run all the way 50 epochs using 59 over 1239, however something I cannot explain happened: after re-running the code without making any changes, the loss on the various epochs (after setting the verbose to 1) jumped back to nan. Running it again it would start populating some values and along the way end up in nan.. It is very erratic and unpredictable and to end up all epochs looks like a lucky test, Could you help me to understand what is wrong? Thanks!

    • Avatar
      Jason Brownlee October 6, 2019 at 8:09 am #

      Yes, it might help to scale your data prior to modeling.

      • Avatar
        Al October 7, 2019 at 10:52 pm #

        Yes, you are correct, as always. Scaling not only did not return nan but also made each epoch faster to run. Thanks Jason!

  126. Avatar
    Addi October 7, 2019 at 1:32 am #

    Thanks Jason. I apologize if this was addressed somewhere in the list of comments but in the case of predicting a continuous variable, how would you compare the performance of LSTM vs. another algorithm such as Random Forest?

    Other than comparing the actual value vs. predicted value from both models, is there a separate way to assess accuracy of both models?

  127. Avatar
    ABDULKARIM GIZZINI October 8, 2019 at 7:43 pm #

    Thanks Jason,
    all your work is clear! thank you very much. I have some questions if you please. What are the differences between all LSTM models you applied above ? is there and performance trade-off between them? because you repeat the sentence that we can use any of them for time series forecasting.
    on the other hand, im working in the domain of wireless channel prediction. its a complex number problem. So can I split it into real and Imag parts and apply your LSTM models for each part separately and then concatenate the output results?

    • Avatar
      Jason Brownlee October 9, 2019 at 8:10 am #

      Good question.

      Not so much a performance trade-off as different framings of the problem, or different problem types.

      The goal was to show you how flexible the method is and that you should adapt it to your problem, not your problem to the method.

      Not sure about imaginary numbers in neural nets or Keras, sorry.

  128. Avatar
    Mayank Prakash October 9, 2019 at 10:46 pm #

    I wanted to know how to approach this problem. Let’s say we have a time series with 2 features, ranging from 0 to n as:
    [a0, b0], [a1, b1], [a2, b2] upto [an, bn]
    The output of the series would be,
    [a0 b0], [a1, b1], [a2, b2] -> [a3]

    The issue is [b3] also play an important role in determining [a3].

    My question is how do I incorporate this so that, I am able to use a0, a1, a2, b0, b1, b2, b3 to feed into the model and predict [a3].

  129. Avatar
    Yawar Abbas October 10, 2019 at 4:30 am #

    Great tutorial.
    I have a question related to lstm model for time series forecasting problem. I have dataset with four input features like 78, 153.23, 77.25, 4.33.
    The first input ordering difference is like 78,80,87,96….so on.
    The other inputs ordering is well like 77.25,77.35,77.40….
    I have used lstm model with one previous timestamp as input to predict the next timestamp which predict well on the last three input but poor for the first one.i.e.
    Actual: 78, 153.23, 77.25, 4.33
    Predicted: 82, 153.01, 77.02, 4.12
    How i tunned this model for good result of first input?

  130. Avatar
    ovi95 October 12, 2019 at 12:01 am #

    hi Jason,
    I want to make a model to predict the Inflow to a reservoir, with past rainfall data, temperature data, and also past inflow data.
    i want the model to be able to predict the inflow for a week ahead (7 timesteps) when given the past week’s, rainfall and temperature data.
    what model should i use for this?

  131. Avatar
    Ulia October 17, 2019 at 6:34 pm #

    Hi Jason,

    Can I put time in the X axis to predict wind speed on Y axis?

    Best Regards

  132. Avatar
    James A October 21, 2019 at 6:43 am #

    Hi Jason,

    In the “Multiple Parallel Input and Multi-Step Output” example, you stated that it could be done with the vector output method, or the encoder/decoder, and proceeded to demonstrate the encoder/decoder.

    I’ve been wondering how the example would look in vector output form. Would the target, y, for each sample need to be merged into a single 1D array, or vector?

    For example,
    If y for one sample looks like:
    [a1,b1,c1],
    [a2,b2,c2],

    [an,bn,cn]

    Would we reshape it into something that looks like this?
    [a1,b1,c1,a1,b2,c2,…,an,bn,cn]

    • Avatar
      Jason Brownlee October 21, 2019 at 1:38 pm #

      Probably one long 1d vector with all time steps that you can then choose to interpret anyway you wish (e.g. by the structure of the expected/target y).

  133. Avatar
    Julien Loutre October 21, 2019 at 12:27 pm #

    I’ve setup all the example in a Google Colab: https://colab.research.google.com/drive/16nsMXFDmzgdpsSY_p1ZljN5ZDzq9u6jY#scrollTo=xgSwSfpE3-GO&forceEdit=true&sandboxMode=true

  134. Avatar
    Kannu October 22, 2019 at 12:55 am #

    hey,

    How can we se the root mean square error in the training of the model here

    Best Regards,
    Kannu

  135. Avatar
    Vishnu Suresh October 24, 2019 at 3:55 am #

    Hello Jason,

    I am trying to use the CNN-LSTM for forecasting

    The split sequences gives an output of

    (175196, 4, 4) (175196, 1)

    Where 175196 is the samples, 4 is number of steps and 4 is the features ( variables)

    Then i reshape the input vector as directed in the tutorial, but when i run the model

    I get this error:

    at: TypeError Traceback (most recent call last)
    in ()
    22 model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    23 model.add(TimeDistributed(Flatten()))

    —> 24 model.add(LSTM(50, activation=’relu’))

    25 model.add(Dense(1))
    26 model.compile(optimizer=’adam’, loss=’mse’)

    TypeError: while_loop() got an unexpected keyword argument ‘maximum_iterations’

    I know it is hard to debug in this manner! but any idea what could be wrong here ?

    • Avatar
      Jason Brownlee October 24, 2019 at 5:46 am #

      Do other Keras examples work for you?

      It is possible that there is a fault with your Keras/TF installation?

      • Avatar
        Vishnu Suresh October 24, 2019 at 6:04 am #

        Yes Other keras examples work, CNN, Multi-Headed CNN etc.

        • Avatar
          Vishnu Suresh October 24, 2019 at 8:54 am #

          You were right 🙂 I updated to higher version of tensorflow and keras and it worked! thanks!

        • Avatar
          Jason Brownlee October 24, 2019 at 1:59 pm #

          That is surprising, not sure I have good advice sorry.

          Perhaps try simplifying the example and see what the case of the fault could be on your workstation?

  136. Avatar
    FRI October 24, 2019 at 9:31 pm #

    Hi Jason,
    Thank you for this interesting article. Can I create one model for all sites with LSTM ? That means if we have for example a group of persons and every person has its time series data with different features, LSTM model can learn from all these time series for once?
    Best regards

  137. Avatar
    Sarveswara Rao October 30, 2019 at 10:22 pm #

    Hi Jason, how do we chose n_steps in the split_sequence() ? or we should consider n_steps as an hyper parameters or it can be set by an statistical test? Thank you for work jason. i am following ur site from past 2 yrs. ur content is best in the ml community.

    • Avatar
      Jason Brownlee October 31, 2019 at 5:29 am #

      A hyperparameter.

      Thanks, I deeply appreciate your support!

  138. Avatar
    jasper November 9, 2019 at 3:32 pm #

    Hi Jason,
    i have some questions in LSTM model.
    First, it is the LSTM input x definition. In the time series forecast case, we divide input data into some portion by batchsize parameters. Later, these 2D portion data were transformed into 3D tensor data and feed to model for training. After all portions feed to the model and complete forward/backward propagation, the 1 epoch routine is completed. My question is : in the x[t] input time, the LSTM model input x refers to only first portion of x data or the all portions data ?
    Second, what is the LSTM_unit parameter definition ? My understanding is the number of the LSTM input x vector’s element. For example, if have 10 input, the LSTM_unit should be 10 to capture all the input vector. But, it is not always requiring the higher numbers such as 20, so on.
    Third, is there any “feature importance” example in the LSTM now? I am looking forward and quite frustration this moment. Could LSTM and XGBoost have sample feature importance result ?

    many thank

  139. Avatar
    mingkai November 11, 2019 at 7:31 pm #

    Hi Jason, I run the first example, but it was failed. It shows: TypeError: Input ‘b’ of ‘MatMul’ Op has type float32 that does not match type int32 of argument ‘a’. Do you know what the problem is?

    • Avatar
      Jason Brownlee November 12, 2019 at 6:36 am #

      Sorry to hear that, I have some suggestions here:
      https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me

      • Avatar
        mingkai November 14, 2019 at 2:03 pm #

        Thank you, Jason. I have solved the problem. The reason is, I installed the tensorflow 2.0 + keras 2.2.4, but these two are not matched, so I use tensorflow.keras instead of keras. I added a command “x_input = x_input.astype(‘float32’)” in the code, and it run swiftly. Another way is to install the tensorflow version 1.15.0, and no problem occurs.

        • Avatar
          Jason Brownlee November 15, 2019 at 7:41 am #

          Happy to hear that you solved the problem.

          You can use Keras 2.3. with TensorFlow 2.0, or Keras 2.2 with TensorFlow 1.15.

  140. Avatar
    Bonnard November 12, 2019 at 7:43 pm #

    Hi Jason,

    I have a problem to modelize and i think lstm network are the most adapted models to do it.

    I want to predict the true trajectory of an airplane before it takes off. I have to set of data, the first is the trajectories announced before departure (the fake), and the second is the trajectory announced after landing (the true).
    I want to predict the true, giving the fake.

    I have a list of array, each array represent a flight made by a plane. Each flight is represented by different variable and after an interpolation i have 50 observations points by flight.
    At each point we can observe a vector of our variables like latitude, longitude ect ..
    Let assume i have N variables like that.
    I have 2200 flights, so my input data is an array with (2200,50,N) shape.

    I already tried a little model but oddly the model seems to follow the fake trajectory and not the true.
    Do you have an idea of what architecture i can use ?

    Thank you a lot

    • Avatar
      Jason Brownlee November 13, 2019 at 5:40 am #

      Perhaps test a suite of different approaches and discover what works best for your specific dataset?

      • Avatar
        Bonnard November 13, 2019 at 7:45 pm #

        Yeah this what i am doing, but maybe you can help with the last layer, i think the error comes from there.
        As i said I have a vector (50,N) shape wich represent a flight with 50 points and N features, and i want to predict a (50,2) vector wich is 50 points with (latitude longitude).

        I cannot use dense layer at the end of the model because it does not return the right shape.

        • Avatar
          Jason Brownlee November 14, 2019 at 8:01 am #

          Encoder-decoder with 2 nodes in the output layer and 50 in the repeat vector layer – this would achieve the desired output.

  141. Avatar
    Marlon November 13, 2019 at 12:51 am #

    Hello,

    Thanks for your tutorials; they are amazing! I’m having the following pitfall by implementing your ideas: I use your “split_sequences” in order to prepare the network input and, accordingly, I train my network and save the model. When I use the same input in the trained model and plot it, I get a very weirdo plot, like the many times over ploted lines. Do you mind what is my problem?

  142. Avatar
    Michaela November 13, 2019 at 6:27 am #

    Hi Jason,

    I’m building a Multiple Parallel Input and Multi-Step Output model, and I’m curious why you repeat the same LSTM output in model.add(RepeatVector(n_steps_out))? The alternative that I was thinking is using the keras functional API, training n_steps_out LSTMs from the input, concatenating the output of these LSTMs, and feeding it into the next LSTM. so it would look something like this

    input = Input(shape=(n_steps_in,n_features))
    concat_layers = []
    for i in range n_steps_out:
    concat_layers.concat(LSTM(200,activation=’relu’))(input)
    x = tf.keras.layer.Concatenate(concat_layers)
    x = LSTM(200,activation=’relu’,return_sequences=True)(x)
    x = TimeDistributed(Dense(n_features)))(x)
    model=Model(input,x)
    model.compile(optimizer=’adam’, loss=’mse’)

    The biggest drawback that I can see is there will be a ton more parameters, but are there other issues that I’m missing? For instance, does this get rid of some relationship between the different timesteps that the previous model maintains better?

    Thanks!

    • Avatar
      Jason Brownlee November 13, 2019 at 1:41 pm #

      The reason is because it is an encoder-decoder model where the same encoding of the input is used in the generation of each output time step.

      Perhaps try it and see? It’s could to test a suite of different models in order to discover what works best for your specific dataset.

  143. Avatar
    fan November 19, 2019 at 9:01 pm #

    Dear Jason,

    thanks for the tutorial, that is very helpful! However, i am having a hard time to understand the input shape given in the CNN LSTM example below:

    X, y = split_sequence(raw_seq, n_steps)
    # reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
    n_features = 1
    n_seq = 2
    n_steps = 2
    X = X.reshape((X.shape[0], n_seq, n_steps, n_features))
    # define model
    model = Sequential()
    model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation=’relu’), input_shape=(None, n_steps, n_features)))

    Here, X is first reshaped into 4 dimensions, however, the input_shape defined and used in the model Conv1D layer is 3 dimensions. Is the None used in “input_shape=(None, n_steps, n_features)” referring to the “n_seq” dimension of X or the number of samples of X…?
    And then, the data used to fit and predict are again 4 dimensions…
    could you please kindly explain a bit? I am really confused …

    thanks a lot!

    • Avatar
      Jason Brownlee November 20, 2019 at 6:13 am #

      Yes, the CNN must process sub sequences and then groups of processed subsequences are passed to the LSTM.

    • Avatar
      Fang He December 11, 2023 at 8:51 pm #

      Accturally, each piece of X is 3 dimentions(n_seq, n_steps, n_features) and every time the model accepts one piece of X in this CNN-LSTM case.

      I think the None refers to the n_seq but the n_seq is expressed through using TimeDistributed(), so there is a None to stand the place of the first dimentions.

  144. Avatar
    Arjun November 21, 2019 at 8:34 pm #

    Hi jason,
    What if we had a dataset of every day of a years sales data and we wanted to predict say for example 10 days sales based on the sales data of previous 30 days? What should be the form of output that we get? and also the code for getting the predicted value? Is it model.predict(X_test)?

  145. Avatar
    Arne November 22, 2019 at 12:58 am #

    Hey Jason, I am halfway through and reading this stuff is pure joy! Thank you for your tremendous efforts and making this available! I’ve become an instand fan of your site.

  146. Avatar
    jasper November 25, 2019 at 1:15 am #

    Hi Jason,

    one practical question in LSTM. If the input data sets have the various range, how to deal with the LSTM forecast model ? For example, if input vector one spans 0~100, vector two spans 0~0.5, could we still put these two input vectors together to compile the model? I use SHAP package to analyze the weight. In this case, vector one is always very strong rather than vector two. In mathematical view, this result is correct. how do you think in this case?

    jasper

  147. Avatar
    Saeed November 27, 2019 at 1:44 pm #

    Hi Jason,

    Thank you for such a detailed explanation. I am having an issue with scaling data for a multistep multivariate lstm problem. I am taking data of last 14/21 days to predict for the next 7 days. Can you please give any idea what is the proper way of scaling data using MinMax for these type of problems, as I am lost in the shapes of matrices.

    • Avatar
      Jason Brownlee November 27, 2019 at 1:51 pm #

      Thanks, I’m happy it helped!

      Yes, see this:
      https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/

      • Avatar
        Saeed November 27, 2019 at 3:38 pm #

        Thank you. I know how scaling works and I have implemented it in single step forecasting. However, when it comes to multistep, we actually split the data and it becomes 3 dim after using the split_sequency function. Which means we have 3 dim matrices for X and Y.
        Scaler doesn’t work on 3 dim matrices.

        If I do scaling before splitting, I will end up with a matrix dimension that I can’t retrieve after prediction and thus will be stuck without doing the inverse_transform for scaling. I will appreciate your help in this matter

        • Avatar
          Jason Brownlee November 28, 2019 at 6:31 am #

          Yes, it is sticky. You may have to write some custom code as the libraries don’t accomodate it.

          Perhaps try using relu and no scaling, at least as a starting point.

  148. Avatar
    juntao December 19, 2019 at 2:25 pm #

    Hi Jason,
    I want to introduce the attention mechanism to the Encoder-Decoder model
    for regression problem (with Multiple Input). Is there any other article that can help me solve this problem?

  149. Avatar
    husfe December 25, 2019 at 8:27 pm #

    Hi Jason.
    Is there some simple method to add attention to the Encoder_Decoder Model in this article?
    I’ve trid to use AttentionWrapper class to achieve it, but I’m failed, because It’s hard for me to do it during a short time. So can you give me some guide?
    Thank you!

    • Avatar
      Jason Brownlee December 26, 2019 at 7:38 am #

      The TensorFlow 2 API provides attention layers.

  150. Avatar
    San December 26, 2019 at 10:33 pm #

    Hi Jason,

    Thank you so much for this valuable tutorial. Really appreciate it.

    Jason, I’m bit new to DL with RNN. I have two small doubts to get cleared. In my question I want to predict how many steps (i.e:- step counts) a participant walk tomorrow depending on the previous step counts. For this we have collected step counts of large number of participants for n number of days.

    Is this a univariate problem where each participant step count is taken as a univariate sequence and train the model? AND do you think RNN is a good move to this problem?

    Do I have to scale the sequences of each and everyone’s step counts (by taking the each participants current mean and sd) Or can’t I use the raw count?

    Thank you so much in advanced again. All the best for your future work too!!!!
    San

  151. Avatar
    Abhishek Singhal January 5, 2020 at 4:06 am #

    Thanks a lot Jason for sharing such a knowledgeable article,

    I have a doubt in my case,
    for the last- Multiple Parallel Input and Multi-Step Output

    I am trying to predict next 6 or 12 hours data, as of now trying to predict next 6 hours data training with n_steps_in- 72 and expecting n_steps_out- 6 with 6 features
    but I am getting output as nan

    Please see if I am doing something wrong..

    def split_sequences(sequences, n_steps_in, n_steps_out):
    X,y = list(), list()
    pt = progress_timer(description= ‘Split Sequences’, n_iter=len(sequences))
    for i in range(len(sequences)):
    # find the end of this pattern
    end_ix = i + n_steps_in
    out_end_ix = end_ix + n_steps_out
    # check if we are beyond the dataset
    if out_end_ix > len(sequences):
    break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix:out_end_ix, :]
    X.append(seq_x)
    y.append(seq_y)
    pt.update()
    pt.finish()
    return array(X), array(y)

    dataset = df_104902.values
    # choose a number of time steps
    n_steps_in, n_steps_out = 72, 6
    # covert into input/output
    X, y = split_sequences(dataset, n_steps_in, n_steps_out)
    # the dataset knows the number of features, e.g. 2
    n_features = X.shape[2]
    # define model
    model = Sequential()
    model.add(LSTM(200, activation=’relu’, input_shape=(n_steps_in, n_features)))
    model.add(RepeatVector(n_steps_out))
    model.add(LSTM(200, activation=’relu’, return_sequences=True))
    model.add(TimeDistributed(Dense(n_features)))
    model.compile(optimizer=’adam’, loss=’mse’)
    # fit model
    model.fit(X, y, epochs=30, verbose=0)
    # demonstrate prediction
    x_input = array(df_104902[-72:])
    x_input = x_input.reshape((1, n_steps_in, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(yhat)

    Output is coming-

    [[[nan nan nan nan nan nan]
    [nan nan nan nan nan nan]
    [nan nan nan nan nan nan]
    [nan nan nan nan nan nan]
    [nan nan nan nan nan nan]
    [nan nan nan nan nan nan]]]

    • Avatar
      Jason Brownlee January 5, 2020 at 7:08 am #

      nan output is not good.

      Perhaps check the scale of your input data and normalize or standardize prior to fitting the model?

  152. Avatar
    Abhishek Singhal January 5, 2020 at 4:14 am #

    also my X and y shape is – (20875, 72, 6) and (20875, 6, 6) respectively.

    and x_input is (1, 72, 6)

  153. Avatar
    Willian Alcoser January 11, 2020 at 1:57 am #

    Saludos Jason, una consulta, como puedo validar el método de predicción. He visto en otros ejemplos que la serie lo dividen en dos partes: en entrenamiento y prueba, y en este caso no lo hace, a que se debe eso ?

  154. Avatar
    Mehdi January 17, 2020 at 12:06 pm #

    Dear Jason,
    First of all, thank you so much for your time and great contents.
    Second, I studied your website for long time. I have a question: I have developed a model which predict the price of shares, my model can predict X_test data as well, now how can I forecast sequences(future times) does not happened?

    • Avatar
      Jason Brownlee January 17, 2020 at 1:50 pm #

      You’re welcome.

      Call model.predict(newData) to make predictions on new data.

      • Avatar
        Mehdi January 19, 2020 at 4:18 am #

        newData are not available, i.e. the future days does not happened and not available, how do I prepare them for the model?

        • Avatar
          Jason Brownlee January 19, 2020 at 7:20 am #

          You must design and train your model based on the data you will have available at the time a prediction is required.

          For example, if you have 7 days prior data at the time of prediction when predicting the next week, then design your model around that and train it on that type of data.

          Then when you start using your model on new data, you will have the data available.

  155. Avatar
    Mehdi January 19, 2020 at 5:28 pm #

    Dear Jason,
    Thank you so much for your time and attention. I will try your approach.

  156. Avatar
    Pietro FUSCO January 21, 2020 at 3:07 am #

    Dear Jason,
    Thank you so much for your time and attention
    I was wondering if I can use time as univariate sequence.

    Regards

  157. Avatar
    Adonis El Hajj January 23, 2020 at 1:49 am #

    Hello Jason,

    my model will learn from the past Forcast data and past actual AC Power data.
    my Input is the future 7 days Forecast as csv file.
    my goal is to predict the AC Power data based on the input.
    I dont know how to apply what I want to you model here.
    can you please help me?

  158. Avatar
    Anshu Shah January 25, 2020 at 6:30 pm #

    Thank you so much. I was struggling to understand LSTM.
    Your work helped me a lot.

    • Avatar
      Jason Brownlee January 26, 2020 at 5:15 am #

      You’re welcome, I’m happy to hear that.

      • Avatar
        Patrick February 7, 2020 at 11:13 pm #

        Dear Jason,

        Thank you for your contributions. You have helped me a lot in the start of deep learning.
        I have a question. I am working on a model and surprisingly the predicted output shape is different from the target shape of training data
        Traning: X (12000, 12, 8), Y (12000,)
        Test: X (3000, 12, 8); Y (3000,)
        pred = model.predict (X (3000, 12, 8))
        and pred shape is (3000, 12, 1) but I was expecting (3000,)
        what am i doing wrong?
        Please help me

        • Avatar
          Jason Brownlee February 8, 2020 at 7:13 am #

          Perhaps double check the structure of your model, e.g. the output layer/model.

  159. Avatar
    wang February 3, 2020 at 5:49 pm #

    Dear Jason,

    thanks for the tutorial, that is very helpful! However, I use data normalization method for input data(10,20,30…) carry out your Multi-Step LSTM Models, it happens error. I dont konw how to resolve it. Pls see the belowing program. Thanks!

    from numpy import array
    from keras.models import Sequential
    from keras.layers import LSTM
    from keras.layers import Dense
    import numpy as np
    import matplotlib.pyplot as plt
    import pandas as pd
    import collections

    # split a univariate sequence into samples
    def split_sequence(sequence, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequence)):
    # find the end of this pattern
    end_ix = i + n_steps_in
    out_end_ix = end_ix + n_steps_out
    # check if we are beyond the sequence
    if out_end_ix > len(sequence):
    break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix]
    X.append(seq_x)
    y.append(seq_y)
    return array(X), array(y)

    # define input sequence

    training_set = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90])

    training_set = training_set.reshape(-1,1)
    from sklearn.preprocessing import MinMaxScaler
    sc = MinMaxScaler(feature_range = (0, 1))
    raw_seq = sc.fit_transform(training_set)

    print(raw_seq)

    # choose a number of time steps
    n_steps_in, n_steps_out = 3, 2
    # split into samples
    X, y = split_sequence(raw_seq, n_steps_in, n_steps_out)
    # reshape from [samples, timesteps] into [samples, timesteps, features]
    n_features = 1
    X = X.reshape((X.shape[0], X.shape[1], n_features))

    # define model
    model = Sequential()
    model.add(LSTM(40, activation=’relu’, return_sequences=True, input_shape=(n_steps_in, n_features)))
    model.add(LSTM(40, activation=’relu’))
    model.add(Dense(n_steps_out))
    model.compile(optimizer=’adam’, loss=’mse’)
    # fit model
    print(‘X: \n’,X)
    print(‘y: \n’,y)
    model.fit(X, y, epochs=60, verbose=0)

    # demonstrate prediction
    #x_input = array([70, 80, 90])
    x_input = np.array([70, 80, 90])
    x_input= x_input.reshape(-1,1)
    x_input = sc.transform(x_input)
    x_input = x_input.reshape((1, n_steps_in, n_features))
    yhat = model.predict(x_input, verbose=0)
    yhat = sc.inverse_transform(yhat)
    print(100,110)
    print(yhat)

  160. Avatar
    wang February 4, 2020 at 3:09 pm #

    Thank you so much.
    I have resolved the problem.
    Thank for your tutorial.

  161. Avatar
    Wandy February 10, 2020 at 7:36 pm #

    X, y = split_sequence(raw_seq, n_steps)
    # reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
    n_features = 1
    n_seq = 2
    n_steps = 2
    X = X.reshape((X.shape[0], n_seq, n_steps, n_features))
    # define model
    model = Sequential()
    model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation=’relu’), input_shape=(None, n_steps, n_features)))
    model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))
    model.add(LSTM(50, activation=’relu’))

    The code in your post, use CNN+LSTM for Univariate Models above.
    I am confused in the numbers of n_seq, why is 2. And Can I consider the n_seq as the times_step of LSTM?

  162. Avatar
    Francis February 12, 2020 at 7:25 pm #

    Thank you for your great tutorial!
    BTW I found a more pythonic way to write the split_sequence() function.
    Regards,

  163. Avatar
    Amelie February 12, 2020 at 11:58 pm #

    Hello Mr. Jason,

    Please, I have a technical question about the LSTM model.
    The LSTM is defined with default activation functions such as:
    3 sigmoid for the input gate, the foget gate and the output gate.
    and 2 tanh for updating the internal states of the recurrent layer.

    In your code:
    In your code:
    #########################
    # define model
    model = Sequential()
    model.add(LSTM(50, activation=’relu’, return_sequences=True, input_shape=(n_steps, n_features)))
    model.add(LSTM(50, activation=’relu’))
    model.add(Dense(1))
    #########################

    Have you changed the sigmoid with a relu or a tanh?

  164. Avatar
    bunty sahoo February 13, 2020 at 4:28 pm #

    Thanks for the wonderful explanation. I have query regarding which category my dataset and requirement falls into.

    i want to forecast number of defects for each of the 3 parts.
    i have dataset like : Part (a,b,c are components of that Tool)

    date Part Tools shipped num of defects(of parts)
    2019-01-01 part a 2 0
    2019-01-01 part b 1 2
    2019-01-01 part c 2 2
    2019-01-08 part a 2 0
    2019-01-08 part b 1 1
    2019-01-08 part c 2 1
    2019-01-15 part a 2 0
    2019-01-15 part b 1 1
    2019-01-15 part c 2 3

    i want to forecast what will be the number of defects of all parts in next 2 weeks for example.
    Tools shipped column has relationship with number of defects.I have future data for Tools shipped too. so output desired :

    2019-01-22 part a 2 ??
    2019-01-22 part b 2 ??
    2019-01-22 part c 2 ??

    tools shipped for a particular week is constant

  165. Avatar
    Ayush February 13, 2020 at 8:04 pm #

    Hi Jason,
    Thank you for such an informative tutorial. I am planning on implementing LSTM for a multivariate time series data. The input dimension is (1000*7*24) and the output is (1000*30). I wanted to understand how can I decide how many layers and units to use. Similarly the batch size which would be appropriate in this case. It would be great you could comment on some of standard heuristics or point to some reliable resource for the same.

  166. Avatar
    Andrei February 19, 2020 at 9:07 pm #

    Hi Jason,
    Great tutorial, it really helped my get on my feet and started.

    I have a question on Multiple Parallel series. Does parallel mean the input features and the output are treated as independent across columns?

    To be more specific, using a 3 feature vector and 4 steps as input:
    [ [F1_t1, F2_t1, F3_t1],
    [F1_t2, F2_t2, F3_t2],
    [F1_t3, F2_t3, F3_t3],
    [F1_t4, F2_t4, F3_t4] ]

    to predict:
    [F1_t5, F2_t5, F3_t5]

    does F1(t1 tot t4) have no effect on prediction F2_t5 or F3_t5 ?

    Also, how would you go about combining Multiple input and Multiple parallel series in a case where the the input is a N-feature vector and using 3 timesteps, predict M-features (many-to-one), where M < N (and M-features included in the N-features) .

    And on a separate note, any literature suggestions for using this with categorical data? I tried encoding to numerical, but they are not treated as categories

  167. Avatar
    zhouhua February 20, 2020 at 3:14 pm #

    Hi Jason,

    I am new for the LSTM, can you put a related picture of topology for each type’s visualization?

  168. Avatar
    Roberto February 22, 2020 at 5:07 am #

    Hi Jason,

    Thank you very much your effort and for offering us your great tutorials. I enjoy a lot!
    I do not have much experience with LSTM so I get already problems with definitions which are problably clear for most of the readers. For Vanilla LSTM you say you use 50 LSTM units. Does it mean you have 1 LSTM whose Input is 3 dimensional and the output 50 dimensional or you actually have 50 LSTM accepting 3 dimensional vectors and 1 dimensional outputs?

    • Avatar
      Jason Brownlee February 22, 2020 at 6:34 am #

      Yes, 50 units, each of which takes the full input and produce an output.

  169. Avatar
    Ben February 27, 2020 at 3:15 am #

    Hi Jason, where you state Vanilla LSTM for univariate time series forecasting and make a single prediction. is it possible to predict more than a single variable? How would I modify to make 5 value predictions?

  170. Avatar
    Alvaro Fierro Clavero March 4, 2020 at 12:59 am #

    Brilliant post. Very enlightening.

  171. Avatar
    manjeet kumar yadav March 6, 2020 at 6:18 pm #

    Hi Jason
    I need help i am working on project for HAR for video dataset, could you help me making model
    which use cnn-lstm .

  172. Avatar
    David March 7, 2020 at 6:07 am #

    Hi, Jason.
    Great job, I have build a model, that performed well, but when I close the program, open and run again doesn’t perform equal, but when I restart the PC does work properly, I am running in CPU, what could be causing this problem? , how do avoid this from happening?, your answer will be most appreciated

    • Avatar
      Jason Brownlee March 7, 2020 at 7:21 am #

      I have not heard of this kind of problem before, sorry.

      Perhaps try posting your experience on stackoverflow?

  173. Avatar
    Peter Yocote March 19, 2020 at 7:59 am #

    Hello Jason

    I was wondering how could we know the accuracy and have some sort of validation_data (the parameter used in model.fit).

    This to obtain the loss and accuracy curves for training and validation

    Could you please give me some guide on this
    Thanks a lot

  174. Avatar
    Mike March 31, 2020 at 2:22 am #

    Hello Jason,

    Thanks for the valuable efforts

    Do you think that TS Deep Learning has proved itself successful when applied to stock market forecasting?

  175. Avatar
    Marlon April 1, 2020 at 7:41 am #

    When I train my model it has a two-dimension output – it is (none, 1) – corresponding to the time series I’m trying to predict. But whenever I load the saved model in order to make predictions, it has a three-dimensional output – (none, 40, 1) – corresponding to the reshaping of the network input training dataset. What is wrong?

    Here is the code:

    df = np.load(‘Principal.npy’)

    # Conv1D
    #model = load_model(‘ModeloConv1D.h5’)
    model = autoencoder_conv1D((2, 20, 17), n_passos=40)

    model.load_weights(‘weights_35067.hdf5’)

    # summarize model.
    model.summary()

    # load dataset
    df = df

    # split into input (X) and output (Y) variables
    X = f.separar_interface(df, n_steps=40)
    # THE X INPUT SHAPE (59891, 17) length and attributes, respectively ##

    # conv1D input format
    X = X.reshape(X.shape[0], 2, 20, X.shape[2])

    # Make predictions

    test_predictions = model.predict(X)
    ## test_predictions.shape = (59891, 40, 1)

    test_predictions = model.predict(X).flatten()
    ##test_predictions.shape = (2395640, 1)

    plt.figure(3)
    plt.plot(test_predictions)
    plt.legend(‘Prediction’)
    plt.show()

      • Avatar
        Marlon April 1, 2020 at 10:34 pm #

        Hello,

        Thank you very much for your reply. Anyway, it didn’t help. I’ve changed the input size of my Conv1D from (2, 20, 17) to (40, 1, 17), but it didn’t accept – it tells me that it has negative dimension. I don’t understand why it doesn’t happen when training the network but does when I use the saved model to predict.

        • Avatar
          Marlon April 1, 2020 at 10:36 pm #

          Layer (type) Output Shape Param #
          =================================================================
          time_distributed_14 (TimeDis (None, 4, 1, 24) 4104
          _________________________________________________________________
          time_distributed_15 (TimeDis (None, 4, 1, 24) 0
          _________________________________________________________________
          time_distributed_16 (TimeDis (None, 4, 1, 48) 9264
          _________________________________________________________________
          time_distributed_17 (TimeDis (None, 4, 1, 48) 0
          _________________________________________________________________
          time_distributed_18 (TimeDis (None, 4, 1, 64) 12352
          _________________________________________________________________
          time_distributed_19 (TimeDis (None, 4, 1, 64) 0
          _________________________________________________________________
          time_distributed_20 (TimeDis (None, 4, 64) 0
          _________________________________________________________________
          lstm_3 (LSTM) (None, 100) 66000
          _________________________________________________________________
          repeat_vector_2 (RepeatVecto (None, 40, 100) 0
          _________________________________________________________________
          lstm_4 (LSTM) (None, 40, 100) 80400
          _________________________________________________________________
          time_distributed_21 (TimeDis (None, 40, 1024) 103424
          _________________________________________________________________
          dropout_2 (Dropout) (None, 40, 1024) 0
          _________________________________________________________________
          dense_4 (Dense) (None, 40, 1) 1025
          =================================================================

        • Avatar
          Jason Brownlee April 2, 2020 at 5:54 am #

          Perhaps there is a bug in your code.

          I am happy to make some suggestions:

          – Consider aggressively cutting the code back to the minimum required. This will help you isolate the problem and focus on it.
          – Consider cutting the problem back to just one or a few simple examples.
          – Consider finding other similar code examples that do work and slowly modify them to meet your needs. This might expose your misstep.
          – Consider posting your question and code to StackOverflow.

  176. Avatar
    Marlon April 2, 2020 at 2:33 am #

    Let me tell how I’ve solved, provisorily, the problem:

    I’ve used your split_sequences() for multivariate and 40 steps. Therefore, for dataset was taking the ith+40 steps and later ith+1+40 steps and so on. It always has the last item of each subsequence as a new one, all the rest equals the past subsequence.

    The output layer, for some reason that I still couldn’t figure out, is making a prediction of every subsequence. Then I design a function that takes the first item of each subsequence.

    def separador_output(sequence):
    X = list()
    for i in range(len(sequence)):
    x = sequence[i][30]
    X.append(x)
    return np.array(X)

    As a result, I’ve got the 1-Dimension time-series I was trying to reproduce.

    I sharing that because I still believe that there should be a manner of doing this without introduce such function as above.

    Best regards!

  177. Avatar
    Jim April 4, 2020 at 11:26 am #

    Thank you for the excellent article!

    I am trying to perform an LSTM model of time series data following the strategy you outline in tis article.

    I have one input (feature) at multiple timepoints in the past, and I use your code “split_sequence()” to split the univariate sequence into multiple samples, each with a specified number of time steps and a single output.

    I have to standardize my “train” dataset for which I had planned on using StandardScaler (per your other excellent articles including: https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/). I am performing the standardization prior to performing the SPLIT into multiple samples for the LSTM. This seems straightforward (although please comment if you think this plan is inappropriate.)

    The complication is that at any given timepoint, my single input feature actually has multiple values, each derived from any one of many “related but independent” sources. While I can perform the LSTM on each source separately, I would like to try maximizing my sample size by performing the LSTM on the aggregate of all of the sources (since the sources seem to follow similar behavior to each other, but not necessarily within the same time window). Or at least I would like to see what the results of that aggregated model looks like. My only question is: does it make more sense to perform the data input standardization separately for each source (so each source is standardized to mean of zero and SD of 1, and has equal weighting in the model), versus standardizing once across all sources in the aggregated data.

    (I am relatively new to machine learning, so I apologize if my question is a bit naive.)
    Thank you for your thoughts.

    Jim

  178. Avatar
    Jam April 5, 2020 at 7:58 pm #

    Hey, Jason Thanks for your helpful blog. could you please help me on a case?
    my data includes a fixed size of input as (1, 16, 2) . but output is different in number of timesteps. i mean that one may be like (1,2,2) or other may be (1, 20, 2). i thought to use Encoder-Decoder format. but the problem is determining dimension of “repeatVector()”. how should i do that?
    is it possible to adjust its size for each input?

    • Avatar
      Jason Brownlee April 6, 2020 at 6:04 am #

      Perhaps try padding all output sequences to the same length and use an encoder-decoder model to that length.

  179. Avatar
    Abhishek Neema April 7, 2020 at 7:41 am #

    Sir please can you explain
    Why in multiple input series the input shape is (3,2) while in multiple parallel series it is (3,3)?

  180. Avatar
    Nuwan Madhusanka April 7, 2020 at 12:56 pm #

    what about EarlyStopping, ModelCheckpoint, and ReduceLROnPlateau functions with lstm. And also i want to update my model with receiving data. i mean i want to train my model after every new data. how can i do it.

  181. Avatar
    郑锋淇 April 8, 2020 at 5:08 pm #

    Don’t you need to test whether the data fits?

  182. Avatar
    Dan April 9, 2020 at 12:35 am #

    Hey Jason, very well written!

    I have a question on your 1DConv LSTM network below:

    model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation=’relu’), input_shape=(None, n_steps, n_features)))
    model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))
    model.add(LSTM(50, activation=’relu’))
    model.add(Dense(1))

    I’m wondering what the intuition behind applying a convolution with a 1D kernel on a sequence of data is? What does this involve – is this equivalent to taking a single value as a feature, to represent the input sequence?

    Thanks for this resource!

    • Avatar
      Jason Brownlee April 9, 2020 at 8:05 am #

      Thanks.

      It attempts to extract patterns from the sequence.

      • Avatar
        daniel April 9, 2020 at 6:57 pm #

        Will it not just be applying the same constant filter across the whole sequence equally, transforming it by some constant?

        • Avatar
          Jason Brownlee April 10, 2020 at 8:26 am #

          Each filter will extract different patterns from the sequence – in an analogy to filters extracting patterns from an image.

  183. Avatar
    Ricardo April 12, 2020 at 10:17 am #

    Hi Jason,

    What are the limits of LSTM models on multistep prediction length?, like if we have N samples and M features and we are predicting K future samples of a 1-D variable. Is there a way to relate N, M and K? or to have a quick rule of thumb on how large K can go before it doesn’t make sense anymore?

    Thanks!

    • Avatar
      Jason Brownlee April 12, 2020 at 1:15 pm #

      The further you predict into the future, the more errors will compound.

      Harder problems are more challenging to forecast.

      That is about as general as we can go – you will need test specific models on specific datasets to learn more.

  184. Avatar
    younes April 16, 2020 at 8:42 pm #

    Hi Jason,
    is there a method to choose the best n-steps-in? knowing that I need to make a prediction of 3 days, and I have a data of 1 year (8760 observations).

    • Avatar
      Jason Brownlee April 17, 2020 at 6:19 am #

      Test different values for your dataset and use the configuration that results in the best average performance.

  185. Avatar
    Dkb April 16, 2020 at 10:24 pm #

    Sir can u write the same code for functional api

  186. Avatar
    Abdül Meral April 17, 2020 at 4:40 am #

    thank you Dr.Jason,
    it is very helpful as always.

    i applied for my dataset
    https://www.kaggle.com/abdulmeral/rnn-4-models-for-lstm

  187. Avatar
    esyraq ekram April 18, 2020 at 1:18 am #

    omg omg omg omg omg, I just graduate last year and i did my internship. there i learn the great wonders of machine learning. for 1 year i have look at so many tutorial saying a this and that blah blah blah then it takes me a couple of hours to try to run that nonsense. Then i saw this, omg I cant stop saying that. this is what i want. this is it, i just want a simple code that i can run my self, i don’t need the million lines of explanation. i just want to know what works. you sir are my hero. Thank you so much from the bottom of my heart, i really feel that i don’t deserve this kindness of free usable knowledge. Thank you Dr Jason Brownlee my hero.

  188. Avatar
    Jung Hwan Kim April 29, 2020 at 4:59 pm #

    Professor Brownlee , What makes you put relu activation function on LSTM? When I tested my own project, the loss value increased astronomically (e.g. Loss: 2382585115.4067 Acc: 0.23)
    When I removed relu function on LSTM. It run as charm. Could you explain it more about this topic?

    • Avatar
      Jason Brownlee April 30, 2020 at 6:36 am #

      I find relu works well in lstm when we don’t scale inputs.

      If it doesn’t work well for your data, don’t use it. Find what works best on your project.

  189. Avatar
    H. Joner May 4, 2020 at 3:37 am #

    Hi professor Brownlee,

    Thank you for this excelent work.

    I’m thinking why when i use “Multiple Input Multi-Step Output” and select just 1 n_steps_out i don’t get the same result has i just make a simple Multiple Input Series predicting the next output?
    Shouldn’t get the same result?

    Thank you,

    • Avatar
      Jason Brownlee May 4, 2020 at 6:26 am #

      The models and data are very small in these cases, they are to show you how to use them, not actually solve the tiny prediction tasks.

  190. Avatar
    Ehsan Ameri May 9, 2020 at 8:30 pm #

    Hello everyone.
    Thanks to Mr.Jason Brownlee

    I reviewed some examples of the site (airplane passengers and shampoo) and I am totally confused the concept of timesteps with features.

    here we assume that data is like:
    X, y
    10, 20, 30 40
    20, 30, 40 50
    30, 40, 50 60

    and then it is concluded that the timesteps is 3 and features is 1.
    but in shampoo sales prediction we always change the data shape into
    X = X.reshape(X.shape[0], 1, X.shape[1])
    no matter how many lags we took in the model. it means we assume the timesteps to be 1 and features to be equal the number of lags.

    I ll appreciated if anyone can help me understand those concepts.

  191. Avatar
    Juan Pablo May 15, 2020 at 8:49 am #

    Hi Jason, thanks for so nice article.
    I have a large set of number sequences, labeled as “good” or “bad”. I need to build a model, so given a new sequence, it can classify as “good” or “bad” based on training. I’m not sure what model to use, because I need to classify, not to predict next value.
    It’s like classifying dogs and cats from pictures, but instead of pictures I have sequence of numbers where the order matters.
    Thank you!

  192. Avatar
    Hrishikesh Bawane May 16, 2020 at 6:29 am #

    Thank you so much for this. Preparing data is always a great task. I have a chat dataset which I want to use to create a chatbot. How should I prepare data for the encoder-decoder model?

  193. Avatar
    Suraj Bhatia May 20, 2020 at 6:46 am #

    Hi Jason, thank you so much for this article. I really liked this and learned many things.
    I have a time series data, I have 60 input data points and I have to predict 1 output at the last layer of LSTM, so basically I want my lstm to be

    first_day_data–>lstm_unit1–>second_day_data–>lstm_unit2–>….60th_day_data–>lstm_unit60–>denseLayer–>output.

    is something like this,
    data is – [1,2,3,4….60] and output only single value for ex. 5.6. How to construct this model using keras ?

  194. Avatar
    Makarand Datar May 20, 2020 at 1:01 pm #

    Hi Jason,
    If I have a time series with say 600000 time steps as output and 3 time serieses for 3 features all of them also of the same length. Then I form sequences (say 60 consecutive features at a time, first sequence will be 1-60, second will be 2-61 third will be 3-62 and so on) from my data and now I want to split it into training and testing sets.
    If I shuffle the sequences (within a sequence, all the points will still be chronologically sequential) and then split the data into 80 – 20 train test split, is that ok or would it lead to data leakage into testing?

  195. Avatar
    Makarand Datar May 20, 2020 at 1:02 pm #

    Also, thank you for this and all the other articles!

  196. Avatar
    Matheus May 23, 2020 at 4:17 am #

    Is there some tutorial for LSTM + Time series in R?

    With best regards

  197. Avatar
    tnuich May 31, 2020 at 5:16 am #

    Hi! Great job with this website! It is very useful.

    I have a question: Does the order of input train samples matter?
    Example:
    product_id | day_1 | day_2 | day_3 | day_4 | day_5 | day_6
    1 | 10 | 3 | 2 | 5 | 9 | 10
    2 | 11 | 5 | 2 | 4 | 3 | 2
    3 | 14 | 8 | 5 | 0 | 2 | 14
    4 | 10 | 0 | 1 | 5 | 1 | 1
    train dataset:
    [10,3,2,5] -> [9,10] #item 1
    [11,5,2,4] -> [3,2] #item 2
    [14,8,5,0] -> [2,14] #item 3
    [10,0,1,5] -> [1,1] #item4

    (to be more precise I have x products with sales for z days and I want to make each of the products a train sample, but by doing this the days will be repeated for each product is it correct? or should I build a train sample to contain all items?) * I mention that I’ve implemented the sliding window approach to build the dataset for each item

    • Avatar
      Jason Brownlee May 31, 2020 at 6:32 am #

      Thanks!

      Yes, the order of samples probably matters both in splitting data for train/eval and within the training and test set themselves.

      • Avatar
        tnuich June 10, 2020 at 4:48 pm #

        At the model.fit the samples are automatically shuffle, so in this case the order of samples(items)[in train] still matter?

  198. Avatar
    tuteja June 3, 2020 at 5:45 am #

    hi Jason,
    Could you provide your opinion on this usecase – I am working on a multivariate, multi-step time series problem to forecast sales for each of the cities. I understand from your tutorials how to use LSTM with vector output on such a problem but how do I handle the forecasting by cities? one way I read is to build separate models for each of the cities and then model concatenate at the end. what are your thoughts? do you have a post on it that I can refer to?

    Your blog has been a “go to” solution for all my problems. Thanks for sharing knowledge and keeping it simple!

  199. Avatar
    Gopikrishna K S June 8, 2020 at 3:21 am #

    Thanks a lot for the article, can you please explain briefly how to do the same in Java using Deeplearning4j library

  200. Avatar
    Adideva June 10, 2020 at 4:04 am #

    Hey Jason

    Thanks for the wonderful article. Can you help me with the data reshaping for the Multiple Parallel Series for a CNN LSTM model? It would be great if you could provide a python function fro the same. As a beginner, it is a bit tricky to understand the data shapes needed for the different models. Thanks

  201. Avatar
    Onur June 14, 2020 at 8:43 pm #

    Hi Jason ,

    What should we do when using string data as input ?

    I get the error that string data could not be converted to float type.

    how can i solve this problem?

  202. Avatar
    Higo Felipe Pires June 16, 2020 at 4:31 pm #

    Hi, Jason. Thank you for your helpful blog and post.

    I’m doing a project to predict COVID-19 growth in countries/regions. My plan is to use data of a handful of chosen countries in training and do the prediction with only one country (dataset: https://github.com/datasets/covid-19/blob/master/data/time-series-19-covid-combined.csv). Is this possible with the knowledge exposed in the post? If yes, which type of time series I’ll have to apply? Univariate? Multivariate? Multi-step?

    Best regards,

    Higo

    • Avatar
      Higo Felipe Silva Pires June 17, 2020 at 5:45 am #

      To be a little more specific:

      I wanna use the “Confirmed”, “Recovered” and “Deaths” to predict “Cases” (and eventually “Deaths”).

    • Avatar
      Jason Brownlee June 17, 2020 at 6:18 am #

      The growth rate can be modelled directly with an exponential function, use the GROWTH() function in excel.

      • Avatar
        Higo Felipe Pires June 17, 2020 at 7:03 am #

        Jason, thanks for the reply, but I don’t think I expressed myself in the best way.

        What I intend to do on my project is to train an LSTM with data from confirmed cases, recovered patients and deaths from a certain set of countries and try to predict the number of cases in another country. The dataset is that on my first comment.

        For example: training the LSTM with data from Australia, Costa Rica, Greece, Hungary and Israel (from 2020-01-22 to 2020-06-15) and trying to predict the number of cases in Brazil (here i would like to try two approaches: a validation with predictions in the same range 2020-01-22 to 2020-06-15, and another aimed at predicting future cases, beyond the date 2020-06-15).

        Which of the approaches exposed in the article should I use? It is not yet clear to me which would be the best.

        Thanks in advance.

  203. Avatar
    Syed Nazir Hussain June 20, 2020 at 1:05 am #

    Good day sir,

    I would like to know, how can I get the next week’s forecasting results in the vanilla LSTM model. In this site example, we only get single forecast value.
    Can you help me in this senario.?

  204. Avatar
    Kyu June 22, 2020 at 8:26 am #

    Hi Jason,

    Thank you for the valuable post. I have a question regarding multi-step LSTM model. I was trying to apply CNN-LSTM for the multi-step model, but I am a bit confused on reshaping [sample, timesteps] into [sample, subsequences, time steps, features].

    The example code for the stacked LSTM is
    X = X.reshape((X.shape[0], X.shape[1], n_features))

    but in case of CNN-LSTM, we need the number of subsequence for the CNN model. But whenever I input n_seq=2 and run the code
    X = X.reshape((X.shape[0], n_seq, X.shape[1], n_features))

    , the error occurred: ValueError: cannot reshape array of size 15 into shape (5,2,3,3)

    Would you please help me resolve the problem?

    Thank you in advance.

    • Avatar
      Jason Brownlee June 22, 2020 at 1:27 pm #

      You’re welcome.

      You may need to experiment with different input shapes that are divisible by the number of timesteps in each sample.

  205. Avatar
    mike mirza June 24, 2020 at 4:58 am #

    Hi and thank you for great explanation
    I have another situation, lets say I have
    20 33
    30 43
    40 53
    50 63
    60 ?

    so I need to predict a time series but with help of another that I already have, whats the best approach?

  206. Avatar
    busssard June 25, 2020 at 7:43 am #

    Hi Jason, Thank you so much fr all your work!
    It is a blessing to have such a talented educator as you to teach the practical side of ML.

    I used this tutorial to create a timeforecast for COVID 19.
    I was wondering, can i use different data generators (in my specific practice case: coutries) to learn the behavior?
    In your example of the shampoo sales: Can i use different companies sales numbers to predict?
    Or do i have to fit one net per data generator?

  207. Avatar
    Amelie July 6, 2020 at 8:50 am #

    Hello,

    I am testing some forecasting algorithms including the LSTM model. From this, I wanted to seek its complexity in terms of memory and computing time.
    So if you allow me, what complexities for the example of the univriate time series forecasting presented in the example above.

    Thank you so much

    • Avatar
      Jason Brownlee July 6, 2020 at 2:05 pm #

      Not sure off hand, sorry. You might have to check the literature if anyone has estimated the big-O for the method.

  208. Avatar
    Julian July 10, 2020 at 5:07 pm #

    Hi Jason,

    I have a litlle complictaed, but I think not so rare forecast Problem I’d like to solve.

    Example description:
    Lets say we do klimate-measurements at ground level but also at 15km hight. The last 2 years we started weatherbaloons every day to measure i.e. the pressure at 15 km hight. weather balloons are expensive and not really enviromental friendly, so we like to reduce the amount of weather balloons we need.
    The idea:
    from now on, we could start a weather balloon only every Sunday. The folowing 6 days we would predict the pressure at 15 km hight based on the current measurements of each day at ground level and on the Sundays we could ‘refocus’ our model using the real world measurement.
    This sounds feasable to me, but I do not know where to start.

    my first idea:
    Not really what i want but possible:
    put all the input data for last week together ((Sunday+)? Monday-Sunday) in one feature set and build a standard RNN to predict the pressure in 15km hight for Monday-Saturday. I think this would work, but then i would only get the values for last week. If I would like to have a estimation for today I to not see a way.

    I think there are many processes you could optimise this way. Also in Industry where products of one batch often have rather equal properties. We could drastically reduce the prodcution time if we predict kalibration Measurements which take a long time to perform based on rather simple Measurements.

    Do you have a idea how I could start building such a model? Do you know a good book with a similar example?

    Chears,
    Julian

    • Avatar
      Jason Brownlee July 11, 2020 at 6:05 am #

      That sounds like a fun project.

      Generally, I would encourage you to prototype and evaluate each approach you can think of, rather than guess a priori what might be best – use results to guide you.

  209. Avatar
    Joe July 13, 2020 at 11:56 pm #

    Thank you for your insightful work.

    Why does the input shape contain the number of steps :
    model.add(LSTM(50, activation=’relu’, input_shape=(n_steps, n_features)))
    model.add(Dense(1))

    It seems that the actual shape of the input can do the job:
    model.add(LSTM(50, activation=’relu’, input_shape=(X.shape[1], n_features)))

    Thanks again.

    • Avatar
      Jason Brownlee July 14, 2020 at 6:27 am #

      You can use either, as long as the model matches the data.

  210. Avatar
    Emmanuel July 16, 2020 at 5:27 pm #

    Great tutorial, your work has always been of help to me. I am trying to develop a predictive model for a belt drive. In this case, my time series data is not necessarily for forecasting but the trained model predicts the status of the belt drive based on new time series data. Is LSTM nevertheless optimal or do you have any two to three neural network you can recommend in this case?

    • Avatar
      Jason Brownlee July 17, 2020 at 6:02 am #

      You’re welcome.

      Good question. I recommend testing a suite of algorithms and algorithm configurations in order to discover what works best for your specific dataset.

  211. Avatar
    Melissa July 22, 2020 at 2:58 am #

    Hey Jason, I have very much enjoyed your tutorial! In your opinion, is there a ‘right’ amount to data points (e.g., rows) to feed into an LSTM model? I was thinking to use around 500000 – 1M data points and I was wondering if they are too much and what wold be the limitations between using a small dataset vs a very large one?

    Thanks, love your website!

  212. Avatar
    Enes August 3, 2020 at 7:19 am #

    Hello Jason,

    Thanks for your great tutorials, they have been always very helpful.

    I’m interested in the calculation process behind LSTM. I’m familiar with all formulas which are used in LSTM but I’m not sure what is the input at each calculation step in Vanilla LSTM example.

    For example, let suppose that the input time series is [30, 40, 50]

    So, at the first step, using C_{0} (cell memory), H_{0} (cell output) and number 30 (from time series above), we calculate C_{1} and H_{1}

    Next, using C_{1}, H_{1} and 40 are calculated C_{2} and H_{2} and so on. Right?

    I’m a little confused because in sentence time series, each word can be represented as a one-hot vector and in that example, the sentence would be time series of one-hot vectors and at each calculation step, the input in the formula would be one one-hot vector.

    Regards, Enes

  213. Avatar
    Jinhui August 3, 2020 at 8:05 pm #

    Hi, Jason, thank you so much for your great tutorials.

    I am using multiple-variables multiple-steps encoder-decoder LSTM. In my case, the input steps, output steps, and n_features are 150, 15, and 11, respectively. But I have a really large number of timesteps (~100,000).

    So the input [100000, 150, 11] and output [100000,15, 11] are used to train. I set the epochs to 50 and got the model after 4h’s training. But I find that all the prediction result of this model keeps constant, i.e. [0, 15, 11], [1, 15, 11], [2, 15, 11], … are the same.

    I will be grateful if you could give me some possible reasons that I should check.

    Thank you!

  214. Avatar
    Gab August 7, 2020 at 4:42 am #

    hi jason, great article!!!

    I have a dataset with 3 years of historical precipitation and radiation data.

    Which of the above models would be more logical to use so that I could predict both variables at the same time?

    Is this enough data for a forecast?

    How would I predict the next 30 days of the month from the last dataset date?

    Sorry for so many questions!

  215. Avatar
    Sumedha Sandip Borde August 8, 2020 at 6:31 pm #

    Hello
    I have an EEG(brain signal) dataset which i want to use for classification. .64 electrodes are attached to every subject(patient) and 5012 samples are recorded for every electrode. this way every subject has 64 series of 5012 samples and one class label for each subject. likewise there are 108 such subjects.
    Can you suggest the right deep learning method that can be used for classification?

  216. Avatar
    Mike August 17, 2020 at 6:42 pm #

    I have a question about builind a test harness for testing LSTMs vs different other models.

    My data is structured as follows:

    Input: Information on weather, construction works, accidents in a road network
    Output: cars passing a counter

    Accidents that happened in the morning would affect traffic in the afternoon and traffic patterns that developed in the morning due to these accidents will as well. Hence I thought an LSTM could help. But I want to test against simpler models.

    I would imagine model performance varies over the course of the day so my performance measure would be a graph showing the errors of the model over the course of the day as a distribution as the test set would include multiple days.

    Where I am stuck is the training part: I selected a few characteristic days over the past years that I want to pass to the model. I assume that no effects spill over from one day to the next as there’s almost no traffic at night. So in effect my train data set consists of a number of days that shall be taken individually. That way I don’t have to pass years of data but can select typical days and only train on these. How do I pass these to the model and avoid at the same time that the “memory” takes info from previous days into consideration?

    Should I just use one model.fit(X, y) where I add a dummy variable to the X representing the day? that doesn’t seem like good practice to me. If I do not point out the day specifically the model may think that the state of the neural network from the day before would affect the following day.

    Or fit the model multiple times, e. g.

    for day in sample_days:
    model.test(X_day, y_day)

    • Avatar
      Mike August 17, 2020 at 6:43 pm #

      Sorry, mistake in the last code snippet. That would have to be:

      for day in sample_days:
      model.fit(X_day, y_day)

    • Avatar
      Jason Brownlee August 18, 2020 at 6:00 am #

      Perhaps you can use all prior data up to the day you want to test as training, then test on the hold out day. Repeat for each day you want to evaluate.

  217. Avatar
    emmanuel August 19, 2020 at 10:58 pm #

    Hello Jason, thank you for this tutorial which is very useful. I’m working on panel data right now, i.e. I observe certain variables on several individuals at different times. I have a dataset of 719 individuals and 11 variables observed daily over 10 years (2010 to 2019).

    Can we apply an LSTM model on these data?
    If yes, how to prepare the data (reshape).

    Thank you.

    • Avatar
      Jason Brownlee August 20, 2020 at 6:42 am #

      LSTM might be appropriate if each subject is a time series and you want to learn across subjects.

      This will help you prepare the data:
      https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input

      • Avatar
        Guanta January 19, 2023 at 5:28 am #

        Thank you Jason, Emmanuel. I read the link on data preperation – very useful. I have a question of clarification:

        I have panel data on 200 different companies, each company belongs to a different sector of which there are 12 of these different sectors labelled numerically as 1-12.

        For each company there are 8 different pieces of price information such as price, market capitalisation, volume, and so forth.

        I then have a column of of future company stock price which is 10 days ahead. My aim is to predict this column.

        The date range is from 2010 – 2012. Weekly, 104 dates for all 200 companies.

        My understanding is that this means 200 samples, 104 timesteps, 9 features including the 10 day ahead stock price.

        Would this mean I need to train 200 different models? How would you go about this problem if you were given this dataset?

        Sorry if this is a daft question. I am new to ML.

  218. Avatar
    Malik Elam August 23, 2020 at 8:53 pm #

    Hi Jason,

    Your answers were always inspiring for me. I am always thankful for that.

    I have a question please.

    I’m working on a stock price forecasting problem.

    Assuming (t-current time, t+1 -future time), I prepared my data set as follows: Xt -> Yt+1, so that data links current feature inputs with future change in the price.

    As I understand, RNNs try to map Yt -> Yt+1. In my case I cannot include (Yt+1, Yt+2,…) in the training set as upon using the trained model for forecasting; these will be unknown future values that cannot be fed into the model.

    On the other hand, using a data set of Xt -> Yt does not hold the core information: the future price change of the stock, so as to be able to forecast it.

    What would be your advice?

    How can I make use of say: Xt-n, … ,Xt, Yt-n,…,Yt to forecast Yt+1 ?

  219. Avatar
    Asish August 25, 2020 at 4:29 am #

    Hi Jason, I have time series eye data like diameter, number of blinks, duration of fixation and each features has different threshold like diameter is more than 3.5 means high cognitive load for eyes. Which LSTM can I use for this dataset to measure cognitive load? Or any other ML will fit for this problem?

    • Avatar
      Jason Brownlee August 25, 2020 at 6:44 am #

      I recommend testing a suite of different models and model configurations, not list lstms, in order to discover what works best for your specific dataset.

      • Avatar
        Asish August 28, 2020 at 6:27 am #

        Hi Jason,
        Thanks for your suggestions.
        I don’t have ground truth data. I’m recording data using device and I’m thinking to ask user to label data for the last 2/3 min recording data. But it has downside to label many rows with same label. Is there any way to generate ground truth data?

        Thanks

        • Avatar
          Jason Brownlee August 28, 2020 at 6:58 am #

          You can take each candidate answer as a separate row, or try consolidating each row using the mode or mean estimate.

  220. Avatar
    Simon PERROTT August 25, 2020 at 6:52 pm #

    Thank you Jason,

    I love your articles so much I’ve bought several of your books which I find excellent.

    I have a data prep question….

    I’m training an LTSM multi-classification model;
    I find that the classes in my training set (training data is chronologically before the val & test data) are very unbalanced.
    I’m particularly interested in the minority classes (their accurate prediction is more important to me).
    Given the dependent nature of timeseries observations and how I’m training in batches with each batch maintaining state (even in the stateless LSTM)…
    Am I correct in saying that I cannot upsample or downsample the training data to balance the classes in the training dataset? (because either omitting or adding any data points, in this case there’s a datapoint for every day, would mess up the timeseries in a batch).

    Do you have any advice for how I can balance out my training dataset?

    Appreciate your insight,
    Thanks,
    Simon

    • Avatar
      Jason Brownlee August 26, 2020 at 6:48 am #

      Thank you deeply Simon!

      Great question.

      First, select an appropriate metric, not accuracy.

      Second, try a cost-sensitive LSTM (and other neural nets). Try weights that balance the classes first, later try more agressive over-corrective weights and see if you can do better.

      Finally, try simple duplication of input patterns for the minority class and add gaussian noise to the observations – e.g. a primitive form of random oversampling.

      Let me know how you go.

      • Avatar
        Simon Perrott August 26, 2020 at 8:51 pm #

        Brilliant suggestions Jason, thank you!

        I’ll try those out, I’m learning a lot from you and appreciate your explanations

  221. Avatar
    Khin Thida San August 27, 2020 at 2:16 pm #

    Thank so much for your articles, I have been learning deep NN and LSTM, this helps me a lot to understand deep down and to build my own model for time series analysis.

  222. Avatar
    Shinichiro Imoto August 31, 2020 at 12:09 pm #

    Hi Jason!
    I always appreciate your blobs. They help me understand the deep idea of DNN with precious sample codes.
    Now, I’m little struggling with CNN + LSTM model for Multivariate – Multistep time series forecasting problem.
    I experimentally added CNN before LSTM layer and your blob made me notice that I needed TimeDistributed wrapper to layers before LSTM layers. To do so, I reshaped input as follows, as well as x validation set.

    [Before adding CNN]
    InputLayer(input_shape=(x_train.shape[1],x_train.shape[2]), batch_size=BATCH_SIZE))
    x_train.shape[1]: time steps (e.g. 600)
    x_train.shape[2]; # of features (e.g. 4, since it’s Multivariate)
    batch_size: I specifed it as 128 or 256 since stateful=True in LSTM arg.

    [Now]
    InputLayer(input_shape=(x_train_multi.shape[1],x_train_multi.shape[2],x_train_multi.shape[3]), batch_size=BATCH_SIZE)
    x_train.shape[1]: subsequences (e.g. 600)
    x_train.shape[2]: time steps (e.g. 1)
    x_train.shape[3]: # of features, No change.,
    batch_size: No change.
    I adjusted the ratio of [1]:[2], then found 600:1 is the best.

    After all, the following is my current model snippet.

    model.add(InputLayer( “AS [Now] ABOVE” ))
    model.add(TimeDistributed(Conv1D(filters=200, kernel_size=3, strides=1, padding=”causal”, activation=”relu”)))
    ## model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))
    model.add(LSTM(150, stateful=True, return_sequences=True))
    model.add(LSTM(150, stateful=True, return_sequences=False))
    model.add(Dense(150, activation=’relu’))
    model.add(Dense(8)) # forcast 8 time points

    “fit()” works normally and the accuracy is almost the same as before I added CNN.
    However, when I enable MaxPooling1D layer after Conv1D, the layer throws ValueError with regards to input shape. When I delete “padding=”causal”” from Conv1D arg, Conv1D also throws the same ValueError too.

    I’m sorry for this long question, but if you see any wrong part especially about the input shape, please give me your comment.
    Thank you.

    • Avatar
      Jason Brownlee August 31, 2020 at 1:24 pm #

      Well done!

      Sorry, I don’t have a good off the cuff answer for you, you will need to tune the model for your problem including ensuring the architecture is a good match for the shape of the data flowing through the model. I cannot debug the model for you.

      • Avatar
        Shinichiro Imoto September 1, 2020 at 12:39 pm #

        Thank you for your reply, Jason!
        Your comment cheers me up since I’m the only one who is doing ML in my office.

        I found the cause of this ValueError. It is because the size of Maxpooling1D has to be more than “timesteps”. As I posted, I reshaped the original time step 600 into 600 x 1 ( subsequences x “timesteps” in [samples, subsequences, “timesteps”, features]).
        It has to be 300 x *2*(or more) since the pooling size is *2*.

        But no errors do not mean that it is correct. I hope this would work to fit.

  223. Avatar
    Suwei September 9, 2020 at 10:40 pm #

    Hi, Jason, thank you for your tutorial. I have a question, I want to predict the flood, and my data is not continuous, like for the year, 2019, I have the data of part weeks of 5, 8 month, and for the year 2020, I have data of 3, 6 month. how should I do to make the prediction?

  224. Avatar
    Victor September 16, 2020 at 6:52 pm #

    Sorry Jason, I have read many times and alongside with some questions people asked above. I still don’t understand what’s the difference of using a RepeatVector comparing to LSTM with return_sequence = True? Is there any easy way to understand the major difference? Would like to understand when each method would be ideal to use.

    Much appreciated!

    • Avatar
      Jason Brownlee September 17, 2020 at 6:43 am #

      Repeat vector uses the same single output vector from the encoder in the creation of each output step by the decoder.

      Return sequences is the output of each input time step from the encoder.

  225. Avatar
    Rudiger September 18, 2020 at 6:28 am #

    Hi Jason,
    Thank you for this wonderful post!
    I have tried out multistep your example “Vector Output Model” with exactly the same numbers, same code. Some of the important data:

    raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
    n_steps_in, n_steps_out = 3, 2
    x_input = array([70, 80, 90])

    print(yhat)
    [[124.500435 137.70433 ]]

    Normally yhat should be close to 100 and 110. Do you have an explanation what is happening or possibly going wrong?

  226. Avatar
    Rudiger September 22, 2020 at 6:48 am #

    I am going back to your multi-step LSTM example.
    You have the following parameters in the example:
    raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
    n_steps_in, n_steps_out = 3, 2
    I am wondering, how many multi-steps could be predicted maximally?
    Imagine that I have a serie of 100 timepoints. What would a the max. reliable multi-step forecast and the most optimal split of X and y?

  227. Avatar
    Armin October 6, 2020 at 6:42 am #

    Hello Jason,
    thank you very much for your great work. Please let me ask you two questions:

    a) can I train (fit) a LSTM with a series with e.g. timestep =5, but predict the network with data with timestep = 1?

    b) is there a relationship between quantity of timestep and quantity of hidden layers or neurons per layer?

    Thanks
    BR
    Armin

    • Avatar
      Jason Brownlee October 6, 2020 at 7:02 am #

      I don’t see why not. You might have to re-define the model input layer after it is fit.

      Yes, it varies for dataset and models. Run a sensitivity analysis to see how performance varies with model capacity in your case.

  228. Avatar
    Armin October 6, 2020 at 7:39 am #

    Thank you very much and greetings from Bavaria.

  229. Avatar
    jean October 8, 2020 at 6:30 am #

    Thanks for the algorithms. I have a question about how optmize the LTSM hyperparameters. Is there some algorithm that do this?

    • Avatar
      Jason Brownlee October 8, 2020 at 8:36 am #

      Yes, a grid search or a random search are a good start.

  230. Avatar
    Manoj October 20, 2020 at 9:28 pm #

    I’ve difficulty in understanding LSTM input shape. For example. I’ve 50 videos out of these 25 are categorized as Awake (0) and 25 as Drowsy (1). I preprocessed them to extract Eye Aspect Ratio and Mouth Aspect Ratio as features every second.

    Now my data has ( VideoFileName, Time Series, EAR, MAR, Label )

    Video1 1 0.30 0.25 0
    Video1 2 0.31 0.27 0
    Video1 3 0.35 0.25 0
    Video2 1 0.30 0.25 1
    Video2 2 0.27 0.28 1
    Video2 3 0.31 0.29 1
    Video2 4 0.33 0.30 1

    I extracted above data from first 3 and 4 seconds of two videos respectively as the length of videos may be different.

    I’ve a very basic question here. How should I feed this data to LSTM? Any code example would be fine. I know input shape should be [Batch Size, Time Step, Features] but I’m confused how to feed this to LSTM should I feed each video’s data in a loop.

    Please help me to clear my doubt.

  231. Avatar
    Adrien October 23, 2020 at 1:21 am #

    Hello Jason,

    Great article!

    just a quick question about the split sequence method for Multiple Input Multi-Step Output.

    On this line, you select only the first two features in X and the last feature in Y.

    seq_x, seq_y = sequences [i: end_ix,: -1], sequences [end_ix-1: out_end_ix, -1]

    Why not include the 3 features in X?

    That is to say, use the 3 features to predict only the 3rd.

    Would that be a problem?

    Thank you

  232. Avatar
    Xu Ji October 29, 2020 at 1:13 pm #

    Thanks you very much for this. I learned a lot from you different post, especially LSTM. Just wondering if you have recommendation using LSTM for anomaly detection? Thank you!

  233. Avatar
    yjk November 3, 2020 at 12:33 am #

    Thanks for sharing this! I learned really well about LSTM models, and I am wondering why you used a Vanilla LSTM on ‘Multiple Input Series’ part, and why I cant use other models such as Stacked LSTM, Bidirectional LSTM, or ConvLSTM. Is it because of the dimensional of input?

    • Avatar
      Jason Brownlee November 3, 2020 at 6:55 am #

      In some cases yes, on other cases because one model performs better than the others for a given dataset.

  234. Avatar
    Eugene November 15, 2020 at 1:25 pm #

    Thanks for sharing, how would you model a regression problem to predict at a various arbitrary time steps? For example: predicting a inflection point where we are interested in where inflection occur and when is the time step it will happen. For example: The next predicted inflection point at 12345 occur at t+136.

    Will it be the same multi time step LSTM model above, or is it a completely different problem, and how can we approach to this?

    • Avatar
      Jason Brownlee November 16, 2020 at 6:24 am #

      There are many ways to approach the problem, perhaps prototype a few and discover what works well/best for your dataset.

      e.g. time series classification – is an event expected to occur in the next interval.
      or multi-step forecast and use an if-statement to post-process the predictions.
      etc.

  235. Avatar
    Arslan November 15, 2020 at 11:22 pm #

    First, thanks for this great article, I just found it on Linkedin.

    Currently I am working on a project where I want to predict how many pieces of a material should be ordered for the next three month.
    I have purchasing data of 20,000 materials (different time series) on monthly base which correlate to eachother ín case of seasonality but have very short time series (50-80 data points).
    For example:
    date | mat | amount | workload |
    2020-08 | A | 20.0 | 0.8

    Does it make sense to build a LSTM model for this kind of problem?

    As a regressor I could implement the months (for seasonality) and also the workload for this month.
    I could train the model with all time series and 80-90% of data points. The other 10-20% for test set)

    Maybe another model is better? (S)ARIMA is only a univariate approach, so I can’t implement the workload.

    Thank you!

    • Avatar
      Jason Brownlee November 16, 2020 at 6:26 am #

      Good question, I recommend evaluating a suite of different algorithms/configs and discover what works well or best for your dataset.

      This framework may help:
      https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/

      • Avatar
        Arslan December 3, 2020 at 3:46 am #

        Thank you for your response!
        Do you have any approach how to handle time series data which is effected by covid-19.

        For example: I have timeseries from 2017 to 2020 on monthly base.
        For two months (April and March, 2020) the production went down, so that I have small values for this two months and also some high values in the following months, because production went up again (in total there are outliers over 4-6 months).

        I have tried several approaches, but these outliers of covid-19 makes it hard to get good results in case of forecasting (training/testing). Also, like I mentioned before there are thousands of timeseries that are effected (differently but of course with some correlation)

        Do you have any advices?

  236. Avatar
    Konstantinos November 25, 2020 at 7:42 am #

    Is the model trained only with training data or for every prediction the actual data of the prediction is added to trainig data and the model retrained?

    • Avatar
      Jason Brownlee November 25, 2020 at 7:53 am #

      You can re-train the model as new observations become available if you like – both in walk forward validation and when the model is deployed.

  237. Avatar
    Tiago November 29, 2020 at 10:19 pm #

    Hi Jason, amazing article covering many of the shapes of the LSTM!

    I have one question:

    I am using PyTorch instead of Keras and would like to reproduce your vanilla LSTM. Could you please explain more about what is the ‘input’ parameter of the LSTM?

    Thanks!

  238. Avatar
    sarah December 7, 2020 at 6:39 am #

    Hi jason,

    I am trying to build and LSTM for a time series data, unfortunately i am unable to reshape a 4D input data into 3D input data to fit my LSTM model. do you know how is this possible?

  239. Avatar
    Ítalo Romani December 8, 2020 at 11:34 am #

    Hi Jason, I am trying to develop a custom loss function for my LSTM mode which was based on yours, like:

    model = Sequential()
    model.add(LSTM(neurons, activation=’relu’, input_shape=(n_steps_in, n_features)))
    model.add(RepeatVector(n_steps_out))
    model.add(LSTM(neurons, activation=’relu’, return_sequences=True))
    model.add(TimeDistributed(Dense(1)))
    model.compile(optimizer=’adam’, loss=my_loss,run_eagerly=True)

    The custom loss my_loss function receives from Keras the parameters (y_true,y_pred), in my understanding they should have shape like input_shape. However, regardless of the input shape I use, y_true comes with shape (32,1,1), even if I remove all layers and leave a bare Sequential model.

    I am trying to understand the logic of this, googled around but so far nothing helped me to explain this.

  240. Avatar
    Ítalo Romani December 8, 2020 at 12:16 pm #

    Actually I made some confusion in this question. Trying to explain again: imagine that I am trying to predict a series with a single (1-dimensional) value each time step, so y_true, y_pred should have shape (total_time_steps,1). However I always get shape (32,1,1), with values that have no remembrance to the actual values.

    • Avatar
      Jason Brownlee December 8, 2020 at 1:32 pm #

      Sorry, I don’t understand what you’re asking exactly. Perhaps you can rephrase.

      • Avatar
        Italo Romani December 8, 2020 at 9:14 pm #

        Hi Jason, thanks so much for the reply. I did a further search and found the answers to my problem. First, y_true, y_pred come with sizes defined by batch_size; second, by default, their values come shuffled, so I have to use shuffle=False. Third, and most important, I don’t know if what I am trying to do is even possible with Keras because all the operations in the loss function have to use tensor operations, otherwise the loss function cannot provide gradients to the optimiser. My intended loss function goes sequentially over each element of y_true and y_pread, compares each pair and updates an accumulation function not definable by custom algebraic/symbolic functions. It’s a bit large and too specific to share here, but if you are interested I can share the details of what I am trying to do.

        • Avatar
          Italo Romani December 8, 2020 at 9:23 pm #

          Perhaps there’s an optimiser in Keras that does not require gradients, but not that I know of

        • Avatar
          Jason Brownlee December 9, 2020 at 6:18 am #

          Well done!

  241. Avatar
    Jam December 9, 2020 at 11:48 am #

    Hi Jason, I learn a lot from your articles. Could you please help on a network. I have an input of presumably (4, 10, 2). [(10,2) are time steps and features, respectively.] There are a lot of data in such a shape and for each one I propose to train a lstm and then make a Convolution layer among them. So by an Conv1D(1), I expect the output (3, 10, 2).
    please correct me if I am wrong. I reshaped data into (1, 4,10,2). Then I used TimeDistributed wrapper for prediction. but then I am not able to make a convolution on shape[0] (I mean 4). what is get is convolution on the shape[2] (I mean 2). can you help me how to arrange data for the network or whether my network is true or not?

    • Avatar
      Jason Brownlee December 9, 2020 at 1:26 pm #

      Typically you would use a CNN than an LSTM, not the other way around.

      I have not tried LSTM-CNN, but I expect it would be challenging and you may need to debug the model yourself.

  242. Avatar
    John White December 25, 2020 at 2:42 pm #

    Hello Jason!

    First off, thanks for being here for my machine learning journey! So I have a base scenario to check for understanding:

    Context: I have supervised binary classification dataset on weather temperatures with 4 features. Target variable at time t is 0 or 1. 0 is if temperature at t+30 is down, 1 if up.

    Framing the Problem for LSTM: Say timesteps is 60. So we take the previous 60 timesteps of data to predict 0 or 1 for t+1. In doing so, we can predict if the weather temperature is up or down in 30 days. Input shape would be (60, 4). I would have to chunk the training dataset and reshape it to be compatible with the (60, 4) input shape.

    Is my understanding correct? Thank you!

  243. Avatar
    Mary December 27, 2020 at 5:24 am #

    Dear Jason,
    The tutorial was really useful as ever is.
    But I have not seen in your tutorials that you applied any **Bilstm** network for regression to predict ** Multivariate and Multi-step ahead** data.

    I have created a Bilstm to forecast 9 features in terms of 3-time steps ahead.

    model = Sequential()
    model.add(Bidirectional(LSTM(200, return_sequences=True), activation=’relu’, input_shape=(n_steps_in, n_features)))
    model.add(RepeatVector(n_steps_out))
    model.add(Dropout(0.5))
    model.add(Bidirectional(LSTM(100, activation=’relu’, return_sequences=False)))
    model.add(Dense(3))
    model.add(TimeDistributed(Dense(n_features)))
    model.compile(optimizer=’adam’, loss=’mse’)

    I am really eager to know whether the given model is correct or not.
    Moreover, the output prediction is not well, so I would like to know the answer to some questions.

    1- Is it common to use **Bilstm** for regression in the case of ** Multivariate and Multi-step ahead**??

    2- what is the best model for regression in the case of ** Multivariate and Multi-step ahead**??

    3- is the given model created correctly or not?

    I am really sorry for writing too much, but I am really looking forward to get anwer.

    Best
    Mary

    • Avatar
      Jason Brownlee December 27, 2020 at 6:14 am #

      No, typically bidirectional LSTMs are not used in the encoder-decoder architecture, but I don’t see any reason why they couldn’t be used.

      We cannot know the best model for a given dataset, the job of a machine learning practitioner is to use careful experiments and discvoer what works well or best.

  244. Avatar
    Mary December 27, 2020 at 7:16 am #

    Dear Jason,
    I really appreciate your quick reply.

    But I did not get the answer o this question:

    1- Is the below architecture correct logically?

    ( I am a beginner in using Bilstm in regression, so I am not sure whether I made the layers correctly or not)?

    model = Sequential()
    model.add(Bidirectional(LSTM(200, return_sequences=True), activation=’relu’, input_shape=(n_steps_in, n_features)))
    model.add(RepeatVector(n_steps_out))
    model.add(Dropout(0.5))
    model.add(Bidirectional(LSTM(100, activation=’relu’, return_sequences=False)))
    model.add(Dense(3))
    model.add(TimeDistributed(Dense(n_features)))
    model.compile(optimizer=’adam’, loss=’mse’)

    I have created a Bilstm to forecast 9 features in terms of 3-time steps ahead.

    2- Is it common to use **Bilstm** for regression in the case of ** Multivariate and Multi-step ahead**??

    I am really looking forward to see your clear answer, as I did not get the mean of your previous answer.

    Best

    Mary

    • Avatar
      Jason Brownlee December 27, 2020 at 9:25 am #

      I don’t have the capacity to review and comment on your model architecture:
      https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code

      LSTMs are used for sequence prediction, not regression. In numeric sequence prediction, bidirectional are rarely used – but if it gives the best results for your dataset, then use it.

      • Avatar
        Mary December 28, 2020 at 4:48 am #

        Dear Jason,
        I am grateful for your reply.
        As you mentioned “LSTMs are used for sequence prediction, not regression”,

        so do you have a tutorial post to introduce the best techniques for numeric sequence regression?

        As I cannot differentiate between sequence prediction and sequence regression.

        I want to predict 9 features in terms of 3-time steps ahead.

        would you please introduce me to some useful methods?

        Best
        Mary

        • Avatar
          Jason Brownlee December 28, 2020 at 6:04 am #

          “regression” is a row of data without sequence.

          “sequence prediction” or “sequence regression” are the same kind of thing. The above examples fall into this.

          You cannot accurately refer to “sequence prediction” or “sequence regression” as “regression” as an LSTM cannot be used for the latter, but can be used for the former.

          I hope that is clearer.

          • Avatar
            Mary December 29, 2020 at 10:37 pm #

            Dear Jason,
            Thank you a lot for the time you spent answering that much clearly.

            Best

            Mary

          • Avatar
            Jason Brownlee December 30, 2020 at 6:37 am #

            You’re welcome.

  245. Avatar
    Devendra January 4, 2021 at 1:30 am #

    i want to make an earthquake prediction using rnn (LSTM).I am getting difficulty to code . can you please help me

    • Avatar
      Jason Brownlee January 4, 2021 at 6:09 am #

      Is there a specific problem you are having that I can perhaps address?

      • Avatar
        Devendra January 4, 2021 at 3:25 pm #

        can i get it from those code you have provided here? If yes then which LSTM should i follow?

        • Avatar
          Jason Brownlee January 5, 2021 at 6:16 am #

          I have no examples of “earthquake prediction”.

          Perhaps you can start with a model listed above and adapt it for your specific dataset.

  246. Avatar
    Bensayah Abdallah January 8, 2021 at 5:25 am #

    Many thanks

    This is my situation: I have several companies. According to 20 measurable features varying from year to year. For ten years, we have a binary classification Fail/Succes.
    My question is what model adequate for this problem to train the machine to predict a probable success or failure of a given company with its successive given features?

    Many thanks

  247. Avatar
    Hnin January 9, 2021 at 8:10 pm #

    Thanks Jason for the insights. I have one question regarding Convo_LSTM. It can extract the spatio-temporal features. How can we input the spatio data? Do we have the examples code of it?

    • Avatar
      Jason Brownlee January 10, 2021 at 5:39 am #

      Yes, convlstm is designed for patio-temporal data.

      It takes a sequence of images as input.

      • Avatar
        Hnin January 13, 2021 at 6:29 pm #

        Thanks a lot for your kind reply.

        Do you mean to extract spatio temporal feature , we need to input a sequence of images as input rather than a sequence of values?

  248. Avatar
    Hyundong January 11, 2021 at 7:44 pm #

    Thank you for your great tutorial. I learned through this article but I have a question about the number of samples.
    If [10, 20, 30, 40, 50, 60, 70, 80, 90] is one sample, I have about 10,000 samples that each sample is independent and has the same characteristics.
    For instance, [10.01, 20, 30.035, 40.102, 50.1, 60, 70.364, 80.112, 90.623], [10.541, 20.983, 30.097, 40.152, 50.2, 60.942, 70.73, 80, 90.53], [10.543, 20.486, 30.897, 40, 50.766, 60.519, 70.132, 80.11, 90.445], …
    In this case, I am wondering if there is a way to apply all 10,000 samples to training the model.

    Thank you.

  249. Avatar
    Yilma January 12, 2021 at 12:50 am #

    Dear Jason,
    I am not familiar with python. Do you have this tutorial in R?
    Best
    Yilma

  250. Avatar
    Alex January 28, 2021 at 2:06 am #

    Hi Jason

    regarding the case ‘Multiple Parallel Series’… my problem has 150k time series, and for each I need to predict the future value.
    I guess this means that I will have 150k features.
    So the input array for my LSTM NN will have dimensions [n_samples, n_steps, 150k].
    The size of the array is too large! I get the error:
    ‘Unable to allocate 606. MiB for an array with shape (365, 3, 150000) and data type float32’.

    What should I do? is this the right way to approach the problem?

    Many thanks!

  251. Avatar
    Alex January 29, 2021 at 2:21 am #

    Hi Jason

    I want to train my LSTM NN with random samples taken from a timeseries.
    Should I normalize the whole series or each sample individually?

    Thanks

  252. Avatar
    Raheel Anjum February 8, 2021 at 1:51 pm #

    Hi Jason,
    I am intending to do my research work in electricity prices forecasting. I have electricity price data of 6 years from 2012 to 2017. And I need to forecast the value for 2017 using NEURAL NETWORK AUTO REGRESSIVE in R.The data set ranges from January 1st 2012 to December 31th 2017 (52608 observations, covering 2192 days). Each day of the data set comprises 24 observations, where each observation corresponds to a load period. For modeling and forecasting purposes,the data set is further divided into two sets:January 1st 2012 to December 31th 2016 (43848 observations, covering 1827 days) for identification and estimation of the models, and January 1st 2017 to December 31th 2017 (8760 observations, covering 365 days) for evaluating one-day-ahead out-of-sample forecasting accuracy of the models. I need your help. I’ve tried searching but couldn’t find a specific code of one day ahead forecasting with NN-AR .Can you kindly send me the code of neural network autoregression to make forecasts for one-day-ahead out-of-sample forecasting for the complete year 2017. I will be highly obliged for this favor. Thank you and have a nice day.

  253. Avatar
    Nigel February 14, 2021 at 12:32 pm #

    Hi Jason,

    Great book! Wish I understood more, but I’m on my way.

    About the tutorial. You’ve stacked the output sequence with the input sequence, and I’m trying to understand how it differentiates x from y.

    Let’s say, I have 10 input_seq and 1 out_seq how would you approach this?

    I tried it myself with some random numbers, but the code predicts all values along the x-axis, which takes forever with LSTM. Should I stack the output)seq at the end of the input_seq’s.

    Thanks in advance!

    • Avatar
      Jason Brownlee February 14, 2021 at 2:17 pm #

      Thanks.

      They are past observations of the target that we believe will help to predict future values of the target.

  254. Avatar
    Robet February 16, 2021 at 8:08 am #

    Hello I am new to machine learning and trying to wrap my head around some of the examples to find the best use cases for each.

    In the section on ‘Multiple Parallel Series’ is this procesessed as multiple paralel univariable predictions or multiple multivariable predictions?

    I am looking for a solution where it is the later. I was considering creating seperate multivariable models for each output but wondering if the parallel series might be the better way to go.

    • Avatar
      Jason Brownlee February 16, 2021 at 10:04 am #

      Multiple parallel univariate time series, which is a multivariate input time series.

      Perhaps experiment with a few of the approaches and see what is a good fit for your data.

  255. Avatar
    Gerard Church February 16, 2021 at 9:54 am #

    Hi Jason,

    Really great and informative article. My first time working with LSTMs but the input format is really clear and has been easy to understand.

    I am trying to adapt this to an a problem I am trying to solve. I am trying to predict net income from a financial income statement from 31 balance sheet and income statement items. I am using 3 years of quarterly data to predict this, thus a time step of 12. For each yhat, my x_train contains 12 lists for each quarter that contains the 31 independent balance sheet/ income statement variables being used to try and predict my yhat.

    Thus due to the fact my y_train has a length of 63, my input data is 63 x 12 x 31. This is stored at a list of arrays, each with 12 lists containing the 31 variables values for each quarter. The LSTM model really doesn’t like this format and gives the error:

    ValueError: Failed to find data adapter that can handle input: ( containing values of types {“”}), ( containing values of types {“”})

    Do you have any advice as to how to format this input into my LSTM? Hope the question is clear and thanks for the help!

  256. Avatar
    Anshuka Anshuka February 24, 2021 at 3:05 pm #

    Hi Jason,

    I have a question regarding Multivariate predictions.

    Say for example I have two sets of multivariate datasets with parallel input series in both.

    How can we use dataset (X) which is multivariate and has parallel input time series , to predict dataset (Y), which again is a multivariate dataset with parallel input series.

    Looking forward to your response.

    • Avatar
      Jason Brownlee February 25, 2021 at 5:23 am #

      The above examples under “Multivariate LSTM Models” can be used as a starting point and adapted directly.

  257. Avatar
    Alireza February 26, 2021 at 3:01 am #

    Hi Jason,

    Do you have any example for univariate multi-step time series?

    Thanks

    • Avatar
      Jason Brownlee February 26, 2021 at 5:04 am #

      Yes many, you can use the search box at the top of the page.

  258. Avatar
    H. March 2, 2021 at 6:57 am #

    There is an issue with the line


    model.add(LSTM(100, activation='relu', input_shape=(n_steps_in, n_features)))

    from section ‘Vector Output Model and 'Encoder-Decoder Model'

    since the following exception is thrown


    NotImplementedError: Cannot convert a symbolic Tensor (lstm/strided_slice:0) to a numpy array. This error may indicate that you’re trying to pass a Tensor to a NumPy call, which is not supported
    `

    How can this be resolved

  259. Avatar
    Martin March 2, 2021 at 5:22 pm #

    Hello Jason,

    Thanks for this brilliant blog post. it has really been helpful to me.

    However, I have got a real-world Spatio-temporal traffic dataset and I reckon that the procedure to model it as a supervised learning problem would be quite different from multivariate time series (as the order of the spatial variables matter).

    As an example: take the Spatio-temporal matrix

    T1 T2 T3 T4 T5 T6
    S1 | 67 | 34 | 24 | 54 | 49 | 67 |
    S2 | 61 | 55 | 23 | 42 | 53 | 78 |
    S3 | 74 | 83 | 55 | 50 | 62 | 68 |
    S4 | 48 | 73 | 78 | 56 | 61 | 78 |
    S5 | 80 | 58 | 67 | 54 | 51 | 89 |

    where the rows represent the spatial identity (the position of the detectors) and the columns represent the time interval for collection of the data.

    In formulating this as a supervised learning problem with 5 time-step per sample and 1 step prediction made at S3, would this be a logical formulation?

    Input:
    T1 T2 T3 T4 T5
    S1 | 67 | 34 | 24 | 54 | 49 |
    S2 | 61 | 55 | 23 | 42 | 53 |
    S3 | 74 | 83 | 55 | 50 | 62 |
    S4 | 48 | 73 | 78 | 56 | 61 |
    S5 | 80 | 58 | 67 | 54 | 51 |

    Output:
    68

    Also, Since I am working with a real Spatio-temporal dataset, do I need to split the samples into subsequences when using the ConvLSTM module?

    If No, for the example above, would this input to the ConvLSTM be correct:
    [no of samples, time-step=5, rows=spatial, columns=temporal, features=1]

    • Avatar
      Jason Brownlee March 3, 2021 at 5:26 am #

      It’s hard to be prescriptive, perhaps experiment and see what works/makes sense for your dataset.

  260. Avatar
    Rigveda Sengupta March 3, 2021 at 11:25 pm #

    Hi, just a quick question I am working with a multiple multivariate timeseries. Will the structure remain the same as the Multiple Input Series model discussed above?

  261. Avatar
    Zhou March 5, 2021 at 11:45 am #

    Hi Jason,
    Thank you for such an informative tutorial.
    But I’m having problems using the convLSTM module for multivariate time series prediction. I hope you can answer this for me, it is really important for me and I would appreciate it.

    My topic is to learn the train dataset to perform outlier detection on the test dataset. If the test set has no outliers, the convLSTM module can predict well. However, when I add outliers, the predictions change and I can’t do outlier detection.I can’t explain it very well.

    Only a simple example can be given.
    Suppose a feature in my training set is [1, 2, 3, 4, 5, 6, 7]
    And the corresponding test set is [8, 9, 10, 11, 11, 11, 14]

    Ideally, the prediction generated by learning the train set would be [8, 9, 10, 11, 12, 13, 14], which is used to prove that there are 2 outliers in my test set.But the real situation is that I get predictions similar to[8, 9, 10, 11, 11.1241, 11.3661, 14].

    Questions:
    1). The data in the prediction set and the test set are too close to each other, so I can’t do outlier detection.
    2). How to use the convLSTM module to perform multi-step prediction for multivariate sequences? Because I guess the reason for the first problem is that I am using the convLSTM module for single-step time series prediction.

    • Avatar
      Jason Brownlee March 5, 2021 at 1:38 pm #

      You’re welcome.

      Sorry, I don’t understand your first question sorry. Outlier detection would probably occur prior to modeling as a data prep step.

      You can perform multi-step prediction a few ways – all described above, e.g. vector out for an encoder-decoder model each time step.

  262. Avatar
    ZHAO, WENYU March 8, 2021 at 8:47 pm #

    Hi Jason!

    I would like to ask another question. After training a mulitivate lstm model, how do we know if the model is good or not?

  263. Avatar
    Aditya March 12, 2021 at 6:35 am #

    Hi Jason,

    I’m currently working on stock price prediction. As of now, I’ve used historical data of the end-of-day ‘Closing’ prices ONLY as univariate sequences. My aim is further improve the model by giving it more than just old ‘closing’ prices. I want to give it open, high and low too. From your article, I could understand that I can achieve this using Multivariate sequences. I have gained so much knowledge from this and I can make my project even better.

    Thanks a lot! I would be really happy if you can give me some tips!

  264. Avatar
    Hooman March 19, 2021 at 3:32 am #

    Hello Jason,

    Now I know how to develop a Multivariate Multistep forecasting model for the hourly weather forecasting task.

    But in case we are also given a day ahead “weather guess” dataset, how can I use these guessed values in a model? do you know any tutorial or blog post?

    In fact, we have a history of guesses and a history of actual values.
    Then a day ahead guess is passed to us, and we should make an accurate prediction using the history of this guess entity and the actual values

  265. Avatar
    MIchele April 1, 2021 at 1:28 am #

    Hi,

    thank you for this nice tutorial.
    I would like to know how to modify the multivariate multi-step forecasting in order to use keras’ SimpleRNN instead of LSTM.
    In particular, I would like to use Elman RNN. I have read that it can be implemented by connecting one SimpleRNN layer with a TimeDistributed(Dense) layer, but it is not clear to me how to do

    I have tried the following code:

    model = Sequential()
    model.add(SimpleRNN(100, return_sequences=True, input_shape=(n_steps_in, n_features)))
    model.add(TimeDistributed(Dense(n_steps_out, activation=’tanh’)))
    model.compile(optimizer=’rmsprop’, loss=’mse’)

    # fit model
    model.fit(X, y, epochs=200, verbose=0)

    but fit() fails raising the error:

    tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [6,3,2] vs. [6,2]

    Thank you in advance

    • Avatar
      Jason Brownlee April 1, 2021 at 8:19 am #

      You’re welcome.

      Sorry, I don’t have an example, perhaps use a little trial and error and discover how to make the required changes.

  266. Avatar
    Green April 1, 2021 at 5:17 am #

    Hello Jason,

    Is it possible to use LSTM without time. Just for coordinates. Input = coordinates, output = value (like temperature). For extrapolation task or interpolation.

  267. Avatar
    Michele April 3, 2021 at 12:20 am #

    Actually, I am using your multivariate multi-step example (version with one LSTM layer), just replacing LSTM with SimpleRNN and Dense with RimeDistributed(Dense).

    Apparently, the problem is the shape of the y data structure. I made the following change:

    X, y = split_sequences(dataset, n_steps_in, n_steps_out)
    y = y.reshape(y.shape[0],1,y.shape[1]) # <– added this one
    print(X.shape, y.shape)

    Now, the model design, train and test is:

    # define model (Elman RNN)
    model = Sequential()
    model.add(SimpleRNN(100, activation="sigmoid", return_sequences=True, input_shape=(n_steps_in, n_features)))
    model.add(TimeDistributed(Dense(n_steps_out, activation='tanh')))
    model.compile(optimizer='rmsprop', loss='mse')
    # fit model
    model.fit(X, y, epochs=200, verbose=0)
    # demonstrate prediction
    x_input = array([[70, 75], [80, 85], [90, 95]])
    x_input = x_input.reshape((1, n_steps_in, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(yhat)

    The resulting yhat is:

    [[[1. 1.]
    [1. 1.]
    [1. 1.]]]

    which is not good in shape and values. What am I still missing?

    • Avatar
      Jason Brownlee April 3, 2021 at 5:34 am #

      Sorry, I have not used “SimpleRNN” and “RimeDistributed”. I don’t know the cause of your problem.

      Perhaps these tips will help:
      https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code

      • Avatar
        Michele April 3, 2021 at 8:14 am #

        RimeDistribuited was a typo, I actually meant: TimeDistributed

        I was not asking to debug my code, of course.

        Recently, I purchased a couple of your books, which unfortunately do not help me in solving theproblem. I thought you were at least able to provide useful hints – not just a link to the FAQ.

        Nevermind, I will find the solution and publish it for free. 😀

  268. Avatar
    Lu April 7, 2021 at 12:41 am #

    Hi Jason,

    Thanks for the post. It is very helpful.

    I created a LSTM model:

    model = Sequential()
    model.add(LSTM(20, activation=’relu’, return_sequences=True, input_shape=(5,12)))
    model.add(Dense(20, activation=’relu’))
    model.add(Dense(1, activation=’sigmoid’))
    model.compile(optimizer=’adam’, loss=’binary_crossentropy’, metrics=[‘accuracy’])

    As you can see the dimention of model output is (1,)

    However after training when I run model prediction, I got multiple output:

    predictions = model.predict_classes(X_test[0].reshape((1, 5, 12)))
    predictions.shape, predictions

    Output:

    ((1, 5, 1),
    array([[[0],
    [0],
    [0],
    [0],
    [0]]]))

  269. Avatar
    SURBHI SINGH April 13, 2021 at 4:54 am #

    I have a question, if i want to get back the test data used in the model in its original form , so as to plot it against the predicted values with the dates on the x-axis, is there a way to do it ?

  270. Avatar
    Trishala April 14, 2021 at 3:42 pm #

    Hello Jason,

    I have a question regarding creating samples. I want to create samples for the Closing price for 60 days window but give labels to them. Using this code

    from numpy import array

    # split a univariate sequence into samples
    def split_sequence(sequence, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequence)):
    # find the end of this pattern
    end_ix = i + n_steps_in
    out_end_ix = end_ix + n_steps_out
    # check if we are beyond the sequence
    if out_end_ix > len(sequence):
    break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix]
    X.append(seq_x)
    y.append(seq_y)
    return array(X), array(y)

    # define input sequence
    raw_seq = df[‘Close’]
    # choose a number of time steps
    n_steps_in, n_steps_out = 60, 60
    # split into samples
    X, y = split_sequence(raw_seq, n_steps_in, n_steps_out)
    # summarize the data
    for i in range(len(X)):
    print(X[i], y[i])

    I am able to create X samples but for Y samples I want to give them labels 0 and 1. On the condition

    X_T: T+1, T+2, …, T+60

    Y_T: ==1, if the price increases by 6% before going down 3% within 3 trading days; ==0, otherwise.

    How should I do this ?

  271. Avatar
    Marimuthu S April 14, 2021 at 5:34 pm #

    Hello Jason, Greetings.

    Hope you are doing well. Thanks for the post. This is very useful. However, I have a doubt.

    How do I choose the optimal time step for my data. or Shall I use ACF or PACF plot to choose the optimal time step? please advise. Thanks in advance

    • Avatar
      Jason Brownlee April 15, 2021 at 5:24 am #

      Perhaps ACF/PACF plots will help, perhaps grid search, perhaps trial and error.

  272. Avatar
    Kerry April 16, 2021 at 7:08 am #

    Dear Jason,
    I appreciate your instructive blog. I lean a lot from you.
    I am trying to teach supervised to several LSTMs and then make a Max pooling between their hidden states. Can you help me whether there is such an ability in LSTMs embedded or I need to make it available by myself?

    • Avatar
      Jason Brownlee April 17, 2021 at 6:01 am #

      You may need to write some custom code or a custom layer.

  273. Avatar
    Marimuthu S April 19, 2021 at 1:27 pm #

    Thanks for the reply, Jason.

  274. Avatar
    Bahadir April 19, 2021 at 11:22 pm #

    Hello Jason, thanks for this great tutorial!
    I have a question and I would be glad if you share your idea.

    I have a dataset of frames obtained from a gameplay video and each frame (row) in the dataset has the following columns (in a simplified manner): time, bitrate_kbps, game stage (0: Exploration, 1: Combat)
    As an example of random 6 adjacent frames:
    2.2, 208, 1
    2.3, 211, 1
    2.5, 215, 1
    2.6, 219, 0
    2.7, 222, 0
    2.9, 221, 1

    My goal is to train a model (e.g. with LSTM) with this time-series data to be able to classify game stages according to the bitrate data. The model should be able to assign the correct game stage labels to the unlabeled time series of frames such: time, bitrate_kbps.
    What kind of approach would be a good way to train such a model? Thanks!

  275. Avatar
    George G April 23, 2021 at 12:09 am #

    Hi Jason and thanks for your posts!

    In your multivariate multi-step stacked lstm example, If I had:

    n_steps_in, n_steps_out = 3, 2

    and for x_input another one line, so:

    x_input = array([[[70, 75], [80, 85], [90, 95]],
    [[100, 105], [110, 115], [120, 125]]])

    then the output would be:

    yhat = array([[182.84283, 212.43597],
    [247.65134, 288.84436]], dtype=float32)

    Now, let’s say that I have the dates information also
    (so all this refers to data in certain dates by every day step).

    So for the first data which is on 1/4/21

    [[[ 70, 75],
    [ 80, 85],
    [ 90, 95]] the +1 day value is 182.84283 (2/4/21) and the +2 days is 212.43597 (3/4/21) ?

    And for the next set of input which is on 2/4/21

    [[100, 105],
    [110, 115],
    [120, 125]] the +1 day value is 247.65134 (3/4/21) and the +2 days is 288.84436 (4/4/21) ?

    But on 3/4/21 I have two values now!

    Please, if you want to clarify because I am confused!

    Thank you!

      • Avatar
        George G April 23, 2021 at 5:08 pm #

        Hi Jason,

        So, since I have 2 samples and 3 timesteps:

        1st sample
        ———–
        [[[ 70, 75] -> 1/4/21
        [ 80, 85] -> 2/4/21
        [ 90, 95]] -> 3/4/21

        the output is:
        182.84283 is on 4/4/21 and 212.43597 on 5/4/21 , right?

        2nd sample
        ———-
        [[100, 105] -> 4/4/21
        [110, 115] -> 5/4/21
        [120, 125]] -> 6/4/21

        the output is:
        247.65134 is on 7/4/21 and 288.84436 on 8/4/21, right?

        So,I am predicting for 4,5,7,8 of April?
        Where is the prediction for 6/4 ?

        • Avatar
          Jason Brownlee April 24, 2021 at 5:17 am #

          You can frame the data any way you want.

          I think it would be better to shift each sample down by one time step, instead of 3, but you can do whatever you think is best for your dataset and model. If you’re not sure, perhaps try a few different approaches and compare results.

          • Avatar
            George G April 24, 2021 at 5:55 am #

            Ok, but what if I have this frame as above?
            3 steps in and 2 steps out. How to deal with the dates, that’s my problem.

          • Avatar
            Jason Brownlee April 25, 2021 at 5:12 am #

            My point is you can prepare your data so you have [3,4,5]->[6,7] if you want.

  276. Avatar
    Abraham Rodarte April 26, 2021 at 7:03 pm #

    Do you have any work about Multiple Parallel Input, Multi-Step Output and Multiple Output for Time Series Forecasting?
    The problem I have is that I have 6 features and I want to predict 3 with their respective test and training like the air pollution blog.

  277. Avatar
    Mingkai April 29, 2021 at 10:29 am #

    Hi Jason,

    Thanks for your post, it was very helpful for me to start LSTM.
    My problem is to predict a time series, say prices over time, and apart from the historic real prices, I also have some forecasted prices from another source, for the next n time intervals, and I want to use them as additional features.

    To test the accuracy of the model, I substitute the forecasted prices with real price. Say I want to predict a price that follows pi: [3 1 4 1 5 9 2 6 7 ..], I use a data input structure look like this:

    X[0,:,:] =
    [[ 3 1 4]
    [ 1 4 1]
    [ 4 1 5]
    [ 1 5 9]]

    Y[0,:] = [5 9]

    X[1,:,:] =

    [[ 1 4 1]
    [ 4 1 5]
    [ 1 5 9]
    [ 5 9 2]]

    Y[1,:] = [9 2]

    and so on,

    As a test, I used a simple single layer LSTM + a dense layer as output.

    model.add(LSTM(10, activation=’relu’, return_sequences=False, input_shape=(4, 3))
    model.add(Dropout(0.1))
    model.add(Dense(2))

    But it seems the current configuration can not figure out there is a relationship between the diagonal element in the input, even the inputs already have the answer. The error is quite large.

    Is there any LSTM or other model structure you see will be helpful?

    Thank you very much!

    MK

  278. Avatar
    Vishnu May 1, 2021 at 6:39 pm #

    Hello,

    I want to know the significance of the number of steps we use.

    In these examples the number of steps used are 3 ? does this mean that every time the LSTM is trained it looks only at the last 3 time steps ?

    Does this mean that if we want the LSTM to look over temporal dependencies over a longer time period we need to increase the number of steps accordingly ?

    I don’t understand this part.

    • Avatar
      Jason Brownlee May 2, 2021 at 5:29 am #

      The configuration was arbitary. I recommend tuning the problem representation and model for your specific dataset.

  279. Avatar
    emma May 2, 2021 at 2:31 am #

    hello jason
    please can you explain the function split_sequence i can’t understand how the function work …
    please # gather input and output parts of the pattern
    seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] those too lines specifically

  280. Avatar
    Atul Upadhyay May 7, 2021 at 5:04 am #

    Hey Jason, Needed some help with my project.
    I am working on a project to predict future demands.
    Its as univariate forecasting. (only two columns i.e. Date and Demand)

    I have trained my model for the year 2015-2016 (having the data only of both these year), and want to predict for the year 2017 (the next 365 days).

    How can I do this

  281. Avatar
    Eva May 7, 2021 at 11:31 pm #

    Thanks for this great tutorial, Dr. Jason.

    In the univariate LSTM model that uses CNN as feature , you use a kernel of size 1.

    model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation=’relu’),
    input_shape=(None, n_steps, n_features)))
    model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))

    The configuration may have been chosen arbitrarily, but the model performed better with kernel size 1. What is the intuition behind this size?

    Thanks

    • Avatar
      Jason Brownlee May 8, 2021 at 6:37 am #

      It may suggest the CNN is not adding any value to the model.

  282. Avatar
    Nima May 9, 2021 at 10:33 am #

    Hi Jason. I wanna use the last model ” Multiple Parallel Input and Multi-Step Output” for stock prediction, but I face this error: “AttributeError: module ‘tensorflow.python.framework.ops’ has no attribute ‘_TensorLike'”

    The code that I have been using is as follows. I exactly copied the code and transformed my data to fit the model but I faced an error.

    Thanks

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    %matplotlib inline
    import yfinance as yf
    from datetime import date
    from dateutil.relativedelta import *
    from copy import deepcopy
    import pickle

    import warnings
    warnings.filterwarnings(“ignore”)

    from numpy import array
    from numpy import hstack
    from keras.models import Sequential
    from keras.layers import LSTM
    from keras.layers import Dense
    from keras.layers import RepeatVector
    from keras.layers import TimeDistributed

    stocks = [‘AAPL’,’TSLA’,’UPS’, ‘FDX’, ‘FB’]
    today = date.today()
    Initial_period = today + relativedelta(months=-24)

    data = pd.DataFrame(columns=stocks)

    for s in stocks:
    dt = yf.download(s,Initial_period, today)
    data[s]= dt.reset_index()[‘Close’].values

    # split a multivariate sequence into samples
    def split_sequences(sequences, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequences)):
    # find the end of this pattern
    end_ix = i + n_steps_in
    out_end_ix = end_ix + n_steps_out
    # check if we are beyond the dataset
    if out_end_ix > len(sequences):
    break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix:out_end_ix, :]
    X.append(seq_x)
    y.append(seq_y)
    return array(X), array(y)

    # choose a number of time steps
    n_steps_in, n_steps_out = 50, 7
    # covert into input/output
    X, y = split_sequences(data.values, n_steps_in, n_steps_out)
    print(X.shape, y.shape)

    model = Sequential()
    model.add(LSTM(200, activation=’relu’, input_shape=(n_steps_in, n_features)))
    model.add(RepeatVector(n_steps_out))
    model.add(LSTM(200, activation=’relu’, return_sequences=True))
    model.add(TimeDistributed(Dense(n_features)))
    model.compile(optimizer=’adam’, loss=’mse’)

  283. Avatar
    George G May 10, 2021 at 7:48 pm #

    Hi Jason, I have one question . Can you please check this question?

    https://stackoverflow.com/questions/67467590/lstm-timesteps-and-features-selection

    Thanks!

      • Avatar
        George G May 11, 2021 at 4:02 pm #

        Ok Jason, so

        I am using 6 features and each feature has 7 timesteps, so I have:

        feature1(t-7) feature2(t-7) feature3(t-7) … feature6(t-7)… feature5(t) feature6(t) .. feature5(t+1) feature6(t+1)

        I am predicting the t and t+1 timesteps.

        So, my input data is [?, 7, 42] (6 features * 7 timesteps).

        Now, at first I was doing:

        X_train = X_train.reshape((X_train.shape[0] , 1 , X_train.shape[1]))

        and

        nb_timesteps, nb_features = 7, X_train.shape[2]

        I want to use 7 timesteps, but as you can see the input data has shape

        [?, 1, 42] and not [?, 7, 42]

        so, I show a warning about that.

        How can I overcome this, if I want to use 7 timesteps?

        My solution is to reshape data (after confirming that my length of data is a multiple of 7)

        X_train = X_train.reshape((X_train.shape[0] , 7 , X_train.shape[1] // 7))

        but now I am using 7 timesteps (ok I want that) and 6 features instead of 42.

        I want to ask if this is ok. I mean, with this setup I am using the 6 features for only for the (t-7) step and at the same time I am using 7 timesteps.

          • Avatar
            George G May 12, 2021 at 4:31 pm #

            I was just saying that if I do reshape, the data is mixed up.
            Then , what features should I place in the last dimension? [samples, timesteps, features].

            Should I have all 42 features? (t-7),(t-6)…(t-1) ?

            Or should I have 6 features ? And at what time reference? (t-7) , (t-6) .. (t-1)?

          • Avatar
            Jason Brownlee May 13, 2021 at 6:00 am #

            I try to avoid being descriptive as I never have all of the details of a reader’s dataset.

            I guess it is a design decision, likely based on the native structure of the data you are working with.

            The link I provided should help you think it through, otherwise prototype some approaches with pen and paper of some vanilla python and print() the results to see what makes sense.

          • Avatar
            Sam B June 19, 2022 at 5:23 pm #

            Hi, I’ve got a silly question but I see variables named like nb_timesteps, nb_features. What does nb actually mean? Thanks!

          • Avatar
            James Carmichael June 20, 2022 at 11:39 am #

            Hi Sam…I do not see what you are referencing, however there would be significance to it as it is just part of variable name. In other words, you could also just call them…”nx_timesteps”, “ab_features” and the like.

  284. Avatar
    Jose May 11, 2021 at 1:59 pm #

    Good evening, thanks for all the material you have published, as a newbie they have been a great help to me. In my case I am working on a time series problem, which consists of the disintegration of residential electrical energy. My problem can be summarized as follows: I have two time series as input, which can be interpreted in a certain way as the sum of the output series. I have the two input data series and 22 output time series. The objective is that once the model receives the two input series, it can reconstruct the 22 series that compose it. Please can you give me a guide between your tutorials and books which may be the most appropriate for my case. Can I reference the book that I purchase? Thank you.

  285. Avatar
    Ming May 13, 2021 at 11:59 am #

    Hi Jason,why do i use Encoder-Decoder Model for muti-step forecast(24steps) had bad result? it only can predict the trend for me ,can you help me? thank you very much

    • Avatar
      Jason Brownlee May 14, 2021 at 6:19 am #

      It may or may not give a good result for a given dataset. We cannot know beforehand.

  286. Avatar
    Sean May 13, 2021 at 7:12 pm #

    Why do you define the input_shape as shape of 2D? What is the difference between input_shape and batch_input_shape?

  287. Avatar
    David Espinosa May 21, 2021 at 10:33 am #

    Good day Jason, first, thanks for the awesome tutorial.

    Second, I have two doubts regarding RNN in general.

    1) I have read in some forums that “each sample ‘should’ be of an integer type”, and in others they say that “RNN can deal with series of numbers, no matter the type”. Plus, the examples used in some of your other tutorials (https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/), show some float numbers used inside each sample. Which type is “preferred” for working with RNN?
    2) Related with the previous question, I am exploring the behaviour of some DNN architectures for binary classification. I have a mixed-type dataset (with both integer and float number), but I don’t know if I could use it “as is”, or turn them into some specific format (all integer, all float, if they are categories OHE, standardize / normalize)…

    I think both question pretty much redundate with each other, but anyways, I want to make sure I am well understood.

    Thanks beforehand for your thoughs about my query, and stay safe.

    • Avatar
      David Espinosa May 21, 2021 at 10:35 am #

      Hello Jason,

      I just wanted to clarify of my ‘doubt # 2’, that I am focusing specifically to LSTM-RNN.

      Thank you again.

    • Avatar
      Jason Brownlee May 22, 2021 at 5:30 am #

      Yes, generally RNNs should take small floats as input.

      Try your model on the raw data and compare to scaled data and use whatever works best for you.

  288. Avatar
    Rasoul May 25, 2021 at 12:15 am #

    The code worked for me with the followin changes,

    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import LSTM
    from tensorflow.keras.layers import Dense

    I had to install tensorflow==1.12.0 and keras==2.2.4 (my python version is 3.6.8, and I am on Windows 10, no anaconda!)

    Hope this is helpful for other people facing problems regarding package incompatibility.

  289. Avatar
    Krisna GIta June 7, 2021 at 10:59 am #

    Hi Jason,
    I tried your CNN-LSTM model and I got an error message. The error message was “ValueError: Please initialize TimeDistributed layer with a tf.keras.layers.Layer instance. You passed: ”

    Would you like to help me solve this error? Thank you

    • Avatar
      Jason Brownlee June 8, 2021 at 7:10 am #

      I recommend using the Keras API directly instead of tf.keras.

  290. Avatar
    mhr June 7, 2021 at 2:00 pm #

    I was thinking about making a model for multiple separate ( sale forecast of a shop for different product) using a single model. I have found different ways but they are not concrete .I have studied ESRNN lib from github but it seems my data magnitude is too low like :
    product_id,date,count
    1101,1-5-2020,1
    1101,2-5-2020,4
    1101,3-5-2020,0
    1101,4-5-2020,0
    1101,5-5-2020,4 ….
    Is it possible to add embedded layer to parse the id then using the split_sequences method of yours to train a model that works for all product .

  291. Avatar
    Peter June 7, 2021 at 8:33 pm #

    Thanks Jason for the tutorial. I have a question regarding the Multiple Input Multi-Step Output. You use the last 3 timesteps of the 2 time series [(10, 15); (20, 25); (30, 35)] to predict the next 2 timesteps [65,85]. Basically the 65 is from the same timestep as the (30,35). So why would you want to predict a value from a timeslot that you have already observerd (otherwise you would not have the input (30,35))? Would it not make more sense to predict the next 2 timeslot after the time slot with the (30,35) which led to 65? So basically you should predict [85, 105] when having [(10, 15); (20, 25); (30, 35)] as input.

    I’d appreciate every comment and would be quite thankful for your help.

    • Avatar
      Jason Brownlee June 8, 2021 at 7:15 am #

      We are evaluating the model using walk-forward validation.

      Once you choose a model and config, you fit the model on all data and start making predictions on new data.

      Perhaps this will help:
      https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/

      • Avatar
        Peter June 8, 2021 at 7:37 pm #

        Thanks Jason for your answer, I know that you are using a walk-forward validation and I know how this works. This was not the point of my question. I am wondering why you forecast the values of the same timeslot for which you have the inputs? Normally you should forcast the values of the NEXT timeslot because this is – by definition – what a forecast is supposed to do.

        • Avatar
          Peter June 8, 2021 at 7:39 pm #

          You are forecasting the output (Timeseries_3) of Timeslot_3 which is 65 using – amongst others – the inputs of Timeslot_3 (Timeseries_1:30 and Timeseries_2: 35). For me this does not make sense. Surely it makes sense to forecast Timeseries_3 of Timeslot_4 because this is a future value while Timeseries_3 for Timeslot_3 is not a future value when you are in Timeslot_3.

          So why do you not use Timeslot_1, Timeslot_2 and Timeslot_3 to forecast Timeslot_4 and Timeslot_5? You are using Timeslot_1, Timeslot_2 and Timeslot_3 to forecast Timeslot_3 and Timeslot_4

        • Avatar
          Peter June 8, 2021 at 7:41 pm #

          Timeslot_1: Timeseries_1: 10, Timeseries_2: 15, Timeseries_3: 25

          Timeslot_2: Timeseries_1: 20, Timeseries_2: 25, Timeseries_3: 45

          Timeslot_3: Timeseries_1: 30, Timeseries_2: 35, Timeseries_3: 65

        • Avatar
          Jason Brownlee June 9, 2021 at 5:43 am #

          That was the framing of the problem I was solving. You can frame the prediction problem anyway you like.

          • Avatar
            Peter June 9, 2021 at 5:14 pm #

            Ah okay. Thanks a lot for your tremendous help. I really appreciate it.

  292. Avatar
    Amelie June 15, 2021 at 12:20 am #

    Hello,

    The LSTM is well modeled my time series with acceptable errors.

    However, the forecasting value (after the test set of my real time series) are very far from what is called normal data.

    Is it normal?
    can you tell me more.

    • Avatar
      Jason Brownlee June 15, 2021 at 6:07 am #

      Perhaps you need to prepare the data prior to modeling?
      Perhaps you need to tune the model?
      Perhaps the model is not appropriate for your dataset?

  293. Avatar
    Ibrahim Adigun June 21, 2021 at 12:28 am #

    Hello,

    I have about to use LSTM for a price prediction case, but i gave addition data like, Age, Region, Town, payment method, different date (First and last payment) and so on.

    I want to know, if i will be able to use those those for LSTM model, This is my first project on NN.

    Thank you

  294. Avatar
    Anu A June 21, 2021 at 6:02 pm #

    Hi Jason, thank you for the informative and detailed tutorials! I noted that you use the ‘relu’ activation function for the LSTM layers instead of the default ‘tanh’ activation. May I ask why? Thank you!

  295. Avatar
    Anu A June 22, 2021 at 1:25 pm #

    Thank you very much for your reply! Sorry, but could you please clarify in what way it is more effective, and in what cases it might be preferred? Thank you!

    • Avatar
      Jason Brownlee June 23, 2021 at 5:32 am #

      I noticed empirically on some problems that using RELU for some simple univariate time series was more effective.

      I recommend that you test a suite of model configurations and discover what works best for your specific dataset and model.

      • Avatar
        Anu A June 23, 2021 at 11:40 am #

        Thank you for your clarification!

  296. Avatar
    Irene June 22, 2021 at 5:20 pm #

    Thank you for your informative post!
    I have a question for ‘Multiple Input Multi-Step Output’ process.

    when I trained, I’d like to add validation set.
    is it a good way to add validation set?
    and if it is, how can I set?

    is it right to split train/validation/set disjointly??

    Thank you in advance!

    • Avatar
      Jason Brownlee June 23, 2021 at 5:35 am #

      I don’t think using a validation set with an LSTM model is appropriate.

  297. Avatar
    Irene June 23, 2021 at 9:47 am #

    Can I ask why?.. I’m lack of information about CNN or LSTM yet…

    • Avatar
      Jason Brownlee June 24, 2021 at 5:57 am #

      Because we cannot perform walk-forward validation on future time steps and use the same time steps (or different future time steps) for validation.

  298. Avatar
    George June 24, 2021 at 6:44 pm #

    Hi Jason,

    In your example Multivariate Multi-Step LSTM Models->Multiple Input Multi-Step Output,

    where you use n_steps_in, n_steps_out = 3, 2 , if we use for example sigmoid for the last layer and binary crossentropy loss:


    n_steps_in, n_steps_out = 3, 3

    X, y = split_sequences(dataset, n_steps_in, n_steps_out)

    n_features = X.shape[2]

    model = Sequential()
    model.add((LSTM(5, activation=’relu’, return_sequences=True, input_shape=(n_steps_in, n_features))))

    model.add(Dense(1, activation=’sigmoid’))
    model.compile(optimizer=’adam’, loss=’binary_crossentropy’, metrics=[‘accuracy’])

    model.fit(X, y, epochs=20, verbose=0, batch_size=1)

    it runs ok.

    BUT, if we use n_steps_in, n_steps_out = 3, 2, it gives:


    ValueError: Dimensions must be equal, but are 2 and 3 for ‘{{node binary_crossentropy/mul}} = Mul[T=DT_FLOAT](binary_crossentropy/Cast, binary_crossentropy/Log)’ with input shapes: [1,2], [1,3].

    Any ideas what is that and how to deal with it?

    Thank you!

    • Avatar
      Jason Brownlee June 25, 2021 at 6:12 am #

      Sorry, it’s not clear what the issue may be. You may need to use a little trial and error in adapting the model for your specific use case.

  299. Avatar
    fan zhang June 25, 2021 at 11:13 pm #

    hi Jason, thanks for the tutorial, that’s very helpful, I found that by changing the batch_size in the predict() method, the prediction values change (I used your # univariate stacked lstm example and just changed the batch_size in the predict() method below)….
    yhat values are almost the same as yhat1 (because the default batch size 32 is similar to 41), but yhat2 values differ a lot from yhat1 and yhat…..since it is a stateless lstm, how come changing the batch size in predict method change the prediction values?

    i really appreciate your time and help in advance 🙂

    # univariate stacked lstm example
    from numpy import array
    from keras.models import Sequential
    from keras.layers import LSTM
    from keras.layers import Dense
    from keras.utils import plot_model

    # split a univariate sequence
    def split_sequence(sequence, n_steps):
    X, y = list(), list()
    for i in range(len(sequence)):
    # find the end of this pattern
    end_ix = i + n_steps
    # check if we are beyond the sequence
    if end_ix > len(sequence)-1:
    break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
    X.append(seq_x)
    y.append(seq_y)
    return array(X), array(y)

    # define input sequence
    raw_seq = list(range(1,65))
    # choose a number of time steps
    n_steps = 2

    # split into samples
    X, y = split_sequence(raw_seq, n_steps)
    # reshape from [samples, timesteps] into [samples, timesteps, features]
    n_features = 1
    X = X.reshape((X.shape[0], X.shape[1], n_features))
    # define model
    model = Sequential()
    model.add(LSTM(50, activation=’relu’, return_sequences=True, input_shape=(n_steps, n_features)))
    model.add(LSTM(50, activation=’relu’))
    model.add(Dense(1))
    model.compile(optimizer=’adam’, loss=’mse’)
    # fit model
    model.fit(X, y, epochs=200, verbose=0)
    plot_model(model)

    # demonstrate prediction
    x_input = array(list(range(2,166)))
    x_input = x_input.reshape((-1, n_steps, n_features))

    yhat = model.predict(x_input, verbose=0, batch_size=41)

    yhat1 = model.predict(x_input, verbose=0)

    yhat2 = model.predict(x_input, verbose=0, batch_size=2)

    and yhat != yhat2 != yhat1

  300. Avatar
    fan zhang June 25, 2021 at 11:37 pm #

    em..just a follow up commet, the difference are quite minor (probably can be ignored):

    yhat2[-1]
    Out[3]: array([169.57353], dtype=float32)

    yhat1[-1]
    Out[4]: array([169.57355], dtype=float32)

    yhat1[-2]
    Out[5]: array([167.4769], dtype=float32)

    yhat2[-2]
    Out[6]: array([167.47688], dtype=float32)

    yhat2[-4]
    Out[7]: array([163.28676], dtype=float32)

    yhat1[-4]
    Out[8]: array([163.28674], dtype=float32)

  301. Avatar
    sinfer July 8, 2021 at 3:31 am #

    HI Jason,

    Can you give me an idea on how to choose the time steps for lstm model used for fault detection and diagnosis of time series data with 7 faults and normal condition data labeled within data set. I have decided to go with 8 time steps since there are 8 types of conditions(7 faults and normal). 8 time series .

    Finally i want to send last 10 data points to the predict function and return the condition( fault type or normal). Multiple data points as input, predicts the class label based on the input data points. A multi-class classification problem.

    Thanks

    • Avatar
      Jason Brownlee July 8, 2021 at 6:09 am #

      Perhaps you can test a suite of configurations and discover what works best for your specific dataset.

  302. Avatar
    Lakmini July 9, 2021 at 3:05 pm #

    Hi Jason,

    This is a great article. Can we use LSTM to impute missing data in time series?

    • Avatar
      Jason Brownlee July 10, 2021 at 6:05 am #

      Yes, perhaps try it and compare results to other methods.

  303. Avatar
    Robin Bartmann July 9, 2021 at 11:12 pm #

    Hey Jason,

    Thanks for these fantastic blogposts!
    I used a lot of your inputs to develop the code for my thesis – Forecasting carbon market prices with Bayesian and Machine Learning methods. I performed 1step and 4step ahead forecasts with a multivariate (6 covariates), direct rolling window forecast with 3 models to compare:
    1) normal linear regression
    2) a shrinkage time varying parameter model (shrinkTVP in R)
    3) LSTM model (from your blogposts)

    I am still finalizing the results and will post them here to compare the performance between these models over time. I use weekly data from 2013-2020. Let me know if you are interested in something particular / if there is something that would help this community most.

    Really big thank for the great resources – I am an economist and will continue to use all the resources here to advance econometric methods!

    • Avatar
      Jason Brownlee July 10, 2021 at 6:11 am #

      Well done!

      Sharing may help other people using the same methods or working on the same problem.

  304. Avatar
    Lio July 12, 2021 at 11:53 pm #

    Hi Jason,
    Thank you for providing such a good article for us!
    In the process of learning LSTM,I encountered some doubts.I hope to get your advice.
    I find that the predicted value lags behind the actual value.It’s like the curve of the actual value make parallel movement to the curve of the predicted value.What is the cause of this phenomenon? Is there any solution?
    I hope to hear from you soon.

  305. Avatar
    Lio July 15, 2021 at 6:39 pm #

    OK, thank you for your reply. I hope I can learn more from your article.
    If you can learn more about lag, I hope you can tell me. I will be indebted forever.

    • Avatar
      Jason Brownlee July 16, 2021 at 5:22 am #

      You can vary the amount of lag used as input in order to discover what works well or best for your specific dataset and model.

      • Avatar
        Lio July 19, 2021 at 12:52 pm #

        Thank you very much. I will try as you say.

  306. Avatar
    Ansh July 19, 2021 at 5:40 pm #

    Hi Jason,

    I have a question regarding the splitting of data for multivariate analysis.

    According to the book Deep Learning for Time Series Forecasting Predict the Future with MLPs, CNNs and LSTMs in Python for the following example:

    time, measure1, measure2
    1, 0.2, 88
    2, 0.5 89
    3, 0.7 87

    The data can be converted into supervised series as follows:
    time, measure1, measure2
    1, ?, 88
    2, 0.2, 89
    3, 0.5 87
    4, 0.7, ?

    Which means the first and last rows fall off.

    However, in your multivariate example for this dataset and window = 3
    [[10, 15, 25]
    [ 20, 25, 45]
    [ 30, 35, 65]
    [40, 45, 85]
    [50, 55, 105]
    [60, 65, 125]
    ………………..]]

    When given an input of:
    10, 15
    20, 25
    30, 35

    The output is :
    65
    85

    Shouldn’t the output be [85, 105], assuming the first set data value [65] falls off as the case in the first example.

    I also reran the same example with window size 1, and for the first row of data [10,15] the output was 25, but should it be 45 instead, given that there is no previous data to predict 25 and the first row should fall off ?

    This is the split function I am using :
    def msplit_sequence(sequences, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequences)):
    # find the end of this pattern
    end_ix = i + n_steps_in
    out_end_ix = end_ix + n_steps_out-1
    # check if we are beyond the dataset
    if out_end_ix > len(sequences):
    break
    # gather input and output parts of the pattern

    seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1:out_end_ix, -1]
    X.append(seq_x)
    y.append(seq_y)
    return array(X), array(y)

    Looking forward to your response.

    • Avatar
      Jason Brownlee July 20, 2021 at 5:34 am #

      Each problem has different requirements and expectations. You can define the input and output of your problem any way you like.

      • Avatar
        Dion August 16, 2022 at 6:19 am #

        Good day
        Searching Google came across your blog is very interesting, I’m a beginner just starting to learn prediction….

        Needed to know , can this be done:

        Race : 1
        5 runners, 400m track race

        1. At 100m , 10.5s / 200m, 19.8s/300m,30.1s/400m, 43.5s
        Then runner 2,
        Then runner 3,
        Then runner 4
        Then runner 5.
        All with times at 100m, 200m,300m,400m individually performed

        Can I predict who’ll be 1st/2nd/3rd/4th with new predicted times for each runner at 100m/200m/300m/400m intervals.

        Race :2
        Another scenario i have 400m race, only 300m sectional 30.1s then for each runner with their individual times achieved and final time 43.5 then for each runner have their 400m times achieved , can i still predict predicted intervals of each runner and each runners position at 100/200/300/400m is this possible ?

        Can result be in this format ?

        Example
        100m – 5, 10.5s / 3, 10.3s /1, 09.9s/4, 10.2s / 2, 11.3
        200m- same like 100m calculations
        300m- same like 100m calculations
        400m – same like 100m calculations

        Appreciate your assistance….

        Await your response

        Thanks

        • Avatar
          James Carmichael August 16, 2022 at 9:39 am #

          Hi Dion…Please narrow your query to single question so that we may better assist you.

  307. Avatar
    Omar July 23, 2021 at 9:10 pm #

    Hi Jason,
    I’m new to Time Series Forecasting. I would appreciate your help. I am currently trying to predict how much a person drinks each day. I have timestamps every 30 minutes and a corresponding value that represents the drunk amount within those 30 minutes. You can already imagine I have a lot of 0 values in the middle. Moreover, a person only drinks from 8AM until 8PM but the data nevertheless spans the whole day (So always 0s from 8PM until 8AM the next day and 1 day is 48 entries). I have also another version of the dataset where the data spans only 8AM till 8PM (1 day is 24 entries).
    I already tried Croston’s Method but I am trying to have a dynamic solution, I am trying to implement a Neural Network for this. Would you point me to the right direction? Will LSTMs for example work for intermittent data? Which version of the data would make the model less complicated?
    Ps: Your blog is extremely helpful, thanks a lot.
    Best,
    Omar

    • Avatar
      Jason Brownlee July 24, 2021 at 5:14 am #

      I would recommend testing a suite of different framing of the problem, different models, different configurations until you find a technique that works well for your dataset.

  308. Avatar
    Mohamed Elhaj Abdou August 4, 2021 at 2:23 pm #

    I have a dataset timeseries forecasting that includes the categorical columns and numeric as well.

    here is a sample of it

    Date | categorical _fature_1 |categorical _fature_2| Feature_1_numeric | feature_2_numeric | price

    1-1-2020 | USA | A | 5.5 | 7.6 | 100

    1-1-2020 | USA | B | 8.3 | 1.7| 20

    1-1-2020 | USA | C | 3.6 | 2.1 | 17

    1-2-2020 | USA | D | 5.5 | 7.6 | 40

    1-2-2020 | USA | E | 77.5 | 35 | 22

    1-2-2020 | USA | F | 69.5 | 2 | 22

    as you can see in the sample in the date lets pick up the **1-1-2020** we have multiple observations at the same date .

    i want to predict the **Price** column as a **Y_label** and taking the **categorical _fature_1**, **categorical _fature_2**, **Feature_1_numeric**, and **Feature_2_numeric** as the **X_features**

    so from my understanding as im using **multiple features** for time series Forecasting predicting the **Price** column this is called **Multivariate Time-Series Forecasting**

    My Question is

    1-how can i manage the multiple observations at the same time from the different features as we saw for example in **1-1-2020** we have **three** different observations

    2-i believe if we have multiple observations at the same time/date then we have a new kind of Time-series forecasting what is it Multi-timestep Multivariate Time-Series Forecasting or what ???

    thanks

    • Avatar
      Jason Brownlee August 5, 2021 at 5:15 am #

      Perhaps you can test different framings of the problem and discover what works well or best for you, e.g. multiple-input model vs treating the observations as separate time steps.

  309. Avatar
    okas August 17, 2021 at 9:19 am #

    Hi Jason , thank you for your amazing tutorial. I have a dataset that contains test results and multiple features for multiple users. for example

    date | user_Id | feature _1| feature _2| test_output
    1-Jan-2020 | A | 5.5 | 7.6 | 100

    2-Jan-2020 | A | 8.3 | 1.7 | 20

    3-Jan-2020 | A | 3.6 | 2.1 | 17

    1-Jan-2020 | B | 5.5 | 7.6 | 40

    2-Jan-2020 | B | 77.5 | 35 | 22

    3-Jan-2020 | B | 69.5 | 2 | 22

    I want to predict the output for the next day, and I want to achieve it using LSTMs if possible and all suggestions are welcome.
    I want to train my model with multiple users so that it can predict the output for any given user(unseen user) in the next day and i could not find a way to create/reshape my data before feeding it into LSTM

    • Avatar
      Adrian Tam August 17, 2021 at 11:45 am #

      A quick way is to use groupby() in dataframe to create a subset on each user, then set target to be dataframe[“target”]=dataframe[“feature”].shift(-1) so you can see the next-period data as a column. Is that what you mean by reshape?

  310. Avatar
    okas August 17, 2021 at 8:45 pm #

    thank you for your reply

    1- i want to understand and visualize the data preparation process (as in the examples above) before feeding it into the lstm model and how can i deal with such data as i mentioned it is related to multiple users.

    2- shouldnt i add the output in the “next-period” column instead of the features ?
    dataframe[“target”]=dataframe[“output”].shift(-1) ?
    3- if i want to generally prepare my code to deal with multistep forecasting, what changes should i modify in any of the above illustrated examples

    • Avatar
      Adrian Tam August 18, 2021 at 3:17 am #

      You’re correct for (2). For (1), I don’t see any issue with multiple users here. You still train the model the same way as long as you do not mix the data from different time series. For (3), that depends on your design. One way is to feed the LSTM output back into the input so we can predict for one more step, then repeat for yet one more step, etc.

      • Avatar
        okas August 18, 2021 at 5:00 am #

        regarding point 1 , can you explain what do you mean by(as long as you do not mix the data from different time series) and how can i make sure that i am not mixing the data during the training phase. in other words how can i make sure my model understands that there are multiple users that shares the same time series

  311. Avatar
    Liliana August 24, 2021 at 6:55 am #

    Hi Jason:

    I have a concern, in the case of using an LSTM for the forecast of time series of the Multiple Parallel Input and Multi-step Output type, Vector Output and Encoder-Decoder LSTM can be used, but, in both cases can also be used Vanilla LSTM, Stacked LSTM, Bidirectional LSTM, CNN-LSTM and ConvLSTM?.

    Thanks for your attention.

    • Avatar
      Adrian Tam August 24, 2021 at 11:52 am #

      Yes, there are different variations of LSTM. All have the feature that they can learn and remember the state, but each variant will have some subtle differences.

      • Avatar
        Liliana December 19, 2021 at 9:54 am #

        Hello Jason:

        I would like to know, if I want to make the forecast for a time series of Multiple Parallel Input and Multi-step Output type, using an LSTM Encoder-Decoder, to obtain multivector output. Could I do the following?:

        Configure the Encoder in any of the following ways:

        Vanilla LSTM
        Stacked LSTM
        Bidirectional LSTM
        CNN-LSTM
        ConvLSTM

        And, configure the Decoder in any of the following ways:

        Vanilla LSTM
        Stacked LSTM
        Bidirectional LSTM
        CNN-LSTM
        ConvLSTM

        And do any combination of LSTM Encoder-Decoder settings to get my multi-step, multi-vector forecast?

        Or are there any of these combinations that I cannot do for an LSTM Encoder-Decoder?

        Thanks for your attention.

        • Avatar
          Adrian Tam December 19, 2021 at 2:18 pm #

          All seems possible. Did you tried anything?

          • Avatar
            Liliana December 21, 2021 at 9:06 am #

            Hi Adrian, yes, now that you mention it, I’m testing each of these combinations.

            Thank you so much.

          • Avatar
            Liliana January 19, 2022 at 9:15 am #

            Hello Adrian

            Doing these tests, I would like to ask you… really in an LSTM Encoder-Decoder model could I really use a CNN-LTSM model or a ConvLSTM model with the Decoder?

            I ask this because these two models use an input with specific characteristics and in the case of being used as Decoders, the input comes with a RepeatVector layer that does not correspond to the input form for a CNN-LSTM model or a ConvLSTM model.

            Thanks for your attention.

  312. Avatar
    davidg September 5, 2021 at 5:17 am #

    Hi Jason,
    I’m trying to learn how LSTMs actually work under the hood (as opposed to how to use them). One very confusing point is this: What exactly is an LSTM unit? There seems to be contradictory definitions in the literature. In particular, referring to your very first example in which you separate a 10-long integer sequence into six sets of three consecutive terms with the next term as the desired output, the best interpretation I have come up so far is that by a “unit” you mean a set of six LSTM cells wired in series, where each cell takes a 3-dimensional vector as input and outputs a scalar. Here a “cell” is the usual collection of 4 (or 3 depending again on murky definitions) gates. So there wold be a total of 6×50 = 300 cells all wired up in series, and all having the same set of affine parameters (weights and biases). Another unanswered question then is: what is the dimension of the state vector?

    It would be great if you could notify my email when you respond, or better yet, copy your response to my email.

    Thanks so much for any help!

    • Avatar
      Jason Brownlee September 6, 2021 at 5:16 am #

      In Keras, there are no cells, just units/nodes. Or a cell is a unit is a node.

    • Avatar
      Liliana January 19, 2022 at 9:14 am #

      Hello Adrian

      Doing these tests, I would like to ask you… really in an LSTM Encoder-Decoder model could I really use a CNN-LTSM model or a ConvLSTM model with the Decoder?

      I ask this because these two models use an input with specific characteristics and in the case of being used as Decoders, the input comes with a RepeatVector layer that does not correspond to the input form for a CNN-LSTM model or a ConvLSTM model.

      Thanks for your attention.

      • Avatar
        Liliana January 19, 2022 at 9:20 am #

        Sorry I was in the wrong place to ask this question. I appreciate it being deleted from this place, because I already asked it in the correct question.

        • Avatar
          James Carmichael January 20, 2022 at 7:56 am #

          No worries Liliana!

  313. Avatar
    Jacques Musonda September 8, 2021 at 1:51 am #

    Thank you for this clear and helpful tutorial.

  314. Avatar
    Preet September 22, 2021 at 5:11 pm #

    Thanks a lot! Amazing tutorial.

    • Avatar
      Adrian Tam September 23, 2021 at 3:37 am #

      Glad you like it!

  315. Avatar
    ilovepython October 5, 2021 at 10:51 pm #

    def split_sequence(sequence, n_steps):
    X, y = list(), list()
    for i in range(len(sequence)):
    # find the end of this pattern
    end_ix = i + n_steps
    # check if we are beyond the sequence
    if end_ix > len(sequence)-1:
    break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
    X.append(seq_x)
    y.append(seq_y)
    return array(X), array(y)

    # define input sequence
    raw_seq = [2456, 1829, 2141, 1362, 1634, 1241, 1617, 1434, 2279, 1131,
    1192, 1065, 725, 997, 1161, 2033, 1815, 1123, 1136, 929, 1340,
    1476, 1962, 2199, 1276, 1351, 1201, 1078, 1397, 2181, 2042, 1117,
    1284, 1114, 1416, 1163, 1931, 1753, 1073, 1168, 1022, 1251, 3167,
    3958, 4002, 2033, 1362, 1099, 1506, 1614, 2838, 2569, 1708, 1536,
    1443, 1734, 1970, 2755, 3101, 1790, 1223, 1369, 1651, 2101, 3255,
    2559, 1711, 1738, 1612, 1878, 2064, 3504, 3855, 3425, 2829, 2846,
    4503, 4300, 4099, 3829, 1694, 1633, 1579, 2404, 2520, 4544, 4435,
    2227, 2173, 1690]

    # choose a number of time steps
    n_steps = 7
    # split into samples
    X, y = split_sequence(raw_seq, n_steps)
    # reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
    n_seq = 1
    n_steps = 2
    n_features = 1
    X = X.reshape((X.shape[0], n_seq, n_steps, n_features))
    # define model
    model = Sequential()
    model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation=’relu’), input_shape=(None, n_steps, n_features)))
    model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))
    model.add(LSTM(50, activation=’relu’))
    model.add(Dense(1))
    model.compile(optimizer=’adam’, loss=’mse’)
    # fit model
    model.fit(X, y, epochs=500, verbose=0)
    # demonstrate prediction
    x_input = array([4300, 4099, 3829, 1694, 1633, 1579, 2404])
    x_input = x_input.reshape((1, n_seq, n_steps, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(yhat)

    Sir, i tried replicating your code and change the n_steps to 7 but it gave me this valueerror ValueError: cannot reshape array of size 581 into shape (83,2,2,1). what should i do? sorry i am very new. thank you. 🙁

    • Avatar
      Adrian Tam October 6, 2021 at 10:37 am #

      you redefined n_steps to 2 later on.

  316. Avatar
    ZHuang October 11, 2021 at 12:19 am #

    Jason, thank you for your great post.
    I am just wondering whether this one can be used to predict non-parallel series problem
    for example:
    out_seq = array([in_seq1[i-10]+in_seq2[i-5] for i in range(len(in_seq1))])
    I tried in_seq1, seq2 as random noise to pred out_seq. The whole purpose is to let the network to learn the hidden mapping btwn different lagging seq1/seq2. Result is not good. Any idea on how to tackle this kind of problem, or did I miss sth.

    • Avatar
      Adrian Tam October 13, 2021 at 7:09 am #

      Garbage in garbage out. If your input is random noise, usually the result would not make sense.

  317. Avatar
    Abbas October 20, 2021 at 9:13 am #

    Jason, thank you for your helpful post.
    I am a phd student . I used Bidirectional LSTM with CNN to forecasting solar Energy . I got good accuracy when compared my result with another model with same dataset, but I need some advice to make contributions on model.

  318. Avatar
    huang hui November 20, 2021 at 11:52 am #

    Hi Jason,

    I focus on your website from 2018. Your website has benefited me a lot .Thank you very much for sharing these tutorials and code publicly.

    I used convlstm for spatial -temporal forecast , I my dataset is [2880, 6], 6 is spatial dot, 2880 is time series.

    n_features = 6
    n_seq = 6
    n_steps = 2
    model.add(ConvLSTM2D(filters=6, kernel_size=(6,2), activation=’relu’, input_shape=(n_seq, 6, n_steps, n_featurs)))

    But meet the error:

    ValueError:
    Input 0 of layer sequential is incompatible with the layer: expected ndim=5, found ndim=3. Full shape received: [None, 5, 6]

    I can not find the a solution,would you like to give me any advice? Thanks !

    • Avatar
      Adrian Tam November 20, 2021 at 1:44 pm #

      ndim=5 because you set “input_shape=(n_seq, 6, n_steps, n_featurs)” and ndim=3 refers to you input dataset. I think you need to check how you shape your input and passed int the network.

  319. Avatar
    Alper Ozel December 3, 2021 at 6:01 am #

    This was a great tutorial, the most comprehensive one out there. Thank you for your work. I have one question, do you have a comparison between the time series prediction NN algorithims, is there any better than LSTM?

    • Avatar
      Adrian Tam December 8, 2021 at 6:39 am #

      I don’t think any comparison would be absolutely fair, but more on which problem fits which model. For the question on LSTM, people have seen GRU as a faster alternative but not always better.

  320. Avatar
    Riti December 5, 2021 at 6:42 pm #

    Hi Jason,

    Thanks a lot for this wonderful tutorial. Extremely helpful for me!
    I have a query regarding the input shape to LSTM model. I would like to provide 8 dimensional time series (i.e. 8 features) where each time sample has a label (or output) associated with it. So, I want the network to learn the mapping from the time series to label series (where time series features also have temporal dependencies). For example- let’s say I have 10000 x 8 length of input series, and 10000 x 1 is the corresponding output size. Now if I set time_steps=10, and feat_size=8, I will have (1000, 10,8) as size of input and (1000,10) as size of output. How can I train LSTM for this ? Should I set return_seq as True and it will take care of learning map from feat to corresponding label ? I am not sure if I am correct here and would like to know if this approach is fine. Thanks again!

    • Avatar
      Adrian Tam December 8, 2021 at 7:36 am #

      If you set return_seq as True, your output is (1000,10) but if it is false, you still have (1000,1). The sequence length in LSTM just means for this many step you will reset the memory.

  321. Avatar
    dalia December 5, 2021 at 7:55 pm #

    Thank you for this clear and helpful tutorial,

    what if i need to work on csv data as input instead of sample data as above ?

  322. Avatar
    IW December 6, 2021 at 4:48 pm #

    Hi

    This blog is super helpful, thank you!

    I am really stuck on this matter and maybe you could help me?

    I have 500 number of different observations in the shape of (100,2). (100 data points, 2 features)
    I am reshaping my data to predict 5 time steps ahead based on past 3 time steps. so, after reshaping my data I have
    input_shape = (94,3,2)
    output_shape=(94,5,2)

    but because I have 500 different observations I essentially have the data in the shape of,
    input_shape = (500,94,3,2)
    output_shape=(500,94,5,2)

    the only way I could train my model is by using a for loop to feed each of the 500 observations.

    is there a better way to do this?

    • Avatar
      Adrian Tam December 8, 2021 at 7:41 am #

      You’re wrong on the shape here. Your LSTM is predicting with 3 steps and 2 features, then your input is (N,3,2). You should combine the 500 observations together.

  323. Avatar
    Bharathi December 15, 2021 at 7:25 pm #

    Can you please tell me how did you consider the below values:
    I understood it for 3 timesteps for input and 1 for output but not the below one’s.

    n_steps_in, n_steps_out = 3, 2
    n_features = X.shape[2]

    • Avatar
      Adrian Tam December 17, 2021 at 6:57 am #

      For example you have data [10, 20, 30, 40, 50, …] it means you use [10, 20, 30] to predict [40, 50], hence you use 3 steps in input and 2 steps in output. In this case, each time step is a single number, hence the n_features is 1.

    • Avatar
      James Carmichael December 21, 2021 at 11:23 pm #

      Hi Bharathi…Could you please post the exact code block you have questions about?

      -Regards,

  324. Avatar
    Alex December 18, 2021 at 4:49 pm #

    Hi Jason

    Alex is my name :I’m looking for an algorithm such as Multi-Modal Deep Prediction Model using LSTM

    • Avatar
      Adrian Tam December 19, 2021 at 1:49 pm #

      Can you explain what do you mean by the multi-model prediction?

    • Avatar
      James Carmichael December 21, 2021 at 11:30 am #

      Hi Alex…Please explain more about what you are specifically trying to accomplish.

  325. Avatar
    Luigi January 5, 2022 at 3:18 am #

    Hi Jason,
    amazing post thanks a lot for it! super, super!

    I would have a question if you do not mind.
    I have a dataset of 100 financial indices.

    I want to make prediction of 1 or more samples ahead (doesnt matter).
    However, since my variables share some information (common variance) there is some redundancy therefore I would like to compress my dataset same as a PCA or a factor analysis does, but I want to use the LSTM Autoencoder (or how you call it here Encoder-Decoder Model).

    The point is that I want to run the autoencoder as you coded here, however what I would keep at the end are the compressed variables at the bottleneck of the autoencoder (end of the encoder), so remove the decoder, and make a prediction only on those compressed set..
    because i believe those compressed variables can represent better my dataset (removing redundancy)

    This would be also useful for denoising (I would let the hyperparameter tuning to choose the dimension of the bottleneck).

    Do you have a reference for coding this?
    Or can you briefly indicate me please how to modify your Encoder-Decoder Model?

    My idea is that the code you show here during the training will be the same but there must be a modification to add such as the number of dimention of the bottleneck (which I cannot see in your code), and the predict() which has to be run using the model without the decoder

    Many thanks in advance
    Luigi

    • Avatar
      James Carmichael January 7, 2022 at 6:33 am #

      Hi Luigi…I appreciate the kind words! I would be able to help you better if you could direct any questions to specific code listings and examples provided machinelearningmastery.com.

      Regards,

  326. Avatar
    Luigi January 7, 2022 at 7:55 pm #

    Hi James,
    thanks for willing to help me.

    I found your post https://machinelearningmastery.com/lstm-autoencoders/
    more relevant to my case so I will open/continue the discussion in there if you don’t mind

    Thanks again for your offer to help, very kind
    Luigi

    • Avatar
      James Carmichael January 8, 2022 at 11:05 am #

      Hi Luigi…You are very welcome! Yes, please feel free to continue the discussion in indicated post.

      Regards,

  327. Avatar
    Mocha January 11, 2022 at 6:35 pm #

    Hello, Sir! Thanks for your explanation. I want to ask about ConvLSTM. Can I use it for weather data that have spatial and temporal features that have extention grib2 or nc? We can get spatial features from the longitude and latitude and temporal from the time. I want use it that data for predict the rain. And also, Can I use ConvLSTM for predict the probabilistic?

    I hope you’ll answer my question, thank you Sir.

  328. Avatar
    Mocha January 12, 2022 at 12:10 pm #

    Thanks for your answer, Sir!

    But, can I still use Conv-LSTM or just LSTM? Because my data aren’t image, Sir.

  329. Avatar
    jesu January 20, 2022 at 4:40 am #

    How can I understand the way to build the model?
    I mean, how many LSTM for example? How many dense layers? dropout?

    I have a multivariate time series with 5 features

    • Avatar
      James Carmichael January 20, 2022 at 7:47 am #

      Hello Jesu…More nodes and layers means more capacity for the network to learn, but results in a model that is more challenging and slower to train.

      You must find the right balance of network capacity and trainability for your specific problem.

      There is no reliable analytical way to calculate the number of nodes or the number of layers required in a neural network for a specific predictive modeling problem.

      My general suggestion is to use experimentation to discover what configuration works best for your problem.

      This post has advice on systematically evaluating neural network models:

      How to Evaluate the Skill of Deep Learning Models
      Some further ideas include:

      Use intuition about the domain or about how to configure neural networks.
      Use deep networks, as empirically, deeper networks have been shown to perform better on hard problems.
      Use ideas from the literature, such as papers published on predictive problems similar to your problem.
      Use a search across network configurations, such as a random search, grid search, heuristic search, or exhaustive search.
      Use heuristic methods to configure the network, there are hundreds of published methods, none appear reliable to me.
      More information here:

      How to Configure the Number of Layers and Nodes in a Neural Network
      Regardless of the configuration you choose, you must carefully and systematically evaluate the configuration of the model on your dataset and compare it to a baseline method in order to demonstrate skill.

  330. Avatar
    Liliana January 21, 2022 at 6:35 am #

    Hello Jason

    Doing these tests, I would like to ask you… really in an LSTM Encoder-Decoder model could I really use a CNN-LTSM model or a ConvLSTM model with the Decoder?

    I ask this because these two models use an input with specific characteristics and in the case of being used as Decoders, the input comes with a RepeatVector layer that does not correspond to the input form for a CNN-LSTM model or a ConvLSTM model.

    Thanks for your attention.

    • Avatar
      James Carmichael January 21, 2022 at 9:32 am #

      Hi Liliana…You should try both and compare the results in my opinion. Also, it would be a good idea to try SARIMA. Sometimes it even outperforms newer deep learning methods!

      https://machinelearningmastery.com/sarima-for-time-series-forecasting-in-python/

      • Avatar
        Liliana January 21, 2022 at 10:30 am #

        Yes, I have already tried it and I have the problem that I describe, that is to say that I cannot make the CNN-LSTM and the ConvLSTM serve as a Decoder due to the form of input they require, which is not like the one provided by the previous layer of the model which is a Repeat Vector layer, hence my question, actually, can I use these models as a Decoder?

        Thanks for the advice, already use a VAR model.

        I am attentive, thank you.

  331. Avatar
    Dwiki Setiawan January 27, 2022 at 4:05 pm #

    how about this,
    i have a time series data (2 years) with one variable (amount per day). And i want to predict based on that data. How to do that?

    *i’m 100% newbie

  332. Avatar
    Ugur Kahveci January 27, 2022 at 11:48 pm #

    Hello Jason, great tutorial as always!

    I am having trouble finding any sensible result in my LSTM algorithm. I am trying to use Early Stopping and Model Checkpoint together but when I try to monitor validation accuracy for model checkpoint, validation accuracy becomes zero and does not improve over epochs. I changed the monitor parameter to validation loss and now validation loss seems to be very high. After model completes training, the results are zero for both train and test accuracies.

    I am thinking if I made a mistake seperating the dataset into train and test datasets because in your article you mention that datasets should be in a certain format to use LSTM.

  333. Avatar
    Paniz January 28, 2022 at 5:13 am #

    Hi,
    Thank you so much for the thorough tutorial.

    As for the out_seq, I see in almost all examples that is a summation of the input_seqs. I understand these are examples. But what if you know there is a dependency between the in and out seqs but u do NOT know what it is exactly. Then how do you set his up? Any tips? thanks

  334. Avatar
    Kostas February 22, 2022 at 2:38 am #

    Hello, thanks for tutorial.

    I tried to use a Vector Output to model your last example (Multiple Parallel Input and Multi-Step Output) instead of an encoder-decoder model, but I keep getting an error.

    Here’s the code.

    # split a multivariate sequence into samples
    def split_sequences(sequences, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequences)):
    # find the end of this pattern
    end_ix = i + n_steps_in
    out_end_ix = end_ix + n_steps_out
    # check if we are beyond the dataset
    if out_end_ix > len(sequences):
    break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix:out_end_ix, :]
    X.append(seq_x)
    y.append(seq_y)
    return array(X), array(y)

    # define input sequence
    in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])
    in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])
    out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])
    # convert to [rows, columns] structure
    in_seq1 = in_seq1.reshape((len(in_seq1), 1))
    in_seq2 = in_seq2.reshape((len(in_seq2), 1))
    out_seq = out_seq.reshape((len(out_seq), 1))
    # horizontally stack columns
    dataset = hstack((in_seq1, in_seq2, out_seq))
    # choose a number of time steps
    n_steps_in, n_steps_out = 3, 2
    # covert into input/output
    X, y = split_sequences(dataset, n_steps_in, n_steps_out)
    # the dataset knows the number of features, e.g. 2
    n_features = X.shape[2]
    model = Sequential()
    model.add(LSTM(200, activation=’relu’,return_sequences=True, input_shape=(n_steps_in, n_features)))
    model.add(LSTM(200, activation=’relu’, return_sequences=True))
    model.add(TimeDistributed(Dense(n_features)))
    model.compile(optimizer=’adam’, loss=’mse’)
    model.fit(X, y, epochs=300, verbose=0)
    # demonstrate prediction
    x_input = array([[60, 65, 125], [70, 75, 145], [80, 85, 165]])
    x_input = x_input.reshape((1, n_steps_in, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(yhat)

    Plz help !
    Thanx in advance

    • Avatar
      James Carmichael February 26, 2022 at 12:45 pm #

      Hi Kostas…Please clarify your question so that we may better assist you.

  335. Avatar
    kostas February 22, 2022 at 8:26 pm #

    Thank you for the tutorial, but I have a question.

    I tried implementing a Multiple Parallel Input and Multi-Step Output model by using a vector output model instead of a encoder-decoder (as you did at the end of your tutorial) but I keep getting some errors.

    The code is presented below. Could you please help me out ?

    Thanks in advance!

    from numpy import array
    from keras.models import Sequential
    from keras.layers import LSTM
    from keras.layers import Dense
    from keras.layers import Bidirectional
    from keras.layers import Flatten
    from keras.layers import TimeDistributed
    from keras.layers.convolutional import Conv1D
    from keras.layers.convolutional import MaxPooling1D
    from keras.layers import ConvLSTM2D
    from numpy import hstack
    from keras.layers import RepeatVector

    # split a multivariate sequence into samples
    def split_sequences(sequences, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequences)):
    # find the end of this pattern
    end_ix = i + n_steps_in
    out_end_ix = end_ix + n_steps_out
    # check if we are beyond the dataset
    if out_end_ix > len(sequences):
    break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix:out_end_ix, :]
    X.append(seq_x)
    y.append(seq_y)
    return array(X), array(y)

    # define input sequence

    in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])
    in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])
    out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])
    # convert to [rows, columns] structure
    in_seq1 = in_seq1.reshape((len(in_seq1), 1))
    in_seq2 = in_seq2.reshape((len(in_seq2), 1))
    out_seq = out_seq.reshape((len(out_seq), 1))
    # horizontally stack columns
    dataset = hstack((in_seq1, in_seq2, out_seq))
    # choose a number of time steps
    n_steps_in, n_steps_out = 3, 2
    # covert into input/output
    X, y = split_sequences(dataset, n_steps_in, n_steps_out)
    # the dataset knows the number of features, e.g. 2
    n_features = X.shape[2]

    model = Sequential()
    model.add(LSTM(200, activation=’relu’,return_sequences=True, input_shape=(n_steps_in, n_features)))
    model.add(LSTM(200, activation=’relu’,return_sequences=True ))
    model.add(TimeDistributed(Dense(2)))
    model.compile(optimizer=’adam’, loss=’mse’)

    model.summary()

    # fit model
    model.fit(X, y, epochs=300, verbose=0)
    # demonstrate prediction
    x_input = array([[60, 65, 125], [70, 75, 145], [80, 85, 165]])
    x_input = x_input.reshape((1, n_steps_in, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(yhat)

    • Avatar
      James Carmichael February 23, 2022 at 12:24 pm #

      Hi Kostas…Thanks for asking.

      I’m eager to help, but I just don’t have the capacity to debug code for you.

      I am happy to make some suggestions:

      Consider aggressively cutting the code back to the minimum required. This will help you isolate the problem and focus on it.
      Consider cutting the problem back to just one or a few simple examples.
      Consider finding other similar code examples that do work and slowly modify them to meet your needs. This might expose your misstep.
      Consider posting your question and code to StackOverflow.

      • Avatar
        kostas February 23, 2022 at 9:36 pm #

        Thanks for the reply, but the code I posted is actually a copy of your last implementation, which is “Multiple Parallel Input and Multi-Step Output” implementation.

        In your article, I quote :
        “A vector output or an encoder-decoder model could be used. In this case, we will demonstrate a vector output with a Stacked LSTM.”

        I tried using a Stacked LSTM instead of an encoder-decoder model, but I did not work, because I’m using three timesteps for training and I’m trying do predict a 2 timesteps series.

  336. Avatar
    kostas February 24, 2022 at 1:55 am #

    #Correction

    Thanks for the reply, but the code I posted is actually a copy of your last implementation, which is “Multiple Parallel Input and Multi-Step Output” implementation.

    In your article, I quote :
    “We can use either the Vector Output or Encoder-Decoder LSTM to model this problem. In this case, we will use the Encoder-Decoder model”

    I tried using a Stacked LSTM instead of an encoder-decoder model, but it did not work, because I’m using three timesteps for training and I’m trying do predict a 2 timesteps series.

    How can I solve the issue plz ?

    • Avatar
      James Carmichael February 24, 2022 at 2:42 pm #

      Hi Kostas…What error(s) are you encountering?

  337. Avatar
    Emmy February 26, 2022 at 7:14 pm #

    Dear Sir,

    Thank you so much for this great tutorial.

    I am working on a project that requires me to feed real-time IoT data (with four variables) to the vanilla LTSM model to enable me to predict an outcome.

    Kindly provide me with a guide on this.

    Thank you

    • Avatar
      James Carmichael February 27, 2022 at 12:24 pm #

      Hi Emmy…Thanks for asking.

      Sorry, I cannot help you with your project.

      I’m eager to help, but I don’t have the capacity to get involved in your project at the level you need or at a level to do a good job.

      I’m sure you can understand my position, as I get many of requests to help with projects each day.

      Nevertheless, I am happy to answer any specific questions you have about machine learning.

  338. Avatar
    Marimuthu S March 2, 2022 at 5:56 pm #

    Hello Jason,

    Greetings.

    Does RNN use one-hot encoding in each time step for time series data forecasting?

    for instance, input=[10,20, 30]

    In 1st time step input is [10, 0, 0],

    In 2nd time step input is [0, 20, 0], and

    In 3rd time step input is [0, 0, 30]

    Isn’t it?

    Thanks in advance.

  339. Avatar
    McanP March 7, 2022 at 5:34 pm #

    Hello. First, thank you for your support to developers.

    I am having a lot of trouble while I’m trying to estimate if my univariate data is forecastable or not.
    What am I doing?:
    1- Using StandartScale to scale my data
    2- Using the “difference” method to make my data stationary.
    3- Testing my data’s stationarity with null hypothesis.

    My questions:
    -When i use MinMax scaler my prediction being absoulute flat (tried relu,sigmoid,even None) Why do you think?.
    -My validation loss increasing, then stabilizing.. why?

    I can publish my code if you want,
    Thanks in advance!

  340. Avatar
    Lochan Luca March 18, 2022 at 1:47 pm #

    Firstly, thanks for this blog. I am developing LSTM forecasting model for stock price. For company X LSTM model with 2 layers, epoch 5, batch size 1 works well with 10 future steps (Recursive Multi-step Forecast). I get RMSE between predicted and actual values less than 5. But the same model with company Y with same rows of data does not work well. RMSE is larger than 20. I am not able to figure out why this happens.
    Apart from RMSE can you suggest method to check how accurate predictions are done by the model.

    • Avatar
      James Carmichael March 20, 2022 at 7:25 am #

      Hi Lochan…Machine learning model performance is relative, not absolute.

      Start by evaluating a baseline method, for example:

      Classification: Predict the most common class value.
      Regression: Predict the average output value.
      Time Series: Predict the previous time step as the current time step.
      Evaluate the performance of the baseline method.

      A model has skill if the performance is better than the performance of the baseline model. This is what we mean when we talk about model skill being relative, not absolute, it is relative to the skill of the baseline method.

      Additionally, model skill is best interpreted by experts in the problem domain.

      For more on this topic, see the post:

      How To Know if Your Machine Learning Model Has Good Performance

  341. Avatar
    Lochan Luca March 18, 2022 at 1:58 pm #

    When I feed the test dataset to the model for predictions, the model predicts with almost 0 variation from test data for the first 70% of test data. I am predicting only a single outcome and for the next outcome I am using the original test value, not my predicted value. Still, for the last 30% of data, the variation (or deviation) between test data and predicted data starts increasing. Plotting it, I found that for the last 30% of test dataset, the deviation between expected and predicted data is even bigger than 25 digits. No matter how big or small dataset I am using, results are always bad for last 30% predictions. What should I do to get more accurate predictions.

  342. Avatar
    Ilenia April 12, 2022 at 7:25 pm #

    Hi!
    Thank you very much for this useful tutorial.
    I have a question on the first example (Vanilla LSTM). You showed how to make one prediction, but how can I proceed in making more?
    I mean, should I use the same model and then just pass as input the two last trained values plus the first prediction (if n_steps = 3, for instance)? Or should I retrain the model using the first prediction value as part of the new training set and go on like that?

    Thanks for the help!
    Ilenia

    • Avatar
      James Carmichael April 14, 2022 at 2:41 am #

      Hi Ilenia…Are you wanting to extend the forecast time period?

      • Avatar
        Ilenia April 14, 2022 at 9:58 pm #

        Hi James!
        Yes, basically, that’s what I would like to do. Let’s say I want to forecast up to 3 future values, instead of just one, what should I do?

        Thanks!

  343. Avatar
    javvv April 20, 2022 at 7:07 pm #

    Hey , I’m new to LSTM. I have to start learning this for my fyp where I have to train model to predict future sensor values. Can you guide me how to start?, what are the pre-requisites and how I can do better? What language tool, software to use. I’m familiar with python and practicing on VS Code but not sure where to run all this?

  344. Avatar
    Ye May 9, 2022 at 11:19 am #

    Hi Jason,

    Thank you for the tutorials. They are very helpful.

    If I have multivariate time series, dependent time series, however, instead of predicting time series, I would like to get the target output from multiple input variables in the same time stamp,

    For example, the first column is input variable 1, the 2nd column is the input variable 2, and the 3rd column is the target variable.
    [[ 10 15 25]
    [ 20 25 45]
    [ 30 35 65]
    [ 40 45 85]
    [ 50 55 105]]

    I would like to have the input of 10, 15 to output 25, 20, 25 to 45, 30, 35 to 65 etc.

    Can I simply follow the examples you’d discussed in the “Multivariate LSTM Models” section, but set n_steps=1? Or there are other methods to deal with such situation?

    Thank you

  345. Avatar
    mat May 10, 2022 at 6:57 am #

    Hi James,
    Awesome tutorial.
    if I want to train the same model on several sequences, how would you do this ?
    Thanks in advance for the answer.

  346. Avatar
    mat May 10, 2022 at 6:57 pm #

    Thanks James for the link. I implemented the model and iterated it on several sequences.
    LSTM is clearly very heavy (very long to iterate 100 epoch on only 1 sequence).
    I have to find an other solution. But thanks for the support, I realy appreciated it and your blog is a huge source of information. Thanks for the work and the knwoledge you share, and congratulations.

  347. Avatar
    Brijesh Soni June 1, 2022 at 12:56 am #

    Hi Jason, thanks for your tutorials.

    Is it possible to train LSTM for different lookback values in different epochs/iterations? Kindly suggest your views

  348. Avatar
    Brijesh Soni June 3, 2022 at 9:12 am #

    Thanks James! I mean to say: Instead of fixed lookback, is it possible that lstm-network learns the lookback value on its own?

  349. Avatar
    skr June 26, 2022 at 8:19 am #

    Hi Jason
    I am using LSTM for sequence to sequence modelling in computer networking scenario. I am considering multiple parallel series and multi-step forecasting. However, in my scenarios the number of input parallel series is not fixed. How can i handle this scenario? Kindly i need your guidance.
    Regards

  350. Avatar
    Budha July 7, 2022 at 9:37 pm #

    Hi Jason,

    Thank you for this. I am new to LSTM, so this really helped me. I would like to ask a question. I have a small data of 24 time points with a clear trend of increase over time. Is it fine to use LSTM or should I go with classical time series methods such as ARIMA?

    Thanks once again,

    • Avatar
      James Carmichael July 8, 2022 at 5:59 am #

      Hi Budha…My recommendation would be to apply ARIMA and an LSTM model and compare results. One is not necessarily the best option in all cases.

      • Avatar
        Budha July 8, 2022 at 12:40 pm #

        Thank you so much for the reply. I will definitely try both models. Love reading your tutorials.

  351. Avatar
    ewind July 18, 2022 at 12:31 pm #

    In section, “Multiple Input Series”, very strange to see the result is not 100% precise? Because it should be very easy for the network to learn add operation? (The output is just the sum of current time step’s inputs)

  352. Avatar
    Hilton Fernandes July 25, 2022 at 9:15 pm #

    Interestingly, I could only replicate your results with Multi-Step LSTM Models when I increased the number of iterations, the length of the sampling time series and the size of the input data. Was that because I haven’t any GPU hardware, that TensorFlow would use ? BTW, in my current setup, TensorFlow is complaining about how Keras uses it.

  353. Avatar
    Olaitan Folashade August 2, 2022 at 9:31 pm #

    Hi Jason, thank you for the tutorial. I have a question about the Multiple Parallel Input and Multi-Step Output.

    The number of features is specified in the Dense output layer for MultiVariate-MultiStep-MultiParallel forecast, as in the last example above where the number of features in the input and output sequences are the same.

    How is this done when the number of features for the input and output are not the same? Foremaple, i am using 15 input variables and only want to forecast 4 in a multistep forecast.

    I will appreciate your response. Thank you

  354. Avatar
    Andre August 27, 2022 at 4:15 am #

    Hi Jason,

    Thanks for your tutorial. It is very useful for me.
    I have one question what if I have multiple series with different dimensions?

    Thanks for your answer

    • Avatar
      James Carmichael August 27, 2022 at 6:11 am #

      Hi Andre…You are very welcome! With limited knowledge of your application, you may want to investigate ensemble learning:

      https://machinelearningmastery.com/ensemble-machine-learning-with-python-7-day-mini-course/

      • Avatar
        Andre August 27, 2022 at 1:08 pm #

        Hi James,

        Sorry I did’t explain it well. My doubt is about “Multiple Input Series”. I have data from multiple sites and I want to forecast the precipitation area of each site. These sites has same features but different time steps. I understood that LSTM can learn parallel input series. Can I apply it in this case too? If yes, how would you recommend I start?

        Thank you

  355. Avatar
    Pranav August 30, 2022 at 9:34 pm #

    Hi Jason thank you so much for the tutorial I had one doubt

    I have 2 series of x,y coordinates

    s1 = [[x1,y1],[x2,y2],[x3,y3],[x4,y4],[x5,y5],[x6,y6],[x7,y7],[x8,y8]]

    s2 = [[a1,b1],[a2,b2],[a3,b3],[a4,b4],[a5,b5],[a6,b6],[a7,b7],[a8,b8]]

    I need to send both of them as inputs to lstm what would you suggest I should do? multiple input seies with more than one value in each instance..

  356. Avatar
    Inam September 4, 2022 at 11:14 pm #

    import numpy as np
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense, LSTM

    # Our Input data X
    X = q_cqi
    X = X.reshape(1, -1)[0]
    X.shape

    # Creating a window of 10
    window_size = 10
    X_train = []
    y = []
    inc = 0
    for i in range(len(X) – window_size):
    if inc + window_size + 2 > len(X):
    break
    row = [[a] for a in X[inc:inc + window_size]]
    X_train.append(row)
    idx = inc + window_size + 1
    y.append(X[idx])
    inc += 1
    X = X_train

    #converting list back into arrays
    X=np.array(X)
    y=np.array(y)

    #Splitting data into train, test and validation
    X_train, y_train = X[:25000], y[:25000]
    X_val, y_val = X[25000:27200], y[25000:27200]
    X_test, y_test = X[27200:], y[27200:]

    n_steps=10
    n_features=1

    # define model
    model = Sequential()
    model.add(LSTM(128, return_sequences= True ,activation=’linear’, input_shape=(n_steps, n_features)))
    model.add(LSTM(64 ,activation=’linear’))
    model.add(Dense(32, ‘linear’))
    model.add(Dense(16, ‘linear’))
    model.add(Dense(1))

    #Compiling the model
    #model.compile(loss=MeanAbsoluteError(), optimizer=’Adam’,metrics=[RootMeanSquaredError()])
    model.summary()

    So above is my input data and my LSTM model. Now I am confused about how to generate the new data? what I mean when I create the new vector q_cqi again like this

    # Our Input data X
    X = q_cqi
    X = X.reshape(1, -1)[0]
    when I create the new vector q_cqi again like this, what would be the next step? how can i reshape it? do i need the target value y in this new data? how I can chose a data suppose from this input vector of length 35000 if I want to do predction on the last 1500 or first 1000 how could i do this?
    what I mean when I create the new vector q_cqi again like this

    # Our Input data X
    X = q_cqi
    X = X.reshape(1, -1)[0]
    what would be the next step? how can I change the following section i.e. creating the window etc.?

    window_size = 10
    X_train = []
    y = []
    inc = 0
    for i in range(len(X) – window_size):
    if inc + window_size + 2 > len(X):
    break
    row = [[a] for a in X[inc:inc + window_size]]
    X_train.append(row)
    idx = inc + window_size + 1
    y.append(X[idx])
    inc += 1
    X = X_train

    Do I need the target value y? how I can chose the new input? Could you please answer how I can generate the new data and how to implement my trained model on the new data?

  357. Avatar
    Inam September 5, 2022 at 8:12 am #

    Hi James! Thank you for your great posts. I am working on a project. It is a regression problem and I am using LSTM model to predict the next value. I trained my LSTM model and test and validate it on the same data. Now I want to generate new data as the previous one but I am confused about this new data whether I will have the target value in this new data or not? also how can I reshape it to used it for my trained LSTM model. the following are my LSTM model and input data. my input vector is around 35000.

    import numpy as np
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense, LSTM

    # Our Input data X
    X = q_cqi
    X = X.reshape(1, -1)[0]
    X.shape

    # Creating a window of 10
    window_size = 10
    X_train = []
    y = []
    inc = 0
    for i in range(len(X) – window_size):
    if inc + window_size + 2 > len(X):
    break
    row = [[a] for a in X[inc:inc + window_size]]
    X_train.append(row)
    idx = inc + window_size + 1
    y.append(X[idx])
    inc += 1
    X = X_train

    #converting list back into arrays
    X=np.array(X)
    y=np.array(y)

    #Splitting data into train, test and validation
    X_train, y_train = X[:25000], y[:25000]
    X_val, y_val = X[25000:27200], y[25000:27200]
    X_test, y_test = X[27200:], y[27200:]

    n_steps=10
    n_features=1

    # define model
    model = Sequential()
    model.add(LSTM(128, return_sequences= True ,activation=’linear’, input_shape=(n_steps, n_features)))
    model.add(LSTM(64 ,activation=’linear’))
    model.add(Dense(32, ‘linear’))
    model.add(Dense(16, ‘linear’))
    model.add(Dense(1))

    #Compiling the model
    #model.compile(loss=MeanAbsoluteError(), optimizer=’Adam’,metrics=[RootMeanSquaredError()])
    model.summary()

    Thanks in advance

  358. Avatar
    Inam September 5, 2022 at 11:26 pm #

    Thank you James!

  359. Avatar
    Francis Tucket October 5, 2022 at 2:23 am #

    Hello I’ve been able to create an LSTM model for my fourth year project which is about forex price movement forecasting but the problem comes to when I want to try and implement it in real time. I trained the model on 30 minute data so the Idea was to make the model into an API with like 10-20 closing prices of a particular forex pair eg GBP/USD and the have the model predict at least 2 hours into the future i.e. 4 30 minute periods and then the API would return that. Thankyou in advance for your help.

    • Avatar
      James Carmichael October 5, 2022 at 7:28 am #

      Hi Francis…While we cannot recommend any particular model for your project, it would be helpful if you could elaborate on a specific question regarding our content so that we may better assist you.

  360. Avatar
    Arun October 13, 2022 at 10:05 pm #

    Which is best for time series prediction like stock price prediction?

  361. Avatar
    Inam October 14, 2022 at 9:41 pm #

    Hello James! I hope you will be. thanks for your great posts.
    I am trying to plot perfromance evaluation of 2 methods (the LSTM and the Ideal)
    I want to compare these two. Also I want to make a plot between the [e_DRNN1,thr_DRNN1]
    bit-error-rate and achieved throughput. How could I do this? the following are my code with
    the respected output for each method.

    #Method LSTM
    [e_DRNN1,thr_DRNN1]=e_short_pkts(p.L_pkt,gamma_real,gamma_DRNN1,p)
    e_DRNN1,thr_DRNN1
    (array([[0.00000000e+00, 9.83990470e-01, 4.78419178e-07, …,
    0.00000000e+00, 1.62437153e-03, 4.77800111e-02]]),
    array([[2.20861316, 0.05908644, 3.07646398, …, 3.5582583 , 4.1410422 ,
    4.0731042 ]]))

    #Method Ideal
    [e_ideal,thr_ideal]=e_short_pkts(p.L_pkt,gamma_real,gamma_ideal,p)
    e_ideal,thr_ideal

    (array([[0. , 0.98399047, 0.97368655, …, 0.08990624, 0.15850721,
    0.12215858]]),
    array([[2.20861316, 0.05908644, 0.09711517, …, 3.89260928, 3.63755763,
    3.79468346]]))

    Thank you

  362. Avatar
    Anwar Ali November 18, 2022 at 1:46 am #

    awsm tutorial

    • Avatar
      James Carmichael November 18, 2022 at 6:02 am #

      Thank you Anwar for your feedback! We appreciate it!

  363. Avatar
    Avi Ofek November 21, 2022 at 10:02 pm #

    Thank you very much for making it easy to understand James.
    As a beginner I tried to get one output from 5 random sets of numbers , letting the model learn by itself.
    How can I get single output from the 5 sets of input please?
    Thank you very much anyway
    Avi Ofek

  364. Avatar
    Sagar Padhiyar December 28, 2022 at 4:35 am #

    Hello Jason,

    Thank you for this blog. It is helpful as always.

    I have one doubt. How to prepare data for future prediction? let’s say I want to forecast energy consumption for the next 3 years in an hourly manner. For training data, we have a date and energy consumption hour wise. How do I prepare testing data where I only have a date?

    Thank you

  365. Avatar
    mayan January 16, 2023 at 7:27 pm #

    Hi
    Thanks for the tutorial. For the univariate series, is there a reason to use ConvLSTM2D and not ConvLSTM1D ?

  366. Avatar
    mayan January 16, 2023 at 11:39 pm #

    Hi,
    I did not really understand why it was necessary to use subsequences instead of the sequences in the CNN-LSTM model. Could you please detail that ?
    Thanks

  367. Avatar
    mayan January 16, 2023 at 11:41 pm #

    Hi again

    In the ConvLSTM could we have used ConvLSTM1D instead of ConvLSTM2D ?

  368. Avatar
    Guantan January 19, 2023 at 3:12 am #

    Hi all, I am trying to find the solution to a simillar problem and I wonder if you can help.

    I have panel data on 200 different stocks, each stock belongs to a different sector of which there are 12 different sectors hot encoded 1-12. For each stock there 8 different pieces of price information such as price, market capitalisation, volume, and so forth. I then have a a column of of future stock prices on which to train the mdoel.

    Would this mean I need to train 200 different models? How would you go about this problem if you were given this dataset?

    Sorry if this is a daft question. I am new to ML.

  369. Avatar
    Arnold January 21, 2023 at 1:21 pm #

    Hi Jason, massive fan of your work throughout the years.
    Keeping it short as I assume you have hundreds of messages a day!

    If one has a dataset on 400 patients’ health through time.
    X variables are: Patient ID, Age Group (Binary i.e OLD 1 and Young 2), Distance walked during the day, Amount of calories eaten that day.
    Y variable to be predicted is: Amount of non-fatal heart attacks.

    My idea was that one could run 400 different LSTM time series models on each individual to predict the amount of non-fatal heart attacks.

    My question is! These results would gain no information from the other predictions, is there a way you know of linking this information?

    For example, if one was to train a model on an OLD patient, is there any way that the model can learn that OLD patients have tended to have more non-fatal heart attacks in the other regressions so the model incorporates more non-fatal heart attacks to this old patients predictions?

    Maybe I am thinking about it wrong, please help!

  370. Avatar
    frr June 11, 2023 at 3:04 pm #

    Hi, is there a “multi parallel & multi inputs(features)” LSTM model? Thanks!

  371. Avatar
    Iman June 29, 2023 at 6:25 am #

    Hi, I searched so much and even used chatGPT … but I’m so confused. I have data set of company and I should find a model for customer churn using LSTM. I have customer (showing by IDs) behavior of these customer in 12 months , I mean I know the churn label for ID : 1445 in first month , second month and so on. This data set has features like monthly_visit or age of customers or the sim_type or contract_ type and so on. How can I define the LSTM input and output. I like to say that I want to predict the churn for customer 1445 for month 12 based on month 11, 10,9 and 8 and then for the customer 1445 I want to predict month 11 based on 10,9,8 and 7 and so on and then jump into the next customer and do the same for him. How can I use LSTM for this problem? sorry for long explanation.

    • Avatar
      James Carmichael June 29, 2023 at 8:50 am #

      Hi Iman…Please narrow your query to a single question so that we may better assist you.

  372. Avatar
    Iman June 29, 2023 at 7:10 pm #

    Sorry … Is it possible to predict customer churn using LSTM when you have monthly behavior of customers? I mean what’s the X(input) and y(output) for LSTM ?

  373. Avatar
    Iman June 29, 2023 at 7:14 pm #

    Is it possible to use LSTM for customer churn prediction when you have monthly behavior of customer and the churn label of each month ? I mean what should be the X(input) and y (output) for LSTM ?

  374. Avatar
    Mani June 29, 2023 at 7:23 pm #

    Is it possible to use LSTM for customer churn prediction ? what’s the X and y for LSTM model. note that I have the monthly behavior of each customers in 12 months.

  375. Avatar
    David June 30, 2023 at 8:20 pm #

    Hi ,I have a dataset that represent the monthly behavior of customers with 1million rows and 8 columns , I mean every 12 rows of dataset are for one customer and I want to predict churn model for these customers using LSTM. how should I make input and output for my LSTM model when I have dataset of monthly behavior of customers?

  376. Avatar
    Justin Goh October 8, 2023 at 12:32 pm #

    Hi Jason,

    Appreciate your guide for LSTM time series model. It is really helpful.
    I have followed your step to make my own time series LSTM model but encountered a question.

    At stage 1, I had multivariate single step forecasting.(simple LSTM model with 3 dense layers)
    At stage 2, I converted it to multivariate multi-step forecasting by using Encoder-Decoder model.
    But in doing so, my dense layer complexity dropped which I didn’t wanted.
    Can you give any suggestion how to maintain complexity of dense layer while using Encoder-Decoder model?

    Please see below in code and model summary

    At stage 1(Simple LSTM model)

    model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(window_size, n_character)),
    tf.keras.layers.LSTM(100, return_sequences=True),
    tf.keras.layers.LSTM(100),
    tf.keras.layers.Dense(100, activation=”relu”),
    tf.keras.layers.Dense(100, activation=”relu”),
    tf.keras.layers.Dense(n_outPut_charactor)
    ])

    model.summary()
    Model: “sequential”
    ________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    lstm (LSTM) (None, 20, 100) 59600

    lstm_1 (LSTM) (None, 100) 80400

    dense (Dense) (None, 100) 10100

    dense_1 (Dense) (None, 100) 10100

    dense_2 (Dense) (None, 44) 4444

    =================================================================
    Total params: 164644 (643.14 KB)
    Trainable params: 164644 (643.14 KB)

    At stage 2 (Encoder- Decoder model)

    model = tf.keras.models.Sequential([

    tf.keras.layers.Input(shape=(window_size, n_character)),

    tf.keras.layers.LSTM(100),
    tf.keras.layers.RepeatVector(n_step_out),
    tf.keras.layers.LSTM(100,return_sequences=True),
    tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(n_outPut_charactor,activation=’relu’)),
    ])
    Model: “sequential”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    lstm (LSTM) (None, 100) 59600

    repeat_vector (RepeatVecto (None, 3, 100) 0
    r)

    lstm_1 (LSTM) (None, 3, 100) 80400

    time_distributed (TimeDist (None, 3, 44) 4444
    ributed)

    =================================================================
    Total params: 144444 (564.23 KB)
    Trainable params: 144444 (564.23 KB)

  377. Avatar
    Ahmad December 26, 2023 at 4:11 pm #

    Hi James,

    I aim to develop an ML predictive model (forecasting) to predict the next failure time

    I have the following data type:

    -Failure date (dd/mm/yy)
    -Failure time (11:00 am)
    -Recovery data (dd/mm/yy)
    -Recovery time (11:30 am)
    -Operational delay (30 min)
    -Age of equipment
    -Number of Failures last time

    Q:

    1-Can you suggest models to be used for prediction
    2- Is there an example of this type of prediction
    3- How to per-processing the (date & time) Data

    Regards,

  378. Avatar
    Julia December 29, 2023 at 11:36 pm #

    Thank you, it seems that you explained the LSTM model implementation quite well but I cannot run your code. Why there is no intended block in the for loops and if loops?

    • Avatar
      James Carmichael December 30, 2023 at 9:30 am #

      Hi Julia…did you type the code or copy and paste it? There could be formatting issues resulting from the way in which the code was entered into your Python environment.

      • Avatar
        Julia December 31, 2023 at 7:04 am #

        Hi James, thank you for your answer. I have found the way to properly copy the code by click toggle plain code.

  379. Avatar
    Bassel January 16, 2024 at 1:00 pm #

    Thank you Jason for the great resource. I have a question : I am trying to train an LSTM autoencoder model on a multivariate time series to detect anomalies using reconstruction error. I want to train the model on normal operating mode, and i have 2 years time of data. A fault occurs 4 months into the timeseries, so i have normal operating mode data before the fault and another normal data after the fault. How can use those two sub time series before and after the fault to train the model ? As far as i know, the timeseries should have a consistent time interval and without cuts in time. What do you suggest ? I was considering adding time features to the existing features fed to the model, explicitly feeding the Model with time information, other option would be maybe to update the model after training it on the first subseries before the fault and then updating it with the second time series after the fault, am not sure this is possible.
    Thank you for your time again.

  380. Avatar
    martin February 1, 2024 at 3:18 pm #

    Hello Dr. Brownlee

    thank you for putting this together! I really helped me understand the operations behind LSTM.

    i have couple questions if you can
    1. in vanilla/stackedetc LSTM you use “model.add(LSTM(50,” .. why 50? the keras LSTM doc specifies this field as “units: Positive integer, dimensionality of the output space.”, which makes me think we should use n_steps or n_features, but as i tried to run it with either of those two options the result was absolutely nowhere near what it should be
    2. in Multiple Input Series > Multiple Input Series shouldnt the “Output” be 85 and not 65 since 85 is the output at the next timestep in the dataseries? similarly as 10,20,30 and output was 40?

    • Avatar
      James Carmichael February 2, 2024 at 10:36 am #

      Hi Martin…

      Determining the input and output parameters of Long Short-Term Memory (LSTM) models is crucial for designing neural networks that can effectively process sequence data (e.g., time series, natural language text). LSTM models are a type of recurrent neural network (RNN) capable of learning long-term dependencies in data, making them suitable for tasks like language modeling, time series forecasting, and more.

      ### Input Parameters

      1. **Input Shape:**
      – The input shape to an LSTM layer is typically (batch_size, time_steps, features):
      – **batch_size**: How many sequences you’re passing through the network at once. It can be left unspecified (None) during model definition for flexibility.
      – **time_steps**: The length of the sequence, i.e., how many time steps or elements are in each sequence.
      – **features**: The number of features in each time step. For instance, in text processing, it could be the size of the word embedding vector; in time series, the number of variables at each time step.

      2. **Timesteps and Feature Selection:**
      – Based on the problem, decide how many past observations (time steps) your model should consider for predicting the future value or next sequence element. This will define your window size or the sequence length.
      – The features depend on the data available and the nature of the problem. For instance, in a stock price forecasting problem, features could include past prices, volume, and other technical indicators.

      ### Output Parameters

      1. **Output Shape:**
      – The output of an LSTM can be tailored based on the task:
      – **Many-to-One**: For tasks like sentiment analysis, where the entire sequence maps to a single label. The output shape would be (batch_size, units), where units refer to the number of LSTM units (neurons).
      – **Many-to-Many**: For tasks like machine translation or sequence generation, where each input time step corresponds to an output time step. This can be achieved by setting return_sequences=True in LSTM layers, resulting in an output shape of (batch_size, time_steps, units).
      – **Custom**: Using techniques like sequence-to-sequence models, where an encoder LSTM’s output is used as an input to a decoder LSTM, allowing for flexible input-output configurations.

      2. **Number of Units:**
      – This parameter defines the dimensionality of the output space of the LSTM layer, i.e., how many hidden states (neurons) each unit/time step should have. It is a crucial parameter to tune based on the complexity of the task and the amount of data available.

      ### Design Considerations

      – **Sequence Padding:** If your input sequences have variable lengths, you’ll need to pad them to ensure they have the same length for batch processing.
      – **Batch Size:** The choice of batch size can affect training dynamics and performance. Smaller batches might lead to faster convergence but can be noisier. Larger batches provide more stable but potentially slower convergence.
      – **Statefulness:** Decide whether your LSTM model should remember its state (hidden states) across batches. Stateful LSTMs can be beneficial for time series data where the sequence continuity across batches is important.

      ### Practical Steps

      1. **Preprocessing**:
      – Normalize/standardize your input data.
      – Convert text data into numerical form (e.g., embeddings for NLP tasks).
      – Ensure sequences have a fixed length (padding/truncating where necessary).

      2. **Model Definition**:
      – Choose the appropriate architecture (e.g., stacked LSTMs, bidirectional LSTMs) based on your problem.
      – Experiment with different numbers of units, batch sizes, and sequence lengths.

      3. **Training**:
      – Use a validation set to monitor performance and avoid overfitting.
      – Adjust learning rate, optimization algorithm, and other hyperparameters as needed.

      Determining the optimal input and output parameters for LSTM models often requires experimentation and is guided by the specific requirements and constraints of your application.

  381. Avatar
    Mesabo Mesman February 2, 2024 at 5:02 pm #

    With your tutorials, It took me only a week to complete LSTM necessary knowledge for working on a real-world problem. Thank you si much!

    • Avatar
      James Carmichael February 3, 2024 at 9:45 am #

      Hi Mesabo…You are very welcome! Thank you for sharing your success!

  382. Avatar
    Arsalan February 20, 2024 at 1:09 am #

    Hello
    I have 21 images(tiff file) that each of them has 60 bands. and each of them is for one year(2000-2020). one of this bands is land cover of pixel.
    I want forecast land cover change for next year of data
    which model do you suggest? ConvLSTM?

    • Avatar
      James Carmichael February 20, 2024 at 7:02 am #

      Hi Arsalan…That would be a great model type to start with! Let us know how it goes!

  383. Avatar
    Charitini February 25, 2024 at 11:12 pm #

    Hello,

    First, I would like to say, that this is an amazing tutorial!

    My question is, at the Multiple Parallel Series example where we have three input series and three output series (3 features) in a single LSTM net, how is the loss computed? Is it the average of the losses in each of the three parallel series?

    Best!!

Leave a Reply