Last Updated on

Long Short-Term Memory networks, or LSTMs for short, can be applied to time series forecasting.

There are many types of LSTM models that can be used for each specific type of time series forecasting problem.

In this tutorial, you will discover how to develop a suite of LSTM models for a range of standard time series forecasting problems.

The objective of this tutorial is to provide standalone examples of each model on each type of time series problem as a template that you can copy and adapt for your specific time series forecasting problem.

After completing this tutorial, you will know:

- How to develop LSTM models for univariate time series forecasting.
- How to develop LSTM models for multivariate time series forecasting.
- How to develop LSTM models for multi-step time series forecasting.

This is a large and important post; you may want to bookmark it for future reference.

Discover how to build models for multivariate and multi-step time series forecasting with LSTMs and more in my new book, with 25 step-by-step tutorials and full source code.

Let’s get started.

## Tutorial Overview

In this tutorial, we will explore how to develop a suite of different types of LSTM models for time series forecasting.

The models are demonstrated on small contrived time series problems intended to give the flavor of the type of time series problem being addressed. The chosen configuration of the models is arbitrary and not optimized for each problem; that was not the goal.

This tutorial is divided into four parts; they are:

- Univariate LSTM Models
- Multivariate LSTM Models
- Multi-Step LSTM Models
- Multivariate Multi-Step LSTM Models

## Univariate LSTM Models

LSTMs can be used to model univariate time series forecasting problems.

These are problems comprised of a single series of observations and a model is required to learn from the series of past observations to predict the next value in the sequence.

We will demonstrate a number of variations of the LSTM model for univariate time series forecasting.

This section is divided into six parts; they are:

- Data Preparation
- Vanilla LSTM
- Stacked LSTM
- Bidirectional LSTM
- CNN LSTM
- ConvLSTM

Each of these models are demonstrated for one-step univariate time series forecasting, but can easily be adapted and used as the input part of a model for other types of time series forecasting problems.

### Data Preparation

Before a univariate series can be modeled, it must be prepared.

The LSTM model will learn a function that maps a sequence of past observations as input to an output observation. As such, the sequence of observations must be transformed into multiple examples from which the LSTM can learn.

Consider a given univariate sequence:

1 |
[10, 20, 30, 40, 50, 60, 70, 80, 90] |

We can divide the sequence into multiple input/output patterns called samples, where three time steps are used as input and one time step is used as output for the one-step prediction that is being learned.

1 2 3 4 5 |
X, y 10, 20, 30 40 20, 30, 40 50 30, 40, 50 60 ... |

The *split_sequence()* function below implements this behavior and will split a given univariate sequence into multiple samples where each sample has a specified number of time steps and the output is a single time step.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# split a univariate sequence into samples def split_sequence(sequence, n_steps): X, y = list(), list() for i in range(len(sequence)): # find the end of this pattern end_ix = i + n_steps # check if we are beyond the sequence if end_ix > len(sequence)-1: break # gather input and output parts of the pattern seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] X.append(seq_x) y.append(seq_y) return array(X), array(y) |

We can demonstrate this function on our small contrived dataset above.

The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
# univariate data preparation from numpy import array # split a univariate sequence into samples def split_sequence(sequence, n_steps): X, y = list(), list() for i in range(len(sequence)): # find the end of this pattern end_ix = i + n_steps # check if we are beyond the sequence if end_ix > len(sequence)-1: break # gather input and output parts of the pattern seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] X.append(seq_x) y.append(seq_y) return array(X), array(y) # define input sequence raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90] # choose a number of time steps n_steps = 3 # split into samples X, y = split_sequence(raw_seq, n_steps) # summarize the data for i in range(len(X)): print(X[i], y[i]) |

Running the example splits the univariate series into six samples where each sample has three input time steps and one output time step.

1 2 3 4 5 6 |
[10 20 30] 40 [20 30 40] 50 [30 40 50] 60 [40 50 60] 70 [50 60 70] 80 [60 70 80] 90 |

Now that we know how to prepare a univariate series for modeling, let’s look at developing LSTM models that can learn the mapping of inputs to outputs, starting with a Vanilla LSTM.

### Need help with Deep Learning for Time Series?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

### Vanilla LSTM

A Vanilla LSTM is an LSTM model that has a single hidden layer of LSTM units, and an output layer used to make a prediction.

We can define a Vanilla LSTM for univariate time series forecasting as follows.

1 2 3 4 5 |
# define model model = Sequential() model.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features))) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') |

Key in the definition is the shape of the input; that is what the model expects as input for each sample in terms of the number of time steps and the number of features.

We are working with a univariate series, so the number of features is one, for one variable.

The number of time steps as input is the number we chose when preparing our dataset as an argument to the *split_sequence()* function.

The shape of the input for each sample is specified in the *input_shape* argument on the definition of first hidden layer.

We almost always have multiple samples, therefore, the model will expect the input component of training data to have the dimensions or shape:

1 |
[samples, timesteps, features] |

Our *split_sequence()* function in the previous section outputs the X with the shape [*samples, timesteps*], so we easily reshape it to have an additional dimension for the one feature.

1 2 3 |
# reshape from [samples, timesteps] into [samples, timesteps, features] n_features = 1 X = X.reshape((X.shape[0], X.shape[1], n_features)) |

In this case, we define a model with 50 LSTM units in the hidden layer and an output layer that predicts a single numerical value.

The model is fit using the efficient Adam version of stochastic gradient descent and optimized using the mean squared error, or ‘*mse*‘ loss function.

Once the model is defined, we can fit it on the training dataset.

1 2 |
# fit model model.fit(X, y, epochs=200, verbose=0) |

After the model is fit, we can use it to make a prediction.

We can predict the next value in the sequence by providing the input:

1 |
[70, 80, 90] |

And expecting the model to predict something like:

1 |
[100] |

The model expects the input shape to be three-dimensional with [*samples, timesteps, features*], therefore, we must reshape the single input sample before making the prediction.

1 2 3 4 |
# demonstrate prediction x_input = array([70, 80, 90]) x_input = x_input.reshape((1, n_steps, n_features)) yhat = model.predict(x_input, verbose=0) |

We can tie all of this together and demonstrate how to develop a Vanilla LSTM for univariate time series forecasting and make a single prediction.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
# univariate lstm example from numpy import array from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense # split a univariate sequence into samples def split_sequence(sequence, n_steps): X, y = list(), list() for i in range(len(sequence)): # find the end of this pattern end_ix = i + n_steps # check if we are beyond the sequence if end_ix > len(sequence)-1: break # gather input and output parts of the pattern seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] X.append(seq_x) y.append(seq_y) return array(X), array(y) # define input sequence raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90] # choose a number of time steps n_steps = 3 # split into samples X, y = split_sequence(raw_seq, n_steps) # reshape from [samples, timesteps] into [samples, timesteps, features] n_features = 1 X = X.reshape((X.shape[0], X.shape[1], n_features)) # define model model = Sequential() model.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features))) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') # fit model model.fit(X, y, epochs=200, verbose=0) # demonstrate prediction x_input = array([70, 80, 90]) x_input = x_input.reshape((1, n_steps, n_features)) yhat = model.predict(x_input, verbose=0) print(yhat) |

Running the example prepares the data, fits the model, and makes a prediction.

Your results may vary given the stochastic nature of the algorithm; try running the example a few times.

We can see that the model predicts the next value in the sequence.

1 |
[[102.09213]] |

### Stacked LSTM

Multiple hidden LSTM layers can be stacked one on top of another in what is referred to as a Stacked LSTM model.

An LSTM layer requires a three-dimensional input and LSTMs by default will produce a two-dimensional output as an interpretation from the end of the sequence.

We can address this by having the LSTM output a value for each time step in the input data by setting the *return_sequences=True* argument on the layer. This allows us to have 3D output from hidden LSTM layer as input to the next.

We can therefore define a Stacked LSTM as follows.

1 2 3 4 5 6 |
# define model model = Sequential() model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(n_steps, n_features))) model.add(LSTM(50, activation='relu')) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') |

We can tie this together; the complete code example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
# univariate stacked lstm example from numpy import array from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense # split a univariate sequence def split_sequence(sequence, n_steps): X, y = list(), list() for i in range(len(sequence)): # find the end of this pattern end_ix = i + n_steps # check if we are beyond the sequence if end_ix > len(sequence)-1: break # gather input and output parts of the pattern seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] X.append(seq_x) y.append(seq_y) return array(X), array(y) # define input sequence raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90] # choose a number of time steps n_steps = 3 # split into samples X, y = split_sequence(raw_seq, n_steps) # reshape from [samples, timesteps] into [samples, timesteps, features] n_features = 1 X = X.reshape((X.shape[0], X.shape[1], n_features)) # define model model = Sequential() model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(n_steps, n_features))) model.add(LSTM(50, activation='relu')) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') # fit model model.fit(X, y, epochs=200, verbose=0) # demonstrate prediction x_input = array([70, 80, 90]) x_input = x_input.reshape((1, n_steps, n_features)) yhat = model.predict(x_input, verbose=0) print(yhat) |

Running the example predicts the next value in the sequence, which we expect would be 100.

1 |
[[102.47341]] |

### Bidirectional LSTM

On some sequence prediction problems, it can be beneficial to allow the LSTM model to learn the input sequence both forward and backwards and concatenate both interpretations.

This is called a Bidirectional LSTM.

We can implement a Bidirectional LSTM for univariate time series forecasting by wrapping the first hidden layer in a wrapper layer called Bidirectional.

An example of defining a Bidirectional LSTM to read input both forward and backward is as follows.

1 2 3 4 5 |
# define model model = Sequential() model.add(Bidirectional(LSTM(50, activation='relu'), input_shape=(n_steps, n_features))) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') |

The complete example of the Bidirectional LSTM for univariate time series forecasting is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
# univariate bidirectional lstm example from numpy import array from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense from keras.layers import Bidirectional # split a univariate sequence def split_sequence(sequence, n_steps): X, y = list(), list() for i in range(len(sequence)): # find the end of this pattern end_ix = i + n_steps # check if we are beyond the sequence if end_ix > len(sequence)-1: break # gather input and output parts of the pattern seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] X.append(seq_x) y.append(seq_y) return array(X), array(y) # define input sequence raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90] # choose a number of time steps n_steps = 3 # split into samples X, y = split_sequence(raw_seq, n_steps) # reshape from [samples, timesteps] into [samples, timesteps, features] n_features = 1 X = X.reshape((X.shape[0], X.shape[1], n_features)) # define model model = Sequential() model.add(Bidirectional(LSTM(50, activation='relu'), input_shape=(n_steps, n_features))) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') # fit model model.fit(X, y, epochs=200, verbose=0) # demonstrate prediction x_input = array([70, 80, 90]) x_input = x_input.reshape((1, n_steps, n_features)) yhat = model.predict(x_input, verbose=0) print(yhat) |

Running the example predicts the next value in the sequence, which we expect would be 100.

1 |
[[101.48093]] |

### CNN LSTM

A convolutional neural network, or CNN for short, is a type of neural network developed for working with two-dimensional image data.

The CNN can be very effective at automatically extracting and learning features from one-dimensional sequence data such as univariate time series data.

A CNN model can be used in a hybrid model with an LSTM backend where the CNN is used to interpret subsequences of input that together are provided as a sequence to an LSTM model to interpret. This hybrid model is called a CNN-LSTM.

The first step is to split the input sequences into subsequences that can be processed by the CNN model. For example, we can first split our univariate time series data into input/output samples with four steps as input and one as output. Each sample can then be split into two sub-samples, each with two time steps. The CNN can interpret each subsequence of two time steps and provide a time series of interpretations of the subsequences to the LSTM model to process as input.

We can parameterize this and define the number of subsequences as *n_seq* and the number of time steps per subsequence as *n_steps*. The input data can then be reshaped to have the required structure:

1 |
[samples, subsequences, timesteps, features] |

For example:

1 2 3 4 5 6 7 8 9 |
# choose a number of time steps n_steps = 4 # split into samples X, y = split_sequence(raw_seq, n_steps) # reshape from [samples, timesteps] into [samples, subsequences, timesteps, features] n_features = 1 n_seq = 2 n_steps = 2 X = X.reshape((X.shape[0], n_seq, n_steps, n_features)) |

We want to reuse the same CNN model when reading in each sub-sequence of data separately.

This can be achieved by wrapping the entire CNN model in a TimeDistributed wrapper that will apply the entire model once per input, in this case, once per input subsequence.

The CNN model first has a convolutional layer for reading across the subsequence that requires a number of filters and a kernel size to be specified. The number of filters is the number of reads or interpretations of the input sequence. The kernel size is the number of time steps included of each ‘read’ operation of the input sequence.

The convolution layer is followed by a max pooling layer that distills the filter maps down to 1/2 of their size that includes the most salient features. These structures are then flattened down to a single one-dimensional vector to be used as a single input time step to the LSTM layer.

1 2 3 |
model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation='relu'), input_shape=(None, n_steps, n_features))) model.add(TimeDistributed(MaxPooling1D(pool_size=2))) model.add(TimeDistributed(Flatten())) |

Next, we can define the LSTM part of the model that interprets the CNN model’s read of the input sequence and makes a prediction.

1 2 |
model.add(LSTM(50, activation='relu')) model.add(Dense(1)) |

We can tie all of this together; the complete example of a CNN-LSTM model for univariate time series forecasting is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
# univariate cnn lstm example from numpy import array from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense from keras.layers import Flatten from keras.layers import TimeDistributed from keras.layers.convolutional import Conv1D from keras.layers.convolutional import MaxPooling1D # split a univariate sequence into samples def split_sequence(sequence, n_steps): X, y = list(), list() for i in range(len(sequence)): # find the end of this pattern end_ix = i + n_steps # check if we are beyond the sequence if end_ix > len(sequence)-1: break # gather input and output parts of the pattern seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] X.append(seq_x) y.append(seq_y) return array(X), array(y) # define input sequence raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90] # choose a number of time steps n_steps = 4 # split into samples X, y = split_sequence(raw_seq, n_steps) # reshape from [samples, timesteps] into [samples, subsequences, timesteps, features] n_features = 1 n_seq = 2 n_steps = 2 X = X.reshape((X.shape[0], n_seq, n_steps, n_features)) # define model model = Sequential() model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation='relu'), input_shape=(None, n_steps, n_features))) model.add(TimeDistributed(MaxPooling1D(pool_size=2))) model.add(TimeDistributed(Flatten())) model.add(LSTM(50, activation='relu')) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') # fit model model.fit(X, y, epochs=500, verbose=0) # demonstrate prediction x_input = array([60, 70, 80, 90]) x_input = x_input.reshape((1, n_seq, n_steps, n_features)) yhat = model.predict(x_input, verbose=0) print(yhat) |

Running the example predicts the next value in the sequence, which we expect would be 100.

1 |
[[101.69263]] |

### ConvLSTM

A type of LSTM related to the CNN-LSTM is the ConvLSTM, where the convolutional reading of input is built directly into each LSTM unit.

The ConvLSTM was developed for reading two-dimensional spatial-temporal data, but can be adapted for use with univariate time series forecasting.

The layer expects input as a sequence of two-dimensional images, therefore the shape of input data must be:

1 |
[samples, timesteps, rows, columns, features] |

For our purposes, we can split each sample into subsequences where timesteps will become the number of subsequences, or *n_seq*, and columns will be the number of time steps for each subsequence, or *n_steps*. The number of rows is fixed at 1 as we are working with one-dimensional data.

We can now reshape the prepared samples into the required structure.

1 2 3 4 5 6 7 8 9 |
# choose a number of time steps n_steps = 4 # split into samples X, y = split_sequence(raw_seq, n_steps) # reshape from [samples, timesteps] into [samples, timesteps, rows, columns, features] n_features = 1 n_seq = 2 n_steps = 2 X = X.reshape((X.shape[0], n_seq, 1, n_steps, n_features)) |

We can define the ConvLSTM as a single layer in terms of the number of filters and a two-dimensional kernel size in terms of (rows, columns). As we are working with a one-dimensional series, the number of rows is always fixed to 1 in the kernel.

The output of the model must then be flattened before it can be interpreted and a prediction made.

1 2 |
model.add(ConvLSTM2D(filters=64, kernel_size=(1,2), activation='relu', input_shape=(n_seq, 1, n_steps, n_features))) model.add(Flatten()) |

The complete example of a ConvLSTM for one-step univariate time series forecasting is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
# univariate convlstm example from numpy import array from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense from keras.layers import Flatten from keras.layers import ConvLSTM2D # split a univariate sequence into samples def split_sequence(sequence, n_steps): X, y = list(), list() for i in range(len(sequence)): # find the end of this pattern end_ix = i + n_steps # check if we are beyond the sequence if end_ix > len(sequence)-1: break # gather input and output parts of the pattern seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] X.append(seq_x) y.append(seq_y) return array(X), array(y) # define input sequence raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90] # choose a number of time steps n_steps = 4 # split into samples X, y = split_sequence(raw_seq, n_steps) # reshape from [samples, timesteps] into [samples, timesteps, rows, columns, features] n_features = 1 n_seq = 2 n_steps = 2 X = X.reshape((X.shape[0], n_seq, 1, n_steps, n_features)) # define model model = Sequential() model.add(ConvLSTM2D(filters=64, kernel_size=(1,2), activation='relu', input_shape=(n_seq, 1, n_steps, n_features))) model.add(Flatten()) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') # fit model model.fit(X, y, epochs=500, verbose=0) # demonstrate prediction x_input = array([60, 70, 80, 90]) x_input = x_input.reshape((1, n_seq, 1, n_steps, n_features)) yhat = model.predict(x_input, verbose=0) print(yhat) |

Running the example predicts the next value in the sequence, which we expect would be 100.

1 |
[[103.68166]] |

Now that we have looked at LSTM models for univariate data, let’s turn our attention to multivariate data.

## Multivariate LSTM Models

Multivariate time series data means data where there is more than one observation for each time step.

There are two main models that we may require with multivariate time series data; they are:

- Multiple Input Series.
- Multiple Parallel Series.

Let’s take a look at each in turn.

### Multiple Input Series

A problem may have two or more parallel input time series and an output time series that is dependent on the input time series.

The input time series are parallel because each series has an observation at the same time steps.

We can demonstrate this with a simple example of two parallel input time series where the output series is the simple addition of the input series.

1 2 3 4 |
# define input sequence in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90]) in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95]) out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))]) |

We can reshape these three arrays of data as a single dataset where each row is a time step, and each column is a separate time series. This is a standard way of storing parallel time series in a CSV file.

1 2 3 4 5 6 |
# convert to [rows, columns] structure in_seq1 = in_seq1.reshape((len(in_seq1), 1)) in_seq2 = in_seq2.reshape((len(in_seq2), 1)) out_seq = out_seq.reshape((len(out_seq), 1)) # horizontally stack columns dataset = hstack((in_seq1, in_seq2, out_seq)) |

The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# multivariate data preparation from numpy import array from numpy import hstack # define input sequence in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90]) in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95]) out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))]) # convert to [rows, columns] structure in_seq1 = in_seq1.reshape((len(in_seq1), 1)) in_seq2 = in_seq2.reshape((len(in_seq2), 1)) out_seq = out_seq.reshape((len(out_seq), 1)) # horizontally stack columns dataset = hstack((in_seq1, in_seq2, out_seq)) print(dataset) |

Running the example prints the dataset with one row per time step and one column for each of the two input and one output parallel time series.

1 2 3 4 5 6 7 8 9 |
[[ 10 15 25] [ 20 25 45] [ 30 35 65] [ 40 45 85] [ 50 55 105] [ 60 65 125] [ 70 75 145] [ 80 85 165] [ 90 95 185]] |

As with the univariate time series, we must structure these data into samples with input and output elements.

An LSTM model needs sufficient context to learn a mapping from an input sequence to an output value. LSTMs can support parallel input time series as separate variables or features. Therefore, we need to split the data into samples maintaining the order of observations across the two input sequences.

If we chose three input time steps, then the first sample would look as follows:

Input:

1 2 3 |
10, 15 20, 25 30, 35 |

Output:

1 |
65 |

That is, the first three time steps of each parallel series are provided as input to the model and the model associates this with the value in the output series at the third time step, in this case, 65.

We can see that, in transforming the time series into input/output samples to train the model, that we will have to discard some values from the output time series where we do not have values in the input time series at prior time steps. In turn, the choice of the size of the number of input time steps will have an important effect on how much of the training data is used.

We can define a function named *split_sequences()* that will take a dataset as we have defined it with rows for time steps and columns for parallel series and return input/output samples.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# split a multivariate sequence into samples def split_sequences(sequences, n_steps): X, y = list(), list() for i in range(len(sequences)): # find the end of this pattern end_ix = i + n_steps # check if we are beyond the dataset if end_ix > len(sequences): break # gather input and output parts of the pattern seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1, -1] X.append(seq_x) y.append(seq_y) return array(X), array(y) |

We can test this function on our dataset using three time steps for each input time series as input.

The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
# multivariate data preparation from numpy import array from numpy import hstack # split a multivariate sequence into samples def split_sequences(sequences, n_steps): X, y = list(), list() for i in range(len(sequences)): # find the end of this pattern end_ix = i + n_steps # check if we are beyond the dataset if end_ix > len(sequences): break # gather input and output parts of the pattern seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1, -1] X.append(seq_x) y.append(seq_y) return array(X), array(y) # define input sequence in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90]) in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95]) out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))]) # convert to [rows, columns] structure in_seq1 = in_seq1.reshape((len(in_seq1), 1)) in_seq2 = in_seq2.reshape((len(in_seq2), 1)) out_seq = out_seq.reshape((len(out_seq), 1)) # horizontally stack columns dataset = hstack((in_seq1, in_seq2, out_seq)) # choose a number of time steps n_steps = 3 # convert into input/output X, y = split_sequences(dataset, n_steps) print(X.shape, y.shape) # summarize the data for i in range(len(X)): print(X[i], y[i]) |

Running the example first prints the shape of the X and y components.

We can see that the X component has a three-dimensional structure.

The first dimension is the number of samples, in this case 7. The second dimension is the number of time steps per sample, in this case 3, the value specified to the function. Finally, the last dimension specifies the number of parallel time series or the number of variables, in this case 2 for the two parallel series.

This is the exact three-dimensional structure expected by an LSTM as input. The data is ready to use without further reshaping.

We can then see that the input and output for each sample is printed, showing the three time steps for each of the two input series and the associated output for each sample.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
(7, 3, 2) (7,) [[10 15] [20 25] [30 35]] 65 [[20 25] [30 35] [40 45]] 85 [[30 35] [40 45] [50 55]] 105 [[40 45] [50 55] [60 65]] 125 [[50 55] [60 65] [70 75]] 145 [[60 65] [70 75] [80 85]] 165 [[70 75] [80 85] [90 95]] 185 |

We are now ready to fit an LSTM model on this data.

Any of the varieties of LSTMs in the previous section can be used, such as a Vanilla, Stacked, Bidirectional, CNN, or ConvLSTM model.

We will use a Vanilla LSTM where the number of time steps and parallel series (features) are specified for the input layer via the *input_shape* argument.

1 2 3 4 5 |
# define model model = Sequential() model.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features))) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') |

When making a prediction, the model expects three time steps for two input time series.

We can predict the next value in the output series providing the input values of:

1 2 3 |
80, 85 90, 95 100, 105 |

The shape of the one sample with three time steps and two variables must be [1, 3, 2].

We would expect the next value in the sequence to be 100 + 105, or 205.

1 2 3 4 |
# demonstrate prediction x_input = array([[80, 85], [90, 95], [100, 105]]) x_input = x_input.reshape((1, n_steps, n_features)) yhat = model.predict(x_input, verbose=0) |

The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
# multivariate lstm example from numpy import array from numpy import hstack from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense # split a multivariate sequence into samples def split_sequences(sequences, n_steps): X, y = list(), list() for i in range(len(sequences)): # find the end of this pattern end_ix = i + n_steps # check if we are beyond the dataset if end_ix > len(sequences): break # gather input and output parts of the pattern seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1, -1] X.append(seq_x) y.append(seq_y) return array(X), array(y) # define input sequence in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90]) in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95]) out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))]) # convert to [rows, columns] structure in_seq1 = in_seq1.reshape((len(in_seq1), 1)) in_seq2 = in_seq2.reshape((len(in_seq2), 1)) out_seq = out_seq.reshape((len(out_seq), 1)) # horizontally stack columns dataset = hstack((in_seq1, in_seq2, out_seq)) # choose a number of time steps n_steps = 3 # convert into input/output X, y = split_sequences(dataset, n_steps) # the dataset knows the number of features, e.g. 2 n_features = X.shape[2] # define model model = Sequential() model.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features))) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') # fit model model.fit(X, y, epochs=200, verbose=0) # demonstrate prediction x_input = array([[80, 85], [90, 95], [100, 105]]) x_input = x_input.reshape((1, n_steps, n_features)) yhat = model.predict(x_input, verbose=0) print(yhat) |

Running the example prepares the data, fits the model, and makes a prediction.

1 |
[[208.13531]] |

## Multiple Parallel Series

An alternate time series problem is the case where there are multiple parallel time series and a value must be predicted for each.

For example, given the data from the previous section:

1 2 3 4 5 6 7 8 9 |
[[ 10 15 25] [ 20 25 45] [ 30 35 65] [ 40 45 85] [ 50 55 105] [ 60 65 125] [ 70 75 145] [ 80 85 165] [ 90 95 185]] |

We may want to predict the value for each of the three time series for the next time step.

This might be referred to as multivariate forecasting.

Again, the data must be split into input/output samples in order to train a model.

The first sample of this dataset would be:

Input:

1 2 3 |
10, 15, 25 20, 25, 45 30, 35, 65 |

Output:

1 |
40, 45, 85 |

The *split_sequences()* function below will split multiple parallel time series with rows for time steps and one series per column into the required input/output shape.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# split a multivariate sequence into samples def split_sequences(sequences, n_steps): X, y = list(), list() for i in range(len(sequences)): # find the end of this pattern end_ix = i + n_steps # check if we are beyond the dataset if end_ix > len(sequences)-1: break # gather input and output parts of the pattern seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix, :] X.append(seq_x) y.append(seq_y) return array(X), array(y) |

We can demonstrate this on the contrived problem; the complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
# multivariate output data prep from numpy import array from numpy import hstack # split a multivariate sequence into samples def split_sequences(sequences, n_steps): X, y = list(), list() for i in range(len(sequences)): # find the end of this pattern end_ix = i + n_steps # check if we are beyond the dataset if end_ix > len(sequences)-1: break # gather input and output parts of the pattern seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix, :] X.append(seq_x) y.append(seq_y) return array(X), array(y) # define input sequence in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90]) in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95]) out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))]) # convert to [rows, columns] structure in_seq1 = in_seq1.reshape((len(in_seq1), 1)) in_seq2 = in_seq2.reshape((len(in_seq2), 1)) out_seq = out_seq.reshape((len(out_seq), 1)) # horizontally stack columns dataset = hstack((in_seq1, in_seq2, out_seq)) # choose a number of time steps n_steps = 3 # convert into input/output X, y = split_sequences(dataset, n_steps) print(X.shape, y.shape) # summarize the data for i in range(len(X)): print(X[i], y[i]) |

Running the example first prints the shape of the prepared X and y components.

The shape of X is three-dimensional, including the number of samples (6), the number of time steps chosen per sample (3), and the number of parallel time series or features (3).

The shape of y is two-dimensional as we might expect for the number of samples (6) and the number of time variables per sample to be predicted (3).

The data is ready to use in an LSTM model that expects three-dimensional input and two-dimensional output shapes for the X and y components of each sample.

Then, each of the samples is printed showing the input and output components of each sample.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
(6, 3, 3) (6, 3) [[10 15 25] [20 25 45] [30 35 65]] [40 45 85] [[20 25 45] [30 35 65] [40 45 85]] [ 50 55 105] [[ 30 35 65] [ 40 45 85] [ 50 55 105]] [ 60 65 125] [[ 40 45 85] [ 50 55 105] [ 60 65 125]] [ 70 75 145] [[ 50 55 105] [ 60 65 125] [ 70 75 145]] [ 80 85 165] [[ 60 65 125] [ 70 75 145] [ 80 85 165]] [ 90 95 185] |

We are now ready to fit an LSTM model on this data.

Any of the varieties of LSTMs in the previous section can be used, such as a Vanilla, Stacked, Bidirectional, CNN, or ConvLSTM model.

We will use a Stacked LSTM where the number of time steps and parallel series (features) are specified for the input layer via the *input_shape* argument. The number of parallel series is also used in the specification of the number of values to predict by the model in the output layer; again, this is three.

1 2 3 4 5 6 |
# define model model = Sequential() model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps, n_features))) model.add(LSTM(100, activation='relu')) model.add(Dense(n_features)) model.compile(optimizer='adam', loss='mse') |

We can predict the next value in each of the three parallel series by providing an input of three time steps for each series.

1 2 3 |
70, 75, 145 80, 85, 165 90, 95, 185 |

The shape of the input for making a single prediction must be 1 sample, 3 time steps, and 3 features, or [1, 3, 3]

1 2 3 4 |
# demonstrate prediction x_input = array([[70,75,145], [80,85,165], [90,95,185]]) x_input = x_input.reshape((1, n_steps, n_features)) yhat = model.predict(x_input, verbose=0) |

We would expect the vector output to be:

1 |
[100, 105, 205] |

We can tie all of this together and demonstrate a Stacked LSTM for multivariate output time series forecasting below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
# multivariate output stacked lstm example from numpy import array from numpy import hstack from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense # split a multivariate sequence into samples def split_sequences(sequences, n_steps): X, y = list(), list() for i in range(len(sequences)): # find the end of this pattern end_ix = i + n_steps # check if we are beyond the dataset if end_ix > len(sequences)-1: break # gather input and output parts of the pattern seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix, :] X.append(seq_x) y.append(seq_y) return array(X), array(y) # define input sequence in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90]) in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95]) out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))]) # convert to [rows, columns] structure in_seq1 = in_seq1.reshape((len(in_seq1), 1)) in_seq2 = in_seq2.reshape((len(in_seq2), 1)) out_seq = out_seq.reshape((len(out_seq), 1)) # horizontally stack columns dataset = hstack((in_seq1, in_seq2, out_seq)) # choose a number of time steps n_steps = 3 # convert into input/output X, y = split_sequences(dataset, n_steps) # the dataset knows the number of features, e.g. 2 n_features = X.shape[2] # define model model = Sequential() model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps, n_features))) model.add(LSTM(100, activation='relu')) model.add(Dense(n_features)) model.compile(optimizer='adam', loss='mse') # fit model model.fit(X, y, epochs=400, verbose=0) # demonstrate prediction x_input = array([[70,75,145], [80,85,165], [90,95,185]]) x_input = x_input.reshape((1, n_steps, n_features)) yhat = model.predict(x_input, verbose=0) print(yhat) |

Running the example prepares the data, fits the model, and makes a prediction.

1 |
[[101.76599 108.730484 206.63577 ]] |

## Multi-Step LSTM Models

A time series forecasting problem that requires a prediction of multiple time steps into the future can be referred to as multi-step time series forecasting.

Specifically, these are problems where the forecast horizon or interval is more than one time step.

There are two main types of LSTM models that can be used for multi-step forecasting; they are:

- Vector Output Model
- Encoder-Decoder Model

Before we look at these models, let’s first look at the preparation of data for multi-step forecasting.

### Data Preparation

As with one-step forecasting, a time series used for multi-step time series forecasting must be split into samples with input and output components.

Both the input and output components will be comprised of multiple time steps and may or may not have the same number of steps.

For example, given the univariate time series:

1 |
[10, 20, 30, 40, 50, 60, 70, 80, 90] |

We could use the last three time steps as input and forecast the next two time steps.

The first sample would look as follows:

Input:

1 |
[10, 20, 30] |

Output:

1 |
[40, 50] |

The *split_sequence()* function below implements this behavior and will split a given univariate time series into samples with a specified number of input and output time steps.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# split a univariate sequence into samples def split_sequence(sequence, n_steps_in, n_steps_out): X, y = list(), list() for i in range(len(sequence)): # find the end of this pattern end_ix = i + n_steps_in out_end_ix = end_ix + n_steps_out # check if we are beyond the sequence if out_end_ix > len(sequence): break # gather input and output parts of the pattern seq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix] X.append(seq_x) y.append(seq_y) return array(X), array(y) |

We can demonstrate this function on the small contrived dataset.

The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# multi-step data preparation from numpy import array # split a univariate sequence into samples def split_sequence(sequence, n_steps_in, n_steps_out): X, y = list(), list() for i in range(len(sequence)): # find the end of this pattern end_ix = i + n_steps_in out_end_ix = end_ix + n_steps_out # check if we are beyond the sequence if out_end_ix > len(sequence): break # gather input and output parts of the pattern seq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix] X.append(seq_x) y.append(seq_y) return array(X), array(y) # define input sequence raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90] # choose a number of time steps n_steps_in, n_steps_out = 3, 2 # split into samples X, y = split_sequence(raw_seq, n_steps_in, n_steps_out) # summarize the data for i in range(len(X)): print(X[i], y[i]) |

Running the example splits the univariate series into input and output time steps and prints the input and output components of each.

1 2 3 4 5 |
[10 20 30] [40 50] [20 30 40] [50 60] [30 40 50] [60 70] [40 50 60] [70 80] [50 60 70] [80 90] |

Now that we know how to prepare data for multi-step forecasting, let’s look at some LSTM models that can learn this mapping.

### Vector Output Model

Like other types of neural network models, the LSTM can output a vector directly that can be interpreted as a multi-step forecast.

This approach was seen in the previous section were one time step of each output time series was forecasted as a vector.

As with the LSTMs for univariate data in a prior section, the prepared samples must first be reshaped. The LSTM expects data to have a three-dimensional structure of [*samples, timesteps, features*], and in this case, we only have one feature so the reshape is straightforward.

1 2 3 |
# reshape from [samples, timesteps] into [samples, timesteps, features] n_features = 1 X = X.reshape((X.shape[0], X.shape[1], n_features)) |

With the number of input and output steps specified in the *n_steps_in* and *n_steps_out* variables, we can define a multi-step time-series forecasting model.

Any of the presented LSTM model types could be used, such as Vanilla, Stacked, Bidirectional, CNN-LSTM, or ConvLSTM. Below defines a Stacked LSTM for multi-step forecasting.

1 2 3 4 5 6 |
# define model model = Sequential() model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps_in, n_features))) model.add(LSTM(100, activation='relu')) model.add(Dense(n_steps_out)) model.compile(optimizer='adam', loss='mse') |

The model can make a prediction for a single sample. We can predict the next two steps beyond the end of the dataset by providing the input:

1 |
[70, 80, 90] |

We would expect the predicted output to be:

1 |
[100, 110] |

As expected by the model, the shape of the single sample of input data when making the prediction must be [1, 3, 1] for the 1 sample, 3 time steps of the input, and the single feature.

1 2 3 4 |
# demonstrate prediction x_input = array([70, 80, 90]) x_input = x_input.reshape((1, n_steps_in, n_features)) yhat = model.predict(x_input, verbose=0) |

Tying all of this together, the Stacked LSTM for multi-step forecasting with a univariate time series is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
# univariate multi-step vector-output stacked lstm example from numpy import array from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense # split a univariate sequence into samples def split_sequence(sequence, n_steps_in, n_steps_out): X, y = list(), list() for i in range(len(sequence)): # find the end of this pattern end_ix = i + n_steps_in out_end_ix = end_ix + n_steps_out # check if we are beyond the sequence if out_end_ix > len(sequence): break # gather input and output parts of the pattern seq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix] X.append(seq_x) y.append(seq_y) return array(X), array(y) # define input sequence raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90] # choose a number of time steps n_steps_in, n_steps_out = 3, 2 # split into samples X, y = split_sequence(raw_seq, n_steps_in, n_steps_out) # reshape from [samples, timesteps] into [samples, timesteps, features] n_features = 1 X = X.reshape((X.shape[0], X.shape[1], n_features)) # define model model = Sequential() model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps_in, n_features))) model.add(LSTM(100, activation='relu')) model.add(Dense(n_steps_out)) model.compile(optimizer='adam', loss='mse') # fit model model.fit(X, y, epochs=50, verbose=0) # demonstrate prediction x_input = array([70, 80, 90]) x_input = x_input.reshape((1, n_steps_in, n_features)) yhat = model.predict(x_input, verbose=0) print(yhat) |

Running the example forecasts and prints the next two time steps in the sequence.

1 |
[[100.98096 113.28924]] |

### Encoder-Decoder Model

A model specifically developed for forecasting variable length output sequences is called the Encoder-Decoder LSTM.

The model was designed for prediction problems where there are both input and output sequences, so-called sequence-to-sequence, or seq2seq problems, such as translating text from one language to another.

This model can be used for multi-step time series forecasting.

As its name suggests, the model is comprised of two sub-models: the encoder and the decoder.

The encoder is a model responsible for reading and interpreting the input sequence. The output of the encoder is a fixed length vector that represents the model’s interpretation of the sequence. The encoder is traditionally a Vanilla LSTM model, although other encoder models can be used such as Stacked, Bidirectional, and CNN models.

1 |
model.add(LSTM(100, activation='relu', input_shape=(n_steps_in, n_features))) |

The decoder uses the output of the encoder as an input.

First, the fixed-length output of the encoder is repeated, once for each required time step in the output sequence.

1 |
model.add(RepeatVector(n_steps_out)) |

This sequence is then provided to an LSTM decoder model. The model must output a value for each value in the output time step, which can be interpreted by a single output model.

1 |
model.add(LSTM(100, activation='relu', return_sequences=True)) |

We can use the same output layer or layers to make each one-step prediction in the output sequence. This can be achieved by wrapping the output part of the model in a TimeDistributed wrapper.

1 |
model.add(TimeDistributed(Dense(1))) |

The full definition for an Encoder-Decoder model for multi-step time series forecasting is listed below.

1 2 3 4 5 6 7 |
# define model model = Sequential() model.add(LSTM(100, activation='relu', input_shape=(n_steps_in, n_features))) model.add(RepeatVector(n_steps_out)) model.add(LSTM(100, activation='relu', return_sequences=True)) model.add(TimeDistributed(Dense(1))) model.compile(optimizer='adam', loss='mse') |

As with other LSTM models, the input data must be reshaped into the expected three-dimensional shape of [*samples, timesteps, features*].

1 |
X = X.reshape((X.shape[0], X.shape[1], n_features)) |

In the case of the Encoder-Decoder model, the output, or y part, of the training dataset must also have this shape. This is because the model will predict a given number of time steps with a given number of features for each input sample.

1 |
y = y.reshape((y.shape[0], y.shape[1], n_features)) |

The complete example of an Encoder-Decoder LSTM for multi-step time series forecasting is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# univariate multi-step encoder-decoder lstm example from numpy import array from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense from keras.layers import RepeatVector from keras.layers import TimeDistributed # split a univariate sequence into samples def split_sequence(sequence, n_steps_in, n_steps_out): X, y = list(), list() for i in range(len(sequence)): # find the end of this pattern end_ix = i + n_steps_in out_end_ix = end_ix + n_steps_out # check if we are beyond the sequence if out_end_ix > len(sequence): break # gather input and output parts of the pattern seq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix] X.append(seq_x) y.append(seq_y) return array(X), array(y) # define input sequence raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90] # choose a number of time steps n_steps_in, n_steps_out = 3, 2 # split into samples X, y = split_sequence(raw_seq, n_steps_in, n_steps_out) # reshape from [samples, timesteps] into [samples, timesteps, features] n_features = 1 X = X.reshape((X.shape[0], X.shape[1], n_features)) y = y.reshape((y.shape[0], y.shape[1], n_features)) # define model model = Sequential() model.add(LSTM(100, activation='relu', input_shape=(n_steps_in, n_features))) model.add(RepeatVector(n_steps_out)) model.add(LSTM(100, activation='relu', return_sequences=True)) model.add(TimeDistributed(Dense(1))) model.compile(optimizer='adam', loss='mse') # fit model model.fit(X, y, epochs=100, verbose=0) # demonstrate prediction x_input = array([70, 80, 90]) x_input = x_input.reshape((1, n_steps_in, n_features)) yhat = model.predict(x_input, verbose=0) print(yhat) |

Running the example forecasts and prints the next two time steps in the sequence.

1 2 |
[[[101.9736 [116.213615]]] |

## Multivariate Multi-Step LSTM Models

In the previous sections, we have looked at univariate, multivariate, and multi-step time series forecasting.

It is possible to mix and match the different types of LSTM models presented so far for the different problems. This too applies to time series forecasting problems that involve multivariate and multi-step forecasting, but it may be a little more challenging.

In this section, we will provide short examples of data preparation and modeling for multivariate multi-step time series forecasting as a template to ease this challenge, specifically:

- Multiple Input Multi-Step Output.
- Multiple Parallel Input and Multi-Step Output.

Perhaps the biggest stumbling block is in the preparation of data, so this is where we will focus our attention.

### Multiple Input Multi-Step Output

There are those multivariate time series forecasting problems where the output series is separate but dependent upon the input time series, and multiple time steps are required for the output series.

For example, consider our multivariate time series from a prior section:

1 2 3 4 5 6 7 8 9 |
[[ 10 15 25] [ 20 25 45] [ 30 35 65] [ 40 45 85] [ 50 55 105] [ 60 65 125] [ 70 75 145] [ 80 85 165] [ 90 95 185]] |

We may use three prior time steps of each of the two input time series to predict two time steps of the output time series.

Input:

1 2 3 |
10, 15 20, 25 30, 35 |

Output:

1 2 |
65 85 |

The *split_sequences()* function below implements this behavior.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# split a multivariate sequence into samples def split_sequences(sequences, n_steps_in, n_steps_out): X, y = list(), list() for i in range(len(sequences)): # find the end of this pattern end_ix = i + n_steps_in out_end_ix = end_ix + n_steps_out-1 # check if we are beyond the dataset if out_end_ix > len(sequences): break # gather input and output parts of the pattern seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1:out_end_ix, -1] X.append(seq_x) y.append(seq_y) return array(X), array(y) |

We can demonstrate this on our contrived dataset.

The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
# multivariate multi-step data preparation from numpy import array from numpy import hstack # split a multivariate sequence into samples def split_sequences(sequences, n_steps_in, n_steps_out): X, y = list(), list() for i in range(len(sequences)): # find the end of this pattern end_ix = i + n_steps_in out_end_ix = end_ix + n_steps_out-1 # check if we are beyond the dataset if out_end_ix > len(sequences): break # gather input and output parts of the pattern seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1:out_end_ix, -1] X.append(seq_x) y.append(seq_y) return array(X), array(y) # define input sequence in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90]) in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95]) out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))]) # convert to [rows, columns] structure in_seq1 = in_seq1.reshape((len(in_seq1), 1)) in_seq2 = in_seq2.reshape((len(in_seq2), 1)) out_seq = out_seq.reshape((len(out_seq), 1)) # horizontally stack columns dataset = hstack((in_seq1, in_seq2, out_seq)) # choose a number of time steps n_steps_in, n_steps_out = 3, 2 # covert into input/output X, y = split_sequences(dataset, n_steps_in, n_steps_out) print(X.shape, y.shape) # summarize the data for i in range(len(X)): print(X[i], y[i]) |

Running the example first prints the shape of the prepared training data.

We can see that the shape of the input portion of the samples is three-dimensional, comprised of six samples, with three time steps, and two variables for the 2 input time series.

The output portion of the samples is two-dimensional for the six samples and the two time steps for each sample to be predicted.

The prepared samples are then printed to confirm that the data was prepared as we specified.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
(6, 3, 2) (6, 2) [[10 15] [20 25] [30 35]] [65 85] [[20 25] [30 35] [40 45]] [ 85 105] [[30 35] [40 45] [50 55]] [105 125] [[40 45] [50 55] [60 65]] [125 145] [[50 55] [60 65] [70 75]] [145 165] [[60 65] [70 75] [80 85]] [165 185] |

We can now develop an LSTM model for multi-step predictions.

A vector output or an encoder-decoder model could be used. In this case, we will demonstrate a vector output with a Stacked LSTM.

The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
# multivariate multi-step stacked lstm example from numpy import array from numpy import hstack from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense # split a multivariate sequence into samples def split_sequences(sequences, n_steps_in, n_steps_out): X, y = list(), list() for i in range(len(sequences)): # find the end of this pattern end_ix = i + n_steps_in out_end_ix = end_ix + n_steps_out-1 # check if we are beyond the dataset if out_end_ix > len(sequences): break # gather input and output parts of the pattern seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1:out_end_ix, -1] X.append(seq_x) y.append(seq_y) return array(X), array(y) # define input sequence in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90]) in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95]) out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))]) # convert to [rows, columns] structure in_seq1 = in_seq1.reshape((len(in_seq1), 1)) in_seq2 = in_seq2.reshape((len(in_seq2), 1)) out_seq = out_seq.reshape((len(out_seq), 1)) # horizontally stack columns dataset = hstack((in_seq1, in_seq2, out_seq)) # choose a number of time steps n_steps_in, n_steps_out = 3, 2 # covert into input/output X, y = split_sequences(dataset, n_steps_in, n_steps_out) # the dataset knows the number of features, e.g. 2 n_features = X.shape[2] # define model model = Sequential() model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps_in, n_features))) model.add(LSTM(100, activation='relu')) model.add(Dense(n_steps_out)) model.compile(optimizer='adam', loss='mse') # fit model model.fit(X, y, epochs=200, verbose=0) # demonstrate prediction x_input = array([[70, 75], [80, 85], [90, 95]]) x_input = x_input.reshape((1, n_steps_in, n_features)) yhat = model.predict(x_input, verbose=0) print(yhat) |

Running the example fits the model and predicts the next two time steps of the output sequence beyond the dataset.

We would expect the next two steps to be: [185, 205]

It is a challenging framing of the problem with very little data, and the arbitrarily configured version of the model gets close.

1 |
[[188.70619 210.16513]] |

### Multiple Parallel Input and Multi-Step Output

A problem with parallel time series may require the prediction of multiple time steps of each time series.

For example, consider our multivariate time series from a prior section:

1 2 3 4 5 6 7 8 9 |
[[ 10 15 25] [ 20 25 45] [ 30 35 65] [ 40 45 85] [ 50 55 105] [ 60 65 125] [ 70 75 145] [ 80 85 165] [ 90 95 185]] |

We may use the last three time steps from each of the three time series as input to the model and predict the next time steps of each of the three time series as output.

The first sample in the training dataset would be the following.

Input:

1 2 3 |
10, 15, 25 20, 25, 45 30, 35, 65 |

Output:

1 2 |
40, 45, 85 50, 55, 105 |

The *split_sequences()* function below implements this behavior.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# split a multivariate sequence into samples def split_sequences(sequences, n_steps_in, n_steps_out): X, y = list(), list() for i in range(len(sequences)): # find the end of this pattern end_ix = i + n_steps_in out_end_ix = end_ix + n_steps_out # check if we are beyond the dataset if out_end_ix > len(sequences): break # gather input and output parts of the pattern seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix:out_end_ix, :] X.append(seq_x) y.append(seq_y) return array(X), array(y) |

We can demonstrate this function on the small contrived dataset.

The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
# multivariate multi-step data preparation from numpy import array from numpy import hstack from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense from keras.layers import RepeatVector from keras.layers import TimeDistributed # split a multivariate sequence into samples def split_sequences(sequences, n_steps_in, n_steps_out): X, y = list(), list() for i in range(len(sequences)): # find the end of this pattern end_ix = i + n_steps_in out_end_ix = end_ix + n_steps_out # check if we are beyond the dataset if out_end_ix > len(sequences): break # gather input and output parts of the pattern seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix:out_end_ix, :] X.append(seq_x) y.append(seq_y) return array(X), array(y) # define input sequence in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90]) in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95]) out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))]) # convert to [rows, columns] structure in_seq1 = in_seq1.reshape((len(in_seq1), 1)) in_seq2 = in_seq2.reshape((len(in_seq2), 1)) out_seq = out_seq.reshape((len(out_seq), 1)) # horizontally stack columns dataset = hstack((in_seq1, in_seq2, out_seq)) # choose a number of time steps n_steps_in, n_steps_out = 3, 2 # covert into input/output X, y = split_sequences(dataset, n_steps_in, n_steps_out) print(X.shape, y.shape) # summarize the data for i in range(len(X)): print(X[i], y[i]) |

Running the example first prints the shape of the prepared training dataset.

We can see that both the input (X) and output (Y) elements of the dataset are three dimensional for the number of samples, time steps, and variables or parallel time series respectively.

The input and output elements of each series are then printed side by side so that we can confirm that the data was prepared as we expected.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
(5, 3, 3) (5, 2, 3) [[10 15 25] [20 25 45] [30 35 65]] [[ 40 45 85] [ 50 55 105]] [[20 25 45] [30 35 65] [40 45 85]] [[ 50 55 105] [ 60 65 125]] [[ 30 35 65] [ 40 45 85] [ 50 55 105]] [[ 60 65 125] [ 70 75 145]] [[ 40 45 85] [ 50 55 105] [ 60 65 125]] [[ 70 75 145] [ 80 85 165]] [[ 50 55 105] [ 60 65 125] [ 70 75 145]] [[ 80 85 165] [ 90 95 185]] |

We can use either the Vector Output or Encoder-Decoder LSTM to model this problem. In this case, we will use the Encoder-Decoder model.

The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
# multivariate multi-step encoder-decoder lstm example from numpy import array from numpy import hstack from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense from keras.layers import RepeatVector from keras.layers import TimeDistributed # split a multivariate sequence into samples def split_sequences(sequences, n_steps_in, n_steps_out): X, y = list(), list() for i in range(len(sequences)): # find the end of this pattern end_ix = i + n_steps_in out_end_ix = end_ix + n_steps_out # check if we are beyond the dataset if out_end_ix > len(sequences): break # gather input and output parts of the pattern seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix:out_end_ix, :] X.append(seq_x) y.append(seq_y) return array(X), array(y) # define input sequence in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90]) in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95]) out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))]) # convert to [rows, columns] structure in_seq1 = in_seq1.reshape((len(in_seq1), 1)) in_seq2 = in_seq2.reshape((len(in_seq2), 1)) out_seq = out_seq.reshape((len(out_seq), 1)) # horizontally stack columns dataset = hstack((in_seq1, in_seq2, out_seq)) # choose a number of time steps n_steps_in, n_steps_out = 3, 2 # covert into input/output X, y = split_sequences(dataset, n_steps_in, n_steps_out) # the dataset knows the number of features, e.g. 2 n_features = X.shape[2] # define model model = Sequential() model.add(LSTM(200, activation='relu', input_shape=(n_steps_in, n_features))) model.add(RepeatVector(n_steps_out)) model.add(LSTM(200, activation='relu', return_sequences=True)) model.add(TimeDistributed(Dense(n_features))) model.compile(optimizer='adam', loss='mse') # fit model model.fit(X, y, epochs=300, verbose=0) # demonstrate prediction x_input = array([[60, 65, 125], [70, 75, 145], [80, 85, 165]]) x_input = x_input.reshape((1, n_steps_in, n_features)) yhat = model.predict(x_input, verbose=0) print(yhat) |

Running the example fits the model and predicts the values for each of the three time steps for the next two time steps beyond the end of the dataset.

We would expect the values for these series and time steps to be as follows:

1 2 |
90, 95, 185 100, 105, 205 |

We can see that the model forecast gets reasonably close to the expected values.

1 2 |
[[[ 91.86044 97.77231 189.66768 ] [103.299355 109.18123 212.6863 ]]] |

## Summary

In this tutorial, you discovered how to develop a suite of LSTM models for a range of standard time series forecasting problems.

Specifically, you learned:

- How to develop LSTM models for univariate time series forecasting.
- How to develop LSTM models for multivariate time series forecasting.
- How to develop LSTM models for multi-step time series forecasting.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

This tutorial is so helpful to me. Thank you very much!

It will be more helpful in the real projects if the dataset is split into batches. Hope you will mention this in the future.

Keras will split the dataset into batches.

I think this blog ( https://machinelearningmastery.com/use-different-batch-sizes-training-predicting-python-keras/) may answer my question. I will do more research. Thanks a lot.

Great!

Hi!

i would like to cite your book “Deep Learning for Time Series Forecasting: Predict the Future

with MLPs, CNNs and LSTMs in Python.” Is there an appropriate format for doing this?

Yes, see here:

https://machinelearningmastery.com/faq/single-faq/how-do-i-reference-or-cite-a-book-or-blog-post

Hi Jason,

I want please an example of Sliding window-based support vector regression for prediction.

have you this example .

Thanks a lot

Thanks for the suggestion.

Hello Jason,

Thank you so so much for your post, it was super helpful. For the multiple timesteps output LSTM model, I am wondering what will be the difference of the performance between model-1 and model-2? Model-1 is your multiple timesteps output LSTM model, for example, we input last 7 days data features, and the output is the next 5 days prices. Model-2 is the simple 1-timstep output LSTM model, where the input is last 7 days data features, output is the next day price. Then we use our predicted price as the new input to predict future prices until we predict all next 5 days prices.

I am wondering what are the key differences between those 2 strategies to predict the next 5 days prices? What are the advantages and disadvantages of those 2 LSTM models?

Thank you,

Good question, the differences really depend on the choice of model and complexity of the dataset.

This post compares the different approaches:

https://machinelearningmastery.com/multi-step-time-series-forecasting/

Thanks Jason for this good tutorial. I have a question. When we have two different time series, 1 and 2. Time series 1 will influence time series 2 and our goal is to predict the future value of time series 2. How can we use LSTM for this case?

I call this a dependent time series problem. I given an example of how to model it on this post:

https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/

The link is the link of the current page, Do you mean that?

Yes, I give an example above.

Thanks Jason for this good tutorial, I have read your tutorial for a long time , I have a question. How to use LSTM model forecasting Multi-Site Multivariate Time Series, such as EMC Data Science Global Hackathon dataset, thank you very much!

I have advice for multi-site forecasting here:

https://machinelearningmastery.com/faq/single-faq/how-to-develop-forecast-models-for-multiple-sites

Thank you for sharing. I found that the results of time series prediction using LSTM are similar to the results of one step behind the original sequence. What do you think?

Sounds like the model has learned a persistance model and may not be skillful.

I have some question?

If I have model from LSTM,I want to know percent of accurate of new prediction.

How to know percent accurate for new forcast?

Thank you

If your model is predicting a class label, you can specify the accuracy measure as a metric in the call to compile() then use the evaluate() model to calculate the accuracy.

You can learn how for an MLP in this post which will be the same for an LSTM:

http://machinelearningmastery.com/tutorial-first-neural-network-python-keras/

Thanks a lot! I have read your websites for a long time!

I have a question, in “Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras” you said that:

“LSTMs are sensitive to the scale of the input data, specifically when the sigmoid (default) or tanh activation functions are used. It can be a good practice to rescale the data to the range of 0-to-1, also called normalizing. ”

So why don’t you normalize input here?

Because you used relu? Because the data is increasing (so we can’t normalize the future input)? Or because you just give us an example?

Do you suggest normalizing here?

It would be a good idea to prepare the data with normalization or similar here.

I chose not to because it seems to confuse more readers than it helps. Also, choice of relu does make the model a lot more robust to unscaled data.

Thanks for a great article. Minor typo or confusion:

For the Multiple input case in Multivariate series, if we use three time steps and

10,15

20,25

30,35

as our inputs, shouldn’t the output (predicted val used for training) be

85

instead of 65?

In the chosen framing of the problem, we want to predict the output at t not t+1, given inputs up to and including t.

You can choose to frame the problem differently if you like. It is arbitrary.

You can also reference ‘Multiple Parallel …’

So you can find the differences in function ‘split_sequences’

if you want to predict 85, you can change the code to:

if end_ix > len(sequences)-1:

break

seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix, -1]

Notice ‘len(sequences)-1’, and ‘sequences[end_ix, -1]’

Thanks sooooo much Jason.

It helped me a lot.

I’m happy to hear that.

Hi Jason,

Thanks for this nice blog! I am new to LSTM in time-series, and I need your help.

Most info on internet is for a single time series and for next-step forecasting. I want to produce 6 months ahead forecast using previous 15 months for 100 different time series, each of length 54 months.

So, there is 34 windows for each time-series if we use sliding windows. So, my initial X_train has a shape of (3400,15). Then. I am reshaping my X_train [samples, timesteps, features] as follows: (3400, 15, 1). Is this reshaping correct? In genera, how can we choose “timesteps” and “features” arguments in this multi-input multi-step forecast?

Also, how can I choose “batch_size” and “units”? Since I want 6 months ahead forecast, my output should be a matrix with dimensions (100,6). I chose units=6, and batch_size=1. Are these numbers correct?

Thanks for your help!

Looks good.

Time steps is really problem specific – e.g. how much history do you need to make a prediction. Perhaps test with your data.

Batch size and units – again, depends on your problem. Test. 6 units is too few. Start with 100, try 500, 1000, etc. Batch size of 1 seems small, perhaps also try 32, 64, etc.

Let me know how you go.

Hi Jason,

Thanks for your response.

I don’t understand “6 units is too few”. In documentation of lstm functions in R, units is defined as “dimensionality of the output space”. Since I need an output with 6 columns (6 months forecast), I define units=6. Any other number does not produce the output I want. Is there anything wrong in my interpretation?

I recommend using a Dense layer as the output rather than the outputting from the LSTM directly.

Then dramatically increase the capacity of the model by increasing the number of LSTM units.

Bidirectional LSTM works better than LSTM. Can you please explain the working of bidirectional LSTM. Since we do not know future values. How do we do prediction?

It has two LSTM layers, one that processes the sequences forwards, and one that processes it backwards.

You can learn more here:

https://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classification-python-keras/

In the last encoder-decoder model, if I have different features of input and output, is it correct that I change the code like this?

model = Sequential()

model.add(LSTM(200, activation=’relu’, input_shape=(n_steps_in, n_features_in)))

model.add(RepeatVector(n_steps_out))

model.add(LSTM(200, activation=’relu’, return_sequences=True))

model.add(TimeDistributed(Dense(n_features_out)))

model.compile(optimizer=’adam’, loss=’mse’)

I’m sure I understand, what do you mean exactly?

I am sorry for not expressing my question clearly.

In the last part of your tutorial, you gave an example like this:

[[10 15 25]

[20 25 45]

[30 35 65]]

[[ 40 45 85]

[ 50 55 105]]

Then, you introduced the Encoder-Decoder LSTM to model this problem.

If I want to use the last three time steps from each of the three time series as input to the model and predict the next two time steps of the third time series as output. Namely, my input and output elements are like the following. The shapes of input and output are (5, 3, 3) and (5, 2, 1) respectively.

[[10 15 25]

[20 25 45]

[30 35 65]]

[[85]

[105]]

When I define the Encoder-Decoder LSTM model, the code will be like this:

model = Sequential()

model.add(LSTM(200, activation=’relu’, input_shape=(3,3)))

model.add(RepeatVector(2))

model.add(LSTM(200, activation=’relu’, return_sequences=True))

model.add(TimeDistributed(Dense(1)))

model.compile(optimizer=’adam’, loss=’mse’)

Is it correct?

Thank you very much!

It looks correct, but I don’t have the capacity to test the code to be sure.

Thank you!

I test the code, and I want to show you what I got.

I assume the input sequence:

in_seq1 = np.arange(10,1000,10)

in_seq2 = np.arange(15,1005,10)

Define the prediction input:

x_input = np.array([[960, 965, 1925], [970, 975, 1945], [980, 985, 1965]])

I expect the output values would be as follows:

[ [1985] [2005] ]

And the model forecasts: [ [1997.1425] [2026.6136] ]

I think this means that the model can work.

Nice work! Now you can start tuning the model to lift skill.

how we can test these examples if have big excel data set?and its time series data, kindly refer to a link?

Save as a CSV file then use code in this post to prepare it for modeling:

https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/

Can Multivariate time series apply to cnn-lstm model?

Yes, I have a good beginner example here:

https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/

I want to predict visibility on one airport for the next 120 hours.

I already build a LSTM to predict the visibility for the next hour, solely based on visibility observation. (Basically, the network learned that persistance is a good algorithm.)

My next step is to include a weather model forecast of say humidity as input.

I have then as input:

visibility observation on the airport (past and present)

prediction of humidity for the next 120 hours.

I have trouble to combine these two information.

Do you have suggestions?

What trouble are you having exactly?

let’s say:

Input : last 120 h of measured visibility

weather forcast for the next 120 h

Output: visibility prediction for the next 120 h

Implementation:

make visibility prediction every hour for the next 120 h

I have trouble to see how the LSTM will update its state every hour, since it will only get as new information a measured visibility for the last hour, and not about the full 120 h prediction.

I must say that I’m a newbie in ML.

The model is only aware of the data that you provide it.

Thanks a lot for your post. Your work is a great resource on forecasts with lstm!

Assume, I have dependent time series (heating costs and temperature) and I want to predict the dependent (heating costs), how could I implement temperature predictions (from other weather forecasts) into my model for heating cost predictions?

Do you know of any common approaches to this? Or any papers on how to handle external forecasts for independent variables?

I recommend this process generally:

https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/

Hi Jason,

I think I saw you mentioning the activation function ‘relu’ usually works better than ‘tanh’ in LSTM model. But, I forget I saw this in which post. I don’t find any post from your blog that focuses on how to choose the activation function. So, I submit this question under this post and hope you don’t mind.

Is it true that ‘relu’ often works better than ‘tanh’ in your experience? If you have any post talking about activation function, please give me the title or URL.

Thank you very much!

It really depends on the dataset, I have found LSTMs with relu more robust on some problems.

Thank you! So, the way I can make sure which activation function is the best for my dataset is to enumerate and see the results?

Yes. It will almost certainly be relu or tanh.

This is awesome for someone starting out with LSTM.

All the content on your site is amazing, I really appreciate it. Thank you.

Thanks!

Hi Jason,

Still lovin’ your work!

1 question: can you please explain the purpose of the out_seq series in the Multiple Parallel Series example?

Many thanks,

Andrew

It is the output sequence, dependent upon the input sequences.

another great article, Jason! I’m trying to get started on a project that is similar to the LSTM model described in this article: https://medium.com/bcggamma/using-deep-learning-to-predict-not-just-what-but-when-fae6515acb1b

I’d greatly appreciate your input on how to develop an LSTM model that can predict ‘what’ a consumer may buy and ‘when’ they will buy it;

Based on your article, it looks like the right model to choose would be Multiple Parallel Input and Multi-Step Output. Would you agree or do you think i should choose a different model? Any pointers or links to relevant articles would help!

Thanks,

I’d encourage you to prototype and explore a suite of different framings of the problem in order to discover what works best for your specific dataset.

I have used your code to get started, at the last step I am getting a below error-

NameError: name ‘to_list’ is not defined

Could you please help, I am not sure what am i missing here.

Thanks for your help

Ensure you have copied all of the code from the tutorial, I have more suggestions here:

https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me

Hi Jason,

Thanks for taking time out, I have copied your code line by line and checked couple of times as well. Example is from Vanila LSTM.

Checks done-

I was getting some error, then I followed stack overflow and downgraded my keras to Version: 2.1.5

I searched stack overflow and related questions and even posted my questions there.

Your help is appreciated.

I recommend using the latest version of Keras and TensorFlow.

Please, have you an example of LSTM encoder-decoder with the train / test-evaluation partitions.

I tried but it does not work like this:

# split into samples

trainX, trainy = split_sequence(train, n_steps_in, n_steps_out)

testX, testy = split_sequence(test, n_steps_in, n_steps_out)

# reshape

trainX = trainX.reshape((trainX.shape[0], trainX.shape[1], n_features))

testX = testX.reshape((testX.shape[0], testX.shape[1], n_features))

….

# fit model

model.fit(trainX, trainy, epochs=5, verbose=2)

# make predictions

trainPredict = model.predict(trainX)

testPredict = model.predict(testX)

# calculate root mean squared error

trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))

print(‘Train Score: %.2f RMSE’ % (trainScore))

testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))

print(‘Test Score: %.2f RMSE’ % (testScore))

thank you very much

I may, you can use the search box to look at all tutorials that use the encoder-decoder pattern.

Hi Jason,

Thanks for this tutorial. I am quite new to the time series forecasting with LSTM. I have a question about the part “Multiple Parallel Input and Multi-Step Output”. The output data shape is (5,2,3). I mean the each instance on the output is not just a sequence, It is a sequence of sequence. And you have show the example there with Encoder and Decoder. I just want to implement one of the methods of Stacked or Bidirectional LSTM. But I am not sure which number I should put the Dense layer. For example, in the previous examples, the output shape is like (6,2) and It is obvious we should put 2 for the Dense layer. But I can not figure out the right thing for the Stacked LSTM. Do you have any example tutorial for this?

Kind Regards,

Gunay

With multi-step output, the number of nodes in the output layer must match the number of output time steps.

With multivariate multi-step, a vanilla or bidirectional LSTM is not suited. You could force it, but you will need n x m nodes in the output for n time steps for m time series. The time steps of each series would be flattened in this structure. You must interpret each of the outputs as a specific time step for a specific series consistently during training and prediction.

I don’t have an example, it is not an ideal approach.

Thank you!

Is there any alternative structure for this kind of problems except Encoder-Decoder?

Yes, the one I described. There may be others, it is good to brainstorm and prototype approaches.

Thanks for your great tutorial. I just wonder should we avoid using bidirectional LSTM for time series data? Does it mean we use future data to train the past model parameters?

No, it means the model will process the input sequence forwards and backwards at the same time.

Hi Jason,

I faced one problem and just interesting maybe you did it before. I have the forecasting problem as like Multiple Input Multi-Step Output but a little bit different. Let’s just assume, my input(which are features dataset) and output (target we want to forecast) datasets have historic data. And I should forecast one week ahead for the target. But I have also the one week ahead forecasted input dataset(which is forecasted by another system). I should use both the historic input and one week ahead forecasted input to forecast one week ahead output. But I do not know how I should use that one week ahead forecasted input data during the learning process. Can you give me any hint?

Perhaps use a model with two heads, one for the historical data and one for the other forecast?

The functional API will help you to design a model of this type:

https://machinelearningmastery.com/keras-functional-api-deep-learning/

What if we want to predict anything for the next 20 upcoming days! Here sequentially we have to predict for 20 days. How can we apply LSTM here?

Yes, although the further you predict into the future, the more error the model will make.

This is called multi-step forecasting, there are many examples, perhaps start here:

https://machinelearningmastery.com/start-here/#deep_learning_time_series

HI Jason, thanks for all the tutorials. They are really helpful. I am looking to try and implement an LSTM that returns a sequence, and had read this tutorial – https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/

One thing I am having trouble understanding is how to really shape the input data and get a sequence output using Tensorflow / Keras. I am looking to predict the sequence T – T+12 hours using T-1 – T-48 hours. So predicting the next 12 hours from the last 48 hours in 1 hour increments. Each hour of data has a dozen or so features for that time step. From what I have read of yours so far it seems as if each of the 48 previous time steps should be considered features of the time step T to predict a sequence for the next 12 hours. And so basically, from what I gather, I would end up with the input for Timestep T having 576 columns (48 time steps, each with 12 features) – I mean does that seem right? I am also a bit unsure of what particular model I should use… is it going to be a multi-step, multi-input network… just a bit confused on the jargon as well and maybe thats why I’m having trouble figuring out what I need to do.

Looking at some of your books too, but not sure what might be the right one to help guide me through a problem like this.

Thanks,

Aaron

Perhaps this will help:

https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input

Thanks! That definitely makes sense now from the input shape standpoint. If I have 20 samples with 48 timesteps and 12 features the input shape would be [20, 48, 12]

For the output however, looking through the Keras docs https://keras.io/layers/recurrent/, I am trying to get a return sequence. Would I be using a 3D tensor? (batch_size, timesteps, units) where it would look like (20, 12, 1)? Since I am trying to find 1 value at each of the 12 time steps for the sample size of 20

Thanks again!

Aaron

I don’t recommend returning a sequence from the LSTM itself, instead use an encoder-decoder model:

https://machinelearningmastery.com/start-here/#lstm

Why don’t you recommend returning a sequence from the LSTM? If I was using the below encoder-decoder model from another one of your posts, what would the output of the first LSTM be?

model = Sequential()

model.add(LSTM(…, input_shape=(…)))

model.add(RepeatVector(…))

model.add(LSTM(…, return_sequences=True))

model.add(TimeDistributed(Dense(…)))

Generally the output sequence from an LSTM is the activation of the nodes from each step in the input sequence. It is unlikely to capture anything meaningful.

It is better to interpret these activations or the final activations with more LSTM or Dense layers, and the output a sequence of the same or different lengths using a separate model.

Hi there,

I love this tutorial, all of your tutorials actually but this one I have found the most helpful. Questions about the MIMO LSTM output shape has come up a few times, and I am also having trouble with it.

I am trying to use a Dense layer as my final layer as you suggest, passing it n_steps_out as an argument. I am predicting 3 variables and n_steps_out is 10.

Keras complains that it is expecting the dense layer to have 2 dimensions, but I am passing it an array with shape (n_samples,n_steps_out,n_features)

Can you help me make sense of this?

Thank you

I would recommend a model with a time distributed wrapper or decoder for multivariate multi-step output, so you can output one vector for each time step.

Hi Jason,

I have a question: are LSTM suitable for predicting based on a test set with the same nature of inputs as of train set ? Like in other cases of prediction where you will be having input signals in train set, that the model will work on. plus the memory based on the fact that entries are ordered.

I trained an LSTM on a CNN model acting on ordered images, to predict a timeserie. on test set I have the following ordered set of images by time. I guess there is no concept of horizon here, how should I improve my model, and what starting point in predicting test set in this case?

Many thanks.

I would recommend modeling the raw time series directly, instead of images of the time series.

Hello Jason,

Many thanks for the helpful article..

I have tried to copy the code “Multiple Parallel Input and Multi-Step Output” and run it exactly the same without any changing but I got a different results than the one you got.

[ [

[147.56306 167.8626 312.92883]

[185.38152 205.36024 385.96536] ] ]

Is there any reason for that?

Best regards,

Tayson

Yes, this is due to the stochastic nature of the learning algorithm, more here:

https://machinelearningmastery.com/faq/single-faq/why-do-i-get-different-results-each-time-i-run-the-code

Hi Jason,

How would you handle building the LSTM model for time series data with irregular time intervals (e.g. Jan 1, Jan 2, Jan 4, Jan 7, Jan 13, Jan 14, etc…)?

It appears this model presupposes a regular time-interval spacing.

You could fill the “missing” days with zeros or impute them with, say, the mean of the last 3 values, but I would like to know how to make the LSTM model without filling/imputing the time series data. How would you handle this?

Thanks, and great lesson.

Yes, I would try many approaches and compare results, such as:

– model as is

– normalize interval with padding

– upsample/downsample to new intervals

– etc.

Follow-up to this question

Holding number of features constant

Are the various combination of models above able to cope when the number of time-steps per each Sample is variable?

Or do the underlying model assumptions break in some way?

Yes, you can either pad all samples to the same length or use a dynamic RNN. Assumptions of the model hold for both cases.

If we are forecasting in monthly buckets and using 5 years of data, how do we know how many months of data to have on each row?

Perhaps perform a sensitivity analysis of the model to see how history impacts model performance.

There will be a sweet spot for a given dataset.

Thanks Jason! If the history has distinct patterns for each quarter, should we have 3 months in each row? How would the results differ when we keep 12 months on each row versus 3 months on each row versus 1 month on each row?

Depends on the dataset, I recommend testing to discover the specific answers with your data and model.

Hi Jason,

I am trying to predict high and low value of a time series in next X days, my output layer in RNN is :

model.add(Dense(2, activation=’linear’))

so basically output vector is [y_high, y_low], the model works pretty well however it sometimes outputs y_low > y_high, which of course doesn’t make any sense, is there a way to enforce model so that condition y_high >= y_low is always met.

Interesting, perhaps you simplify the problem and predict a value in a discrete ordinal interval, e.g. each category is a blocks of values?

I was trying to modify loss function but I am unable to access y_pred individual members, I don’t even know whether it’s ultimately possible.

I don’t follow, why not? What is the error?

Hi Jason, a colleague and I are thinking of trying an LSTM model for time series forecasting. We are faced with over a thousand potential predictors, and would like to select only a smaller number for the final model. In particular, I have recently become fascinated by SHAP values; e.g., see this informal blog post by Scott Lundberg himself, in the context of XGBoost.

https://towardsdatascience.com/interpretable-machine-learning-with-xgboost-9ec80d148d27

Tantalizingly, Scott L. demonstrates SHAP values in the context of an LSTM model here:

https://slundberg.github.io/shap/notebooks/deep_explainer/Keras%20LSTM%20for%20IMDB%20Sentiment%20Classification.html

But that is using text input (sentiment classification in the IMDB data set), which involves an Embedding layer just before the LSTM layer. For a non-text problem like time series forecasting, we would exclude the Embedding layer. But doing so breaks the code.

Do you have any suggestions how SHAP values might be used in the context of LSTMs for time series forecasting (not text processing)? If not, do you have any suggestions for feature selection in that context?

Thanks!

I don’t know what SHAP is, sorry.

Hi Jason,

Thanks for this useful tutorial.

I am confused to inverse scaling of my data after splitting it into the form:

x(data_length, n_step, feature)

Because the scaler only can be used in 2D condition.

What I want to do is evaluate rmse between prediction and true values, so I have to

inverse transform data. Could you please tell me how to deal with this problem?

Yes, I show how here:

https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/

Hi Jason,

Firstly, I must say you have a fabulous chunk of articles on ML/DL. Thanks for helping out the community at large.

Coming to LSTMs, I am stuck in one problem from last few days. Here is how it goes –

I have 3 columns namely customer id and basket_index and timestamp. For every customer, each row represents one time stamp. Lets say there are 3 customers with variable time stamps. First one is having 30 time stamps, 2nd is having 25 and 3rd is having 50. So, the total number of rows are 105. Now for the column basket index, each row signifies a list of product keys bought by any customer on a particular timestamp. Here is the snapshot of the dataset –

CustomerID basket_index timestamp predicted_basket

111 [1,2,3] 1 [4,5]

111 [4,5] 2 [9,7]

111 [9,7] 3 [3,5,6,1]

.

.

222 [6,2,3] 1 [1,0,2,5]

222 [1,0,2,5] 2 [7,5]

.

.

333

.

. and so on..

Now, since every customer has a different time series,

1) How to pass everything into one network?

2) Do I have to build multiple LSTM models (one for each customer) in this case?

3) Also, I am creating an embedding layer for both customer and product keys (taking mean for every basket). How to specify how many steps back does every time series look in such cases?

4) How should I specify batch size in this case?

Your help will be really appreciated. Thanks!

Great question, I have some suggestions here that might help:

https://machinelearningmastery.com/faq/single-faq/how-to-develop-forecast-models-for-multiple-sites

Generally, I would encourage you to try to learn across customers.

Thanks Jason for nice post.

One question hopes to get your guide: For a LSTM work, we can’t stop on say the model is good but most important is how to use the good model outcome.

For example flu or not for patients. Now I want to predict the flu for future half year (Jun-2019 to Dec-2019) but what I have is history data (I have past 4 years those people’s flu data and target on that model is half year from 6-1-2018 to 12-31-2018).

How can I apply history LSTM outcome to predict future?

Can I get a list of important features from the history model with some value(like a weight) and apply this to my future data?

Or can i get the list of important feature from a good fit LSTM model and those features are important than other features?

Appreciate your guide!

This is a common question that I answer here:

https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/

Hi Jason,

Amazing work! Thanks sharing us your knowledge, this tutorial was so helpfull.

I’m new in ML/DL, i’m trying to predict sales in a company for future six months using LSTM. But i have an issue, i’m not sure about how to get more than 1 next step from your code using just one x vector by input. I’m using a monthly time step

Could you help me to understand a little bit better how to get it?