Multivariate Time Series Forecasting with LSTMs in Keras

By Jason Brownlee on October 21, 2020 in Deep Learning for Time Series 2,751

Neural networks like Long Short-Term Memory (LSTM) recurrent neural networks are able to almost seamlessly model problems with multiple input variables.

This is a great benefit in time series forecasting, where classical linear methods can be difficult to adapt to multivariate or multiple input forecasting problems.

In this tutorial, you will discover how you can develop an LSTM model for multivariate time series forecasting with the Keras deep learning library.

After completing this tutorial, you will know:

How to transform a raw dataset into something we can use for time series forecasting.
How to prepare data and fit an LSTM for a multivariate time series forecasting problem.
How to make a forecast and rescale the result back into the original units.

Kick-start your project with my new book Deep Learning for Time Series Forecasting, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Update Aug/2017: Fixed a bug where yhat was compared to obs at the previous time step when calculating the final RMSE. Thanks, Songbin Xu and David Righart.
Update Oct/2017: Added a new example showing how to train on multiple prior time steps due to popular demand.
Update Sep/2018: Updated link to dataset.
Update Jun/2020: Fixed missing imports for LSTM data prep example.

Tutorial Overview

This tutorial is divided into 4 parts; they are:

Air Pollution Forecasting
Basic Data Preparation
Multivariate LSTM Forecast Model
1. LSTM Data Preparation
2. Define and Fit Model
3. Evaluate Model
4. Complete Example
Train On Multiple Lag Timesteps Example

Python Environment

This tutorial assumes you have a Python SciPy environment installed. I recommend that youuse Python 3 with this tutorial.

You must have Keras (2.0 or higher) installed with either the TensorFlow or Theano backend, Ideally Keras 2.3 and TensorFlow 2.2, or higher.

The tutorial also assumes you have scikit-learn, Pandas, NumPy and Matplotlib installed.

If you need help with your environment, see this post:

How to Setup a Python Environment for Machine Learning

Need help with Deep Learning for Time Series?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

1. Air Pollution Forecasting

In this tutorial, we are going to use the Air Quality dataset.

This is a dataset that reports on the weather and the level of pollution each hour for five years at the US embassy in Beijing, China.

The data includes the date-time, the pollution called PM2.5 concentration, and the weather information including dew point, temperature, pressure, wind direction, wind speed and the cumulative number of hours of snow and rain. The complete feature list in the raw data is as follows:

No: row number
year: year of data in this row
month: month of data in this row
day: day of data in this row
hour: hour of data in this row
pm2.5: PM2.5 concentration
DEWP: Dew Point
TEMP: Temperature
PRES: Pressure
cbwd: Combined wind direction
Iws: Cumulated wind speed
Is: Cumulated hours of snow
Ir: Cumulated hours of rain

We can use this data and frame a forecasting problem where, given the weather conditions and pollution for prior hours, we forecast the pollution at the next hour.

This dataset can be used to frame other forecasting problems.
Do you have good ideas? Let me know in the comments below.

You can download the dataset from the UCI Machine Learning Repository.

Update, I have mirrored the dataset here because UCI has become unreliable:

Beijing PM2.5 Data Set

Download the dataset and place it in your current working directory with the filename “raw.csv“.

2. Basic Data Preparation

The data is not ready to use. We must prepare it first.

Below are the first few rows of the raw dataset.

No,year,month,day,hour,pm2.5,DEWP,TEMP,PRES,cbwd,Iws,Is,Ir
1,2010,1,1,0,NA,-21,-11,1021,NW,1.79,0,0
2,2010,1,1,1,NA,-21,-12,1020,NW,4.92,0,0
3,2010,1,1,2,NA,-21,-11,1019,NW,6.71,0,0
4,2010,1,1,3,NA,-21,-14,1019,NW,9.84,0,0
5,2010,1,1,4,NA,-20,-12,1018,NW,12.97,0,0

No,year,month,day,hour,pm2.5,DEWP,TEMP,PRES,cbwd,Iws,Is,Ir

1,2010,1,1,0,NA,-21,-11,1021,NW,1.79,0,0

2,2010,1,1,1,NA,-21,-12,1020,NW,4.92,0,0

3,2010,1,1,2,NA,-21,-11,1019,NW,6.71,0,0

4,2010,1,1,3,NA,-21,-14,1019,NW,9.84,0,0

5,2010,1,1,4,NA,-20,-12,1018,NW,12.97,0,0

The first step is to consolidate the date-time information into a single date-time so that we can use it as an index in Pandas.

A quick check reveals NA values for pm2.5 for the first 24 hours. We will, therefore, need to remove the first row of data. There are also a few scattered “NA” values later in the dataset; we can mark them with 0 values for now.

The script below loads the raw dataset and parses the date-time information as the Pandas DataFrame index. The “No” column is dropped and then clearer names are specified for each column. Finally, the NA values are replaced with “0” values and the first 24 hours are removed.

The “No” column is dropped and then clearer names are specified for each column. Finally, the NA values are replaced with “0” values and the first 24 hours are removed.

from pandas import read_csv
from datetime import datetime
# load data
def parse(x):
	return datetime.strptime(x, '%Y %m %d %H')
dataset = read_csv('raw.csv',  parse_dates = [['year', 'month', 'day', 'hour']], index_col=0, date_parser=parse)
dataset.drop('No', axis=1, inplace=True)
# manually specify column names
dataset.columns = ['pollution', 'dew', 'temp', 'press', 'wnd_dir', 'wnd_spd', 'snow', 'rain']
dataset.index.name = 'date'
# mark all NA values with 0
dataset['pollution'].fillna(0, inplace=True)
# drop the first 24 hours
dataset = dataset[24:]
# summarize first 5 rows
print(dataset.head(5))
# save to file
dataset.to_csv('pollution.csv')

from pandas import read_csv

from datetime import datetime

# load data

def parse(x):

return datetime.strptime(x, '%Y %m %d %H')

dataset = read_csv('raw.csv', parse_dates = [['year', 'month', 'day', 'hour']], index_col=0, date_parser=parse)

dataset.drop('No', axis=1, inplace=True)

# manually specify column names

dataset.columns = ['pollution', 'dew', 'temp', 'press', 'wnd_dir', 'wnd_spd', 'snow', 'rain']

dataset.index.name = 'date'

# mark all NA values with 0

dataset['pollution'].fillna(0, inplace=True)

# drop the first 24 hours

dataset = dataset[24:]

# summarize first 5 rows

print(dataset.head(5))

# save to file

dataset.to_csv('pollution.csv')

Running the example prints the first 5 rows of the transformed dataset and saves the dataset to “pollution.csv“.

                     pollution  dew  temp   press wnd_dir  wnd_spd  snow  rain
date
2010-01-02 00:00:00      129.0  -16  -4.0  1020.0      SE     1.79     0     0
2010-01-02 01:00:00      148.0  -15  -4.0  1020.0      SE     2.68     0     0
2010-01-02 02:00:00      159.0  -11  -5.0  1021.0      SE     3.57     0     0
2010-01-02 03:00:00      181.0   -7  -5.0  1022.0      SE     5.36     1     0
2010-01-02 04:00:00      138.0   -7  -5.0  1022.0      SE     6.25     2     0

pollution dew temp press wnd_dir wnd_spd snow rain

date

2010-01-02 00:00:00 129.0 -16 -4.0 1020.0 SE 1.79 0 0

2010-01-02 01:00:00 148.0 -15 -4.0 1020.0 SE 2.68 0 0

2010-01-02 02:00:00 159.0 -11 -5.0 1021.0 SE 3.57 0 0

2010-01-02 03:00:00 181.0 -7 -5.0 1022.0 SE 5.36 1 0

2010-01-02 04:00:00 138.0 -7 -5.0 1022.0 SE 6.25 2 0

Now that we have the data in an easy-to-use form, we can create a quick plot of each series and see what we have.

The code below loads the new “pollution.csv” file and plots each series as a separate subplot, except wind speed dir, which is categorical.

from pandas import read_csv
from matplotlib import pyplot
# load dataset
dataset = read_csv('pollution.csv', header=0, index_col=0)
values = dataset.values
# specify columns to plot
groups = [0, 1, 2, 3, 5, 6, 7]
i = 1
# plot each column
pyplot.figure()
for group in groups:
	pyplot.subplot(len(groups), 1, i)
	pyplot.plot(values[:, group])
	pyplot.title(dataset.columns[group], y=0.5, loc='right')
	i += 1
pyplot.show()

from pandas import read_csv

from matplotlib import pyplot

# load dataset

dataset = read_csv('pollution.csv', header=0, index_col=0)

values = dataset.values

# specify columns to plot

groups = [0, 1, 2, 3, 5, 6, 7]

i = 1

# plot each column

pyplot.figure()

for group in groups:

pyplot.subplot(len(groups), 1, i)

pyplot.plot(values[:, group])

pyplot.title(dataset.columns[group], y=0.5, loc='right')

i += 1

pyplot.show()

Running the example creates a plot with 7 subplots showing the 5 years of data for each variable.

Line Plots of Air Pollution Time Series

3. Multivariate LSTM Forecast Model

In this section, we will fit an LSTM to the problem.

LSTM Data Preparation

The first step is to prepare the pollution dataset for the LSTM.

This involves framing the dataset as a supervised learning problem and normalizing the input variables.

We will frame the supervised learning problem as predicting the pollution at the current hour (t) given the pollution measurement and weather conditions at the prior time step.

This formulation is straightforward and just for this demonstration. Some alternate formulations you could explore include:

Predict the pollution for the next hour based on the weather conditions and pollution over the last 24 hours.
Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour.

We can transform the dataset using the series_to_supervised() function developed in the blog post:

How to Convert a Time Series to a Supervised Learning Problem in Python

First, the “pollution.csv” dataset is loaded. The wind direction feature is label encoded (integer encoded). This could further be one-hot encoded in the future if you are interested in exploring it.

Next, all features are normalized, then the dataset is transformed into a supervised learning problem. The weather variables for the hour to be predicted (t) are then removed.

The complete code listing is provided below.

# prepare data for lstm
from pandas import read_csv
from pandas import DataFrame
from pandas import concat
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler

# convert series to supervised learning
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	n_vars = 1 if type(data) is list else data.shape[1]
	df = DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
		else:
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
		agg.dropna(inplace=True)
	return agg

# load dataset
dataset = read_csv('pollution.csv', header=0, index_col=0)
values = dataset.values
# integer encode direction
encoder = LabelEncoder()
values[:,4] = encoder.fit_transform(values[:,4])
# ensure all data is float
values = values.astype('float32')
# normalize features
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)
# frame as supervised learning
reframed = series_to_supervised(scaled, 1, 1)
# drop columns we don't want to predict
reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)
print(reframed.head())

# prepare data for lstm

from pandas import read_csv

from pandas import DataFrame

from pandas import concat

from sklearn.preprocessing import LabelEncoder

from sklearn.preprocessing import MinMaxScaler

# convert series to supervised learning

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):

n_vars = 1 if type(data) is list else data.shape[1]

df = DataFrame(data)

cols, names = list(), list()

# input sequence (t-n, ... t-1)

for i in range(n_in, 0, -1):

cols.append(df.shift(i))

names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]

# forecast sequence (t, t+1, ... t+n)

for i in range(0, n_out):

cols.append(df.shift(-i))

if i == 0:

names += [('var%d(t)' % (j+1)) for j in range(n_vars)]

else:

names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]

# put it all together

agg = concat(cols, axis=1)

agg.columns = names

# drop rows with NaN values

if dropnan:

agg.dropna(inplace=True)

return agg

# load dataset

dataset = read_csv('pollution.csv', header=0, index_col=0)

values = dataset.values

# integer encode direction

encoder = LabelEncoder()

values[:,4] = encoder.fit_transform(values[:,4])

# ensure all data is float

values = values.astype('float32')

# normalize features

scaler = MinMaxScaler(feature_range=(0, 1))

scaled = scaler.fit_transform(values)

# frame as supervised learning

reframed = series_to_supervised(scaled, 1, 1)

# drop columns we don't want to predict

reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)

print(reframed.head())

Running the example prints the first 5 rows of the transformed dataset. We can see the 8 input variables (input series) and the 1 output variable (pollution level at the current hour).

   var1(t-1)  var2(t-1)  var3(t-1)  var4(t-1)  var5(t-1)  var6(t-1)  \
1   0.129779   0.352941   0.245902   0.527273   0.666667   0.002290
2   0.148893   0.367647   0.245902   0.527273   0.666667   0.003811
3   0.159960   0.426471   0.229508   0.545454   0.666667   0.005332
4   0.182093   0.485294   0.229508   0.563637   0.666667   0.008391
5   0.138833   0.485294   0.229508   0.563637   0.666667   0.009912
 
   var7(t-1)  var8(t-1)   var1(t)
1   0.000000        0.0  0.148893
2   0.000000        0.0  0.159960
3   0.000000        0.0  0.182093
4   0.037037        0.0  0.138833
5   0.074074        0.0  0.109658

var1(t-1) var2(t-1) var3(t-1) var4(t-1) var5(t-1) var6(t-1) \

1 0.129779 0.352941 0.245902 0.527273 0.666667 0.002290

2 0.148893 0.367647 0.245902 0.527273 0.666667 0.003811

3 0.159960 0.426471 0.229508 0.545454 0.666667 0.005332

4 0.182093 0.485294 0.229508 0.563637 0.666667 0.008391

5 0.138833 0.485294 0.229508 0.563637 0.666667 0.009912

var7(t-1) var8(t-1) var1(t)

1 0.000000 0.0 0.148893

2 0.000000 0.0 0.159960

3 0.000000 0.0 0.182093

4 0.037037 0.0 0.138833

5 0.074074 0.0 0.109658

This data preparation is simple and there is more we could explore. Some ideas you could look at include:

One-hot encoding wind direction.
Making all series stationary with differencing and seasonal adjustment.
Providing more than 1 hour of input time steps.

This last point is perhaps the most important given the use of Backpropagation through time by LSTMs when learning sequence prediction problems.

Define and Fit Model

In this section, we will fit an LSTM on the multivariate input data.

First, we must split the prepared dataset into train and test sets. To speed up the training of the model for this demonstration, we will only fit the model on the first year of data, then evaluate it on the remaining 4 years of data. If you have time, consider exploring the inverted version of this test harness.

The example below splits the dataset into train and test sets, then splits the train and test sets into input and output variables. Finally, the inputs (X) are reshaped into the 3D format expected by LSTMs, namely [samples, timesteps, features].

...
# split into train and test sets
values = reframed.values
n_train_hours = 365 * 24
train = values[:n_train_hours, :]
test = values[n_train_hours:, :]
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

...

# split into train and test sets

values = reframed.values

n_train_hours = 365 * 24

train = values[:n_train_hours, :]

test = values[n_train_hours:, :]

# split into input and outputs

train_X, train_y = train[:, :-1], train[:, -1]

test_X, test_y = test[:, :-1], test[:, -1]

# reshape input to be 3D [samples, timesteps, features]

train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))

test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))

print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

Running this example prints the shape of the train and test input and output sets with about 9K hours of data for training and about 35K hours for testing.

(8760, 1, 8) (8760,) (35039, 1, 8) (35039,)

1	(8760, 1, 8) (8760,) (35039, 1, 8) (35039,)

Now we can define and fit our LSTM model.

We will define the LSTM with 50 neurons in the first hidden layer and 1 neuron in the output layer for predicting pollution. The input shape will be 1 time step with 8 features.

We will use the Mean Absolute Error (MAE) loss function and the efficient Adam version of stochastic gradient descent.

The model will be fit for 50 training epochs with a batch size of 72. Remember that the internal state of the LSTM in Keras is reset at the end of each batch, so an internal state that is a function of a number of days may be helpful (try testing this).

Finally, we keep track of both the training and test loss during training by setting the validation_data argument in the fit() function. At the end of the run both the training and test loss are plotted.

...
# design network
model = Sequential()
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
# fit network
history = model.fit(train_X, train_y, epochs=50, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)
# plot history
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

...

# design network

model = Sequential()

model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))

model.add(Dense(1))

model.compile(loss='mae', optimizer='adam')

# fit network

history = model.fit(train_X, train_y, epochs=50, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)

# plot history

pyplot.plot(history.history['loss'], label='train')

pyplot.plot(history.history['val_loss'], label='test')

pyplot.legend()

pyplot.show()

Evaluate Model

After the model is fit, we can forecast for the entire test dataset.

We combine the forecast with the test dataset and invert the scaling. We also invert scaling on the test dataset with the expected pollution numbers.

With forecasts and actual values in their original scale, we can then calculate an error score for the model. In this case, we calculate the Root Mean Squared Error (RMSE) that gives error in the same units as the variable itself.

...
# make a prediction
yhat = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
# invert scaling for forecast
inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]
# calculate RMSE
rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
print('Test RMSE: %.3f' % rmse)

...

# make a prediction

yhat = model.predict(test_X)

test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))

# invert scaling for forecast

inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)

inv_yhat = scaler.inverse_transform(inv_yhat)

inv_yhat = inv_yhat[:,0]

# invert scaling for actual

test_y = test_y.reshape((len(test_y), 1))

inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)

inv_y = scaler.inverse_transform(inv_y)

inv_y = inv_y[:,0]

# calculate RMSE

rmse = sqrt(mean_squared_error(inv_y, inv_yhat))

print('Test RMSE: %.3f' % rmse)

Complete Example

The complete example is listed below.

NOTE: This example assumes you have prepared the data correctly, e.g. converted the downloaded “raw.csv” to the prepared “pollution.csv“. See the first part of this tutorial.

from math import sqrt
from numpy import concatenate
from matplotlib import pyplot
from pandas import read_csv
from pandas import DataFrame
from pandas import concat
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
 
# convert series to supervised learning
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	n_vars = 1 if type(data) is list else data.shape[1]
	df = DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
		else:
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
		agg.dropna(inplace=True)
	return agg
 
# load dataset
dataset = read_csv('pollution.csv', header=0, index_col=0)
values = dataset.values
# integer encode direction
encoder = LabelEncoder()
values[:,4] = encoder.fit_transform(values[:,4])
# ensure all data is float
values = values.astype('float32')
# normalize features
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)
# frame as supervised learning
reframed = series_to_supervised(scaled, 1, 1)
# drop columns we don't want to predict
reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)
print(reframed.head())
 
# split into train and test sets
values = reframed.values
n_train_hours = 365 * 24
train = values[:n_train_hours, :]
test = values[n_train_hours:, :]
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
 
# design network
model = Sequential()
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
# fit network
history = model.fit(train_X, train_y, epochs=50, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)
# plot history
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()
 
# make a prediction
yhat = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
# invert scaling for forecast
inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]
# calculate RMSE
rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
print('Test RMSE: %.3f' % rmse)

from math import sqrt

from numpy import concatenate

from matplotlib import pyplot

from pandas import read_csv

from pandas import DataFrame

from pandas import concat

from sklearn.preprocessing import MinMaxScaler

from sklearn.preprocessing import LabelEncoder

from sklearn.metrics import mean_squared_error

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import LSTM

# convert series to supervised learning

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):

n_vars = 1 if type(data) is list else data.shape[1]

df = DataFrame(data)

cols, names = list(), list()

# input sequence (t-n, ... t-1)

for i in range(n_in, 0, -1):

cols.append(df.shift(i))

names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]

# forecast sequence (t, t+1, ... t+n)

for i in range(0, n_out):

cols.append(df.shift(-i))

if i == 0:

names += [('var%d(t)' % (j+1)) for j in range(n_vars)]

else:

names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]

# put it all together

agg = concat(cols, axis=1)

agg.columns = names

# drop rows with NaN values

if dropnan:

agg.dropna(inplace=True)

return agg

# load dataset

dataset = read_csv('pollution.csv', header=0, index_col=0)

values = dataset.values

# integer encode direction

encoder = LabelEncoder()

values[:,4] = encoder.fit_transform(values[:,4])

# ensure all data is float

values = values.astype('float32')

# normalize features

scaler = MinMaxScaler(feature_range=(0, 1))

scaled = scaler.fit_transform(values)

# frame as supervised learning

reframed = series_to_supervised(scaled, 1, 1)

# drop columns we don't want to predict

reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)

print(reframed.head())

# split into train and test sets

values = reframed.values

n_train_hours = 365 * 24

train = values[:n_train_hours, :]

test = values[n_train_hours:, :]

# split into input and outputs

train_X, train_y = train[:, :-1], train[:, -1]

test_X, test_y = test[:, :-1], test[:, -1]

# reshape input to be 3D [samples, timesteps, features]

train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))

test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))

print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

# design network

model = Sequential()

model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))

model.add(Dense(1))

model.compile(loss='mae', optimizer='adam')

# fit network

history = model.fit(train_X, train_y, epochs=50, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)

# plot history

pyplot.plot(history.history['loss'], label='train')

pyplot.plot(history.history['val_loss'], label='test')

pyplot.legend()

pyplot.show()

# make a prediction

yhat = model.predict(test_X)

test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))

# invert scaling for forecast

inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)

inv_yhat = scaler.inverse_transform(inv_yhat)

inv_yhat = inv_yhat[:,0]

# invert scaling for actual

test_y = test_y.reshape((len(test_y), 1))

inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)

inv_y = scaler.inverse_transform(inv_y)

inv_y = inv_y[:,0]

# calculate RMSE

rmse = sqrt(mean_squared_error(inv_y, inv_yhat))

print('Test RMSE: %.3f' % rmse)

Running the example first creates a plot showing the train and test loss during training.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Interestingly, we can see that test loss drops below training loss. The model may be overfitting the training data. Measuring and plotting RMSE during training may shed more light on this.

Line Plot of Train and Test Loss from the Multivariate LSTM During Training

The Train and test loss are printed at the end of each training epoch. At the end of the run, the final RMSE of the model on the test dataset is printed.

We can see that the model achieves a respectable RMSE of 26.496, which is lower than an RMSE of 30 found with a persistence model.

...
Epoch 46/50
0s - loss: 0.0143 - val_loss: 0.0133
Epoch 47/50
0s - loss: 0.0143 - val_loss: 0.0133
Epoch 48/50
0s - loss: 0.0144 - val_loss: 0.0133
Epoch 49/50
0s - loss: 0.0143 - val_loss: 0.0133
Epoch 50/50
0s - loss: 0.0144 - val_loss: 0.0133
Test RMSE: 26.496

...

Epoch 46/50

0s - loss: 0.0143 - val_loss: 0.0133

Epoch 47/50

0s - loss: 0.0143 - val_loss: 0.0133

Epoch 48/50

0s - loss: 0.0144 - val_loss: 0.0133

Epoch 49/50

0s - loss: 0.0143 - val_loss: 0.0133

Epoch 50/50

0s - loss: 0.0144 - val_loss: 0.0133

Test RMSE: 26.496

This model is not tuned. Can you do better?
Let me know your problem framing, model configuration, and RMSE in the comments below.

Train On Multiple Lag Timesteps Example

There have been many requests for advice on how to adapt the above example to train the model on multiple previous time steps.

I had tried this and a myriad of other configurations when writing the original post and decided not to include them because they did not lift model skill.

Nevertheless, I have included this example below as reference template that you could adapt for your own problems.

The changes needed to train the model on multiple previous time steps are quite minimal, as follows:

First, you must frame the problem suitably when calling series_to_supervised(). We will use 3 hours of data as input. Also note, we no longer explictly drop the columns from all of the other fields at ob(t).

...
# specify the number of lag hours
n_hours = 3
n_features = 8
# frame as supervised learning
reframed = series_to_supervised(scaled, n_hours, 1)

...

# specify the number of lag hours

n_hours = 3

n_features = 8

# frame as supervised learning

reframed = series_to_supervised(scaled, n_hours, 1)

Next, we need to be more careful in specifying the column for input and output.

We have 3 * 8 + 8 columns in our framed dataset. We will take 3 * 8 or 24 columns as input for the obs of all features across the previous 3 hours. We will take just the pollution variable as output at the following hour, as follows:

...
# split into input and outputs
n_obs = n_hours * n_features
train_X, train_y = train[:, :n_obs], train[:, -n_features]
test_X, test_y = test[:, :n_obs], test[:, -n_features]
print(train_X.shape, len(train_X), train_y.shape)

...

# split into input and outputs

n_obs = n_hours * n_features

train_X, train_y = train[:, :n_obs], train[:, -n_features]

test_X, test_y = test[:, :n_obs], test[:, -n_features]

print(train_X.shape, len(train_X), train_y.shape)

Next, we can reshape our input data correctly to reflect the time steps and features.

...
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], n_hours, n_features))
test_X = test_X.reshape((test_X.shape[0], n_hours, n_features))

...

# reshape input to be 3D [samples, timesteps, features]

train_X = train_X.reshape((train_X.shape[0], n_hours, n_features))

test_X = test_X.reshape((test_X.shape[0], n_hours, n_features))

Fitting the model is the same.

The only other small change is in how to evaluate the model. Specifically, in how we reconstruct the rows with 8 columns suitable for reversing the scaling operation to get the y and yhat back into the original scale so that we can calculate the RMSE.

The gist of the change is that we concatenate the y or yhat column with the last 7 features of the test dataset in order to inverse the scaling, as follows:

...
# invert scaling for forecast
inv_yhat = concatenate((yhat, test_X[:, -7:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_y, test_X[:, -7:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]

...

# invert scaling for forecast

inv_yhat = concatenate((yhat, test_X[:, -7:]), axis=1)

inv_yhat = scaler.inverse_transform(inv_yhat)

inv_yhat = inv_yhat[:,0]

# invert scaling for actual

test_y = test_y.reshape((len(test_y), 1))

inv_y = concatenate((test_y, test_X[:, -7:]), axis=1)

inv_y = scaler.inverse_transform(inv_y)

inv_y = inv_y[:,0]

We can tie all of these modifications to the above example together. The complete example of multvariate time series forecasting with multiple lag inputs is listed below:

from math import sqrt
from numpy import concatenate
from matplotlib import pyplot
from pandas import read_csv
from pandas import DataFrame
from pandas import concat
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

# convert series to supervised learning
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	n_vars = 1 if type(data) is list else data.shape[1]
	df = DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
		else:
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
		agg.dropna(inplace=True)
	return agg

# load dataset
dataset = read_csv('pollution.csv', header=0, index_col=0)
values = dataset.values
# integer encode direction
encoder = LabelEncoder()
values[:,4] = encoder.fit_transform(values[:,4])
# ensure all data is float
values = values.astype('float32')
# normalize features
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)
# specify the number of lag hours
n_hours = 3
n_features = 8
# frame as supervised learning
reframed = series_to_supervised(scaled, n_hours, 1)
print(reframed.shape)

# split into train and test sets
values = reframed.values
n_train_hours = 365 * 24
train = values[:n_train_hours, :]
test = values[n_train_hours:, :]
# split into input and outputs
n_obs = n_hours * n_features
train_X, train_y = train[:, :n_obs], train[:, -n_features]
test_X, test_y = test[:, :n_obs], test[:, -n_features]
print(train_X.shape, len(train_X), train_y.shape)
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], n_hours, n_features))
test_X = test_X.reshape((test_X.shape[0], n_hours, n_features))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

# design network
model = Sequential()
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
# fit network
history = model.fit(train_X, train_y, epochs=50, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)
# plot history
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

# make a prediction
yhat = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], n_hours*n_features))
# invert scaling for forecast
inv_yhat = concatenate((yhat, test_X[:, -7:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_y, test_X[:, -7:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]
# calculate RMSE
rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
print('Test RMSE: %.3f' % rmse)

from math import sqrt

from numpy import concatenate

from matplotlib import pyplot

from pandas import read_csv

from pandas import DataFrame

from pandas import concat

from sklearn.preprocessing import MinMaxScaler

from sklearn.preprocessing import LabelEncoder

from sklearn.metrics import mean_squared_error

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import LSTM

# convert series to supervised learning

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):

n_vars = 1 if type(data) is list else data.shape[1]

df = DataFrame(data)

cols, names = list(), list()

# input sequence (t-n, ... t-1)

for i in range(n_in, 0, -1):

cols.append(df.shift(i))

names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]

# forecast sequence (t, t+1, ... t+n)

for i in range(0, n_out):

cols.append(df.shift(-i))

if i == 0:

names += [('var%d(t)' % (j+1)) for j in range(n_vars)]

else:

names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]

# put it all together

agg = concat(cols, axis=1)

agg.columns = names

# drop rows with NaN values

if dropnan:

agg.dropna(inplace=True)

return agg

# load dataset

dataset = read_csv('pollution.csv', header=0, index_col=0)

values = dataset.values

# integer encode direction

encoder = LabelEncoder()

values[:,4] = encoder.fit_transform(values[:,4])

# ensure all data is float

values = values.astype('float32')

# normalize features

scaler = MinMaxScaler(feature_range=(0, 1))

scaled = scaler.fit_transform(values)

# specify the number of lag hours

n_hours = 3

n_features = 8

# frame as supervised learning

reframed = series_to_supervised(scaled, n_hours, 1)

print(reframed.shape)

# split into train and test sets

values = reframed.values

n_train_hours = 365 * 24

train = values[:n_train_hours, :]

test = values[n_train_hours:, :]

# split into input and outputs

n_obs = n_hours * n_features

train_X, train_y = train[:, :n_obs], train[:, -n_features]

test_X, test_y = test[:, :n_obs], test[:, -n_features]

print(train_X.shape, len(train_X), train_y.shape)

# reshape input to be 3D [samples, timesteps, features]

train_X = train_X.reshape((train_X.shape[0], n_hours, n_features))

test_X = test_X.reshape((test_X.shape[0], n_hours, n_features))

print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

# design network

model = Sequential()

model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))

model.add(Dense(1))

model.compile(loss='mae', optimizer='adam')

# fit network

history = model.fit(train_X, train_y, epochs=50, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)

# plot history

pyplot.plot(history.history['loss'], label='train')

pyplot.plot(history.history['val_loss'], label='test')

pyplot.legend()

pyplot.show()

# make a prediction

yhat = model.predict(test_X)

test_X = test_X.reshape((test_X.shape[0], n_hours*n_features))

# invert scaling for forecast

inv_yhat = concatenate((yhat, test_X[:, -7:]), axis=1)

inv_yhat = scaler.inverse_transform(inv_yhat)

inv_yhat = inv_yhat[:,0]

# invert scaling for actual

test_y = test_y.reshape((len(test_y), 1))

inv_y = concatenate((test_y, test_X[:, -7:]), axis=1)

inv_y = scaler.inverse_transform(inv_y)

inv_y = inv_y[:,0]

# calculate RMSE

rmse = sqrt(mean_squared_error(inv_y, inv_yhat))

print('Test RMSE: %.3f' % rmse)

The model is fit as before in a minute or two.

...
Epoch 45/50
1s - loss: 0.0143 - val_loss: 0.0154
Epoch 46/50
1s - loss: 0.0143 - val_loss: 0.0148
Epoch 47/50
1s - loss: 0.0143 - val_loss: 0.0152
Epoch 48/50
1s - loss: 0.0143 - val_loss: 0.0151
Epoch 49/50
1s - loss: 0.0143 - val_loss: 0.0152
Epoch 50/50
1s - loss: 0.0144 - val_loss: 0.0149

...

Epoch 45/50

1s - loss: 0.0143 - val_loss: 0.0154

Epoch 46/50

1s - loss: 0.0143 - val_loss: 0.0148

Epoch 47/50

1s - loss: 0.0143 - val_loss: 0.0152

Epoch 48/50

1s - loss: 0.0143 - val_loss: 0.0151

Epoch 49/50

1s - loss: 0.0143 - val_loss: 0.0152

Epoch 50/50

1s - loss: 0.0144 - val_loss: 0.0149

A plot of train and test loss over the epochs is plotted.

Plot of Loss on the Train and Test Datasets

Finally, the Test RMSE is printed, not really showing any advantage in skill, at least on this problem.

Test RMSE: 27.177

1	Test RMSE: 27.177

I would add that the LSTM does not appear to be suitable for autoregression type problems and that you may be better off exploring an MLP with a large window.

I hope this example helps you with your own time series forecasting experiments.

Summary

In this tutorial, you discovered how to fit an LSTM to a multivariate time series forecasting problem.

Specifically, you learned:

How to transform a raw dataset into something we can use for time series forecasting.
How to prepare data and fit an LSTM for a multivariate time series forecasting problem.
How to make a forecast and rescale the result back into the original units.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

2,751 Responses to Multivariate Time Series Forecasting with LSTMs in Keras

zorg August 14, 2017 at 7:08 pm #

except wind *dir*, which is categorical.

Reply
- Jason Brownlee August 15, 2017 at 6:33 am #
  
  Thanks, fixed!
  
  Reply
  - bhupesh January 12, 2018 at 4:24 am #
    
    how to use grid search for neurons
    
    Reply
    - bhupesh January 12, 2018 at 4:39 am #
      
      I want to apply grid search in this to tune neurons and add layers
      
      Reply
      - lstm January 12, 2018 at 4:40 am #
        
        and to find best parameters
    - Jason Brownlee January 12, 2018 at 5:54 am #
      
      See this post:
      https://machinelearningmastery.com/tune-lstm-hyperparameters-keras-time-series-forecasting/
      
      Reply
  - qing January 15, 2018 at 3:12 am #
    
    hello Jason,
    I have run the code in my spyder and I know the RMSE index is good enough for this model. However, I added the accuracy index in this code, that is
    model.compile(loss=’mae’, optimizer=’adam’, metrics=[‘accuracy’])
    and the accuracy is totally the same in each epoch and is very low (0.0761). I also use my own data to run your code, and the result is the same, with good RMSE values but bad accuracy. I have troubled by this for several days and looking forward to your reply.
    
    Reply
    - Jason Brownlee January 15, 2018 at 7:00 am #
      
      You cannot measure accuracy for regression.
      
      Learn more here:
      https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/
      
      Reply
      - Miao April 3, 2020 at 3:18 pm #
        
        Hi，Jason.
        I have the same problem as qing.I don‘t know why we cannot measure accuracy for regression.And the website you provided cannot be opened.
        Could you please help me with that?
      - Jason Brownlee April 4, 2020 at 6:15 am #
        
        Accuracy summarizes correct predictions for class labels. It cannot be used for regression. Instead you must calculate an error metric, like RMSE.
        
        Learn more here:
        https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-classification-and-regression
        
        And here:
        https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/
      - Miao April 6, 2020 at 10:12 pm #
        
        Thank you very much!
      - Jason Brownlee April 7, 2020 at 5:49 am #
        
        You’re welcome.
      - Emmanuel March 5, 2021 at 7:51 am #
        
        That is correct! You can only use accuracy for class labels. You could calculate RMSE or R^2 instead
  - Mike September 19, 2018 at 5:45 pm #
    
    hi，Jason，I‘m a new learner. There is no real curve and predicted curve in your tutorial.
    I want to know how can I get it? I mean how to write it in the code?
    
    Reply
    - Jason Brownlee September 20, 2018 at 7:52 am #
      
      Sorry, I don’t understand your question Mike, can you elaborate?
      
      Reply
      - Ramon September 25, 2018 at 12:01 am #
        
        I guess he means the predicted value vs ground truth chart.
      - Jason Brownlee September 25, 2018 at 6:24 am #
        
        I see.
        
        You can call model.predict() to get yhat and create a line plot with y and yhat.
        
        I have done this in some other tutorials, for example:
        https://machinelearningmastery.com/time-series-forecasting-long-short-term-memory-network-python/
        
        If this is a challenge for you, I would suggest this tutorial is too advanced for you and I would encourage you to start with intro to time series here:
        https://machinelearningmastery.com/start-here/#timeseries
  - Nischal Sehrawat April 16, 2019 at 10:55 pm #
    
    Hi Jason, in all this implementation, how does thw feedback implementation occur? How do we account for lags in predicted time series?
    
    Reply
    - Jason Brownlee April 17, 2019 at 7:01 am #
      
      Lags are accounted for as input time steps to the model.
      
      Perhaps read this:
      https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
      
      Reply
  - Silvia July 25, 2019 at 2:00 am #
    
    Many thanks for this incredibly useful example!
    I think I might have a small suggestion: I’ve downloaded the “pollution” data set from the Github link provided, and I found out that maybe the column to be encoded is now column 8 and not 4 like in the original code, so I made this amendment and it all worked: (apologies if I’m missing something):
    # I’ve replaced this line:
    #values[:,4] = encoder.fit_transform(values[:,4])
    # … with this line:
    values[:,8] = encoder.fit_transform(values[:,8])
    
    Thanks for your help!
    
    Reply
    - Jason Brownlee July 25, 2019 at 7:56 am #
      
      Perhaps you downloaded the wrong dataset?
      
      Here it is:
      https://raw.githubusercontent.com/jbrownlee/Datasets/master/pollution.csv
      
      Reply
  - neha June 20, 2020 at 8:10 pm #
    
    good afternoon,i m new to machine learning and trying to run ur code on google colabs,but i getting the following error.
    
    2003
    2004 if not is_integer(x):
    -> 2005 x = names.index(x)
    2006
    2007 self._reader.set_noconvert(x)
    
    ValueError: ‘year’ is not in list
    
    pls help me to slove out
    
    Reply
    - Jason Brownlee June 21, 2020 at 6:20 am #
      
      Sorry, I don’t know about colab.
      
      Try running the example on your workstation.
      
      Reply
- Wallace March 20, 2020 at 10:57 pm #
  
  Hi Jason. Do you know why i can’t inverse scaler transform in inv_yhat and why appear this error?
  
  operands could not be broadcast together with shapes (157,13) (7,) (157,13)
  
  Reply
  - Jason Brownlee March 21, 2020 at 8:23 am #
    
    Perhaps this will help:
    https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/
    
    Reply
  - Pam April 9, 2020 at 10:57 pm #
    
    I know how I can help you! In Jason’s code it is as follows:
    
    inv_yhat = concatenate((yhat, test_X[:, -7:]), axis=1)
    
    But make sure instead of 7 you use number_of_features -1, otherwise you have the value error.
    
    So in my case, I use 31 features (including the one I wanna predict), and it is the following code:
    inv_yhat = concatenate((yhat, test_X[:, -30:]), axis=1)
    
    as well as for inv_y:
    inv_y = concatenate((test_y, test_X[:, -30:]), axis=1)
    
    Hope this helps!
    
    Reply
Francois AKOA August 15, 2017 at 7:16 am #

Great post Jason. Thank you so much for making this material available for the community..

Reply
- Jason Brownlee August 15, 2017 at 4:54 pm #
  
  Thanks Francois, I’m glad it helped!
  
  Reply
yao August 15, 2017 at 2:02 pm #

hi, jason. There were some problems under my environment which were keras2.0.4and tensorflow-GPU0.12.0rc0.

And Bug was that “TypeError: Expected int32, got list containing Tensors of type ‘_Message’ instead.”

The sentence that “model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))” was located.

Could you please help me with that?

Regards,

yao

Reply
- Jason Brownlee August 15, 2017 at 4:54 pm #
  
  I would recommend this tutorial for setting up your environment:
  https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/
  
  Reply
  - yao August 16, 2017 at 7:18 pm #
    
    Thx a lot, doctor, it works! fabulous! 🙂
    
    Reply
    - Jason Brownlee August 17, 2017 at 6:40 am #
      
      I’m glad to hear that.
      
      Reply
      - Shirley Yang August 18, 2017 at 12:00 pm #
        
        Dr.Jason, I update TensorFlow then it works!
        Sorry to bother you.
        Thank you very much !
        Best wishes !
      - Jason Brownlee August 18, 2017 at 4:40 pm #
        
        I’m glad to hear that!
    - Shirley Yang August 17, 2017 at 8:54 pm #
      
      I met the same problem .
      
      Did you uninstall all the programs previously installed or just set up the environment again?
      
      Thx a lot!
      
      Reply
  - Shirley Yang August 18, 2017 at 11:43 am #
    
    Hi Jason,I set up my environment as the your tutorial.
    
    scipy: 0.19.0
    numoy: 1.12.1
    matplotlib: 2.0.2
    pandas: 0.20.1
    statsmodels: 0.8.0
    sklearn: 0.18.1
    
    theano: 0.9.0.dev-c697eeab84e5b8a74908da654b66ec9eca4f1291
    tensorflow: 0.12.1
    Using TensorFlow backend.
    keras: 2.0.5
    
    But the bug still existed.Is the version of tensorFlow too odd?How could I do?
    Thanks!
    
    Reply
    - Jason Brownlee August 18, 2017 at 4:39 pm #
      
      It might be, I am running v1.2.1.
      
      Perhaps try running Keras off Theano instead (e.g. change the backend in the ~/.keras.jason config)
      
      Reply
Songbin Xu August 15, 2017 at 10:42 pm #

It seems that inv_y = scaler.inverse_transform(test_X)[:,0] is not the actual, should inv_yhat be compared with test_y but not pollution(t-1)? Because I think this inv_y here means pollution(t-1). Is this prediction equals to only making a time shifting from the current known pollution value (which means the models just take pollution(t) as the prediction of pollution(t+1))?

Reply
- Jason Brownlee August 16, 2017 at 6:35 am #
  
  Sorry, I’m not sure I follow. Can you please restate your question, perhaps with an example?
  
  Reply
  - Songbin Xu August 16, 2017 at 7:36 pm #
    
    Sorry for the confusing expression. In fact, the series_to_supervised() function would create a DataFrame whose columns are: [ var1(t-1), var2(t-1), …, var1(t) ] where ‘var1’ represents ‘pollution’, therefore, the first dimension in test_X (that is, test_X[:,0]) would be ‘pollution(t-1)’. However, in the code you calculate the rmse between inv_yhat and test_X[:,0], even though the rmse is low, it could only shows that the model’s prediction for t+1 is close to what it has known at t.
    I am asking this question because I’ve ran through the codes and saw the models prediction pollution(t+1) looks just like pollution(t). I’ve also tried to use t-1, t-2 and so on for training, but still changed nothing.
    Do you think the model tends to learn to just take the pollution value at current moment as the prediction for the next moment?
    
    thanks 🙂
    
    Reply
    - Jason Brownlee August 17, 2017 at 6:42 am #
      
      If we predict t for t+1 that is called persistence, and we show in the tutorial that the LSTM does a lot better than persistence.
      
      Perhaps I don’t understand your question? Can you give me an example of what you are asking?
      
      Reply
      - Songbin Xu August 17, 2017 at 10:53 am #
        
        Hmm, it’s difficult to explain without a graph.
        
        In a word, and also it’s an example, I want to ask two questions:
        
        1. In the “make a prediction” part of your codes, why it computes rmse between predicted t+1 and real t, but not between predicted t+1 and real t+1?
        
        2. After the “make a prediction” part of your codes run, it turns out that rmse between predicted t+1 and real t is small, is it an evidence that LSTM is making persistence?
      - Jason Brownlee August 17, 2017 at 4:52 pm #
        
        RMSE is calculated for y and yhat for the same time periods (well, that was the intent), why do you think they are not?
        
        Is there a bug?
      - David Righart August 18, 2017 at 5:30 am #
        
        I think Songbin Xu is right. By executing the statement at line 90: inv_y = inv_y[:,0], you compare the inv_yhat with inv_y. inv_y is the polution(t-1) and inv_yhat is the predicted polution(t).
        
        On line 50 the second parameter the function series_to_supervised can be changed to 3 or 5, so more days of history are used. If you do so, an error occurs in the scaler.inverse_transform (line 89).
        
        No worries, great tutorial and I learned a lot so far!
      - Jason Brownlee August 18, 2017 at 6:54 am #
        
        I see now, you guys are 100% correct. Thank you!
        
        I have updated the calculation of RMSE and the final score reported in the post.
        
        Note, I ran a ton of experiments on AWS with many different lag values > 1 and none achieved better results than a simple lag=1 model (e.g. an LSTM model with no BPTT). I see this as a bad sign for the use of LSTMs for autoregression problems.
      - Chen-Yeou Yu February 3, 2019 at 2:21 am #
        
        Hi Dr. Jason,
        
        As for this:
        Updated Aug/2017: Fixed a bug where yhat was compared to obs at the previous time step when calculating the final RMSE. Thanks, Songbin Xu and David Righart.
        
        It seems to have some errors on calculating RMSE based on (t-1) vs (t) different time slots before. I’m just curious how it is corrected? Can you elaborate that little bit more? Because for me, I’m still thinking it is RMSE based on (t-1) vs (t)
        
        Thanks
      - Jason Brownlee February 3, 2019 at 6:20 am #
        
        I have updated tutorials that I think have better code and are easier to follow, you can get started here:
        https://machinelearningmastery.com/start-here/#deep_learning_time_series
      - SUNNY April 5, 2019 at 3:39 pm #
        
        hey,Janson.The RMSE before you updated it was 3.386. Is this article RMSE 26.496 the correct answer after you updated it? In other words,inv_y = scaler.inverse_transform(test_X)[:,0] is not true，test_y = test_y.reshape((len(test_y), 1))
        inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)
        inv_y = scaler.inverse_transform(inv_y) is the correct code,is it right?I find so many people use the incorrect code .
      - Jason Brownlee April 6, 2019 at 6:39 am #
        
        I don’t recall.
        
        I recommend starting with a more recent tutorial using modern methods:
        https://machinelearningmastery.com/start-here/#deep_learning_time_series
Simone August 16, 2017 at 1:11 am #

Hi Jason, great post!

Is it necessary remove seasonality (by seasonal differentiation) when we are using LSTM?

Reply
- Jason Brownlee August 16, 2017 at 6:37 am #
  
  No, but results are often better.
  
  Reply
Slavenya August 16, 2017 at 5:18 am #

Good article, thank.

Two questions:
What changes will be required if your data is sporadic? Meaning sometimes it could be 5 hours without the report.

And how do you add more timesteps into your model? Obviously you have to reshape it properly but you also have to calculate it properly.

Reply
- Jason Brownlee August 16, 2017 at 6:41 am #
  
  You could fill in the missing data by imputing or ignore the gaps using masking.
  
  What do you mean by “add more timesteps”?
  
  Reply
  - Slavenya August 16, 2017 at 7:00 pm #
    
    But what should I do if all data is stochastic time sequence?
    
    For example predicting time till the next event – when events frequency is stochastically distributed on the timeline.
    
    Reply
    - Jason Brownlee August 17, 2017 at 6:39 am #
      
      Good question, this sounds like survival analysis to me, perhaps see if it applies:
      https://en.wikipedia.org/wiki/Survival_analysis
      
      Reply
Jack Dan August 16, 2017 at 5:48 am #

Dr.Jason,

Thank you for an awesome post.
(I was practicing on load forecast using MLP and SVR (You also suggested on a comment in your other LSTM tutorials). I also tried with LSTM and it did almost perform like SVR. However, in LSTM, I did not consider time lags because I have predicted future predictor variables that I was feeding as test set. I will try this method with time lags to cross validate the models)

Reply
- Jason Brownlee August 16, 2017 at 6:42 am #
  
  Nice Jack, let me know how you go.
  
  Reply

Adam August 16, 2017 at 1:03 pm #

Hi Jason,

Can I use ‘look back'(Using t-2 , t-1 steps data to predict t step air pollution) in this case?
If it’s available,that my input data shape will be [samples , look back , features] isn’t it?

Jason Brownlee August 16, 2017 at 5:00 pm #

You can Adam, see the series_to_supervised() function and its usage in the tutorial.

Adam August 18, 2017 at 6:07 pm #

Hi Jason,

If I used n_in=5 in series_to_supervised() function,in your tutorial the input shape will be [samples, 1 , features*5].Can I reshape it to [samples, 5 , features]?If I can, what is the difference between these two shape?

Jason Brownlee August 19, 2017 at 6:09 am #

The second dimension is time steps (e.g. BPTT) and the third dimension are the features (e.g. observations at each time step). You can use features as time steps, but it would not really make sense and I expect performance to be poor.

Here’s how to build a model multiple time steps for multiple features:

# specify number of hours
n_hours = 2
reframed = series_to_supervised(scaled, n_hours, 1)
...

# no longer just drop those columns
# reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)
# print(reframed.head())
...

# be more careful about choosing columns for input and output
n_features = 7
n_obs = n_hours * n_features
train_X, train_y = train[:, 0:n_obs], train[:, -n_features]
test_X, test_y = test[:, 0:n_obs], test[:, -n_features]
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], n_hours, n_features))
test_X = test_X.reshape((test_X.shape[0], n_hours, n_features))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
...

# specify number of hours

n_hours = 2

reframed = series_to_supervised(scaled, n_hours, 1)

...

# no longer just drop those columns

# reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)

# print(reframed.head())

...

# be more careful about choosing columns for input and output

n_features = 7

n_obs = n_hours * n_features

train_X, train_y = train[:, 0:n_obs], train[:, -n_features]

test_X, test_y = test[:, 0:n_obs], test[:, -n_features]

# reshape input to be 3D [samples, timesteps, features]

train_X = train_X.reshape((train_X.shape[0], n_hours, n_features))

test_X = test_X.reshape((test_X.shape[0], n_hours, n_features))

print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

...

And that’s it. I just tested and it looks good. The RMSE calculation will blow up, but you guys can fix that up I figure.

George Khoury August 19, 2017 at 11:55 pm #

Jason, great post, very clear, and very useful!! I’m about 90% with you and think a few folks may be stuck on this final point if they try to implement multi-feature, multi-hour-lookback LSTM.

Seems like by making adjustments above, I’m able to make a prediction, but the scaling inversion doesn’t want to cooperate. The reshape step now that we have multiple features and multiple timesteps has a mismatch in the shape, and even if I make the shape work, the concatenation and inversion still don’t work. Could you share what else you changed in this section to make it work? I’m not so concerned about the RMSE as much as that I can extract useful predictions. Thank you for any insight since you’ve been able to do it successfully.

# make a prediction
yhat = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
# invert scaling for forecast
inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]

…
Lg September 2, 2017 at 12:40 am #

Hi Jason,

Great and useful article.

I am somewhat puzzled by the number of features you specify to forecast the pollution rate based on data from the previous 24 hours.

Do not we have 8 features for each time-step and not 7?

After generating data to supervise with the function series_to_supervised(scaled,24, 1), the resulting array has a shape of (43800, 200) which is 25 * 8.

To invert the scaling for forecast I made few modifications. I used scaled.shape[1] below but in my opinion it could be n_features. Moreover, I don’t know if the values concatenated to yhat and test_y really matter, as long as they have been scaled with fit_transform and the array has the right shape.

yhat = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], n_obs))

# invert scaling for forecast
inv_yhat = concatenate((yhat, test_X[:, 1:scaled.shape[1]]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]

# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_y, test_X[:, 1:scaled.shape[1]]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]

The model has 4 layers with dropout.
After 200 epochs I have got
loss: 0.0169 – val_loss: 0.0162
And a rmse = 29.173

Regards.
Jason Brownlee September 2, 2017 at 6:13 am #

We have 7 features because we drop one in section “2. Basic Data Preparation”.
lg September 2, 2017 at 5:59 pm #

Hi Jason,

It’s really weird to me :(, as I used your code to prepare the data (pollution.csv) and I have 9 fields in the resulting file.

[date, pollution, dew, temp, press, wnd_dir, wnd_spd, snow, rain]

😯
Jason Brownlee September 3, 2017 at 5:40 am #

Date and wind direction are dropped during data preparation, perhaps you accidentally skipped a step or are reviewing a different file from the output file?
Lg September 3, 2017 at 6:22 pm #

Hi Jason,

So that’s fine, in my case I have 8 features.

When reading the file, the field ‘date’ becomes the index of the dataframe and the field ‘wnd_dir’ is later label encoded, as you do above in “The complete example” lines 42-43.

It is now much clearer for me. I am not puzzled anymore. 😉

Thanks a lot for all the information contained in your articles and your e-books.

They are really very informative.

🙂
Jason Brownlee September 4, 2017 at 4:26 am #

I’m glad to hear that!
Cloud September 20, 2017 at 8:06 pm #

Hi Jason,
I think the output is column var1(t), that means:
train_X, train_y = train[:, 0:n_obs], train[:, -(n_features+1)]
am I right?
In case the “pollution” is in the last column, it is easy to get train[:, -1]
am i right?
I just want to verify that I understand your post.
Thank you, Jason
Hesam October 11, 2017 at 9:39 pm #

I have some confusion for this problem.

I want to use a bigger windows (I want to go back in time more, for example t-5 to include more data to make a prediction of the time t) and use all of this to predict one variable (such as just the pollution), like you did. I think predicting one variable will be more accurate than predicting many. Such as pollution and temperature.

What should I do to apply more shift?
Jason Brownlee October 12, 2017 at 5:29 am #

I show in another comment how to update the example to use lab obs as input.

I will update the post and add an example to make it clearer.
Kentor October 19, 2017 at 10:01 pm #

First of all, thanks for your work and the effort you put in!

I tried to implement your suggestion for increasing the timesteps (BPTT). I have intergrated your code but I keep getting this error in when reshaping test_X in the prediction step:

test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
ValueError: cannot reshape array of size 490532 into shape (35038,7)

Do you have any tips on how to proceed?
Jason Brownlee October 20, 2017 at 5:34 am #

I will update the post with a worked example. Adding to trello now…
Robert Dan November 23, 2017 at 10:29 pm #

Hi Jason.
In the code you wrote above, should the following code:

train_X = train_X.reshape((train_X.shape[0], n_hours, n_features))

be actually

train_X = train_X.reshape((train_X.shape[0]/n_hours, n_hours, n_features))
Jason Brownlee November 24, 2017 at 9:44 am #

Why is that?
vivi March 7, 2020 at 2:10 pm #

Hi,Janson.I am a new leaner. First, thank fou for your share! But, when I run the complete code, it has an error: pyplot.plot(history.history[‘val_loss’], label=’test’)
KeyError: ‘val_loss’

How can I sovle it!
Jason Brownlee March 8, 2020 at 6:03 am #

Perhaps you did not use a validation dataset when fitting the model. In that case you cannot plot validation loss.
Anjana Rajakumar August 27, 2020 at 12:48 am #

Hi Jason,
Thank you for this excellent tutorial. I recently started working on LSTM methods. I have a doubt regarding this input shape. In case if the n_hour >1 , how to inverse transform the scaled values? Thanks in advance. Thanks in advance.
Jason Brownlee August 27, 2020 at 6:19 am #

You’re welcome.

This will help with the input shape:
https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input

Arun August 18, 2017 at 12:45 am #

Hi Jason, I get the following error from line # 82 of your ‘Complete Example’ code.

ValueError: Error when checking : expected lstm_1_input to have 3 dimensions, but got array with shape (34895, 8)

I think LSTM() is looking for (sequences, timesteps, dimensions). In your code, line # 70, I believe 50 is timesteps while input_shape (1,8) represents the dimensions. May be it’s missing ‘sequences’ ?

Appreciate your response.

Reply
- Jason Brownlee August 18, 2017 at 6:25 am #
  
  Ensure that you first prepare the data (e.g. convert “raw.csv” to “pollution.csv”).
  
  Reply
- Sameer January 31, 2018 at 11:53 pm #
  
  I have the same error too. Cannot figure out what’s wrong
  
  Reply
  - Timmy January 25, 2019 at 2:18 am #
    
    Something changed, the problem is on the model evaluation section, specifically the reshape line
    
    test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
    
    as it is, is 2 dimensions (34895, 8)
    
    we need to add one dimension but I can’t figure out how (noob here)
    
    tried this: test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
    
    but didn’t work (IndexError: tuple index out of range)
    
    any ideas anyone?
    
    Reply
    - Jason Brownlee January 25, 2019 at 8:46 am #
      
      You can use the reshape() function or the expand_dimensions() function in NumPy.
      https://docs.scipy.org/doc/numpy/
      
      Does that help?
      
      Reply
- Edward October 26, 2018 at 2:42 am #
  
  Greetings Sir..
  
  I’ve run into the same problem as well. And I’m confident that I’m using “pollution.csv” data.. How can I rectify this?
  
  Reply
  - Jason Brownlee October 26, 2018 at 5:39 am #
    
    I have some suggestions here:
    https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
    
    Reply
Neal Valiant August 18, 2017 at 2:35 am #

Hi Jason, I am wondering what the issue that I’m getting is caused by, maybe a different type of dataset then the example one. basically when I run the history into the model, When i check the History.history.keys() I only get back ‘loss’ as my only key.

Reply
- Jason Brownlee August 18, 2017 at 6:27 am #
  
  You must specify the metrics to collect when you compile the model.
  
  For example, in classification:
  
  model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
  
  1
  
  model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
  
  Reply
  - max ver April 15, 2019 at 4:40 am #
    
    Hi Jason,
    
    If you replace in this example the target by a binary target, let us say one that says if the var_1 goes up or not in the next move, thus : :
    
    reframed[‘var1(t)_diff’]=reframed[‘var1(t)’].diff(1)
    reframed[‘target_diff’]=reframed[‘var1(t)_diff’].apply(lambda x : (x>0)*1)
    
    it gives this error :
    ””
    You are passing a target array of shape (8760, 1) while using as loss categorical_crossentropy. categorical_crossentropy expects targets to be binary matrices (1s and 0s) of shape (samples, classes). If your targets are integer classes, you can convert them to the expected format via:
    ””’
    
    I have :
    test_y.shape as (35038,)
    
    but if we follow another example from you with the PIMA dataset on a simple classification : https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
    
    which was :
    X = dataset[:,0:8]
    Y = dataset[:,8]
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation=’relu’))
    model.add(Dense(8, activation=’relu’))
    model.add(Dense(1, activation=’sigmoid’))
    model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
    model.fit(X, Y, epochs=150, batch_size=10)
    
    it gives no error whereas the Y have the same shape … why ?
    
    How can we make it work for the lstm classification please ?
    
    Thanks
    
    Reply
    - Jason Brownlee April 15, 2019 at 7:55 am #
      
      I have an example of LSTMs for time series classification here:
      https://machinelearningmastery.com/how-to-develop-rnn-models-for-human-activity-recognition-time-series-classification/
      
      Reply
      - max ver April 16, 2019 at 11:21 pm #
        
        Yes thanks I looked at it:
        if you do one example inside :
        trainX, trainy = load_dataset_group(‘train’, path + ‘HARDataset/’)
        trainy = trainy – 1
        
        Note :
        set(list(pd.DataFrame(trainy)[0]))
        Out[217]: {0, 1, 2, 3, 4, 5}
        
        But
        trainy_postcategorical = to_categorical(trainy)
        trainy_postcat.shape
        gives
        print(trainy_postcat.shape)
        (7352, 7)
        
        which means one additional variable has been created while we were expecting 6 dummies only.
        
        pd.DataFrame(trainy_postcat)[0].sum() gives 0 so empty column for 1st one
        
        Come back to the sahpe of lstm.
        the output of your pre process work gives :
        
        trainy_postcat.shape
        Out[219]: (7352, 7)
        
        which for a single dummy (the case of this article and my original question)
        is the analogy of
        ”’ You are passing a target array of shape (8760, 1) ”
        which should be good.
        
        Any idea ? the activity recognition analogy does not solve the shape issue.
      - Jason Brownlee April 17, 2019 at 7:02 am #
        
        Sorry, I don’t have the capacity to review/debug your code, more here:
        https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
Aman Garg August 18, 2017 at 4:18 pm #

Hello Jason,

Thank you for such a nice tutorial.

Since you have published a similar topic and few other related topics in one of your paid books (LSTM networks), should the reader also expect some different topics covered in it?

I’m an ardent fan of your blogs since it covers most of the learning material and therefore, it makes me wonder that will be different in your book?

Reply
- Jason Brownlee August 18, 2017 at 4:42 pm #
  
  Thanks Arman.
  
  The book does not cover time series, instead it focuses on teaching you how to implement a suite of different LSTM architectures, as well as prepare data for your problems.
  
  Some ideas were tested on the blog first, most are only in the book.
  
  You can see the full table of contents here:
  https://machinelearningmastery.com/lstms-with-python/
  
  The book provides all the content in one place, code as well, more access to me, updates as I fix bugs and adapt to new APIs, and it is a great way to support my site so I can keep doing this.
  
  Reply
Songbin Xu August 18, 2017 at 6:54 pm #

Thank you for accepting my opinions, such a pleasure!

Running the codes u modified, still something puzzles me here,

1. Have u drawn the waveforms of inv_y and inv_yhat in the same plot? I think they looks quite like persistence.

2. Curiously, I computed the rmse between pollution(t) and pollution(t-1) in test_X, it’s 4.629, much lower than your final score 26.496, does it mean LSTM performs even worse than persistence?

3. I’ve tried to remove var1 at t-1, t-2, … , and I’ve also tried to use lag values>1, and also assign different weights to the inputs at different timesteps, but none of them improved, they performed even worse.

Do you have any other ideas to avoid the whole model to learn persistence?

Looking forward to your advices 🙂

Reply
- Jason Brownlee August 19, 2017 at 6:14 am #
  
  Thank you for pointing out the fault!
  
  The final line plot shows loss on the transformed train and test sets.
  
  Yes, LSTMs are no good at autoregression, yet I keep getting asked to develop examples (tens of emails per day)… See here:
  https://machinelearningmastery.com/suitability-long-short-term-memory-networks-time-series-forecasting/
  
  Consider developing a baseline with an MLP, you’ll find it tough to beat it with an LSTM!
  
  Reply
Varuna Jayasiri August 19, 2017 at 2:51 pm #

Why are you only training with a single timestep (or sequence length)? Shouldn’t you use more timesteps for better training/prediction? For instance in https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py they use 40 (maxlen) timesteps

Reply
- Jason Brownlee August 20, 2017 at 6:05 am #
  
  Yes, it is just an example to help you get started. I do recommend using multiple time steps in order to get the full BPTT.
  
  Reply
  - Long.Ye August 23, 2017 at 11:06 am #
    
    Hi Jason and Varuna,
    
    When the timesteps = 1 as you mentioned, does it mean the value of t-1 time was used to predict the value of t time? Is moving window a method to use multiple time steps? Is there any other way? Has Keras any functions of moving window?
    
    Thank you very much.
    
    Reply
    - Jason Brownlee August 23, 2017 at 4:23 pm #
      
      Keras treats the “time steps” of a sequence as the window, kind of. It is the closest match I can think of.
      
      Reply
lymlin August 20, 2017 at 4:28 pm #

Hi Jason,
I met some problem when learning your codes.

dataset = read_csv(‘D:\Geany\scriptslym\raw.csv’, parse_dates = [[‘year’, ‘month’, ‘day’, ‘hour’]],index_col=0, data_parser=parse)
Traceback (most recent call last):
File “”, line 1, in
dataset = read_csv(‘D:\Geany\scriptslym\raw.csv’, parse_dates = [[‘year’, ‘month’, ‘day’, ‘hour’]],index_col=0, data_parser=parse)
NameError: name ‘parse’ is not defined
>>>

Reply
- Jason Brownlee August 21, 2017 at 6:04 am #
  
  It looks like you have specified a function “parse” but not defined it.
  
  Reply
guntama August 21, 2017 at 11:30 am #

Hi Jason,
Can I use “keras.layers.normalization.BatchNormalization” as a substitute for “sklearn.preprocessing.MinMaxScaler”?

Reply
- Jason Brownlee August 21, 2017 at 4:22 pm #
  
  No, they do very different things.
  
  Reply
Naveen Koneti August 21, 2017 at 10:56 pm #

Hi Jason, Its a very Informative article. Thanks. I have a question regarding forecasting in time series. You have used the training data with all the columns while learning after variable transformations and the same has been done for the test data too. The test data along with all the variables were used during prediction. For instance, If I want to predict the pollution for a future date, Should I know the other inputs like dew, pressure, wind dir etc on a future date which I’m not aware off? Another question is, Suppose we have same data about multiple regions(let us consider that the pollution among these regions is not negligible), How can we model so that the input argument while prediction is the region name along with time to forecast just for that one region.

Reply
- Jason Brownlee August 22, 2017 at 6:43 am #
  
  It depends on how you define your model.
  
  The model defined above uses the variables from the prior time step as inputs to predict the next pollution value.
  
  In your case, maybe you want to build a separate model per region, perhaps a model that improves performance by combining models across regions. You must experiment to see what works best for your data.
  
  Reply
  - Naveen Koneti August 24, 2017 at 4:12 pm #
    
    Thanks! I missed the trick of converting the time-series to supervised learning problem. That alone is sufficient even for multiple regions I guess. We just have to submit the input parameters of the previous time stamp for the specific region during prediction. We may also try one-hot encoding on the region variable too during data preprocessing.
    
    Reply
  - LY September 7, 2017 at 8:12 pm #
    
    Thank you for your excellent blog, Jason. I’ve really learnt a lot from your nice work recently. After this post, I’ve already known how to transform data into data that formates LSTM and how to construct a LSTM model.
    
    Like the question aksed by Naveen Koneti, I have the same puzzle.
    Recently I’ve worked on some clinical data. The data is not like the one we used in this demo. It is consist of hunderds of patients, each patient has several vital sign records. If it is about one individual’s records through many years, I can process the data as what you told us. I wonder how I can conquer this kind of data. Could you give me some advice, or tell me where I can find any solutions about it?
    If I didn’t state my question clearly and you’re interested it, pls let me know.
    Thanks in advance.
    
    PS. the data set in my situation is like this
    [ID date feature1 feature2 feautre3 ]
    [patient1 date1 value11 value12 value13 ]
    [patient1 date2 value21 value22 value23 ]
    [patient2 date1 value31 value32 value33 ]
    [patient2 date2……………………………………..]
    [patient3 ……………………………………………..]
    
    Reply
    - Jason Brownlee September 9, 2017 at 11:43 am #
      
      You could model one patient at a time, or groups or all of them. Try different approaches and see what works best.
      
      I cannot tell you what would work best – I have no idea – you must discover it.
      
      See this post:
      https://machinelearningmastery.com/a-data-driven-approach-to-machine-learning/
      
      Reply
- Fabio Ferrari March 28, 2018 at 7:12 pm #
  
  Hi Naveen, I have the same your question: the model is defined such that if you know the input features at time t, then you can predict the target value at time t+1. If you want to predict the target variable at time t+2, though, you would need to know the input features at time t+1. If a feature does not change over time, it is no problem; but if a feature changes over time, then its value at time t+1 is not known and may be different from its value at time t.
  I am thinking that to solve this, you would need to define such features as output of the model as well as the target variable. In this way, at time t, you can predict the target variable for time t+1, but also the feature for time t+1, so that this predicted value can be used as input to predict the target variable for time t+2.
  
  What do you think about that? Did you think of a different solution?
  Many thanks
  
  Reply
Chris August 21, 2017 at 11:23 pm #

Hi,
again a nice post for the use of lstm’s!

I had the following idea when reading.

I would like to build a network, in which each feature has its own LSTM neuron/layer, so that the input is not fully connected.
My idea is adding a lstm layer for each feature and merge it with the merge layer and feed these results to the output neurons.

Is there a better way to do this? Or would you recommend to avoid this because the features are poorly abstracted? On the other hand, this might also be interesting.

Thank you!

Reply
- Jason Brownlee August 22, 2017 at 6:44 am #
  
  Try it and see if it can out-perform a model that learns all features together.
  
  Also, contrast to an MLP with a window – that often does better than LSTMs on autoregression problems.
  
  Reply
Tryfon August 22, 2017 at 5:20 am #

Hi Jason,

I have two questions:

1) I have a question/ notice regarding the scaling of the Y variable (pollution). The way you implement the rescaling between [0-1] you consider the entire length of the array (all of the 43799 observations -after the dropna-).

Is it rightto rescale it that way? By doing so we are incorporating information of the furture (test set) to the past (train set) because the scaler is “exposed” to both of them and therefore we introduce bias.

If you agree with my point what could be a fix?

2) Also the activation function of the output (Y variable) is sigmoid, that’s why we rescale it within the [0,1] range. Am I correct?

Thanks for sharing the article!

Reply
- Jason Brownlee August 22, 2017 at 6:49 am #
  
  No, ideally you would develop a scaling procedure on the training data and use it on test and when making predictions on new data.
  
  I tried to keep the tutorial simple by scaling all data together.
  
  The activation on the output layer is ‘linear’, the default. This must be the case because we are predicting a real-value.
  
  Reply
  - Fati March 7, 2018 at 9:44 pm #
    
    Hi,
    
    First I wanna thanks for your helpful and practical blog.
    
    I tried to separate train and test set to do normalization on training but I have gotten error related to test set shape something like that “ValueError: cannot reshape array of size 136 into shape (34,2,4)”, which I don’t know how to fix it!
    Do you have an example on LSTM which run normalization on train and used in test, or do you explain that in your book?
    
    Thanks
    
    Reply
    - Jason Brownlee March 8, 2018 at 6:28 am #
      
      This post will help you learn how to reshape your input data:
      https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
      
      Reply
  - Fati March 7, 2018 at 10:25 pm #
    
    Hi,
    
    I did some changes and just use transform method on test set, is that correct?
    firstly I divided my data-set to two different sets ,(train and test)
    secondly I ran fit_transform on train set and transform on test set
    
    But I get rmse=0 ? which seems weird. am I correct?
    
    Reply
    - Jason Brownlee March 8, 2018 at 6:30 am #
      
      Sounds correct.
      
      An RMSE of zero suggests a bug or a very simple modeling problem.
      
      Reply
WCH August 22, 2017 at 5:25 pm #

Thank you very much for your tutorial.

I have one question,

but I failed to read the NW in pollution. csv.(cbwd column)

values = values.astype(‘float32’)
ValueError: could not convert string to float: NW

How do you fix it?

Reply
- WCH August 22, 2017 at 5:30 pm #
  
  sorry, I saw the text above and solved it.
  
  Reply
  - Jason Brownlee August 23, 2017 at 6:42 am #
    
    Glad to hear it!
    
    Reply
- Juno Huang June 29, 2018 at 7:08 am #
  
  Hi, I would like to know how did you fix it? I still have that problem, tried to find the solution above but didn’t find one. Thank you !
  
  Reply
  - Can Altas August 17, 2018 at 3:35 pm #
    
    You have to prepare the Data befor you convert (see “Basic Data Preparation”). In Jason’s complete Example of the LSTM this preparation step is missing (more likely left out).
    
    Reply
    - Jason Brownlee August 18, 2018 at 5:33 am #
      
      Yes the note above the complete example says clearly:
      
      NOTE: This example assumes you have prepared the data correctly, e.g. converted the downloaded “raw.csv” to the prepared “pollution.csv“. See the first part of this tutorial.
      
      Reply
Dmitry August 22, 2017 at 5:58 pm #

Hi Jason!
I assume there is little mistake when you calculate RMSE on test data.
You must write this code before calculate RMSE:

inv_y = inv_y[:-1]
inv_yhat = inv_yhat[1:]

Thus, RMSE equals 10.6 (on the same data, in my case), that is much less than 26.5 in your case.

Reply
- Jason Brownlee August 23, 2017 at 6:44 am #
  
  Sorry, I don’t understand your comment and snippet of code, can you spell out the bug you see?
  
  Reply
  - Tommy November 12, 2017 at 2:50 pm #
    
    This beats further exploration
    
    Reply
  - Azhar Khan December 22, 2017 at 11:42 pm #
    
    I agree with @Dmitry here. The prediction “inv_yhat” is one index ahead of real output “inv_y”.
    
    It can be seen by plotting predicted output v/s real output:
    pyplot.plot(inv_y[:-1,], color=’green’, marker=’o’, label = ‘Real Screening Count’)
    pyplot.plot(inv_yhat[1:,], color=’red’, marker=’o’, label = ‘Predicted Screening Count’)
    pyplot.legend()
    pyplot.show()
    
    Compute RMSE by skipping first element of inv_yhat, and better RSME score is presented:
    rmse = sqrt(mean_squared_error(inv_y[:-1,], inv_yhat[1:,]))
    print(‘Test RMSE: %.3f’ % rmse)
    
    rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
    print(‘Test RMSE: %.3f’ % rmse)
    
    Reply
jan August 22, 2017 at 11:01 pm #

Hi Jason,

great post! I was waiting for meteo problems to infiltrate the machinelearningmastery world.

Could you write something about the changed scenareo where, given the weather conditions and pollution for some time, we can predict the pollution for another time or place with given weather conditions?

For example: We have the weather conditions and pollution given for Beijing in 2016, and we have the weather conditions given for Chengde (city close to Bejing) also in 2016. Now we want to know how was the pollution in Chengde in 2016.

Would be great to learn about that!

Reply
- Jason Brownlee August 23, 2017 at 6:52 am #
  
  Great suggestion, I like it. An approach would be to train the model to generalize across geographical domains based only on weather conditions.
  
  I have tried not to use too many weather examples – I came from 6 years of work in severe weather, it’s too close to home 🙂
  
  Reply
Simone August 23, 2017 at 9:43 am #

Hi Jason,
I have read many of your posts about LSTM. I have not completely clear the difference between the parameters batch_size and time_steps. Batch_size means when the memory is reset (right?), but this shouldn’t have the same value of time_steps that, if I have understood correctly, means how often the system makes a prediction?

Reply
- Jason Brownlee August 23, 2017 at 4:22 pm #
  
  Great question!
  
  Batch size is the number of samples (e.g. sequences) to that are used to estimate the gradient before the weights are updated. The internal state is reset at the end of each batch after the weights are updated.
  
  One sample is comprised of 1 or more time steps that are stepped over during backpropagation through time. Each time step may have one or more features (e.g. observations recorded at that time).
  
  Time steps and batch size and generally not related.
  
  You can split up a sequence to have one-time step per sequence. In that case you will not get the benefit of learning across time (e.g. bptt), but you can reset state at the end of the time steps for one sequence. This an odd config though and really only good to showing off the LSTMs memory capability.
  
  Does that help?
  
  Reply
  - Simone August 24, 2017 at 6:26 am #
    
    Thanks, now it’s more clear!
    
    Reply
Pedro August 23, 2017 at 8:58 pm #

Hi,I ger this error at this step, could you help me please?

model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))

—————————————————————————
TypeError Traceback (most recent call last)
in ()
—-> 1 model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))

C:\Anaconda3\lib\site-packages\keras\models.py in add(self, layer)
431 # and create the node connecting the current layer
432 # to the input layer we just created.
–> 433 layer(x)
434
435 if len(layer.inbound_nodes) != 1:

C:\Anaconda3\lib\site-packages\keras\layers\recurrent.py in __call__(self, inputs, initial_state, **kwargs)
241 # modify the input spec to include the state.
242 if initial_state is None:
–> 243 return super(Recurrent, self).__call__(inputs, **kwargs)
244
245 if not isinstance(initial_state, (list, tuple)):

C:\Anaconda3\lib\site-packages\keras\engine\topology.py in __call__(self, inputs, **kwargs)
556 ‘layer.build(batch_input_shape)‘)
557 if len(input_shapes) == 1:
–> 558 self.build(input_shapes[0])
559 else:
560 self.build(input_shapes)

C:\Anaconda3\lib\site-packages\keras\layers\recurrent.py in build(self, input_shape)
1010 initializer=bias_initializer,
1011 regularizer=self.bias_regularizer,
-> 1012 constraint=self.bias_constraint)
1013 else:
1014 self.bias = None

C:\Anaconda3\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
86 warnings.warn(‘Update your ' + object_name + 87 ' call to the Keras 2 API: ‘ + signature, stacklevel=2)
—> 88 return func(*args, **kwargs)
89 wrapper._legacy_support_signature = inspect.getargspec(func)
90 return wrapper

C:\Anaconda3\lib\site-packages\keras\engine\topology.py in add_weight(self, name, shape, dtype, initializer, regularizer, trainable, constraint)
389 if dtype is None:
390 dtype = K.floatx()
–> 391 weight = K.variable(initializer(shape), dtype=dtype, name=name)
392 if regularizer is not None:
393 self.add_loss(regularizer(weight))

C:\Anaconda3\lib\site-packages\keras\layers\recurrent.py in bias_initializer(shape, *args, **kwargs)
1002 self.bias_initializer((self.units,), *args, **kwargs),
1003 initializers.Ones()((self.units,), *args, **kwargs),
-> 1004 self.bias_initializer((self.units * 2,), *args, **kwargs),
1005 ])
1006 else:

C:\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py in concatenate(tensors, axis)
1679 return tf.sparse_concat(axis, tensors)
1680 else:
-> 1681 return tf.concat([to_dense(x) for x in tensors], axis)
1682
1683

C:\Anaconda3\lib\site-packages\tensorflow\python\ops\array_ops.py in concat(concat_dim, values, name)
998 ops.convert_to_tensor(concat_dim,
999 name=”concat_dim”,
-> 1000 dtype=dtypes.int32).get_shape(
1001 ).assert_is_compatible_with(tensor_shape.scalar())
1002 return identity(values[0], name=scope)

C:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype)
667
668 if ret is None:
–> 669 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
670
671 if ret is NotImplemented:

C:\Anaconda3\lib\site-packages\tensorflow\python\framework\constant_op.py in _constant_tensor_conversion_function(v, dtype, name, as_ref)
174 as_ref=False):
175 _ = as_ref
–> 176 return constant(v, dtype=dtype, name=name)
177
178

C:\Anaconda3\lib\site-packages\tensorflow\python\framework\constant_op.py in constant(value, dtype, shape, name, verify_shape)
163 tensor_value = attr_value_pb2.AttrValue()
164 tensor_value.tensor.CopyFrom(
–> 165 tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
166 dtype_value = attr_value_pb2.AttrValue(type=tensor_value.tensor.dtype)
167 const_tensor = g.create_op(

C:\Anaconda3\lib\site-packages\tensorflow\python\framework\tensor_util.py in make_tensor_proto(values, dtype, shape, verify_shape)
365 nparray = np.empty(shape, dtype=np_dt)
366 else:
–> 367 _AssertCompatible(values, dtype)
368 nparray = np.array(values, dtype=np_dt)
369 # check to them.

C:\Anaconda3\lib\site-packages\tensorflow\python\framework\tensor_util.py in _AssertCompatible(values, dtype)
300 else:
301 raise TypeError(“Expected %s, got %s of type ‘%s’ instead.” %
–> 302 (dtype.name, repr(mismatch), type(mismatch).__name__))
303
304

TypeError: Expected int32, got list containing Tensors of type ‘_Message’ instead.

Reply
- Jason Brownlee August 24, 2017 at 6:36 am #
  
  Perhaps check that your environment is setup correctly:
  https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/
  
  Also, ensure that you have copied all of the code.
  
  Reply
Neal Valiant August 24, 2017 at 2:49 am #

Hi Jason,
I was curious if you can point me in the right direction for converting data back to the actual values instead of scaled.

Reply
- Jason Brownlee August 24, 2017 at 6:48 am #
  
  Yes, you can invert the scaling.
  
  This tutorial demonstrates how to do that Neal.
  
  Reply
  - Neal Valiant August 25, 2017 at 7:34 am #
    
    Hi Jason, I did have an issue converting back to actual values, but was able to get past it using the drop columns on the reframed data which got me past it.
    
    When looking at my predicted values vs actual values, I’m noticing that my first column has a prediction and a true value, but for every other variable, I only see what I can assume is a prediction? does this make a prediction on every column, or just one particular one.
    
    Im sorry for asking a question such as this, I just think I’m confusing myself looking at my results.
    
    Reply
    - Jason Brownlee August 25, 2017 at 3:56 pm #
      
      The code in the tutorial only predicts pollution.
      
      Reply
Jack Dan August 24, 2017 at 3:24 am #

Dr. Jason,
I have been trying with my own dataset and I am getting an error “ValueError: operands could not be broadcast together with shapes (168,39) (41,) (168,39)” when I try to do inv_yhat = scaler.inverse_transform(inv_yhat) as you have in line 86 in your script. I still can not figure out where my issue is. I have yhat.shape as (168,1) and test_X.shape as (168,38). When I do this, inv_yhat = np.concatenate((yhat, test_X[:, 1:]), axis=1), my inv_yhat.shape is (168,39). I still can not figure why inverse_transform gives that error.

Reply
- Jason Brownlee August 24, 2017 at 6:50 am #
  
  The shape of the data must be the same when inverting the scale as when it was originally scaled.
  
  This means, if you scaled with the entire test dataset (all columns), then you need to tack the yhat onto the test dataset for the inverse. We jump through these exact hoops at the end of the example when calculating RMSE.
  
  Reply
  - Jay Regalia August 24, 2017 at 7:29 am #
    
    This seems to be the same issue I am having at the moment also. i concatenate my inv_yhat with my test_X like you said, but the shape of inv_yhat after is still not taking into account the 2nd numbers(in posts case (41,).
    
    Reply
    - Jack Dan August 26, 2017 at 6:00 am #
      
      Ask a question in stackoverflow and post the link, I should be able to help. I spent lots of time on this and have a decent idea now.
      
      Reply
  - Jack Dan August 24, 2017 at 7:39 am #
    
    Yes, you’re right! I did that and it worked, nice! Thank you for your comment!
    
    Reply
    - Jason Brownlee August 24, 2017 at 4:24 pm #
      
      Glad to hear that Jack.
      
      Reply
    - jehung January 24, 2018 at 2:27 pm #
      
      How did you solve the problem??
      
      Reply
    - Tom June 8, 2018 at 7:58 pm #
      
      here’s link to solution on stackoverflow:
      https://datascience.stackexchange.com/questions/22488/value-error-operands-could-not-be-broadcast-together-with-shapes-lstm
      
      Reply
      - Jason Brownlee June 9, 2018 at 6:51 am #
        
        Nice!
  - John Regilina August 24, 2017 at 8:38 am #
    
    I am having the same problem, but cannot solve the issue. everytime i try to concatenante them together, there is not change to my inv_yhat variable. i still am unable to understand this issue if you can expand a bit more that would be amazing
    
    Reply
    - Jack Dan August 26, 2017 at 6:08 am #
      
      @John Regilina,
      Check the shape of data after you scale the data and then check the scale again after you do the concatenation. Remember, when your yhat shape will be (rowlength,1) and after concatenation inv_yhat should be the same shape after you scaled the data. Look at Dr.Jason’s answer to my comment/question. Hope that will help. (Thanks to Dr.Jason saved a lot of my time)
      
      Reply
  - Sabyasachi Purkayastha May 18, 2018 at 10:48 pm #
    
    Hello Sir, thank you for the awesome tutorial. But I still couldn’t understand what exactly needs to be done. I am getting the error:
    > operands could not be broadcast together with shapes (12852,27) (14,) (12852,27) ”
    This the line which generates the error:
    inv_yhat = scaler.inverse_transform(inv_yhat).fit()
    Could you please give me a small example to understand what went wrong. Thanks in advance Sir.
    
    Reply
- Shan September 19, 2017 at 1:59 pm #
  
  I am also stuck with same thing. How did you fix it?
  
  Reply
  - anna March 26, 2018 at 11:33 pm #
    
    Same question here, how did everyone fix this? From your answers I cannot deduce what exactly went wrong in your case, and what you did to solve it.
    
    Reply
- Machiraju Yashwanth May 10, 2021 at 5:55 am #
  
  I am suffering from the same problem when i am trying it on my dataset having np.shape(test_X) as (89070,13) size. Kindly kindly help me out if you have got the solution.
  
  Reply
  - Jason Brownlee May 10, 2021 at 6:23 am #
    
    This will help with preparing data for LSTMs:
    https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
    
    Reply
Lizzie August 24, 2017 at 4:23 am #

Hi Jason, In dataset.drop(‘No’, axis =1, inplace = True), what is the purpose of ‘axis’ and ‘inplace’?

Reply
- Jason Brownlee August 24, 2017 at 6:50 am #
  
  Great question.
  
  We specify to remove the column with axis=1 and to do it on the array in memory with inplace rather than return a copy of the array with the column removed.
  
  Reply
Lizzie August 24, 2017 at 4:44 am #

Fabulous tutorials Jason!

Reply
- Jason Brownlee August 24, 2017 at 6:51 am #
  
  Thanks Lizzie.
  
  Reply
Jaskaran August 24, 2017 at 5:19 am #

Can you show how the multi variate forecast looks like?
Looks like you missed it in the article.

Reply
- Jason Brownlee August 24, 2017 at 6:56 am #
  
  Sure,
  
  You can plot all predictions as follows:
  
  pyplot.plot(inv_yhat) pyplot.plot(inv_y) pyplot.show()
  
  1
  2
  3
  
  pyplot.plot(inv_yhat)
  pyplot.plot(inv_y)
  pyplot.show()
  
  You get:
  
  It’s a mess, you can plot the last 100 time steps as follows:
  
  pyplot.plot(inv_yhat[-100:]) pyplot.plot(inv_y[-100:]) pyplot.show()
  
  1
  2
  3
  
  pyplot.plot(inv_yhat[-100:])
  pyplot.plot(inv_y[-100:])
  pyplot.show()
  
  You get:
  
  The predictions look like persistence.
  
  Reply
  - BEN BECKER August 29, 2017 at 1:33 pm #
    
    Jason, what am I missing, looking at your plot of the most recent 100 time steps, it looks like the predicted value is always 1 time period after the actual? If on step 90 the actual is 17, but the predicted value shows 17 for step 91, we are one time period off, that is if we shifted the predicted values back a day, it would overlap with the actual which doesn’t really buy us much since the next hour prediction seems to really align with the prior actual. Am I missing something looking at this chart?
    
    Reply
    - Jason Brownlee August 29, 2017 at 5:16 pm #
      
      This is what a persistence forecast looks like, that value(t) = value(t-1).
      
      Reply
      - BECKER August 29, 2017 at 9:22 pm #
        
        So how would you get the true predicted value(t)? I am thinking of the last record in the time series where we are trying to predict the value for the next hour.
      - Jason Brownlee August 30, 2017 at 6:15 am #
        
        Sorry, I don’t follow. Perhaps you can restate your question?
      - Anna October 2, 2017 at 4:38 pm #
        
        Hello Jason Brownlee
        
        Thank you for your great posts. I run the model above for my data and it works perfectly, how ever when I draw the real data (blue one – inv_y) and the prediction (the orange one – inv_yhat), the result shows the prediction is delay after 1 step. it should be predicted one step before as your graph. your model is the same with the matlab tool:
        https://nl.mathworks.com/videos/maglev-modeling-with-neural-time-series-tool-68797.html
        
        And after running the model, I applyed realtime this model for my problem to compute the inv_yhat in every step. I got the result is really bad, since I have never had the real inv_y. I took the prediction to feed the input ( instead of real data inv_y)
        
        My problem is: I received some signals as inputs, then I labeled offline to have output (real data inv_y or the first column in train_X)
        
        Do you have the model that trains without the real data in the first column?????? thank you
      - Jason Brownlee October 3, 2017 at 5:40 am #
        
        Your model may have low skill and be simply predicting the input as the output (e.g. persistence).
        
        You may need to continue to develop your model, I list some ideas for lifting model skill here:
        https://machinelearningmastery.com/improve-deep-learning-performance/
    - Li Yue March 20, 2018 at 6:46 pm #
      
      hi, i have the same confusion as you. i think the prediction problem should be value_predict(t-1) = value_real(t). the label “train_y” indicates value_real(t+1). we input the train_x(t) into the model to get the prediction and the prediction should match “train_y” , not one step after “train_y”. did you solve this problem?
      
      Reply
  - Tyler Byers October 26, 2017 at 3:40 am #
    
    It’s definitely similar to a persistence model since we trained the model using the var1(t-1) feature (i.e. the lagged pollution feature). The model certainly found that to be the strongest predictor. This would be ok if we were doing predictions later on an hour-by-hour basis. But, if, say we want to predict the pollution 20 hours from now, we aren’t yet going to know what the hour-19 pollution is. So it seems like cheating to include this variable in the training and prediction sets.
    
    I removed this variable to train the model, leaving other parameters about the same, and was then only able to get a minimum validation loss of 0.55 and test RMSE of 87.02
    
    Reply
    - Jason Brownlee October 26, 2017 at 5:33 am #
      
      Nice work.
      
      It’s not cheating, it comes down to different framings of the problem based on the requirements of the problem.
      
      This post can help if you want to explore direct multi-step forecasting:
      https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
      
      Reply
  - xeo December 26, 2017 at 4:00 am #
    
    It looks the prediction is pretty good. Can we say the lstm model is good?
    
    Reply
    - Jason Brownlee December 26, 2017 at 5:18 am #
      
      I think LSTMs are poor at autoregression.
      
      Reply
  - Fiona January 27, 2019 at 10:51 pm #
    
    Hi, Jason.I have a question on the transform, which is I found the predicted data after inverse_transform() were not same as the original value. For example, my original data is at the range from 0 to 850, but the prediction data is at 0 to 8. Is there any problem?
    
    Reply
    - Jason Brownlee January 28, 2019 at 7:14 am #
      
      Perhaps there is a bug in your implementation?
      
      Reply
  - Jay October 23, 2019 at 11:17 am #
    
    Hi Jason
    
    I have two questions:
    
    (a) based on the graphs that you have shown for the y_inv and yhat_inv, it looks like your model has overfit on the test set. Don’t you agree ?
    
    (b) In all time series prediction posts I have seen, the validation part uses the tail of the data to do validation (predict(yhat)). How can we modify the code in order to predict the future which is not covered in the dataset.
    
    Reply
    - Jason Brownlee October 23, 2019 at 1:50 pm #
      
      The model in this tutorial is probably underfit – e.g. it learned a persistence model.
      
      Fit the data on all available data then call model.predict() to predict out of sample.
      
      Reply
gammarayburst August 24, 2017 at 11:32 pm #

Wind dir is label encoded not wind speed!!!

Reply
- Jason Brownlee August 25, 2017 at 6:43 am #
  
  Yes.
  
  Reply
Filipe August 27, 2017 at 4:16 am #

First of all, thanks. All of this material on the blog is super interesting, and helpful and making me learn a lot.

Of course… I have a question.

I’m surprised by the use of LSTMs here. The property of them being “stateful” I guess is being used. But is there “sequence” information flowing?

So when I used LSTMs in Keras for text classification tasks (sentence, outcome), each “sentence” is a sequence. Each observation is a sequence. It’s an ordered array of the words in the sentence (and it’s outcome).
In this example, I could not see a sense in which var1(t-1) is linked to var1(t-2). Aren’t they being treated as independent Xs in a regression problem? (predicting var8(t))

Reply
- Jason Brownlee August 27, 2017 at 5:53 am #
  
  Correct, we are not providing a sequence of observations and therefore not getting good BPTT.
  
  Based on my tests, I have found LSTMs to be poor at autoregression, and in this case, as I added more history to the model (longer sequences), performance degraded.
  
  I would strongly encourage you to use an MLP baseline that any MLP would have to out-perform.
  
  See this post for more on the limitations of LSTM for time series:
  https://machinelearningmastery.com/suitability-long-short-term-memory-networks-time-series-forecasting/
  
  Reply
STYLIANOS IORDANIS August 27, 2017 at 5:23 am #

Awesome article, as always.
Btw, what is your view on using an autoencoder/ restricted Boltzmann layer compressing features/ features before feeding an LSTM network ? For example, if one has a financial timeseries to forecast, e.g. a classifier trying to predict increase or decrease in a look ahead time window, via numerous technical indicators and/or other candidate exogenous leading indicators…..
Could you write an article based on that idea?

Reply
- Jason Brownlee August 27, 2017 at 5:53 am #
  
  I have seen better results from large MLPs, nevertheless, try it and see how you go.
  
  Reply
  - STYLIANOS IORDANIS August 27, 2017 at 7:25 am #
    
    autoencoder/ restricted Boltzmann layers also deal with multicollinearity issues… do MLPs also deal with multicollinearity if you have multicollinearity in the features, right?
    
    Reply
    - Jason Brownlee August 28, 2017 at 6:46 am #
      
      MLPs are more robust to multicollinearity than linear models.
      
      Reply
Hee Un August 29, 2017 at 12:28 am #

Hi, I am always amazed at your article. Thank you.
I have a question.
Is this LSTM code now weighted for each features?
Nowdays, I’m predicting precipitation, that is the trend is correct, but the amount is not right.
What’s wrong with that?:(

Reply
- Jason Brownlee August 29, 2017 at 5:06 pm #
  
  Thanks!
  
  Sorry, I’m not sure I understand the question, perhaps you could rephrase it?
  
  I can say that I would expect better skill if the data was further prepared – e.g. made stationary.
  
  Reply
Vipul August 30, 2017 at 7:53 pm #

Hi Jason,

Thanks for wonderful explanation!
Could you please help me to understand dimensionality reduction concept. Should PCA or statistical approach be used before feeding the data to LSTM OR LSTM will learn correlation with the inputs provided on its own? how to approach regression problem in LSTM when we have large set of features?

Your reply is greatly appreciated!

Reply
- Jason Brownlee August 31, 2017 at 6:18 am #
  
  Generally, if you make the problem simpler using data preparation, the LSTM or any model will perform better.
  
  Reply
Nader August 31, 2017 at 2:42 am #

How can I predict a single input ?
for example :

[0.036, 0.338, 0.197, 0.836, 0.333, 0.128, 0.00000001, 0.0000001]

how do i reshape and do a model.predict () ?

Thank you

Reply
- Jason Brownlee August 31, 2017 at 6:23 am #
  
  Perhaps this post will make it clearer:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
  - Nader August 31, 2017 at 12:48 pm #
    
    Thank you, Jason.
    I applied:
    
    my_x = np.array([0.036, 0.338, 0.197, 0.836, 0.333, 0.128, 0.00000001, 0.0000001])
    print(my_x.shape) # (8,)
    my_x = my_x.reshape((1, 1, 8))
    my_pred = model.predict(my_x)
    print(my_pred)
    
    The answer is the “scaled” answer which is 0.03436
    
    I tried applying the scaler.inverse_transform(my_pred) to GET the actual number
    
    But I get the following error:
    
    on-broadcastable output operand with shape (1,1) doesn’t match the broadcast shape (1,8)
    
    Thank you
    
    Reply
    - Jason Brownlee September 1, 2017 at 6:40 am #
      
      Yes, the transform requires data in the same form as when you “fit” it.
      
      Reply
      - David September 23, 2017 at 3:27 pm #
        
        Then what if I use multi-time step prediction? (use several lags for prediction)
        The y_hat and X_test can not have the same dimension.
      - Jason Brownlee September 24, 2017 at 5:13 am #
        
        If the size of X or y must vary, you can use padding.
Fejwin August 31, 2017 at 3:52 am #

Hi Jason,
Thanks for the tutorial!
Maybe I missed something, but it seems that you provided the model with all of remaining data as ‘testdata’ and then tried predicting it? Isn’t that kind of pointless, since we should be interested in predicting unknown data in the future, instead of data that the model has already seen? Wouldn’t it make more sense to try the model to predict a first timestep into the future that neither the training nor the test data knew anything about? (Perhaps only give the model training data, but no test data, and afterwards ask it to predict first time step after training data?) How would I have to change the code to achieve that?

Reply
- Jason Brownlee August 31, 2017 at 6:25 am #
  
  The model is fit on the training data, then makes a prediction for each step in the test data. The model did not “know” the answer to the test data prior to making each prediction.
  
  Normally we would use walk-forward validation:
  https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
  
  I did use walk forward validation on other LSTM examples (use the blog search) but it confuses readers more than helps it seems.
  
  Reply
  - Guillermo November 8, 2017 at 9:19 pm #
    
    Hi Jason.
    
    I am digging into your example and maybe missing something because I agree with Fejwin.
    
    I mean, as long as real Pollution in t-1 is introduced in the test_X set, instead of predicted Pollution in t-1, when you run model.predict(test_X) each output is not considered for future prediction.
    
    This is with all the features, including real Pollution(t-1) the model predicts an output: predicted Pollution(t). But on the next step, when the model predicts Pollution(t+1) it doesn´t take predicted Pollution(t), it takes real Pollution(t) instead.
    
    Can you clarify this point please?
    
    Thank you.
    
    Reply
    - Jason Brownlee November 9, 2017 at 9:58 am #
      
      Yes, the assumption in the setup of the problem is that each prior hours pollution is available when predicting t+1.
      
      You could change the framing of the problem if you wish.
      
      Reply
      - vivi November 13, 2020 at 1:06 am #
        
        Hi Jason,
        
        I applied your code to my real dataset and it worked fine all the way to getting predicted for test dataset. But I’m stuck with how to get predicted value for future beyond the max timestamp in the actual input dataset. I know one way of iteratively feeding each prediction back in as input but concerned about getting bigger and bigger error by keeping using predicted value as the input
      - Jason Brownlee November 13, 2020 at 6:34 am #
        
        Perhaps this will help:
        https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
        
        And this:
        https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
- David September 24, 2017 at 1:01 pm #
  
  Can I use part of trainX to predict testY ? (lags needed to predict testY is in trainX) Not sure if it is a logical way to do it.
  
  Reply
  - Jason Brownlee September 25, 2017 at 5:36 am #
    
    Yes.
    
    Reply
hadi September 1, 2017 at 12:08 pm #

Dear Jason Brownlee,

I have a little different question, Actually I have a sequence of characters as input and I want to project it into a multidimensional space.
I mean I want to project each sequence of chars (let say word) to an vector of 100 real numbers along my corpus, so my input is a sequence of chars (any char-emedding is welcome) and my output is a vector for each sequence (which is a word ) and Im really confused how to define the model,
I would appreciate if you give any clue help or sample code to define my model.

Thanks a lot in advance.

Reply
- Jason Brownlee September 1, 2017 at 3:26 pm #
  
  Keras provides an Embedding layer that you can use directly:
  https://keras.io/layers/embeddings/
  
  Reply
- Balint Takacs May 1, 2020 at 1:09 am #
  
  Hi,
  I am also having trouble understanding the difference between the walk-forward validation (prediction) method, and the “simple” prediction method being carried out here in the example.
  
  Why does the walk-forward prediction (with an appended history) give different predictions than the simply calling predict on the test set, if the model is not re-fitted (that is including the new available observations, and training again) ?
  Has the cumbersome walk-forward any advantage over this approach here in the example?
  Can the walk-forward be carried out also for multivariate-multistep forecasting ?
  
  Thanks,
  Balint
  
  Reply
  - Jason Brownlee May 1, 2020 at 6:41 am #
    
    Walk-forward validation simulates how we expect to use the model in practice, it evaluates the model under those conditions.
    
    The procedure can be adapted based on how you want to use the model, e.g. when to refit, when new obs are available, how many steps to predict, etc.
    
    You can learn more about walk-forward validation here:
    https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
    
    Reply
    - Balint Takacs May 1, 2020 at 9:41 pm #
      
      Hey, thanks for the quick answer.
      
      So as far as I see your point, the walk forward approach, without refitting the model at each iteration, is the same as calling model.predict(X_test) at once.
      And the reason why you still implement it without refitting, is to provide the framework properly, and make it easier for us to work further with it, right ?
      
      If I am wrong, and it is not the same, why is it not the same? I went through many of your posts, including the one you posted, but I didnt manage to comprehend the difference, if there is any, so far.
      
      For example: https://machinelearningmastery.com/update-lstm-networks-training-time-series-forecasting/
      
      Here you explain the updating, which awesome, but at the baseline part, where you do not apply updating (so no iterative re-fit), you still do iterative walk-forward predicting instead of calling model.predict() on the test set as whole. Would that be the same in the no update case?
      Sorry for being annoying. I really appreciate your help, and time.
      
      Many thanks
      Balint
      
      Reply
      - Jason Brownlee May 2, 2020 at 5:46 am #
        
        Probably.
        
        Sometimes I like to drive the epochs manually for lots of reasons – e.g. so I have more control over the process/do things in between epochs.
        
        We use walk-forward validation as it is the only valid approach for evaluating models on sequence data:
        https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
Sai k September 2, 2017 at 12:12 am #

Hi Jason,

Thanks for the wonderful tutorial!
Could you please explain how to deal the problem when situation is “Predict the pollution for the complete month (assume month has 30 days. t+1…t+30) and given the “expected” weather features for that month…assuming we have been provided historic data of pollution and weather data on daily basis”

How should the data be prepared and how it should be feed into LSTM?

As I new to LSTM model, I have problem understanding the data preparation and feeding to LSTM.

Thanks in advance for your response

Reply
- Jason Brownlee September 2, 2017 at 6:11 am #
  
  Predicting for a month is called multi-step forecasting.
  
  Here is a post on the general approach:
  https://machinelearningmastery.com/multi-step-time-series-forecasting/
  
  Here is an example of doing multi-step forecasting with an LSTM:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
Adrian September 5, 2017 at 5:29 am #

Hi Jason,

Thanks for sharing. I added accuracy info to model while training using ‘ metrics=[‘accuracy’] ‘.

So model.compile(loss=’mae’, optimizer=’adam’) becomes :

model.compile(loss=’mae’, optimizer=’adam’, metrics=[‘accuracy’])

This adds acc & val_acc to output. After 100 epochs the acc value appears quite low : (0.0761) :
Epoch 100/100
1s – loss: 0.0143 – acc: 0.0761 – val_loss: 0.0132 – val_acc: 0.0393

The accuracy of the model appears very low ? Is this expected ?

Further info on acc & val_acc values : https://github.com/tflearn/tflearn/issues/357 “acc is the accuracy of a batch of training data and val_acc is the accuracy of a batch of testing data.”

Reply
- Jason Brownlee September 7, 2017 at 12:38 pm #
  
  This is a regression problem. Accuracy does not make sense.
  
  Reply
Eric H September 5, 2017 at 6:33 am #

Hi Jason, I’ve recently discovered your site and have been so pleased with your information – thank you. I’ve been trying to model data which is much like the air quality data described here, but every few time steps there will be a change in the number of features present.
Example: in my data a time step = 1 day and a sequence can be 800 – 1200 days long. Normally the data consists of features
– pm2.5: PM2.5 concentration
– DEWP: Dew Point
– TEMP: Temperature
– PRES: Pressure
– cbwd: Combined wind direction
– Iws: Cumulated wind speed
– Is: Cumulated hours of snow
– Ir: Cumulated hours of rain

But then every (random-ish amount of time) there will be an additional number of features for a day and then back to the baseline number of features.

I’ve no idea on how to handle variable feature length. I’ve seen and played with plenty of variable sequence length examples, but I have both variable sequenceS and features. I’d love your input!
Thanks!
-Eric

Reply
- Jason Brownlee September 7, 2017 at 12:40 pm #
  
  You will need to normalize the number of features to be consistent for all time.
  
  Reply
  - Eric Hiller September 10, 2017 at 5:21 am #
    
    Is it possible to use (what in TensorFlow – land is called) SparseFeatures or SparseTensors to represent sparse datasets, or is there a fundamental issue with handling sparse datasets within RNNs?
    
    Reply
    - Jason Brownlee September 11, 2017 at 12:04 pm #
      
      Good question, I’m not sure off the cuff. Keras may support sparse numpy arrays – try it and see?
      
      Reply
Ali Haidar September 8, 2017 at 1:56 am #

Hi Jason,

Thanks for the amazing articles. They are really helpful.

Lets say I want to forecast with lead 2. I mean by that forecasting values at time t using t-2 values, without using t-1 elements. I have to remove columns from reframed after running function series_to_supervised right ? To remove all columns with values t-1?
reframed.drop(reframed.columns[…])

Thanks

Reply
- Jason Brownlee September 9, 2017 at 11:46 am #
  
  Yep, looks good.
  
  Reply
Inna September 11, 2017 at 7:53 pm #

Hello!
Thanks for articles.

I have a question related with time series. Is it possible to forecast all variables? For example, I have ‘pollution’, ‘dew’, ‘temp’, ‘press’, ‘wnd_dir’, ‘wnd_spd’, ‘snow’, ‘rain’ and want to predict all of them for the next hour. We know about trends and common rules (because of data amount: few years), so we can do forecasting. Where can I find more info about it?

Reply
- Jason Brownlee September 13, 2017 at 12:22 pm #
  
  Yes, this example can be modified to predict each variable.
  
  Reply
appreciator September 12, 2017 at 10:59 am #

Thank you Jason for the great tutorial! I’m adapting it for different data, and i’m trying to use >1 time step. However I noticed something strange in the series-to-supervised: Since the first loops ends at 0 and the last loops starts at 0, won’t there be two columns that are the same?

Reply
- Jason Brownlee September 13, 2017 at 12:26 pm #
  
  No, try it with the data and see.
  
  Reply
Eric September 12, 2017 at 11:49 am #

Hi Jason,

Thanks for the tutorial. I had just one question though.
I’ve seen tutorial using multivariate time series to train a lot of dataset (all have correlation between each other) at the same time and were able to predict for each dataset used.

For sake of argument let’s say than one of the dataset is broke, the sensor that get the information to feed it is out of service (let’s say at some point one of the column of data only have 0 instead of whatever value). Do you think that we could use the other spot to continue to predict the broken one? (there is correlation between them and there would be a lot of non broken data from before the bug)

Best regards,

Reply
- Jason Brownlee September 13, 2017 at 12:27 pm #
  
  Yes, you could try it and see. Or impute the missing data and see if that is better.
  
  Reply
  - Eric September 14, 2017 at 2:22 pm #
    
    Thank you Jason,
    
    I shall try that as soon as possible.I guess that the overall accuracy will lower for every set prediction (since my goal is to use multivariate, feed it every spot data set and predict each of them (with possibility to predict a broken one)) so one spot being fed “wrong” data should lower each spot accuracy no?
    
    Best regards,
    
    Reply
    - Jason Brownlee September 15, 2017 at 12:10 pm #
      
      It will.
      
      Reply
Shan September 13, 2017 at 3:46 am #

Is there any time parser like date parser? I am working with data which is in milliseconds.

Reply
- Jason Brownlee September 13, 2017 at 12:33 pm #
  
  It can handle parsing dates and times I believe.
  
  Reply
kumar September 13, 2017 at 10:00 pm #

i got this error when i tried to run the program

pyplot.plot(history.history[‘val_loss’], label=’test’)
KeyError: ‘val_loss’

Reply
- Jason Brownlee September 15, 2017 at 12:05 pm #
  
  Ensure you copy all of the code.
  
  Reply
Simon September 15, 2017 at 9:55 pm #

Hi Jason,

Wouldn’t it be better to scale the data after you run the series_to_supervised function? As it stands now, the inverse scaling doesn’t work if n_in > 1 since the dimensions don’t line up anymore.

Reply
- Jason Brownlee September 16, 2017 at 8:41 am #
  
  It would, but the scaling would be column-wise and incorrect.
  
  Reply
  - Simon September 17, 2017 at 11:26 am #
    
    Could you expand more on this and how the code might be modified to incorporate multi-step? I’m also playing around with turning this into a classification problem, would it still work if the feature we are trying to predict is a classifier?
    
    Reply
    - Jason Brownlee September 18, 2017 at 5:42 am #
      
      I give the code to do this in another comment.
      
      For classification, you will need to change the number of neurons in the output layer, the activation function in the output layer and the loss function.
      
      Reply
Agrippa Sulla September 16, 2017 at 5:18 am #

I have a little question. I’ve successfully built my own LSTM multivariate NN using your code as a basis (thanks!). It forecasts export growth for the UK using past export growth and GDP. It perform decently but the financial crisis kinda messes things up.

Now I want to add data to this model, but I can’t go further back than 1980 for the time-series (not for now at least). So what I want to do is add the GDP growth rate of all the UK’s major trading partners. Should I be worried about adding another 20 input neurons (e.g. countries)? Do you have a post talking about the risks of using data that is low in rows (e.g. years) but high in columns (e.g. inputs).

I hope my question makes sense.

Cheers

Reply
- Jason Brownlee September 16, 2017 at 8:46 am #
  
  I don’t have posts on the topic of more columns than rows. It does require careful handling.
  
  As a start, I would recommend developing a strong test harness, then try adding data and see how it impacts the model skill. Experiment.
  
  Reply
Ed September 16, 2017 at 6:00 am #

Jason
Thanks a lot for your tutorial!
Is there a feature importance plot for cases like this?
sometimes is very important to know it

Reply
- Jason Brownlee September 16, 2017 at 8:47 am #
  
  Good question. I’m not sure about feature importance plots for LSTMs. I would expect that if feature importance can be calculated for MLPs, then it could be calculated for LSTMs, but this is not something I have looked into sorry.
  
  Reply
  - Ed September 17, 2017 at 2:49 am #
    
    Thanks a lot, Jason!
    
    Reply
    - Jason Brownlee September 17, 2017 at 5:29 am #
      
      No problem.
      
      Reply
Kuldeep September 20, 2017 at 12:53 am #

Hi Jason,

Great post as always!

I have a question regarding scaling. My problem is quite different as I have to apply series to supervised function first on the data coming from different source and then combine the data… my question is, can I apply scaling at the end? Should scaling be applied column wise or on complete matrix/array?

Reply
- Jason Brownlee September 20, 2017 at 5:58 am #
  
  The key is being able to scale the data consistently. The place in the pipeline is less important.
  
  Reply
Nejra September 21, 2017 at 1:25 am #

Hi Jason thank you very much for your tutorials!
I’m trying to develop an LSTM for time prediction having as input 3 features (2 measurements and a third one is a sort of control of the system) and the output (value to predict) is not a single value but a vector of 6 values. So, at every time step my network should be able to predict this entire vector. Two questions:
1. Since my inputs are not correlated between them, their order in the input array will not influence my predictions?
2. How can I shape my output in order to estimate all the 6 values of the vector for each time step?
Thanks for any kind of help!

Reply
- Jason Brownlee September 21, 2017 at 5:51 am #
  
  This post will help you understand how to prepare data for multi-step forecasting:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
Mitchel Myers September 22, 2017 at 5:34 am #

I replicated the example described on this page, and saved my test_y and yhat vectors to csv so that I could manually check how my prediction compared with the true values. However, when I did this, I discovered that every yhat value in my array is the exact same value (~34). I was expecting a unique yhat value for each input vector. Do you have any suggestions to help fix this?

Reply
Mitchel Myers September 23, 2017 at 3:25 am #

Follow up on this — when this error arose, I was using my own data set that I want to perform time series forecasting on. When I duplicated the guide exactly as described above, the issue goes away. Do you have any idea why this issue comes up (where every predicted yhat value is the exact same) when I use a different data set?

Reply
- Jason Brownlee September 23, 2017 at 5:44 am #
  
  Perhaps the model needs to be tuned to your specific dataset?
  
  Reply
zwj September 25, 2017 at 1:10 pm #

Hi Jason thank you very much for your tutorials! I try to delete the columns [‘dew’, ‘temp’, ‘press’, ‘wnd_dir’, ‘wnd_spd’, ‘snow’, ‘rain’] from the train_X data, and I also get the almost same test RMSE. It is 26.461. It seems to show that the 8 weather conditions have no affect on the prediction result. The code is below.

# fit an LSTM network to training data
def fit_lstm(train, test, batch_size, neurons):
# split into input and outputs
train_X, train_y = train[:, 0:1], train[:, -1]
test_X, test_y = test [:, 0:1], test [:, -1]

train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

# design network
model = Sequential()
model.add(LSTM(neurons, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss=’mae’, optimizer=’adam’)

# fit network
history = model.fit(train_X, train_y, epochs=50, batch_size=batch_size, validation_data=(test_X, test_y), verbose=2, shuffle=False)
#history = model.fit(train_X, train_y, epochs=50, batch_size=72, verbose=2, shuffle=False)

return model

# make a prediction
def make_forecasts(model, test_X):
test_X = test_X[:, 0:1]
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
forecasts = model.predict(test_X)

return forecasts

Reply
- Jason Brownlee September 25, 2017 at 3:26 pm #
  
  Nice one!
  
  The real motivation for me writing this post was to help the 100s of people asking how to develop a multivariate LSTM.
  
  Reply
  - Tommy November 13, 2017 at 4:07 am #
    
    This is more substantial than I think is being acknowledged. What is the point of creating a multivariate lstm if all of the other variables don’t have an impact on the outcome? Has this been attempted with other data sets?
    
    Reply
    - Jason Brownlee November 13, 2017 at 10:19 am #
      
      It is an example for those who want to explore the approach.
      
      I don’t have more examples because it turns out the method is outperformed by MLPs for autoregression problems. At least in my experience.
      
      Reply
      - Sena April 10, 2020 at 6:15 pm #
        
        even when we are looking at multivariate times series forecasting?
      - Jason Brownlee April 11, 2020 at 6:11 am #
        
        It really depends.
        
        I recommend this framework:
        https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
      - William Xu March 1, 2021 at 9:54 pm #
        
        Hi Dr. Brownlee,
        As you mentioned that MLP ususally have a good performance for autoregression problems. Do you have any post with an example code for that? Thanks.
      - Jason Brownlee March 2, 2021 at 5:45 am #
        
        Yes, many examples – use the search box.
        
        Perhaps start here:
        https://machinelearningmastery.com/how-to-develop-multilayer-perceptron-models-for-time-series-forecasting/
Mitchel September 27, 2017 at 1:39 am #

Can you explain why the train_X and test_X data sets are reshaped to this?

train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))

Reply
- Jason Brownlee September 27, 2017 at 5:44 am #
  
  The shape is: samples, time steps, features.
  
  Reply
Lino September 28, 2017 at 12:59 pm #

Hi Jason

Great post.
Suppose i want to predict the next 24h using previous one year dataset. How can we do it?
Thanks

Reply
- Jason Brownlee September 28, 2017 at 4:45 pm #
  
  I give an example in another comment.
  
  Also, generally, see this post on multi-step forecasting with LSTMs:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
Nels September 29, 2017 at 5:56 am #

I think I’m missing something fundamental in my understanding of LSTM/s and BPTT. I’ve read through many of your posts and have come to understand RNN’s and LSTM in particular much better because of them, so thank you for that!

My question that I hope you can shed some light on is what is the difference between passing the past information, i.e. var(t-n)…var(t-1) in the input vector for a single sample, and passing multiple sequences, of length n as a single sample?

To help clarify, using temsteps of length N, I have a configuration that looks like this:

Input to LSTM is [samples, timesteps, features].
Each sample/observation consists of a vector of timestamps (of size N+1) where each of these vector’s values corresponds to the input feature’s values I.e.

Observations for each time t, with features f and r
[
time t
[
[ f(t-N) r(t-N) ]
[ f(t-N+1) r(t-N+1) ]
[ f(t-N+2) r(t-N+2) ]
. .
. .
. .
[ f(t) r(t) ]
]
]
And for each observation/sequence the target is Y(t).

Or, as many of your examples do, you can include the the past information in the form of a windowed input, with a single time step, so something like:

Input is [samples, 1, features]. So for every observation, we include previous time values as features

Observations for each time t, with features f and r
[
time t
[
[ f(t-N), r(t-N), f(t-N+1), r(t-N+1), f(t-N+2), r(t-N+2), f(t), r(t) ]
]
]
And again, for each observation, the target is Y(t).

I understand that having sequences longer than 1 allows BPTT to work over the length of those sequences, but I don’t think I really understand the difference in these two methods.

I have tried the described two options, and I find the the latter is performing better based on preliminary tests. I can use a window size of 3 and a sequence length of 1 and get good results, but if I use the first approach and a window size of 12, the model actually fails to learn within the same amount of time.

Hence, I wonder if I don’t have a fundamental misconception. If you have some time, I would like to hear your explanation on this difference and how the LSTM responds in terms of “memory” based on these two different types of input setup. (I have read a lot of articles, blogs, git hub issues, and stack overflow posts trying to wrap my head around this, but I haven’t found anything that address this directly.)

Thanks!

Reply
- Jason Brownlee September 30, 2017 at 7:31 am #
  
  Generally, the multiple steps for one sequence are required for BPTT:
  https://machinelearningmastery.com/gentle-introduction-backpropagation-time/
  
  Without the history, the training will not have sufficient context to estimate the error gradient and your model will learn a function mapping rather than a sequence prediction problem.
  
  Does that help?
  
  Reply
Paul September 29, 2017 at 12:28 pm #

With this line…

# drop columns we don’t want to predict
reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)

I don’t understand the numbers used here, doesn’t the data not even have that many columns? There are 8 feature columns and 1 index column.

I’m adapting this code for my own use and have very different features but I’m not sure I’m getting that line adapted right.

Thanks for the great post!

Reply
- Paul September 29, 2017 at 1:29 pm #
  
  Nevermind! I figured it out.
  
  Reply
  - Jason Brownlee September 30, 2017 at 7:34 am #
    
    Glad to hear it Paul.
    
    Reply
- Jason Brownlee September 30, 2017 at 7:33 am #
  
  It does have that many columns after we reshape it to be a supervised learning problem.
  
  Reply
Wenhan Wang September 30, 2017 at 2:05 pm #

This is awesome!
Helping me a lot in my real work!

Reply
- Jason Brownlee October 1, 2017 at 9:07 am #
  
  Thanks, I’m glad to hear that.
  
  Reply
Vilmara Sanchez October 4, 2017 at 3:54 pm #

Hi Dr. Jason, I am working on a project for sleep stage classification where the number of timesteps (observations) in the input series (ECG signal) is different than the number of timesteps in the output series (sleep stage scores).

The issue here is that the input and output time series are not equal in terms of timesteps as the examples you have shown in your problems.

I have tried to frame the problem in different ways without getting results that make sense. Could you please provide guidance on how to approach this problem?.

Thanks,

Vilmara

Reply
- Jason Brownlee October 5, 2017 at 5:21 am #
  
  Generally, I would recommend an encoder-decoder model:
  https://machinelearningmastery.com/encoder-decoder-long-short-term-memory-networks/
  
  Reply
Devakar Verma October 6, 2017 at 6:06 pm #

Hi Jason,
If we want to predict multiple features as output and having multiple feature as input. How can we solve this problem. For example input variables are temperature and humidity and want to predict both temperature and humidity, can we solve this with single LSTM model.

Thanks for your anticipated response.

Reply
- Jason Brownlee October 7, 2017 at 5:50 am #
  
  Yes you can. Change the multivariate input model to output more than one value in the output layer.
  
  Reply
Brent October 7, 2017 at 5:55 am #

Hi Jason,

Thank you for taking the time to write such an excellent post and follow up with questions. The mechanics of the data conversion & training work great.

However, my first reaction is that the LSTM doesn’t seem to have learned anything more than to copy the previous value. As BECKER states:

> it looks like the predicted value is always 1 time period after the actual?

These are the same results as in your Shampoo example: the predicted value appears to be equal to the previous value (possibly with some constant offset).

Have you found a different network architecture that performs better than a DNN without LSTM layers?

Reply
- Jason Brownlee October 7, 2017 at 5:58 am #
  
  Agreed, LSTMs do not seem to be very good for autoregression. I would generally recommend using an MLP with a window for time series forecasting instead.
  
  See this post:
  https://machinelearningmastery.com/suitability-long-short-term-memory-networks-time-series-forecasting/
  
  Reply
  - Avinish May 3, 2020 at 2:35 am #
    
    Hi Jason,
    
    Would like to understand how to go about when the problem statement is framed like below.
    
    Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour. And this is to be done for next n days at hourly level, ie n * 24 time steps in the future with other variables given at those time steps.
    
    Hope you can point out to some resources and if LSTM would be a good way to go for this formulation.
    
    Thanks,
    Avinish
    
    Reply
    - Jason Brownlee May 3, 2020 at 6:16 am #
      
      You may need a multi-input model, e.g. one input for the sequence, and one for the static data, this will help:
      https://machinelearningmastery.com/keras-functional-api-deep-learning/
      
      Reply
sathvik October 9, 2017 at 1:34 pm #

Thank you so much Jason for the wonderful article, learnt a lot… I wanted to have a comparison shown on multivariate statistical methods and neural networks and I was looking for some post/article on multivariate time series model using ARIMA. I would be glad to know if anything you know of the same.

Thank you

Reply
- Jason Brownlee October 9, 2017 at 4:46 pm #
  
  You will need to look into using SARIMAX, sorry I do not have an example at this stage.
  
  Reply
Shan October 12, 2017 at 4:34 am #

Hi Jason, is there any library available to perform feature extraction/ dimensionlity reduction for sequential LSTM model?

Reply
- Jason Brownlee October 12, 2017 at 5:37 am #
  
  Often an embedding layer is used to project observations at each time step prior to feeding them into the LSTM.
  
  Reply
Terry October 12, 2017 at 6:15 pm #

How does multivariate LSTM compare to Multivariate ARIMAX? Are there use cases where one model outperforms the other?

Reply
- Jason Brownlee October 13, 2017 at 5:45 am #
  
  I would recommend using a linear model first and only moving to a neural net if it delivers better results on your specific problem.
  
  Reply
Hesam October 13, 2017 at 4:27 am #

Hello,

There are some problem of scaling back when we use more than one shift in time, I mean something like this:

reframed = series_to_supervised(scaled, 6, 1)

I can train and test the model, but some errors appears in the scaling back section which I couldn’t fix.

Please have a look. I really appreciate it.

Reply
Anil Maddala October 13, 2017 at 9:59 am #

Hi Jason, thanks for the great series of articles. How should I modify the code from changing the LSTM code from preiction to classification?

One sample input data is 60 time steps over 2 features and I want to classify the 60 step input sequence into 3 classes. To start with is LSTM the right approach?

Hoping that you wold take any requests, I would definetly love to see an article on Multivariate classification in Keras using LSTM/GRU and it would be really helpful for analyzing sensor data. You could look at the Human Activity Recognition dataset

Reply
- Jason Brownlee October 13, 2017 at 2:55 pm #
  
  Change the loss function and the activation function of the output layer to categorical_crossentropy and softmax respectively.
  
  Reply
heeun October 13, 2017 at 6:31 pm #

Hi Jason, thanks yor nice article.

I have a question!

That algorithm is many to one right?

How can I slove many to many?? for example, i want predict pollution and rain

Reply
- Jason Brownlee October 14, 2017 at 5:42 am #
  
  It is many-to-one in terms of features.
  
  You can change it to be many-to-many by outputting multiple features.
  
  Reply
Pau October 14, 2017 at 1:13 pm #

3 Things:
1) Thanks so much for this. I’ve used this as a basis for some code I’m writing and it gave me a great head start.
2) One thing that would be great to help with understanding the meanings of variables you’re using is to first put them into variables rather than using the integers. For example,

x_size = 1
train_X, train_y = train[:, :-x_size], train[:, -x_size:]
test_X, test_y = test[:, :-x_size], test[:, -x_size:]

This way, as people are reading the code they understand why it’s “-1” in case their adapted usage has different dimensions, they can change one variable and have it used everywhere it’s needed.

3) For instance, I’m trying to make this code output multiple predictions and am having a bit of trouble figuring out all the variables I need to change.

I have 368 columns of data, the first 168 are what will be predicted based on the other 200 points.

x_size = 200
# split into input and outputs
train_X, train_y = train[:, :-x_size], train[:, -x_size:]
test_X, test_y = test[:, :-x_size], test[:, -x_size:]

# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

# design network
model = Sequential()
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))

I get the error:
ValueError: Error when checking target: expected dense_1 to have shape (None, 1) but got array with shape (659, 200)

Should the Dense(1) be Dense(x_size) where for me that is 200? (this is why it would be great to use variables so I know what that 1 means). When I try it as 168 (which is what it seems like it should be), I get an error.

When I switch to x_size, it actually runs without errors, but I’m not sure if that means I’m correct or not.

I’m so confused.

Thanks!

Reply
- Jason Brownlee October 15, 2017 at 5:18 am #
  
  I have an example of multiple timestep outputs here that you could use as a starting point:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
  - Paul October 16, 2017 at 4:35 pm #
    
    Rather than trying to predict many timestep outputs, I’m looking to output multiple predicted values per timestep.
    
    One thing I don’t understand is this section:
    
    # invert scaling for forecast
    inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)
    inv_yhat = scaler.inverse_transform(inv_yhat)
    inv_yhat = inv_yhat[:,0]
    
    Why is it inserting the yhat values as the *first* column? The scaler has a different scale per column so positioning is important, and the Y data had been the last column in the row, hadn’t it? So won’t it get scaled incorrectly?
    
    Reply
    - Jason Brownlee October 17, 2017 at 5:38 am #
      
      The first column is the pollution value, we remove it from the test data, concat our prediction so we have enough columns for the transform’s expectations, then invert the transform and get the predicted pollution values in the correct scale.
      
      Does that help?
      
      Reply
Rui October 14, 2017 at 9:35 pm #

First of all ,thanks a lot for the great tutorial Jason.

I just have one question regarding the achieved predictions using the LSTM network.

I just don’t understand why are you making “trainPredict = model.predict(trainX)” .

I get the predict method using the testset testX, but using this method for trainX is not like if you were in some way cheating? I say this because we train the network using the trainX and trainY and trainY corresponds to the labels you are trying to predict in the predict method using trainX.

Is it performed for validation purposes only?

I’m still learning to work with the Keras API so I might be confused with the syntax of it

Many thanks

Reply
- Jason Brownlee October 15, 2017 at 5:21 am #
  
  Where am I doing that exactly?
  
  Reply
Kai Li October 17, 2017 at 1:05 pm #

Jason
Thanks a lot for your tutorial!
I still have some question,looking forward to your answer.
If I want use the feature(t) 、 feature(t-1) and pollution(t-1) to predict pollution (t), how can I do to reshape my input?

Reply
DC October 17, 2017 at 8:21 pm #

Hi Jason, Thank you very much for the wonderful post. I have a few questions.

1. You did not de-trend by using diff for above example. Diff from multi step only works for series. Can you please share how can we de-trend of multivariate time series?

2. I’d like to use past 3 days of above data to predict 3 time steps for multivariate data as above. Can you please let me know how I can do that with the example above?

Thanks for your help.

Reply
- Jason Brownlee October 18, 2017 at 5:36 am #
  
  You could de-trend each input series separately. Here is an example of using diff to detrend:
  https://machinelearningmastery.com/remove-trends-seasonality-difference-transform-python/
  
  I give an example in another comment of how to use multiple lag obs as input.
  
  Reply
Xie October 19, 2017 at 12:30 am #

Hi, Jason. First of all, any thanks for your post. And I have some problems.

1. I don’t really get the meaning of hidden_units? Can you please explain a little bit.
2. I am building a lstm network as you do. I just follow your ways and build the network but got an error, as described here https://stackoverflow.com/questions/46811085/dimension-error-building-lstm-with-keras.Could you please help me?

Thanks!!

Reply
- Jason Brownlee October 19, 2017 at 5:37 am #
  
  A hidden unit is a neuron or cell in a hidden layer.
  
  A hidden layer is a layer that is not the output or the input layer.
  
  Change your code to set “return_sequences” to be “False”.
  
  Reply
Argie October 19, 2017 at 3:16 am #

So in your example you are using the data this way:

No,year, month,day,hour,pm2.5,DEWP,TEMP,PRES,cbwd,Iws,Is,Ir
1, 2010,1,1,0,NA,-21,-11,1021,NW,1.79,0,0

Is possible to use the data in a way that lets say we could have multiple input numbers in one of the columns like for example, having
No, year, month, day, hour, pm2.5, newVariable
and in the new variable position instead of having just one integer like 20
to have a sequence of integers like (5,10,3,50,23)

Would that be possible using it on the same context, or is there any scenario that we could
use the data the way I mentioned ?

Reply
- Jason Brownlee October 19, 2017 at 5:40 am #
  
  If you mean, can you predict a sequence output, then yes. Here is an example:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
  - Argie October 19, 2017 at 7:31 am #
    
    I might have not been clear enough, and sorry for that.
    
    What I mean is that as an input I will have 4 different categories of data lets call them A, B, C, and D, that each one of them will have more than one integer, to be exact they will have 10 integers
    so for example:
    
    A = {3,4,6,8,34,65,43,1,54} and so on with the other three categories.
    
    The sequence of numbers within the four categories belong on different time stamps, for example 3 -> t0 , 4-> t1 and so on.
    
    So what I need is to classify them for different data samples.
    
    Reply
    - Jason Brownlee October 19, 2017 at 3:55 pm #
      
      These would be parallel series (columns) that could be all fed to one LSTM model like the example in the above tutorial.
      
      The model will process the parallel series one at a time step at a time.
      
      If the series extends beyond 200-400 time steps, then they could be split into multiple samples (e.g. multiple sub-parallel series).
      
      Does that help?
      
      Reply
      - Argie October 20, 2017 at 11:31 am #
        
        So so helpful, I tried it and worked like a charm.
        
        Great job, and so helpful all the material you provide, and the way you do it !!
        
        Thanks a lot Jason !!
      - Jason Brownlee October 21, 2017 at 5:23 am #
        
        I’m glad to hear that, well done!
Tim October 19, 2017 at 4:59 am #

Really appreciate all the work you have done!

Reply
- Jason Brownlee October 19, 2017 at 5:40 am #
  
  Thanks Tim.
  
  Reply
Abhinav October 19, 2017 at 6:36 am #

Hi Dr Brownlee. Thank you for this tutorial.

inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)

inv_yhat = scaler.inverse_transform(inv_yhat)

what does these steps do?

Because I am getting a ValueError: operands could not be broadcast together with shapes (1822,11) (6,) (1822,11) on this step.
I am applying on my own dataset

Reply
- Jason Brownlee October 19, 2017 at 3:52 pm #
  
  These steps add the prediction to the test input data so that we can inverse the transform and get the prediction back into the scale we care about.
  
  Reply
- Neha Aggarwal December 21, 2018 at 12:12 pm #
  
  Hi Abhinav,
  
  I am facing a similar problem. What did you do to rectify it ?
  
  Thanks
  
  Reply
TvT October 19, 2017 at 8:08 pm #

Hi Jason,

Thanks for sharing your awesome work, I’ve been learning a lot from you!

I have been struggling with increasing the second dimension to fully benefit from the BPTT though. I keep getting lost in the shapes. Would you mind sharing your code for multiple time steps aswell?
That would be awesome!

Keep up the good work!

Reply
- Jason Brownlee October 20, 2017 at 5:32 am #
  
  This post might help clear things up:
  https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
  
  Reply
Dirk October 20, 2017 at 7:42 pm #

Awesome work, thanks for sharing it!

Could it be possible that you switched up the chronological order of your predictions?
It looks to me that you predict the pollution of the previous hour, instead of predicting the future.

Reply
- Jason Brownlee October 21, 2017 at 5:33 am #
  
  That is what a persistence model looks like exactly.
  
  Reply
Craig October 21, 2017 at 3:22 am #

Hi Jason, I’m new to Deep Learning, so sorry if this is a fundamental question. I am trying to use an LSTM NN to create a super fast surrogate for a coastal circulation model (something sort of similar to this, but with time dependency: https://arxiv.org/pdf/1709.08725.pdf)

My training set looks something like this:

-samples: 2000 – (I modeled a year with hourly output)
-timesteps: 7 – (t-6, t-5, …, t)
-features: 4 – (offshore boundary tide, 1st derivative of offshore boundary tide, boundary river discharge for river-1, and boundary river discharge for river-2)

Currently, my target is velocity magnitude for one node in my model domain ([2000,1]

My question is: When you do this tutorial, you assign the time steps as additional features (i.e. for my problem, our train_X = [2000,1,28]). I did this and it works fine, but eventually I’d like to scale this, and I thought I’d try to reshape my data to it’s intended shape for the model (i.e. [2000,7,4]). However, when I do this, my training time goes way down (it’s probably 3-4x slower.

Does the model treat these two shapes differently? If not, why does it take so much longer to train with the latter shape?

Reply
- Jason Brownlee October 21, 2017 at 5:44 am #
  
  More time steps is slower.
  
  Perhaps this post will clear things up re input shapes:
  https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
  
  Reply
Amir Aaron October 22, 2017 at 5:58 pm #

Hi Jason,
Great article.
I have a small question:
In previous article you pointed out that we need to make the data stationary,
Do we need to do it for multi-variant as well?

Reply
- Jason Brownlee October 23, 2017 at 5:43 am #
  
  Ideally, yes.
  
  Reply
Andriy October 24, 2017 at 12:39 pm #

Nice article! I think one question remains unanswered. Why use RNNs if we only use one previous step to predict the next step? Why not SVM for example?

Reply
- Jason Brownlee October 24, 2017 at 4:00 pm #
  
  No reason at all, we cannot what will work best for a given problem.
  
  Try it and compare the results!
  
  Reply
Ali Abdul October 25, 2017 at 7:39 pm #

Hi Jason,

Thanks for this very informative post! Before applying to my financial dataset, I would like to consult you about my case. The type of my data is almost the same. I have financial risk factors like equity values, interest rates, foreign exchanges etc. values on daily basis and their corresponding dependent variable which is profit or loss of a portfolio. My goal is to detect the patterns and features (if any) responsible for the highest profits or lowest losses. So my question is can I convert your code above to a classification problem if I label my classes as 0 for the lowest losses and 1 for the highest profits?

Thanks in advance!

Reply
- Jason Brownlee October 26, 2017 at 5:25 am #
  
  Sure.
  
  Reply
  - Ali Abdul October 27, 2017 at 1:28 am #
    
    Great! One more small thing. When dealing with tails (let’s say 0 for lower, 1 for other than tail, 2 for upper tail), the classes and the features of course will be highly imbalanced. What would your approach be?
    
    Reply
    - Jason Brownlee October 27, 2017 at 5:23 am #
      
      You might need to adjust the distribution via rescaling to make the least represented classes better represented.
      
      Reply
Mehmet Abd October 26, 2017 at 8:28 pm #

Hi Jason,

Thanks for this very informative post! Before applying to my financial dataset, I would like to consult you about my case. The type of my data is almost the same. I have financial risk factors like equity values, interest rates, foreign exchanges etc. values on daily basis and their corresponding dependent variable which is profit or loss of a portfolio. My goal is to detect the patterns and features (if any) responsible for the highest profits or lowest losses. So my question is can I convert your code above to a classification problem if I label my classes as 0 for the lowest losses and 1 for the highest profits?

Thanks in advance!

Reply
- Jason Brownlee October 27, 2017 at 5:19 am #
  
  Try it and see.
  
  Reply
Hesam October 29, 2017 at 8:22 pm #

Hello

What we should do if the time itself would be a value that we must predict, such as predicting time and date for the next rainfall?

Reply
- Jason Brownlee October 30, 2017 at 5:37 am #
  
  You could predict the likelihood of rainfall for each hour and then use code (an if statement) to interpret those predictions and only output the predictions with a probability above a given threshold.
  
  Reply
Thabet October 30, 2017 at 3:33 am #

Hello Jason,

Could you perhaps show me exactly where to change as to predict the temperature instead of pollution?

Reply
- Jason Brownlee October 30, 2017 at 5:42 am #
  
  You can change the column used as the output variable when fitting the model.
  
  Around line 52 in the full example where we drop columns we don’t care about. Change it to drop the pollution as well and not drop temperature.
  
  Reply
  - Thabet October 31, 2017 at 10:14 am #
    
    Can you please help me further as i can’t manage to find where to change to predict for the temperature instead of pollution
    
    “” Next, we need to be more careful in specifying the column for input and output.
    We have 3 * 8 + 8 columns in our framed dataset. We will take 3 * 8 or 24 columns as input for the obs of all features across the previous 3 hours. We will take just the pollution variable as output at the following hour, as follows:
    
    # split into input and outputs
    n_obs = n_hours * n_features
    train_X, train_y = train[:, :n_obs], train[:, -n_features]
    test_X, test_y = test[:, :n_obs], test[:, -n_features]
    print(train_X.shape, len(train_X), train_y.shape)
    
    Where and how should i change to chose the temperature column?
    
    Reply
    - Jason Brownlee October 31, 2017 at 2:51 pm #
      
      Sorry, I cannot prepare an example for you.
      
      You might want to explore getting more familiar with NumPy arrays first:
      https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/
      
      Reply
      - Thabet November 1, 2017 at 7:52 am #
        
        Thanks Jason
        can you at least point to me where in these lines the clue is?
        
        train_X, train_y = train[:, :n_obs], train[:, -n_features]
        test_X, test_y = test[:, :n_obs], test[:, -n_features]
Allen November 1, 2017 at 7:03 pm #

Hi Jason,

Thanks for sharing your awesome work, I’ve been learning a lot from you!

I have a small question:

In previous article you pointed out that “Predict the pollution for the next hour as above and
given the “expected” weather conditions for the next hour.” , eg “pollution,dew,temp”.

What would your approach be?

Reply
- Jason Brownlee November 2, 2017 at 5:11 am #
  
  For the case: “Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour.”
  
  You would not need to transform the dataset, you would simply pretend that the actual weather conditions for the next hour are a forecast and predict the pollution value at that time.
  
  Reply
Ali November 2, 2017 at 3:42 am #

first thanks for the post I learned a lot. I have a fundamental question about LSTM. lets say, I have 3 variables X, Y, and Z. I want to predict on Z.

if I make the input(train_X in example above) time lagged. So I pass it x(t), x(t-1), x(t-2), x(t-3) etc…. then will the time component of LSTM matter or not? For example we have:

t, x, y, x-1, x-2, y-1, y-2, z-1, z-2, z
1, 1, 2, 0, 0, 0, 0 , 0, 0, 3
2, 2, 4, 1, 0, 2. 0, 3 0, 3
3, 3, 6, 2, 1, 4, 2, 3, 3, 6
4, 4, 8, 3, 2 6, 4 6, 3, 6
5, 5, 10, 4, 3, 8, 6 6, 6, 9

traditionally we would train on variables (x, y, x-1, x-2, y, y-1, y-2, z-2, z-2) on the first 4 time-steps then evaluate on the 5th.

my question is if I train it on time step,(1, 2, 4, 5) and evaluate on step 5, will I have the same result? mainly if I add the time-lag as an input can I reshuffle the data?

Reply
- Jason Brownlee November 2, 2017 at 5:13 am #
  
  If you reshuffle the data and the result is better/same then the LSTM is probably not the right method to use. I would recommend using an MLP. See this post:
  https://machinelearningmastery.com/get-the-most-out-of-lstms/
  
  Reply
Ali November 2, 2017 at 4:40 am #

Hi Jason,

if we pass in previous time lag can we shuffle the data around in the model? in other words make the input timeless?

Reply
- Ali November 2, 2017 at 4:41 am #
  
  sorry when I refreshed my question didn’t appear, I thought it did not go through….did not mean to impatiently spam. apologies.
  
  Reply
  - Jason Brownlee November 2, 2017 at 5:14 am #
    
    No problem, I moderate comments so there is some delay before they appear.
    
    Reply
Gus C November 3, 2017 at 3:41 am #

Thanks for this great post.
So how do you assess graphically your forecast with the actual?

Reply
- Jason Brownlee November 3, 2017 at 5:21 am #
  
  You could plot both with matplotlib.
  
  Reply
Num November 3, 2017 at 4:44 am #

Hello, I have a problem that’s highly related to this guide.

I have a time series where the predicted variable is (allegedly) in part dependant on some features from that time step, and these features are known before it (they are “planned prices” and “expected value” for different feature). I would like to include them as input into the LSTM.
For one output, this turned out to be easy (just keep them in), but if I try to predict several outputs, I am having troubles formating the input correctly.

For better understanding, the desired input would be features x1 through x8 for t-1,t-2…etc and then x1 through x7 for t,t+1,t+2…etc.

Is this even possible with the example given here?

Reply
- Jason Brownlee November 3, 2017 at 5:22 am #
  
  I believe you could adapt the example for your problem.
  
  Spend some time with this post:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  Reply
Geoffrey Anderson November 3, 2017 at 4:58 am #

PM2.5 is just one time series to predict, clearly. Predicting say 3 (or even 100,000) time series would be nice to look at too. An real life example where it’s useful is inventory management in retailing businesses. How many units will be sold in the next day of eggs, mascara, paper plates, frozen corn, 2% milk, skim milk, etc etc. Many of these TS will be correlated. Might need multi-tasking neural network outputs. LSTM would offer more automatic feature engineering than, say, using a boosted tree traditional machine learning algorithm which is natively unaware of time series. The latter needs manual feature creation of time-windowed aggregates by the data scientist. The LSTM just inputs the raw time series values directly by contrast, finding its own features. A bonus when using the LSTM is there may be some time-window or other features the human didn’t know about in advance. Another bonus is multiple-output (multitasking) that neural networks can naturally provide, unlike boosted trees for example. I’d suggest to start with only 2 or 3 TS at first, because a whole grocery store’s worth of items for even just a one day example is way too cumbersome to look at and manipulate easily on one small monitor screen. Just a warning: This may be frontier research, believe it or not.

Reply
- Jason Brownlee November 3, 2017 at 5:23 am #
  
  Thanks for the suggestion Geoffrey. I hope to spend more time on this soon.
  
  Reply
Lu November 6, 2017 at 8:35 pm #

I plot inv_yhat and inv_y in a same figure, and I found an interesting fact, that the training result is shifted to right for an hour compared with the ground truth. That’s to say the predicted result is almost the one hour ago data, or X_t = X_{t-1} approximately.
Actually, the best estimation for RNN is to output the latest result, without doing any prediction. How do you think about this?

Reply
- Jason Brownlee November 7, 2017 at 9:48 am #
  
  When a prediction looks like a shifted input it means the model has no skill because it is predicting the input as output, e.g. a persistence model:
  https://machinelearningmastery.com/persistence-time-series-forecasting-with-python/
  
  Reply
Rafael November 7, 2017 at 6:32 am #

I’m using my own dataset and I’m not using the series_to_supervised method because I already have the dataset prepared in 2 files, train and test files. I still have the error:

Traceback (most recent call last):
File “teste.py”, line 64, in
inv_yhat = scaler.inverse_transform(inv_yhat)
File “C:\Users\rafae\AppData\Local\Programs\Python\Python35\lib\site-packages\sklearn\preprocessing\data.py”, line 385, in inverse_transform
X -= self.min_
ValueError: operands could not be broadcast together with shapes (52,12585) (12586,) (52,12585)

Reply
- Rafael November 7, 2017 at 6:34 am #
  
  To load the datasets
  
  #Train dataset
  dataset = read_csv(‘trainning_small.csv’, header=None, index_col=None)
  dataset.drop(dataset.columns[[0]], axis=1, inplace=True)
  train = dataset.values
  
  encoder = LabelEncoder()
  train[:,-1] = encoder.fit_transform(train[:,-1])
  train = train.astype(‘float32’)
  
  scaler = MinMaxScaler(feature_range=(0, 1))
  train = scaler.fit_transform(train)
  
  #Test dataset
  dataset_test = read_csv(‘test_passare.csv’, header=None, index_col=None)
  dataset_test.drop(dataset_test.columns[[0]], axis=1, inplace=True)
  test = dataset_test.values
  
  encoder = LabelEncoder()
  test[:,-1] = encoder.fit_transform(test[:,-1])
  test = test.astype(‘float32’)
  
  test = scaler.fit_transform(test)
  
  train_x, train_y = train[:, :-1], train[:, -1]
  test_x, test_y = test[:, :-1], test[:, -1]
  
  train_x = train_x.reshape((train_x.shape[0], 1, train_x.shape[1]))
  test_x = test_x.reshape((test_x.shape[0], 1, test_x.shape[1]))
  print(train_x.shape, train_y.shape, test_x.shape, test_y.shape)
  
  THE RESULT FOR THE PRINT:
  (838, 1, 12585) (838,) (52, 1, 12585) (52,)
  
  Reply
Fred November 7, 2017 at 4:30 pm #

Dr. Brownlee,

First of all, thanks for this wonderful post. I have applied your code with the following parameters:
lags=8, features=8, epochs=50, batch=104, neurons=150

And got almost perfect match between train and test. The test RMSE is 26.526.

My question is that what does this result stand for?

Reply
- Jason Brownlee November 8, 2017 at 9:18 am #
  
  Well done. The result is a summary of the error between predicted and expected values.
  
  Reply
Vlad November 12, 2017 at 5:37 am #

I launched this example on my notebook (AMD FX-8800P Radeon R7, 8GB RAM), it runs already 4 hours and I even can’t see what is going on with the model training and how long will it run. Is it possible to include in the example some monitoring and visualization of the training process, ex. using callbacks.RemoteMonitor ?

P.S. previously I worked with Matlab, it was so nice to see number of epochs, accuracy, error, and many other parameters during the training process. It helped a lot to understand should I continue training, or should I change the model.

Reply
- Jason Brownlee November 12, 2017 at 9:08 am #
  
  You should see the progress for each epoch and across epochs as output on the command line.
  
  Reply
Vlad November 12, 2017 at 7:56 am #

Hm, relaunched the example step-by-step and found out it’s stuck not at training, but at model compilation. Working for hours at 100% CPU load on block:
# design network
model = Sequential()
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss=’mae’, optimizer=’adam’)
What’s wrong?
Ubuntu 16.4, Keras 2.0.6, Theano 0.9.0, Python 3.6.2, Anaconda custom

Reply
- Jason Brownlee November 12, 2017 at 9:09 am #
  
  Are you running on the command line? If you run in a notebook, you may hide error or verbose messages.
  
  Reply
Vlad November 12, 2017 at 9:57 am #

I updated all libraries and anaconda and python and now it works! Sorry for disturbance 🙂 BTW, monitoring tool can be used for callbacks.RemoteMonitor is hualos-master

Reply
- Jason Brownlee November 13, 2017 at 10:11 am #
  
  I’m glad to hear that, well done!
  
  Reply
Tommy November 13, 2017 at 5:20 am #

Thanks for the very well written article. I really appreciate the detailed walkthrough.

I have been looking for a way to apply multivariate input to a machine learning prediction model of any sort. I’m doing this in order to predict the growth of compute systems in excess of hundreds of thousands of nodes bases on 6 years of daily samples. Simply looking at the Y growth over time and feeding that into something like Facebook prophet has proved somewhat insufficient because it only looks at the problem as a function of past behavior.

In reality there are more variables at play that control or effect that line of growth. As such, simple univariate approaches fall short and the predictions can be very good or very bad.

When I found this article I thought to myself, Eureka! I will be able to use this approach in order to feed in multivariate data along with the growth of my systems in order to get better predictions. However I was somewhat crestfallen at the revelation of 2 key problems discussed over the last several months here in the comments…

One problem you acknowledged as a potential/known issue and linked to another article explaining why autoregression time series problems may not be best solved with lstm neural networks. The article posits that better results might be obtained by stacking or using more layers. Have you tried this? If so, what did it look like and what results did you get?

The second and more concerning problem was when one commenter performed the same exercise as laid out in this article, but removed all of the multivariate data and still obtained the same rmse rate as you did. It was as if none of the other variables had any bearing on the prediction. This is deeply concerning, because as I see it, either this event was anomalous and driven by the input data, or the overall approach itself may be flawed, or the implementation thereof is broken. I’m not sufficiently versed in the technology to make a value statement on any of those points.

I’m hoping that you would be willing to share your thoughts on possible answers to these questions.

Reply
- Jason Brownlee November 13, 2017 at 10:22 am #
  
  The tutorial is a demonstration of a method, not the best way of solving or even framing the presented problem.
  
  I should have made that clearer, but that is the philosophy behind every single blog post on my site. I show how to use the methods, not how to get the best results (for a specific problem). The former problem is tractable the latter is not.
  
  Reply
  - Tommy November 13, 2017 at 12:14 pm #
    
    Thanks for the clarity and candor! As a long-time comp-sci person, I find it very strange to run these tensorflow sessions and get different results for the same inputs (I’ve been putting your code through the paces) … I found I needed to add this, or every subsequent run would result in predictions that seemed to augment each previous run:
    
    try: keras.backend.clear_session() except: pass
    
    For what it’s worth, I zeroed out all the other variables (instead of eliminating them) and it /did/ have bearing on the output. I don’t think this methodology can be dismissed as ineffective. It seems to be approximating a workable solution. More exploration is necessary.
    
    Thank you for setting me on the path!
    
    Reply
    - Jason Brownlee November 14, 2017 at 10:06 am #
      
      Damn.
      
      Well, these are stochastic algorithms in general, but a single trained model should be deterministic and when it’s not, we’re in trouble.
      
      Reply
      - Tommy November 14, 2017 at 11:48 am #
        
        Have you tried running multiple iterations and examining yhat_inv?
        
        I keep getting different output, and I didn’t expect that. Am I looking in the wrong place?
        
        I can send a catalog of my results if that helps…
      - Jason Brownlee November 15, 2017 at 9:45 am #
        
        I have not.
        
        In general, we do expect different results across different runs given the stochastic nature of neural networks (forgive me if I am missing the point):
        https://machinelearningmastery.com/randomness-in-machine-learning/
sam November 15, 2017 at 10:23 pm #

Hi Jason,

multivariate time series forecasting possible for multi-step??

Reply
- Jason Brownlee November 16, 2017 at 10:30 am #
  
  Sure.
  
  Reply
  - sam November 16, 2017 at 6:23 pm #
    
    Hi,
    
    Jason Can you please explain..How to prepare dataset for train models.. let’s suppose i have 5 feature and i want to predict t + 5 value..
    
    For example..
    
    x1 = (2,3,4,3,1,6,8,9,4,1)
    x2 = (5,2,5,7,9,9,6,3,1,3)
    x3 = (2,3,4,8,1,6,8,9,1,1)
    x4 = (5,1,5,7,9,9,6,3,1,7)
    x5 = (2,3,4,6,8,3,1,3,5,7)
    y = (8,7,6,5,4,3,2,8,9,7)
    
    Thanks,
    
    Reply
Tommy November 18, 2017 at 3:54 pm #

What do you think about putting a dropout layer between the LSTM and Dense layers to address the overfitting phenomenon?

Reply
- Jason Brownlee November 19, 2017 at 11:08 am #
  
  Try it and see, I’d love to hear how it goes.
  
  Reply
Abdulrauf Garba November 19, 2017 at 10:36 pm #

Hi, Jason, we need a similar tutorial of Multivariate time series using the Recurrent neural network in R.

Reply
- Jason Brownlee November 20, 2017 at 10:17 am #
  
  Thanks for the suggestion.
  
  Reply
Louis November 22, 2017 at 1:51 am #

Hello Jason!

You say in your post:

“We can use this data and frame a forecasting problem where, given the weather conditions and pollution for prior hours, we forecast the pollution at the next hour.”

Is it possible to do the same without prior knowledge of the pollution levels?

I am working on a very similar time series forecasting problem. However, in my case, I don’t have access to intermediate level of pollution.

Thank you

Reply
- Jason Brownlee November 22, 2017 at 11:13 am #
  
  Yes, but it is important to spend time exploring different framings of the problem.
  
  Reply
Shantanu November 22, 2017 at 5:50 am #

Hi,

I have a question about splitting the data.
I have the data month wise for around 20 years.
How should I split it?
Thanks.

Reply
- Jason Brownlee November 22, 2017 at 11:14 am #
  
  See this post:
  https://machinelearningmastery.com/prepare-univariate-time-series-data-long-short-term-memory-networks/
  
  Reply
michael November 22, 2017 at 9:21 am #

Hi Jason,

Thank you for this excellent tutorial!

This may or may not be a slight variation of your “Train On Multiple Lag Timesteps Example”, but I was wondering how I should modify your example to do a multivariate one to multiple time step prediction i.e. look at one time step of 8 dimensional data and predict 10 time steps of 8 dimensional data. Or a multivariate seq2seq prediction i.e. show 10 time steps of 8 dimensional data and predict 10 time steps of 8 dimensional data.

Thanks

Reply
- Jason Brownlee November 22, 2017 at 11:22 am #
  
  Hmmm, I have to think about that. It might be best to do a multiple output model:
  https://machinelearningmastery.com/keras-functional-api-deep-learning/
  
  Reply
Sammy November 23, 2017 at 1:20 pm #

Hi Jason,
First of all, thank you very much for this excellent post. I would be grateful if you can show how to do multivariate time series forecasting per group. In other words, lets say we have data for many cities and we would like to add the forecasting per city ? How we can feed the data to LSTM for a given city and get inv_y, inv_yhat to compare to see how model does ?
Thanks again,
Sammy

Reply
- Jason Brownlee November 24, 2017 at 9:31 am #
  
  You could model each city separately or combine all cities into a single dataset, or do both and ensemble the result.
  
  Reply
Nagabhushan S Baddi November 23, 2017 at 7:50 pm #

Hi Jason.
I have a dataset of 169307 rows and 41 features. I want to use timestep of 5. So, when I am using X=np.reshape(X, (169307, 5, 41)), I am getting an error that “cannot reshape array of size 6941587 into shape (169307,5,41)”. Does this mean that n_samples*n_features in the orginal dataset should be divisible by n_timesteps? If this is true, then how can I be able to use timestep of my choice?

Reply
- Jason Brownlee November 24, 2017 at 9:37 am #
  
  Perhaps this post will help:
  https://machinelearningmastery.com/prepare-univariate-time-series-data-long-short-term-memory-networks/
  
  Reply
  - Nagabhushan November 24, 2017 at 7:10 pm #
    
    Hi Jason.
    
    I referred to this post. But it explains data preprocessing in which only 1 feature is present. But my dataset has multiple features..I am confused on how to reformulate the data and then reshape it…for example, let us say, the following is my dataset:
    Slno f1 f2 f2 target
    1. 2. 3. 1. 0
    2. 1. 7. 9. 1
    3 . 3. 3. 1. .1
    ……
    
    Here it has three features f1 f2 f3..and a target label with two classes.here the classification cannot be done only on the current feature vector, since the output has a dependence on previous feature vectors..can u plz explain me the data formulation for this case to the format n_sample, time steps, n_features…where n_sample is the same as number of sample in the original dataset X and n_features is the same as number of feature I.e 3. Let’s say the time step is 5. Plz help in this.
    
    Reply
    - Jason Brownlee November 25, 2017 at 10:15 am #
      
      This post will help you frame your data as a supervised learning problem:
      https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
      
      Reply
Chris November 25, 2017 at 11:27 pm #

Hi Jason,
I’m a little confused about the range of scaling.

In many other posts you mentioned the following:
“Transform the observations to have a specific scale. Specifically, to rescale the data to values between -1 and 1 to meet the default hyperbolic tangent activation function of the LSTM model.”

Is there a reason for the use of 0 to 1 ?
Isn’t -1 to 1 better for scaling, since the activation function is tanh?

Thank you,
Chris

Reply
- Jason Brownlee November 26, 2017 at 7:32 am #
  
  Great question, a scale of 0-1 results in better skill in my experience.
  
  Reply
Somayeh November 28, 2017 at 1:44 am #

Hi Jason,

Thank you so much for the wonderful tutorial! That was so helpful for me.
When i read your post, my questions was solved about how to predict multi-output multi-input system in multi-step time series because of your great illustration.

But I have a question, in my problem, we have many observations for some cases in each time (about 500), so we have multiple series inputs and outputs in each time.

Could you please help me how can solve this issue.

Any help will be useful for me. i will be very appreciated for your help.

Thank you,

Somayeh

Reply
- Jason Brownlee November 28, 2017 at 8:39 am #
  
  I would recommend exploring many different framings of the problem to see what works best and consider a baseline MLP model.
  
  Reply
- Max July 20, 2018 at 12:55 am #
  
  May I ask how you solved your problem of multiple outputs? I am having trouble implementing it.
  
  Reply
Michael November 29, 2017 at 6:35 am #

I see this question has been raised before, I’m sorry for beating a dead horse. I’ve been struggling with the inverse_transform step.
I tried to implement this algorithm using my own dataset and had trouble with it. Then I tried to run the example with the example dataset as in the tutorial and also had an error on the inverse_transform step.

inv_yhat = scaler.inverse_transform(inv_yhat)

(on my data)
ValueError: operands could not be broadcast together with shapes (15357,287) (8,) (15357,287)

on the tutorial data set:
ValueError: operands could not be broadcast together with shapes (35037,24) (8,) (35037,24)

PS. your blog is great. Keep up the the good work!

Reply
- Jason Brownlee November 29, 2017 at 8:30 am #
  
  Generally, you must make sure that the data has the same shape and that columns have the same index when transforming and inverse transforming.
  
  Confirm this before performing each operation.
  
  Does that help? Let me know how you go.
  
  Reply
  - Abby November 1, 2019 at 8:45 am #
    
    Hi Jason,
    
    I am unable to fix a similar valueerror. Initially when the data is normalized the shape is different. Can you give an example of what needs to be done from your tutorial?
    
    Reply
  - Michael Brown June 20, 2020 at 1:01 pm #
    
    First of all, a lot of people are getting this same mistake, I am not an exception, and I followed the exact code. There might be some problems in the code itself. This answer is so general and does not help at all.
    
    Reply
    - Jason Brownlee June 21, 2020 at 6:00 am #
      
      Sorry, here are some specific things to check:
      https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
      
      Reply
- Cynthia June 20, 2020 at 1:16 pm #
  
  This error is because he applied scaler.fit_transform on the dataframe that only had 8 columns (the original dataframe), but then he apply the scaler.inverse_transform on the test_X dataframe which had 16 columns; hence, the mismatch. I don’t know why he was able to upload the full code without reproducing this error.
  
  Reply
  - Jason Brownlee June 21, 2020 at 6:10 am #
    
    The code works as is.
    
    Ensure you have copied the code from the complete example.
    
    Reply
    - Igor January 19, 2021 at 9:50 pm #
      
      The code doesn’t work, and you doesn’t help. Is it so hard to answer: what can I do with this mistake? I have copied the code from the example correctly
      
      Reply
      - Jason Brownlee January 20, 2021 at 5:43 am #
        
        Perhaps these tips will help:
        https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
Abdur Rehman Nadeem November 29, 2017 at 8:21 am #

HI jason,

Thanks for great tutorial. I have a question how to choose the no. of timesteps as you always choose 1 timestep ? From where can I see the predicted value as graph just showing training of model and how can I predict the value for different time intervals (e.g. if I want to predict the value for next 1, 2, 4 or hours)?

Reply
- Jason Brownlee November 29, 2017 at 8:31 am #
  
  I recommend experimenting with different numbers of time steps on your problem to see what works best.
  
  You can collect predicted values and plot your own graph using matplotlib. I provide examples on other posts, for example:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
Ahmed Ali Mbarak November 29, 2017 at 4:07 pm #

Hello Mr Jason Brownlee, Your tutorial is awesome, it helped me in my project. I have been really interested in machine learning and this place has given me a lot.

My next move was to find a way to input data to my code and predict the future value. Like for example, for predicting air pollution. A user will keep todays data like N02 and windspeed and the code will spit out tomorrow’s air pollution. In other words how to apply the code to practice?.

Thank you.

Reply
- Abdur Rehman Nadeem November 30, 2017 at 12:46 am #
  
  I think “yhat” is the predicted value regarding “test_X” actual value because we are providing test_X as input to predict.
  
  Reply
  - Jason Brownlee November 30, 2017 at 8:18 am #
    
    Sounds correct.
    
    Reply
- Jason Brownlee November 30, 2017 at 8:04 am #
  
  Here is an example:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
Abdur Rehman Nadeem November 29, 2017 at 8:25 pm #

Hi Jason,

In series_to_supervised() function, when we change the value of variable “n_in” (e.g. if we say 2 in this example ,does it mean we are now predicting for the next two hour because now the dataframe will have 16 columns instead of 8)? How the value of “n_out” effects please explain that also .

Best Regards,

Reply
- Jason Brownlee November 30, 2017 at 8:11 am #
  
  You can learn more about that function here:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  Reply
Abdur Rehman Nadeem November 30, 2017 at 12:21 am #

Hi Jason,

i took the “yhat” array as my predicted values and “test_X” array as actual values because we predicted on test_X array and draw a plot using matplotlib , did I do the right ?

Reply
Sammy November 30, 2017 at 7:15 am #

Hi Jason,
I wanted to have n_in: Number of lag observations as input (X) set to 3 (using my own data) as can be seen below
49 # frame as supervised learning
50 reframed = series_to_supervised(scaled, 3, 1)

I make the data samples
86 inv_yhat = scaler.inverse_transform(inv_yhat)
and I get the following error:
File “/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/preprocessing/data.py”, line 385, in inverse_transform
X -= self.min_
ValueError: operands could not be broadcast together with shapes (67112,57) (19,) (67112,57)
I have initially 19 variables and I have number of observations set to 3 the text_X has following shape
>>> test_X.shape
(67112, 1, 57)
yhat = model.predict(test_X) and
>>> yhat.shape
(67112, 1)

I don’t understand the error above. I would be grateful if you can help me see what I am doing wrong.
Again, thanks a lot. You are awesome !
Sammy

Reply
- Jason Brownlee November 30, 2017 at 8:40 am #
  
  Hi Sammy, did you try the section “Update: Train On Multiple Lag Timesteps Example”?
  
  Reply
  - Sammy November 30, 2017 at 9:00 am #
    
    No as I didn’t see the update before. I will try it now. Thanks a lot
    
    Reply
    - Jason Brownlee November 30, 2017 at 10:07 am #
      
      No problem.
      
      Reply
Miha December 1, 2017 at 2:37 am #

Hi Jason,

First of all, many thanks for this great tutorial!

I’m trying to apply this to my own problem. However, I’m facing some problems.
Let’s say we have the time series of multivariate data structured like this:

x1,x2,x3,…x30, y1
x1,x2,x3,…x30, y2
….

where x1 – x30 are numeric (continues) values and y1 – yn are labels which I want to predict.
Y can only be 1 (on) or 0 (off). Some of these parameters are raw sensor data, which increase or decrease over n samples, so I know that this problem is ideal for RNN.

But I am not sure if my approach is ok.

Is it ok to re-factor the data in a way, that I take the first 10 samples (without y values of course), create the 2D array of them and try to predict the output of sample n10 and then move for 1 place and take next 10 samples and predict sample n11 and so on… So not to combine them into one vector like you did.

For example, if I have 10,000 samples, each for 100ms and I want to look at the last 10 samples (1 second) I train the data with samples of shape (99990, 10, 30 ) where 99990 represent the number of samples, each containing 10 readings (1 second) with the dimension of 30.

My current model looks like this, but it is not as successful as I want it to be (I think it can be a lot better):

model = Sequential()
model.add(LSTM(100, input_shape=(nsamples, nbatch, ndimension))
model.add(Dropout(0.2))
model.add(LSTM(100))
model.add(Dropout(0.2))
model.add(Dense(1, activation=’sigmoid’))
model.compile(loss=’binary_crossentropy’, optimizer=’adam’)

Can you please point me in the right direction?

Reply
- Abdur Rehman Nadeem December 2, 2017 at 9:28 am #
  
  Hi Maha,
  
  Can you tell me why you are just applying “Activation Function” to just output layer I mean why there is no “Activation Function” for hidden layer?
  
  Reply
  - Jason Brownlee December 3, 2017 at 5:22 am #
    
    We are using the default activation functions for the LSTM hidden layers.
    
    Reply
Silvia December 3, 2017 at 4:01 am #

train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))

I’m having a lot of troubles with these two lines.

I don’t understand why it isn’t like so

train_X = train_X.reshape((1, train_X.shape[0], train_X.shape[1]))
test_X = test_X.reshape((1, test_X.shape[0], test_X.shape[1]))

I thought (and obviously I’m wrong, but I want to know why) that we had 1 sample because we have one city, but have multiple timesteps one for each set of measurements.

If we had 3 cities would we then have 3 instead of 1?

Reply
- Jason Brownlee December 3, 2017 at 5:26 am #
  
  In this example, we are only using a single time step per sample.
  
  It is unrelated to the number of cities.
  
  See this post for more on how to reshape data for LSTMs:
  https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
  
  Reply
Mahesh December 3, 2017 at 12:50 pm #

Hi Jason,

If I have data for every city then how can I build one LSTM model. Here data is for only one city and have to forecast pollution. Lets suppose if I append data for other cities so can we predict pollution using single LSTM
Yes,we can build model for each city separately but can we build a single model?

Reply
- Jason Brownlee December 4, 2017 at 7:44 am #
  
  There is no one best way. I would encourage you to explore different ways to frame this problem, perhaps one model per city, perhaps one model for regions or all cities, perhaps ensembles of models. See what works best for your data.
  
  Reply
lucy80 December 3, 2017 at 10:47 pm #

Hi Jason,

If instead of single time series we have multiple time series, how should we normalize data?
i.e. if we have pollution data for 100 cities, normalization should be done citiwise or across all cities ?

Reply
- Jason Brownlee December 4, 2017 at 7:47 am #
  
  It really depends on the model that you are constructing.
  
  Your goal is to ensure input data to the model is consistent.
  
  Reply
Mangesh Divate December 9, 2017 at 7:38 am #

Hello Jason, one question is why didn’t you used scikit-learn train_test_split function instead of

# split into train and test sets
values = reframed.values
n_train_hours = 365 * 24
train = values[:n_train_hours, :]
test = values[n_train_hours:,

Reply
- Jason Brownlee December 9, 2017 at 9:22 am #
  
  By all means, try it. Note that you cannot shuffle the series.
  
  Reply
james December 11, 2017 at 1:16 am #

oh,jason,
in my computer, every epochs used 191s! emmmmmm……….. this time is too long .
i want to ask ,you used GPU to speed up ? or other problems?
thank you!!

Reply
- Jason Brownlee December 11, 2017 at 5:27 am #
  
  GPU can speed up LSTMs somewhat, but not as much as MLPs.
  
  Reply
Mark December 11, 2017 at 8:23 am #

Hi Jason,

Thank you so much for your brilliant website helping us all get good at machine learning!

Please could you clarify the line of code that outputs the next hour’s pollution reading? I’ve run the model and it return the RMSE but I’m interested to see the t+1 prediction.

What code would I add at the end so that when the model has finished running it prints the next hour’s predicted pollution reading?

Many thanks!

Reply
- Jason Brownlee December 11, 2017 at 4:51 pm #
  
  Thanks Mark!
  
  See this post on how to make predictions with a finalized LSTM model:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
Mark December 13, 2017 at 12:49 am #

Thank you, Jason.

I’m almost ready to apply what you’ve taught me here to my use case. The only other thing that isn’t 100% clear to me is the dropping columns number references 9,10,11,12,13,14,15 (below):

# drop columns we don’t want to predict
reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)

I get that you’re dropping the columns after ‘pollution’ because you only want to predict the pollution readings but why are they referenced 9-15?

Thank you in advance!

Reply
- Jason Brownlee December 13, 2017 at 5:40 am #
  
  We are dropping variables that we do not want to predict at the next time step. We only want to predict pollution.
  
  Reply
  - Mark December 13, 2017 at 7:50 am #
    
    I understand that. My question was around the numbering. If we’re dropping columns ‘dew’ through to ‘rain’ i.e. columns number 3 to 9 in the prepared “pollution.csv” dataset above then why isn’t the code written:
    
    reframed.drop(reframed.columns[[3,4,5,6,7,8,9]], axis=1, inplace=True)
    
    It’s the 9 – 15 that I just need an explanation for please.
    
    Many thanks
    
    Reply
    - Jason Brownlee December 13, 2017 at 4:10 pm #
      
      We are dropping them from the new dataset that has lag variables.
      
      Try printing the version of the dataset that we are modifying to get an idea of its shape.
      
      Reply
Chris December 13, 2017 at 11:07 pm #

Hello json,
again a very successful contribution.

What I would like to do is something like a early warning system that predicts as early as possible, as safely as possible for example in the case of natural disasters, financial forecast or driving data from the prediction output of a Multivariate Time Series LSTM Forecast.

Suppose I get the prediction, e.g. x, y and z and each area labeled with x or z must be K-units long, each time they occur. X and z make up 10 percent of the data.

The ground truth and Prediction would then look like e.g.
GT:y y y y y y y y x x x x x x y y y y y y z z z z z z y y y y y y y y y y y y y y y y
PR:y y y x x y y y x x x x x x y y y x y y y z z z y y y y y y y y y z z y y y x x y y

Now I would like to determine an overall probability for an event, based on the PR sequence.
Op:y – – – – – – – – X – – – – – – Y – – – – – – Z- – – – – -Y – – – – – – – – – – – – – – – – –

I had the idea of a window with a threshold or a sequence classification task.

Since I am fairly new to machine learning and co, but I’m thinking that this problem has probably been discussed and solved very often, I would be very happy about your advice.

Reply
- Jason Brownlee December 14, 2017 at 5:39 am #
  
  There is not one best way to solve a problem like, this, but many. I’d encourage you to brainstorm different ways of framing this as a prediction problem and see what works best.
  
  Reply
Abdur Rehman Nadeem December 14, 2017 at 4:14 am #

Hi Jason,

These days LSTM is also popular for sentimental analysis. Have you written any tutorial on Sentimental Analysis using LSTM or something like that ?

Reply
- Jason Brownlee December 14, 2017 at 5:42 am #
  
  Yes, see here:
  https://machinelearningmastery.com/develop-word-embedding-model-predicting-movie-review-sentiment/
  
  Reply
Mike December 14, 2017 at 5:42 pm #

Hi,jason
can i save my model ? i don’t want to train it everytime….
oh,and do you have any article to talk how to predict next n step in Multivariate Time Series Forecasting with LSTMs in Keras??
thank you!!!

Reply
- Jason Brownlee December 15, 2017 at 5:29 am #
  
  Yes you can save your model, here’s how:
  https://machinelearningmastery.com/save-load-keras-deep-learning-models/
  
  Here’s how to make predictions:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
Tony December 15, 2017 at 11:26 pm #

Hi, jason
I read your article and run the code.But i have some questions .Can you give me some suggestions?
1. In this article, you prepare the pollution dataset for the LSTM. All features are normalized, your dataset is transformed a supervised learning problem . I want to ask ,why the code is ‘MinMaxScaler(feature_range=(0, 1)) ‘, rather than ‘MinMaxScaler(feature_range=(-1, 1))’ ?I remember the default activation function for LSTMs is the hyperbolic tangent (tanh), which outputs values between -1 and 1. Why we set (0,1) in there?
2. In this code,we don’t transform Time Series to Stationary. Why? I think we must transform Time Series to Stationary. It’s necessary，right?
3. the important arguments are batch_size, n_neuron and epochs. How shoud i adjust them?
4. Can i use CNN network to predict Multivariate Time Series ? Too many people all think LSTM is the best way, Really?
Thank you very much!

Reply
- Jason Brownlee December 16, 2017 at 5:29 am #
  
  Results are better if you normalize the data.
  
  Making the data stationary may improve the skill of the model. I was trying to keep the example simple.
  
  Use experiments to see what values give the best results. Be systematic.
  
  I think MLP is better at time series, here’s why:
  https://machinelearningmastery.com/suitability-long-short-term-memory-networks-time-series-forecasting/
  
  Reply
  - Tony December 16, 2017 at 7:15 pm #
    
    thank you jason,
    your reply it’s very usefu. But i still don’t understand why the code is MinMaxScaler(feature_range=(0, 1))? in your other article ,you use feature_range=(0, 1),
    so i’m very wondering . what is the reason? The activation function for LSTMs is changeable?
    
    Reply
    - Jason Brownlee December 17, 2017 at 8:51 am #
      
      Sorry, I don’t follow?
      
      Reply
      - Tony December 17, 2017 at 1:47 pm #
        
        i am foolish,I write it wrongly ,i am sorry,
        my question is:
        But i still don’t understand why the code is MinMaxScaler(feature_range=(0, 1))? in your other article ,you use feature_range=(-1, 1),The activation function for LSTMs is tanh? i think thnh is in (-1,1) , why in there ,we use (0,1)?
        thank you so much….
      - Jason Brownlee December 18, 2017 at 5:20 am #
        
        LSTMs generally perform better with normalized data (in the range 0-1).
      - slouchpie January 18, 2018 at 12:49 pm #
        
        Hi Jason, great article.
        Can you please explain why it is OK to use feature_range [0. 1] as opposed to [-1, 1].
        In another article (https://machinelearningmastery.com/time-series-forecasting-long-short-term-memory-network-python/) you said that the feature_range should be [-1, 1] in order to be the same range as the hyperbolic tan (tanh) function, which default LSTM uses. In fact, you said “This is the preferred range for the time series data.”.
        I am not sure why it is OK to now use [0, 1]. Are you taking absolute value of tanh somewhere in your LSTM layer?
      - Jason Brownlee January 19, 2018 at 6:26 am #
        
        The range [0,1] results in better skill.
soloyuyang December 16, 2017 at 12:06 am #

Hi,Jason,
The work you have done is wonderful. i’m interested in time series forecasting with lstm.
i have two questions.
1.In some cases in time series forecasting, especially the single series, the features are the data of previous time(t-1,t-2…). For example,only the series of pm2.5, i want to predict the value on t+1,depending on the data of t-k……t-1,t. how should i set the “time-steps” and “features”, [samples, k+1, 1]or [samples, 1, k+1](treat the previous data as features).
2.you have mentioned “LSTM does not appear to be suitable for autoregression type problems”. did you mean that LSTM didn’t perform well in the cases like the example i mentioned in the first question(single series ,and predict t+1 with data before t).

Reply
- Jason Brownlee December 16, 2017 at 5:31 am #
  
  This post may help you with preparing the data:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  And this post has an example:
  https://machinelearningmastery.com/prepare-univariate-time-series-data-long-short-term-memory-networks/
  
  Correct.
  
  Reply
Ahmed Mbarak December 17, 2017 at 1:17 pm #

Hello Jason,

I hope you are doing fine.

I am getting this error and i don’t know why. I used my own data set for Ammarilo Texas.

raceback (most recent call last):
File “/Users/Ahmed/Desktop/Coding/P.prediction.py”, line 118, in
inv_yhat = scaler.inverse_transform(inv_yhat)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/preprocessing/data.py”, line 385, in inverse_transform
X -= self.min_
ValueError: operands could not be broadcast together with shapes (3567,13) (10,) (3567,13)

Reply
- Jason Brownlee December 18, 2017 at 5:20 am #
  
  The size of your data may not match the expectations of your model?
  
  Reply
Abdur Rehman Nadeem December 17, 2017 at 11:43 pm #

Hi Jason,

Currently I am working on a project and I am following your tutorials , they are great but I have some questions regarding LSTM. First is can you briefly tell what timestep is exactly and how that affects the performance of model?

In the above example, we used model.add(LSTM(50)), if we increase the no. LSTM cells, how that will affect the performance of model ?

In the above example, why did you assign shuffle = False, If we keep it true , dont you think that will increase the performance ?

How can I check the underfitting and overfitting of my model and result accuracy of the model ?

Best Regards,

Reply
- Jason Brownlee December 18, 2017 at 5:25 am #
  
  You can learn more about LSTM inputs here:
  https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
  
  I recommend testing different numbers of cells on your problem to see what works best.
  
  We do not want to shuffle inputs because all samples are sequential, learn more here:
  https://machinelearningmastery.com/handle-long-sequences-long-short-term-memory-recurrent-neural-networks/
  
  More about model diagnostics here:
  https://machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/
  
  Reply
TAMER A. FARRAG December 18, 2017 at 6:25 am #

hi Jason, I want to ask why you do normalization (scale) for data before “series to supervised operation”. for another example, this may cause denormalization errors when using n_in=2, n_out=1 .
So , It is better to do normalization after “series to supervised” operation?

Reply
- Jason Brownlee December 18, 2017 at 3:22 pm #
  
  I recommend normalizing before splitting the series into multiple features.
  
  Reply
Abdur Rehman Nadeem December 18, 2017 at 8:01 am #

Hi Jason,

Again appreciation for your blogs and thanks for the quick response but still have some queries.

I am working on a dataset whose size is approximately 2.5 Million and more than 10 features and this is a time series data and interval is 5 min so in my case should I use Truncated Backpropagation Through Time or just I should increase the no. of timesteps to 250-500 as mentioned in one of your blog ?

I have followed many of your tutorials but I did not see “dropout” anywhere but I have read at some places it dcreases the learning time ?

No. of timesteps tells that how many times we are going to backpropagate ? Please correct me if I am wrong.

One big confusion is when to use LSTM and when to Bidirectional LSTM .e.g. as I mentioned my dataset above what will be useful in my case ?

Best Regards,

Reply
- Jason Brownlee December 18, 2017 at 3:25 pm #
  
  Here are some ideas on strategies for dealing with long sequences:
  https://machinelearningmastery.com/handle-long-sequences-long-short-term-memory-recurrent-neural-networks/
  
  Here is an example of dropout with LSTMs:
  https://machinelearningmastery.com/use-dropout-lstm-networks-time-series-forecasting/
  
  Yes, time steps define BPTT, here’s more on BPTT:
  https://machinelearningmastery.com/gentle-introduction-backpropagation-time/
  
  Try bidirectional and see if it lifts model skill, here is an example:
  https://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classification-python-keras/
  
  Reply
Rui December 18, 2017 at 2:43 pm #

hello, nice example.

If you want to “compress” time, before entering the LSTM, using convNet1D how would you do ?

thanks in advance,
Rui

Reply
- Jason Brownlee December 18, 2017 at 3:32 pm #
  
  Depends on the problem.
  
  Perhaps you can compress all obs from an hour, day or week into a CNN output vector to feed into an LSTM.
  
  Reply
Stefano December 19, 2017 at 4:13 am #

Hi Jason,
I do not understand why you swap “samples” and “timesteps” meaning. From the Keras’ FAQ, a sample is an element of the dataset. In the case of timeseries prediction, an element of the dataset is a timeseries. In this case, you have just one timeseries. Instead you have N timeseries with just 1 timestep. A timeseries with 1 timestep is not really a timeseries. Anyway, you are not even setting the stateful property and the internal state is going to be reset at each step (sample in your case). So, how does the network remember?

Best regards

Reply
- Jason Brownlee December 19, 2017 at 5:21 am #
  
  When we frame our time series problem as a supervised learning problem, we can choose what constitutes a sample or a time step.
  
  Indeed, we need multiple timesteps in order to achieve true BPTT:
  https://machinelearningmastery.com/gentle-introduction-backpropagation-time/
  
  LSTMs can remember across samples if internal state is not reset.
  
  Reply
Abdur Rehman Nadeem December 19, 2017 at 9:41 am #

Hi Jason,

Really great blogs. I have never seen such nice blogs. But again I am disturbing you.

If I have a time series dataset at 5min interval which contain 250000 rows and 10 features and I want to predict one feature and If I apply Backpropagation Through Time (BPTT) using 200 timesteps:

1-> I have to reshape into [samples, timesteps, features] = [ 250000, 200, 10] ?

or

2-> I will have to split the 250000 time steps into 1250 sub-sequences of 200 time steps each and I have to reshape into [samples, timesteps, features] = [ 1250, 200, 10] ?

Which approach is the right for BPTT, both of them have mentioned in your blogs and now I am totally confused between these two ?

And kindly mention the reshape [samples, timesteps, features] for the above example in case of Truncated Backpropagation Through Time (TBPTT).

Regards,

Reply
- Jason Brownlee December 19, 2017 at 3:59 pm #
  
  Good question, here are some ideas that may help:
  https://machinelearningmastery.com/handle-long-sequences-long-short-term-memory-recurrent-neural-networks/
  
  Reply
Mahesh December 19, 2017 at 5:46 pm #

Dear Jason,
I am trying to Solve a problem using RNN and wish to explain that problem using this example and want to know how to apply RNN
If the test data had every other data other than PM2.5 ( Pollution) for few days , how to predict pollution using the Training data and test data with RNN
thanks

Reply
- Jason Brownlee December 20, 2017 at 5:39 am #
  
  Sorry, I’m not sure I follow. Can you perhaps rephrase your question?
  
  Reply
  - Mahesh December 21, 2017 at 11:59 pm #
    
    Dear Jason,
    Let me Rephrase my question
    We have a problem to solve similar to example you have explained above.
    Instead of explaining my problem, I would like to pose a question on this problem hoping that would provide some clues to solve my problem
    You had Stated
    
    Predict the pollution for the next hour based on the weather conditions and pollution over the last 24 hours.
    Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour.
    
    The first one is clear. But the second line is not clear to me
    Are you predicting the pollution for next hour based on Model created using past data AND using weather conditions like temperature, pressure for next hour ?
    if yes, then i would go ahead and read more on the solution you have posted
    if no, i am wondering how RNN can be used to solve a problem like
    Predict the pollution , not just for next hour but , say, for next 15 hours based on past data and with weather conditions also provided for those 15 hours
    
    Thanks
    
    Reply
    - Jason Brownlee December 22, 2017 at 5:35 am #
      
      Yes, I use the weather conditions for the next hour with the conceit that we pretend they are forecast weather condition rather than obs.
      
      Reply
jack December 19, 2017 at 11:58 pm #

Hi，jason
if i want to make Multivariate Time Series classification Forecasting with LSTMs in Keras.
what should i do ? my dataset is Y: classified variable(0/1) , X1：numericalvariable，X2：numerical variable，X3：numerical variable，and all of these variables are timeseries. i want to predict Y’s class.
thank you very much!

Reply
- Jason Brownlee December 20, 2017 at 5:46 am #
  
  Perhaps you can use the above tutorial as a guide?
  
  Reply
Abdur Rehman Nadeem December 20, 2017 at 3:04 am #

HI Jason,

You are not using in this blog “stateful = True”, how your network will remember the previous history ?

When we use property “returnSequences = True” ?

Please give a brief description.

Reply
- Anton December 20, 2017 at 5:39 am #
  
  This model is not rolling-forecast, so we don’t need to manually reset the cells memory as of reset_states() method, and therefore the model is not required to be “stateful = True”
  
  “returnSequences = True” is necessary for LSTM multi-layer stacking (probably not only), when each previous layer should return the same vectors as it received from the previous layer. In this post model Jason used only 1 LSTM layer, so it should transmit only one flat value to Dense(1) layer.
  
  Am i right?
  
  Reply
- Jason Brownlee December 20, 2017 at 5:49 am #
  
  The LSTM is still stateful, although state is reset at the end of each batch.
  
  Return sequences is appropriate when stacking LSTMs or when outputting a sequence.
  
  Reply
Anton December 20, 2017 at 5:31 am #

Hi Jason!

Is it important (or even necessary) to include the pollution of the previous timestep as the feature of observation to predict next?

var1(t-1) var2(t-1) var3(t-1) var4(t-1) var5(t-1) var6(t-1) \
1 0.129779 0.352941 0.245902 0.527273 0.666667 0.002290

var7(t-1) var8(t-1) var1(t)
1 0.000000 0.0 0.148893

I’m asking about var1(t-1)

Bacause if the pollution value is a result of all the other variables in the past, so why should we feed it to the LSTM?

Thanks for your great work!

Reply
- Jason Brownlee December 20, 2017 at 5:54 am #
  
  Test and see.
  
  Reply
Franzi December 21, 2017 at 1:07 am #

Hello Jason,

thank you very much for your turorial. I am wondering if it is possible to adapt your code to the a multi-step forecasting problem.
Can I predict multiple time steps of the pollution value under consideration of the other variables?

Thank you for your great work!

Reply
- Jason Brownlee December 21, 2017 at 5:27 am #
  
  Yes, use this post as a template:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
Ismael December 21, 2017 at 6:51 am #

Hi Jason!

Thanks for your tutorial, and the time you have dedicated to make it and answer all of us. And also sorry for my bad english!

I’m making a prediction model for water consumption, and I have for inputs, the real aggregated consume of a pool of people of the previous day, the previous-day forecast of consume for the day, if the day is labor/no labor, day of the week, and the average anual consume and standard dev for 10 subtypes of persons.

For last inputs, I have 20 columns, 10 for average consume, and 10 for standard dev.

With this, my question is, may I link in any way average consumue and std-dev, as something similar than a tuple, as input? I’m afraid that the model misunderstand relations between them.

Thank you in advance!! Best regards.

Reply
- Jason Brownlee December 21, 2017 at 3:33 pm #
  
  I would recommend brainstorming many different ways of framing the problem and test each to see what works best for your data, even ensemble a few of them together.
  
  Reply
Ankit Mishra December 23, 2017 at 7:53 pm #

Thanks for this blog on using RNN and using LSTM for forecasting.
and its very enlightning

i have been working on an energy dataset with dimensions(87647,7).(approx five years of data).The data is collected at every half an hour

.I have trained my model using a single LSTM and Dense Layer with test batchsize of 4 years and predicted and validated over a 1 year of data .

The test rmse is about 0.458 and train rmse is 0.058 .does this means my model badly overfits the data. i have scaled the data using minmax scaler just like your post

i have read your other blog of diagnosis of underfitting and overfitting and played with batchsize and epochs but it doesnt helps much .

can you give me insights upon how to improve my model performance ?
does LSTM regressor work well ?

Reply
- Jason Brownlee December 24, 2017 at 4:52 am #
  
  Great work! I have some ideas for lifting model skill here that might help as a first step:
  https://machinelearningmastery.com/improve-deep-learning-performance/
  
  Reply
Adam December 24, 2017 at 12:57 pm #

Hello Jason
thank you for such a great tutorial, I implement the code and it works fine with no problem.
but I was wondering about the future I mean how we may predict the next 10 hours or 5 days after the dataset ends based on this proven model

Reply
- Jason Brownlee December 25, 2017 at 5:22 am #
  
  You can call model.predict()
  
  Learn more here:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
Sara December 25, 2017 at 7:49 pm #

Hi,

Why have you trained both examples till the 50 epochs? because the lowest validation error on each example might happen somewhere before the 50th epoch. for example, 10th at the first one and 15th at the second one.

the 50th epoch might not be the best point.

Reply
- Jason Brownlee December 26, 2017 at 5:15 am #
  
  It is just a demonstration. You can tune the model with early stopping or any way you wish.
  
  Reply
tom December 28, 2017 at 1:23 am #

Hi Jason
Thanks for this awesome web site where I learned a lot about deep learning, but I have a question:
How to feed a multiple data sources (several csv files) special if these files are time series to neural network?
we may have a multiple data frames, different date format with different time steps, and may be different data format…etc.

Reply
- Jason Brownlee December 28, 2017 at 5:24 am #
  
  Perhaps you can use a multi-headed model, see an example on this post:
  https://machinelearningmastery.com/keras-functional-api-deep-learning/
  
  Reply
karim December 28, 2017 at 1:42 am #

Hello Jason
thanks for the tutorial, I did the example you did with no problems at all, thanks for the detailed description you did, but I have a question about what’s next.
I mean how to publish this model into a complete application that can make prediction with different data based on the model without repeating the whole training process all over again and again.

Reply
- Jason Brownlee December 28, 2017 at 5:25 am #
  
  I have notes on finalizing the model here:
  https://machinelearningmastery.com/train-final-machine-learning-model/
  
  I have some ideas on moving a model to production here:
  https://machinelearningmastery.com/deploy-machine-learning-model-to-production/
  
  Reply
Rui December 29, 2017 at 12:51 pm #

hello

when using K.set_image_dim_ordering(“th”)

on LSTM the input_shape(timeSteps,variables) becomes input_shape(variables, timeSteps) ?

Reply
- Jason Brownlee December 29, 2017 at 2:38 pm #
  
  I don’t know, try it and see.
  
  Reply
  - Rui December 30, 2017 at 6:28 am #
    
    I tried , In my problem I am using K.set_image_dim_ordering(“th”) the acc drop when I use input_shape(variables, timeSteps) … Looking on the internet (on completely different approaches) it looks like it does not change the dim ordering on LSTM like on ConvNets.
    
    With all that I assume the dim_order is always the same in LSTM : input_shape(timeSteps,variables)
    
    for K.set_image_dim_ordering(‘th’) or K.set_image_dim_ordering(‘tf’)
    
    Reply
    - Jason Brownlee December 31, 2017 at 5:19 am #
      
      I believe dimensional order is always the same for LSTMS and that changing dim ordering is only for images (e.g. impacting CNNs) as the name suggests.
      
      Reply
Peter Cserna December 29, 2017 at 10:04 pm #

Hello Jason,

I am wondering if I would one hot encode the wind feature, what modifications should be done on the shape of input?
Br,

Reply
- Jason Brownlee December 30, 2017 at 5:21 am #
  
  The length of the binary vector would be added to the number of input features.
  
  Reply
Peter K. December 31, 2017 at 10:01 am #

Jason,

Great tutorial, and outstanding book btw. I have two related conceptual questions and would appreciate your expertise:

1. Given that LSTM is stateful and has memory, what would be a valid reason to use multi-lag input? Is it just to force a quasi-working memory onto the LSTM or are there some other reasons?

2. You mention that LSTM is not ideal for autoregression. I don’t get this. Doesn’t the inbuilt memory make LSTM ideal for autoregressive time series?

And one more question: what’s your view on combining convolutional NN with LSTM for time series predictions, for instance to capture multi-scale patterns?

Happy New Year!

Reply
- Jason Brownlee January 1, 2018 at 5:28 am #
  
  Time steps are required for BPTT:
  https://machinelearningmastery.com/gentle-introduction-backpropagation-time/
  
  The memory in LSTMs is simple and cannot act like a stack making it poor for autoregression. Please read this post:
  https://machinelearningmastery.com/suitability-long-short-term-memory-networks-time-series-forecasting/
  
  I have not tried combining CNN and LSTM for time series, but I have for video classification:
  https://machinelearningmastery.com/cnn-long-short-term-memory-networks/
  
  Reply
Choi.HD January 3, 2018 at 11:01 pm #

Hello. Thank you so much. Dr.Jason

I have a question. How can we see a graph of a prediction, not loss graph? like 1 year after

Reply
- Jason Brownlee January 4, 2018 at 8:10 am #
  
  You can collect predictions and plot them using matplotlib.
  
  Reply
Vlad Gorlov January 4, 2018 at 7:53 am #

As far as I can understand so far (and I am a beginner in deep learning space), LSTM cannot handle trends or seasonality (you recommend making all series stationary with differencing and seasonal adjustment first). In practical business problems trends and seasonality are the most important aspects of forecasting so separating them out leaves us very little to work with. Any thoughts on how trends and seasonality could be handled by NN’s? In principle, NN’s are good at finding patterns and these are exactly that

Many thanks!

Reply
- Jason Brownlee January 4, 2018 at 8:18 am #
  
  Exactly as you say, model the structure and remove it, then model what you have left.
  
  I would encourage you to explore MLPs and only move to LSTMs if they lift model skill.
  
  Also, get creative about inputs to the model.
  
  Reply
  - Vlad Gorlov January 5, 2018 at 2:40 am #
    
    I keep hoping that given deep learning success across such a variety of applications it can also be used eventually to pick up these patterns just like today it can handle video. I doubt that there is something structurally intractable about trends, seasonality, lifecycles, etc. If people can do it, ML should be able to even if not just yet.
    
    Was looking at your CNN LSTM tutorial. Seems like a step in that direction. Of course, there we are dealing with a sequence of patterns each of which can be interpreted by CNN and then submitted to LSTM. Time series are not quite like that, they are sequences WITH patterns. But hopefully there is an architecture to handle that too
    
    Reply
    - Jason Brownlee January 5, 2018 at 5:28 am #
      
      It might come down to how the series is presented to the network.
      
      Reply
Antonio January 4, 2018 at 11:51 pm #

Hi Jason,

I have question, I am new to ML so please don’t get annoyed. I am actually trying to understand why the shape of a prediction does not have the same shape of test_X, I have fed the model with my data which is originally a time series with 3 values of a parameter max,min and avg, I have converted it to a supervised problem, I would like to predict these 3 values, so I’d expect the prediction to have more than one column, but I always get one column as output and I don’t understand which of the parameter values either min, max or avg is predicting.

Thanks a lot,
Antonio

Reply
- Jason Brownlee January 5, 2018 at 5:26 am #
  
  See this post for a good explanation of input shape:
  https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
  
  Reply
  - Antonio January 9, 2018 at 5:23 am #
    
    Thanks!
    
    Reply
Franziska January 6, 2018 at 4:17 am #

Hay, I would like to predict the pollution data for the next 10 timesteps so t+1 till t+10, just knowing the ‘dew’, ‘temp’, ‘press’, ‘wnd_dir’, ‘wnd_spd’, ‘snow’, ‘rain’ data of timestep t.

Is this possible? What do I have to change in the definition of series_to Supervised function?

Thank a lot in advance!

Reply
- Jason Brownlee January 6, 2018 at 5:56 am #
  
  This post has an example of multi-step forecasting:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
vlad January 7, 2018 at 6:31 am #

Hello and a happy new year! 😀

I’m back with more pertinent quesions. Managed to create the ml environment, finally, and ran this example with my own data (the values are all integers so i have not used the labelencoding() feature – used here for the wind dir)

i’ve transformed the data so it resembles the pollution data input, trained it but when executing

inv_yhat = scaler.inverse_transform(inv_yhat)

it returns the following error:

Traceback (most recent call last):
File “/Users/vlad/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py”, line 2862, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File “”, line 1, in
inv_yhat = scaler.inverse_transform(inv_yhat)
File “/Users/vlad/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/data.py”, line 385, in inverse_transform
X -= self.min_
ValueError: operands could not be broadcast together with shapes (13,13) (7,) (13,13)

the data structure is 303 rows x 7 columns (excluding the date)
training data size is 289.

Any idea what i’m doing wrong?

Reply
- Jason Brownlee January 8, 2018 at 5:38 am #
  
  Does the example in the post work as-is?
  
  Perhaps this post will help you reshape your data:
  https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
  
  Reply
Nathan D. January 9, 2018 at 1:41 am #

Hi Jason, thank you for the great post. I have a short question that hope you may address:

Is this fair to normalize both training and test datasets at the same time? I think in your post, the test dataset is truly the validation one so it should be ok. However, how do we normalize and re-scale the unseen test data in the future, in which they may contain values (at some features) larger/smaller than the max/min that we have seen in our training data?

Reply
- Jason Brownlee January 9, 2018 at 5:34 am #
  
  Yes, normalize the training dataset and use the min/max from training to normalize the test set.
  
  sklearn makes this really easy with their data transform objects.
  
  Reply
Leo January 9, 2018 at 2:10 pm #

Hey Jason, great work, thank you very much for your blog, it gives me many help.

However, I have a question. Your code removed the other 7 features from the test data, therefore, we need to restore them in last section to do the invert scaling. But, the code :

concatenate((yhat, test_X[:, -7:]), axis=1),

whether the test_X should be replaced by test_y in this line? is it right? Or it does not matter

Thanks again and happy new year!

Reply
- Leo January 9, 2018 at 2:15 pm #
  
  In fact, I mean the test_X[:, -7:]) should be replaced by test[:, -n_features:]?
  
  Reply
  - Jason Brownlee January 9, 2018 at 3:20 pm #
    
    Did you try it, does it work?
    
    Reply
  - chiu0602 May 17, 2018 at 1:59 am #
    
    I think it should be test_X[:, -(n_features-1):])
    
    By the way, thanks Dr Jason a lot for the useful articles and help through comments!
    
    Reply
- Jason Brownlee January 9, 2018 at 3:19 pm #
  
  It does not matter.
  
  Reply
Dan Baez January 9, 2018 at 3:49 pm #

Hi Jason, thanks for the great post and I recently purchased your book. Equally helpful for learning (I’m completely new to ML techniques!). I have question which is probably straight forward but has me puzzled.

In the China Pollution multivariate prediction code, what exactly is required to predict and print the next time hour prediction once I have updated all other variables with new data in the pollution.csv file? I have read other posts but it is still not clear to me. So essentially, I have run all the code as provided above not problems. I now have updated pollution.csv with my own variable data but can’t copy and paste any of the code provided to obtain new predictions….what is the exact code to use so I get a pollution value to be predicted and printed? Thanks in advance!

Reply
- Jason Brownlee January 10, 2018 at 5:20 am #
  
  Good question, I spell out how to make a prediction here:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
Sergio January 10, 2018 at 12:02 am #

Dr. Jason. Thank you so much always.

I have a question.

The value of result on your air pollution example was got 0.xxx. In other words, it is new value.

But in my case, the results exist. For example, area, weather, person are multivariate depends on time. And the sold number of icecream is the value of result through area, weather, person etc. And then i want to predict the sold number of ice cream in real time seeing datas. How can i make this codes? I think it can be regression or mixed regression and time series.

Thank you!

Reply
- Jason Brownlee January 10, 2018 at 5:27 am #
  
  Sorry, I don’t follow. Perhaps use the above code as a template for your problem?
  
  Reply
Antonio January 10, 2018 at 1:20 am #

HI Jason,

I have a question I hope you can answer, the prediction you make with your model, are a step-by-step prediction, that is you use the current pollution value to predict the next one, so their variations are not very big and I assume the predictions are very accurate because of that. My question is: how would I predict all the values of the next hour based on past data, in other words how would you predict the shape of the pollution function for the next x seconds based on past data?

Reply
- Jason Brownlee January 10, 2018 at 5:29 am #
  
  This is called multi-step forecasting, see this tutorial:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
JB January 11, 2018 at 6:48 am #

Jason-
This example is fantastic, but I have some questions. If I alter the model to where n_in = 12 and n_out =3, am I correct in understanding that I am essentially using the last 12 time points to forecast the next three in time? If that is so, wouldn’t there theoretically be multiple forecasts for each point in time? If so, how do we come up with the values that are output?

Reply
- Jason Brownlee January 12, 2018 at 5:46 am #
  
  There are multiple ways to predict 3 time steps ahead:
  https://machinelearningmastery.com/multi-step-time-series-forecasting/
  
  I would recommend this approach:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
Maria January 12, 2018 at 10:44 pm #

Hello, Brownlee.

First of all, thanks!

If your problem were classification, what “loss” function would you indicate?
What changes would you do in “design network” section?

Thanks

Reply
- Maria January 12, 2018 at 11:27 pm #
  
  I intend to use “categorical_crossentropy” loss function.
  My problem has 3 possible output classes (0,1 or 2)
  
  So the last layer I put 3 neurons. Right?
  
  Before all of this, i need to use LabelEncoder class and np_utils.to_categorical() method. Right?
  
  My doubts is about what activation function is better to my problem.
  
  Reply
  - Jason Brownlee January 13, 2018 at 5:33 am #
    
    Nope, you need to use categorical_crossentropy for > 2 classes.
    
    Reply
- Jason Brownlee January 13, 2018 at 5:33 am #
  
  binary_crossentropy for 2 classes otherwise categorical_crossentropy.
  
  Reply
steven January 13, 2018 at 9:57 am #

Hi Jason,

Nice example, very detail and great responses to questions. I just found this post when tried to see if LSTM outperforms normal statistical learning methods. From your answers, you alluded two important points:

1. LSTM is not great for autoregression, compared to MLP
2. SARIMA is better fit to this particular dataset

Can you elaborate the first point? Do you mean there is an AR model in the dataset, esp., pollution? I did acf and pacf on pollution (in R, not Python):

acf(pollution,plot=T)

Autocorrelations of series ‘pollution’, by lag

0 1 2 3 4 5 6 7 8 9 10 11 12
1.000 0.659 0.507 0.405 0.328 0.273 0.228 0.193 0.164 0.143 0.127 0.111 0.102 …

pacf(pollution,plot=T)

Partial autocorrelations of series ‘pollution’, by lag

1 2 3 4 5 6 7 8 9 10 11
0.659 0.128 0.053 0.024 0.023 0.012 0.012 0.006 0.011 0.011 0.004

From your experience, how would compare performance between MLP and linear regression (SARIMA or whatever)? I understand you don’t have an example on linear regression yet. So just keep the discussion in general.

Thanks,
Steven

Reply
- Jason Brownlee January 14, 2018 at 6:35 am #
  
  Yes, but we are modeling it as an AR: t = f(t-1, t-2, …).
  
  See more here:
  https://machinelearningmastery.com/suitability-long-short-term-memory-networks-time-series-forecasting/
  
  Compare the methods based on skill directly. Perhaps I don’t understand your question.
  
  Reply
  - Steven January 14, 2018 at 5:26 pm #
    
    Ah, now I realized what you refer “AR” to is different than I referred to after reading your link. Your AR is defined as the learning method: model prediction is based on previous knowledge at t-1, t-2, etc. What I referred to was time parametric behavior in the data itself. In other words, the dataset itself can or can’t be fit into AR, ARIMA… etc models and thus if LSTM would be advantageous to these parametric modeling methods.
    
    Reply
Sara January 16, 2018 at 4:21 am #

Hi Jason,
Thank you for this perfect post.
For prediction, in multivariate model, after saving this model, How I should call it back?

Reply
- Jason Brownlee January 16, 2018 at 7:40 am #
  
  This post will help you understand how to make predictions:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
Zack Stinnett January 17, 2018 at 9:20 am #

I have been working on this and I added the accuracy metric to compile and the results were really low. Is the accuracy supposed to be low?

model.compile(loss=’mae’, optimizer=’adam’,metrics=[‘accuracy’])

Epoch 50/50
1s – loss: 0.0143 – acc: 0.0761 – val_loss: 0.0141 – val_acc: 0.0393

Reply
- Jason Brownlee January 17, 2018 at 10:02 am #
  
  You cannot measure accuracy for regression.
  
  Learn more here:
  https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/
  
  Reply
John January 18, 2018 at 12:34 am #

Hi Jason,
Thank you for your article. I have a question about the encoding part right after the normalization. Why are you doing that since we don’t have classes as data are time series ?
Thanks in advance.

Reply
- Jason Brownlee January 18, 2018 at 10:09 am #
  
  Sorry, I don’t follow John, what do you mean encoding?
  
  Reply
Sacha Jacob January 19, 2018 at 9:29 am #

Dear Jason,
Thank you very much for this tutorial, it helped me a lot
I have one question: how should we model our LSTM to produce predictions for the next N days instead of just the current hour?
It makes more sense to produce a larger prediction windows for other applications such as sales forecast or weather forecast

Regards.

Reply
- Jason Brownlee January 20, 2018 at 8:14 am #
  
  See this post:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
Arpita January 21, 2018 at 2:30 am #

Your explanation is awesome and most helpful. My problem has multiple variables (5 input variables) of previous 24 time steps as an input,where n_in=24*5=120 and the output (forecast) only one variable with next 24 time step, where n_out=24*1=24. How can I solve this problem. Please help me.

Reply
John January 21, 2018 at 10:12 am #

Hello Jason. I am working on a project where i try to predict the evolution of a stock index. I used your function series_to_supervised to have one feature (which is obtained by offseting the stock index by one step). I trained my model on the data i have until. Then i tried to predict tomorrow index by using the model. Then i trained the model on the previous data plus the new information predicted for tomorrow in order to have a model that will be used to predict the stock index of day 2. But the problem is, besides it takes a lot of time, the result isn’t good. Do you have any idea how i can improve my algorithm ? Thank you

Reply
- Jason Brownlee January 22, 2018 at 4:40 am #
  
  Perhaps try an MLP instead? LSTMs are generally poor at autoregression type problems.
  
  Reply
Dan January 22, 2018 at 7:55 pm #

Hi Jason, when you refer to LSTMs being generally poor at autoregression type problems, would you be able to elaborate a little? The reason is I am confused by some literature which mentions that LSTM’s as being superior to ARIMA models for certain time series applications, and I thought ARIMA was an autoregressive type model. Perhaps I am misunderstanding something. Thanks!

Reply
- Jason Brownlee January 23, 2018 at 7:54 am #
  
  What literature have you seen Dan? I’d love to see. Any links?
  
  I see MLPs or ARIMA outperform LSTM on pretty much every time series problem I try.
  
  Also see this post with some refs on why LSTMs are poor at autoregression:
  https://machinelearningmastery.com/suitability-long-short-term-memory-networks-time-series-forecasting/
  
  Reply
Bartek January 22, 2018 at 11:22 pm #

Hello Jason,

How to add in your code the forecast for “date”. Let’s suppose that now we have test RMSE for ***next value*** – how to print something like that: The dust for 1/22/2018 will be around 9.16, and add forecast for longer times period like one month, one year.

Bartek

Reply
- Jason Brownlee January 23, 2018 at 7:56 am #
  
  This post explains how to make predictions in more detail:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
Sergio January 23, 2018 at 1:00 pm #

Thank you so much.
I have a question about this concept.
And then this LSTM get one formula and put the test_X on that formula
and compare between prediction by test_X and test_y?

If that operate like that, where can we see that formula?

Thank you!

Reply
- Jason Brownlee January 24, 2018 at 9:49 am #
  
  There is no over arching formula. There is an opaque model.
  
  Reply
Michel January 25, 2018 at 7:37 am #

Hi Jason!
Thank you for your post!
How would you do to predict future values ? As you don’t have future values of your features, how will you manage to have futures y_hat ? Does it mean that you will do yours predictions step by step and use y_hat of day to be the feature that will be used for day 2….etc ?

Reply
- Jason Brownlee January 25, 2018 at 9:10 am #
  
  model.predict(…)
  
  Reply
  - Michel January 25, 2018 at 10:08 am #
    
    Do i need to train again the model with including the future value predicted in order to predict the ones after ? Or do we keep the same model ?
    
    Reply
    - Jason Brownlee January 26, 2018 at 5:35 am #
      
      You can try both approaches and see which results in the most accurate predictions.
      
      Reply
      - Michel January 26, 2018 at 9:15 am #
        
        I tried both of them. Both of them gave very bad results. Do you have an idea to improve it ?
      - Jason Brownlee January 27, 2018 at 5:48 am #
        
        Yes, I have a few ideas:
        https://machinelearningmastery.com/improve-deep-learning-performance/
Hugues Laliberte January 29, 2018 at 5:53 am #

Hi Jason,

thanks for sharing all this knowledge, much appreciated.

I managed to run my model and i have a few observations/questions. My data is composed of 20’000 minutes of 10 inputs and 1 feature. My test data is the next 8’000 minutes. My objective is to forecast the next minute feature. So far i have used only the last minute data to train.

– On the first run, i let the model used the feature as an input and i got excellent results. But in reality i do not have that feature available, at least not in the last hour or two.

– So i removed the feature from the input (by removing it using the reframed.drop command) and then the results got pretty poor. I could not calculate the RMSE though as i got the error (operands could not be broadcast together with shapes (8098,9) (10,) (8098,9)), on instruction inv_yhat = scaler.inverse_transform(inv_yhat). Any idea how i can go around that ?

– So to improve on this i will use the code in the second part of your tutorial above to use more than 1 minute step as inputs, ideally 15, 30 or 60 minutes if possible/not too slow to train.

– In the discussions above, you often mention that MLP should give better results than LSTM for time series, at least they should be tried first. You gave the link where this is discussed, but have you made a tutorial on how to set-up a model with MLP ? Or is it part of your book ? I would like to try.

thanks for all your help,

Hugues

Reply
- Hugues Laliberte January 29, 2018 at 6:18 am #
  
  sorry, above i wrongly used “feature” , i wanted to say output or target.
  
  Reply
- Jason Brownlee January 29, 2018 at 8:20 am #
  
  This post will help you prepare the data so you can try an MLP:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  I hope to write a book on this topic soon to address all of these questions.
  
  Reply
  - Hugues Laliberte January 29, 2018 at 5:43 pm #
    
    thanks Jason,
    after googling MLP for time series, i bumped into this article of yours: https://machinelearningmastery.com/exploratory-configuration-multilayer-perceptron-network-time-series-forecasting/
    Can i follow this to build an MLP network for my time series ?
    
    Reply
    - Jason Brownlee January 30, 2018 at 9:47 am #
      
      You could use it as a starting point.
      
      Reply
sayan January 30, 2018 at 7:21 am #

Hi,
I’m running the code:

however there is an error in model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))

the error is Expected int32, got of type ‘Variable’ instead.

How I can resolve it.

Reply
- Jason Brownlee January 30, 2018 at 9:57 am #
  
  Sorry to hear that, I have not seen this error.
  
  Reply
Matt C January 30, 2018 at 4:48 pm #

Dear Dr. Jason,

Hello, I was wondering why you ignore the first column of X_test here in this line:

inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)

where you look at “test_X[:, 1:]”. Aren’t you losing one of the weather condition features? If not, then what is that column?

Thank you

Reply
- Jason Brownlee January 31, 2018 at 9:39 am #
  
  Why do you think that? Perhaps inspect the test_X to confirm what is going on.
  
  Reply
sushrut January 31, 2018 at 6:32 pm #

Hi Jason,

I have specific business problems and want to implement LSTM for the same.

1.Sales forecast with effect of promotion: Actual sales has trend and seasonality AND which is effected by promotions. I want to capture both the time series pattern AND promotion effectiveness on sales to get a final sales forecast.

2.Order forecast : My partner places orders on the company, which has its own pattern ALSO it is effected by the Inventory levels and the sales of a particular week.

Kindly advise on how to use LSTM for both the cases, since both have their own time series pattern (auto correlated) AND effected by other variables.

Reply
- Jason Brownlee February 1, 2018 at 7:17 am #
  
  I would recommend getting a handle on time series forecasting first:
  https://machinelearningmastery.com/start-here/#timeseries
  
  Reply
  - sushrut February 1, 2018 at 10:04 pm #
    
    Hi Jason,
    
    thanks for the reply. Dont you think only time series models wont help in my case, since i need to not only get the pattern of order forecast but also how inventory is effecting the order pattern.
    
    Kindly advise.
    Thanks.
    
    Reply
    - Jason Brownlee February 2, 2018 at 8:19 am #
      
      Perhaps try a few methods and see what works best.
      
      Reply
Neumaier C February 1, 2018 at 2:25 am #

Dear Dr. Jason,

your post here helped me a lot to get my LSTM model working.

I tried to create a second model, also using a multivariate time series, but this time i did not want to predict a single value from the data, I wanted to predict the data for the next timestep.

Assuming we have the data: [1.0, 0.2, 0.3], [0.9, 0.3, 0.1], [0.7, 0.1, 0.5]
I want to predict the whole term, not a single value. So for example [0.9, 0.3, 0.1] instead of [0.9].

I am kind of stuck on how to modify the model settings and i can not find any good references on this.

Do you have any suggestions?

Thanks a lot

Reply
- Jason Brownlee February 1, 2018 at 7:24 am #
  
  This post will help you prepare your data:
  https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
  
  Reply
  - Neumaier C February 1, 2018 at 7:04 pm #
    
    Thanks a lot for the answer.
    
    But does changing the input_shape have any effect on the output?
    
    That’s the point I am struggling with. I am trying to understand or go find a way to tell the network how many values it should predict. Shouldn‘t I change the shape of the train_Y data?
    
    I already prepared the data according to my plans: 8 features, 10 rumratend. For both train_X and train_Y. But when I try to fit the model, I am getting an input_dim error (expected shape (None, 10), so 10 single values rather than 10 series of data-vectors.
    
    Thanks and best regards
    
    Reply
    - Jason Brownlee February 2, 2018 at 8:11 am #
      
      Yes, you want to predict multiple output variables, you will need to shape your y variable accordingly.
      
      Maybe this post will help:
      https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
      
      Reply
Hugues Laliberte February 1, 2018 at 5:59 am #

Hello again Jason,

i’m making good progress,

i’m trying your multiple lags timesteps code above,

the results are pretty good, but again, my output is fed as an input, which is not realistic for me. In the single step code, i managed to change your code to remove my output from my input (by playing on the reframed.drop line.

But in the multiple lags timesteps, you do not show this reframed.drop line. I tried to add it, but for some reason it does not change my inputs, so my output is still in. Any idea how I can remove my output from my inputs in this scenario ?

Reply
- Hugues Laliberte February 1, 2018 at 6:39 am #
  
  forgot to mention above, i reduce the n_features parameter but it did not change my input data.
  
  Reply
simon February 1, 2018 at 5:33 pm #

Thanks for good examples!!!

I wonder the concept of this code is only to predict ‘pollution’ when we have other parameters (dew temp, press, wnd_spd, snow, rain) at the same time of prediction.

But can we predict all columns beyond 2014-12-31 23:00:00 (the last entry of the data)?

Let’s say we want to predict pollution level in 2015-01-01 01:00:00 and our current time is 2014-12-31 23:00:00. Since we don’t have any data about dewtemp, press, wnd_spd, snow, rain for the time 2015-01-01 01:00:00, how can we predict pollution level in 2015-01-01 01:00:00?

Thanks,

Reply
- Jason Brownlee February 2, 2018 at 8:07 am #
  
  You can frame the prediction model to predict tomorrow from today.
  
  Reply
simon February 1, 2018 at 6:10 pm #

And can we predict all column data at once?
Thanks,

Reply
- Jason Brownlee February 2, 2018 at 8:07 am #
  
  Sure.
  
  Reply
Shivam February 5, 2018 at 8:58 pm #

Hi, I know this is completely off-topic but would it be possible to code this in R?

Reply
- Jason Brownlee February 6, 2018 at 9:14 am #
  
  I don’t see why not.
  
  Reply
Patrik February 7, 2018 at 10:22 pm #

Hello and thank you for all the information on the site.
I may be confused here, but it seems to me that all the examples given throughout various posts deal with predicting the future based on historical data (e.g. predicting pollution for tomorrow based on observations from today, yesterday, etc.). Am I correct in assuming that this is what you refer to as “auto-regression problem”?
The scenario I would like to solve is a bit different: I want to predict the future based on predicted observations for the future (e.g. predicting pollution for tomorrow based on predicted temperature, dew, etc. for tomorrow (perhaps in addition to real, measured data from today, yesterday, etc.)). Is this a completely different problem category, or is it just a variation on the examples you have provided? Are LSTMs the right tool for this kind of problem?

Reply
- Jason Brownlee February 8, 2018 at 8:26 am #
  
  Autoregression means that output is a function of observations at prior time steps.
  
  Making predictions from predictions can become very unstable. It is called a recursive model in this post:
  https://machinelearningmastery.com/multi-step-time-series-forecasting/
  
  Reply
  - Patrik February 8, 2018 at 6:33 pm #
    
    Thank you, but this is not quite what I had in mind. Recursive model predicts the future and then uses those predictions to predict even further, and like you say, this can become unstable.
    The situation I am talking about is this:
    – we have historical observations for certain variables, including the target variable
    – we have future predictions about the same variables (except the target variable)
    – we want to predict the target variable based on the available predictions of other variables (let’s say we get predictions of temperature, dew, etc. from a meteorogical service)
    In the pollution scenario, this would mean that we want to find the correlation between the temperature, pressure, etc. and pollution (and this correlation can exist between lagged inputs and current pollution, but also between current-time inputs and current pollution).
    When the net learns this correlation, we will feed it information about temperature, pressure, etc. “from the future” and expect the pollution at said future date.
    But in the given example, it seems to me that the net only searches for the correlation between current pollution (at time t) and historical observations of certain variables (temp, dew, ..) Or am I missing something? Because the “present-day” observations are dropped from the training array, so the net can’t learn this correlation at time t.
    So I guess what I’m really asking is, whether LSTMs are only suitable for predicting the future based on historical (and only historical) observations, or can they also use input at time t to predict for time t?
    
    Reply
    - Jason Brownlee February 9, 2018 at 9:03 am #
      
      There are no rules. Suitability is to hard to comment on. To check if the method is appropriate for your data, try it.
      
      What is the best framing for your specific data? No one can say. I’d recommend brainstorming 5-10 framings, test each and see what works best for your data.
      
      Reply
Luca February 9, 2018 at 4:39 am #

Hi Jason,

thanks for your post, it was really interesting and helpful!
I was wondering, why does scaling the values into the range (0,1) affect the accuracy of the prediction? Is it a common practice in time series forecasting?
In fact, I tried to repeat the experiment without scaling, and I got an RMSE of 100.35. Also, the loss functions were much less steep. Could you please help me understand why this happens?

Thank you in advance,
Luca

Reply
- Jason Brownlee February 9, 2018 at 9:15 am #
  
  This is a good practice for neural networks, although is not always required.
  
  Reply
Adam February 9, 2018 at 4:52 am #

Quick one from me — I’m finding that my model doesn’t converge, and is pretty spiky. See loss graph here:

https://drive.google.com/file/d/1fLmgtP_YgBH67GWI9Is_nb8tihQd_vMj/view

Gonna play around with learning rate, drop-off and regularization — but had a feeling folks might have seen a graph that looks exactly like this before.

Welcome any thoughts!

Reply
- Jason Brownlee February 9, 2018 at 9:16 am #
  
  Might also try a larger network.
  
  This post might help:
  https://machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/
  
  Reply
  - Adam February 9, 2018 at 10:19 am #
    
    Thanks — I’ll give it a go. Both neurons and layers?
    
    The interesting thing I’m finding is that because of the spikes in test performance I can just get one run that’s pretty good, and the next one is terrible (with the same input). I realize I can fix seed, but I’m more worried about the results in “production”.
    
    Is it normal to do something like fitting for a few cycles, and forecasting each time and averaging the results? Or should I be trying to solve the “spiky-ness” problem directly?
    
    FYI: so far a dropout, and and a decaying learning rate have helped a bit … regularizaton might, but it’s just then taking too damn long to get to an answer 🙂
    
    Thanks for an awesome resource.
    
    Reply
    - Jason Brownlee February 10, 2018 at 8:47 am #
      
      Yeah, this is common.
      
      You could search for a config that is more stable. You could also try and iron out the forecast skill by creating n models and making predictions with an ensemble of all of them.
      
      I have notes here on how to control for the stochastic natural of the method:
      https://machinelearningmastery.com/evaluate-skill-deep-learning-models/
      
      And more general notes here:
      https://machinelearningmastery.com/randomness-in-machine-learning/
      
      Reply
      - Adam February 10, 2018 at 11:56 am #
        
        Perfect — giving that a whirl and seems to be doing ok! many thanks.
      - Jason Brownlee February 11, 2018 at 7:51 am #
        
        Glad to hear it.
Tanya February 11, 2018 at 10:34 pm #

Hey Jason,
thanks for the great post.
I am a pretty new in machine learning, but I have to see how to predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour. Can you help me with the code or how to change the current one in order to get such a prediction?

Thank you very much in advance!

Reply
- Jason Brownlee February 12, 2018 at 8:30 am #
  
  I believe you have everything you need to make this change.
  
  Reply
  - Tanya February 17, 2018 at 2:49 am #
    
    Hey Jason, thanks to answering. Unfortunately I have tried already and I did not get it working. That is why I’ve text you. Some hint or code will help me really very much. As I said, I am a pretty new in python and machine learning…..
    
    Reply
shamsul February 12, 2018 at 3:00 am #

# design network
model = Sequential()
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss=’mae’, optimizer=’adam’)

SIR,

WHAT IS DENSE? HOW WILL IT BE VARIED? IS IT RELATED TO THE NUMBER OF DATA POINTS WE WOULD LIKE TO PREDICT IN SINGLE FORECAST?

Reply
- Jason Brownlee February 12, 2018 at 8:31 am #
  
  Dense just means a fully connected layer, the parameter is the number of neurons in that layer.
  
  Does that help?
  
  Reply
rbk February 13, 2018 at 5:53 pm #

Do you have a recommendation for situations where we soon by have the target data available when using the NN? In this example, you may have a dataset that has monitored pollution but you cannot measure that on an ongoing basis and let:s suppose, for the sake of argument, that it cannot be easily calculated using ng an equation either.. Therefore, perhaps the LSTM needs to have its own calculated pollution fed back, in addition to easier measurements like wind and rain, in order to make a prediction about pollution at the next step. Suggestions?

Reply
- Jason Brownlee February 14, 2018 at 8:15 am #
  
  I would recommend exploring multiple different framings of your problem, evaluate them and see what works best for your specific data.
  
  Reply
  - rbk February 14, 2018 at 10:10 am #
    
    Fair enough. It seems like you are suggesting that other NN formulations are more appropriate for such a problem. I think i agree.
    
    Reply
    - Jason Brownlee February 14, 2018 at 2:40 pm #
      
      In my experience MLPs perform better for autoregression type forecasting.
      
      Reply
Erik February 13, 2018 at 9:42 pm #

Jason, thank you for your guide.

I have a general question regarding training (and predicting) on multiple time-series. I understand that the answer might be “it depends”, but I hope you can give some insight (or point me in the right direction).

I have N time series of variable length Mi, each sample in each time series having the same dimension D. My goal is to have the network train on some fraction of these N series and then predict on the remaining series. That is, unlike your tutorials, I am not interested in training a fraction of ONE time series and predict the rest of the same series.

Currently, I pad each time series so they are all equal length and create a matrix of shape (N*M’ x D) where M’ is the length of the longest time series. I split the matrix into two smaller matrices (train/test) and during training I feed the RNN network with (1 x D) samples in batches of some batch size B.

That is, in my sequential keras-model, my first layer (SimpleRNN) has input_shape=(1, D) and since I am trying to predict the following F steps my Dense output layer is a Dense(F) layer.

This works (at least, I get a result) but I am wondering if there is a better way to do it. Is it possible (and if so, better) to feed the network with samples of shape (Mi x D) (i.e. one time-series a the time)? Are there any “general rules” to follow when it comes to these sorts of things (if so, where can I read up on them)?

Thank you for a very interesting blog.
Cheers

Reply
- Jason Brownlee February 14, 2018 at 8:20 am #
  
  I would recommend brainstorming multiple framings of the problem and evaluate each to see what works best for your specific data.
  
  Also, consider starting with MLPs and only move to LSTMs or RNNs generally if they offer better results (often they don’t).
  
  Let me know how you go.
  
  Reply
TaZa February 17, 2018 at 2:50 am #

Hey Jason,

can I make out-of-sample forecast using LSTM network. Can you help and give me a hint how to do this in python.

Thank you very much in advance!
TZ

Reply
- Jason Brownlee February 17, 2018 at 8:50 am #
  
  Yes, I show exactly how in the tutorial above.
  
  Reply
Zou Yanyun February 17, 2018 at 2:02 pm #

Hi, thanks for your tutorial. It helps me a lot. And I’m wondering if there is only one hidden layer in this neural network. And how to determine the number of neurons?
Thank you very much.

Reply
- Jason Brownlee February 18, 2018 at 6:46 am #
  
  Trial and error is the best way to configure the number of layers and neurons. There are no reliable analytical methods to configure neural nets as far as I know.
  
  Reply
Haylee Ham February 25, 2018 at 7:19 am #

Jason, your website has been such any amazing resource for me. I have had trouble in my searches on Google scholar and elsewhere in finding the appropriate way to construct a NN for panel data and any tips would be greatly appreciated.

How would the data preparation/model change here if you were using a panel data set? In that case, the date would not be unique and so I assume should not be used as a index.

Also, how do you create the LSTM in such a way that it will produce predictions for all locations at time period t?

Reply
- Jason Brownlee February 25, 2018 at 7:47 am #
  
  Sorry, I have not worked with panel data. I don’t have good off the cuff advice.
  
  Reply
Ahmed Torky February 26, 2018 at 11:40 pm #

Hello Jason, I have read your work and it has been great advice for me. I have tried to implement it on time series (dynamic) analysis of buildings due to ground motion. Could you kindly consider the following:
I have the input as the ground acceleration X(t) and target as the motion of the first floor Y(t). I would like to train the network on LSTM, or any other RNN that would be suitable. However, researchers have published ideas that make use of other RNNs and Wavenet, yet they do not share their codes.
Could you kindly have a look at my work and inform me if there are better techniques to work with? Do you have any idea on how to use Wavelet Neural Networks?
Thank you for considering it.
Work found here: https://www.dropbox.com/sh/lqt97olutq9uca2/AAB1aCWlfFtP3BRJcGjjqwXUa?dl=0

Reply
- Jason Brownlee February 27, 2018 at 6:31 am #
  
  Sorry, I am not familiar with that paper, perhaps contact the authors of the paper?
  
  Reply
  - Ahmed Torky February 27, 2018 at 6:56 am #
    
    Thank you for your reply. What do you think of having both the predictor and target variables in time, would you use LSTM, or would ConvLSTM2D be better? I am not entirely confident in LSTM, and have read that applications like DeepMind have had better results with Wavenet. I am looking forward to you sharing your ideas because I trust your opinion.
    Thank you.
    Ahmed
    
    Reply
    - Jason Brownlee February 27, 2018 at 2:54 pm #
      
      A good place to start would be an MLP. I’d only recommend moving to an LSTM if you can lift model skill.
      
      Reply
Bosco Raju February 27, 2018 at 5:30 am #

Hi Jason,

Thanks for the great resource. I have a question.

Shouldn’t you apply MinMaxScaler normalisation after splitting the dataset into train/test? If you apply MinMaxScaler normalisation before splitting the dataset, the LSTM model will have sufficient information about the test sample during training? Therefore, it is not a true “test” sample. Or does it only apply for standardisation (z-score)? Could you please clarify on this matter? Thanks.

Bosco

Reply
- Jason Brownlee February 27, 2018 at 6:38 am #
  
  Yes, correct. I was trying yo keep things simple for the tutorial.
  
  Reply
William February 27, 2018 at 9:08 am #

Hi Jason

Thank you for this tutorial. I am new to RNN and this has helped me a lot. Is it possible to train a LSTM model to do forecasting using multiple multivariate time series?

I am currently working with a dataset that has N individuals and each individual has a time series that has 3 features and 16 samples (the time series are all of equal length, have the same time step and contain no missing values). What I would like to do is to train LSTM with the 3 feature values from t1, t2,…t15 to predict the 3 feature values at t16 for this sample population. Would you be able to offer some advice or point me to the right direction?

Thanks in advance

Reply
- Jason Brownlee February 27, 2018 at 2:55 pm #
  
  Yes. You could predict a vector for each time step, e.g. multiple units in the output layer and a TimeDistributedDense for the time steps.
  
  Reply
Jakob March 3, 2018 at 5:15 am #

Thank you for very interesting articles Jason.

Reply
- Jason Brownlee March 3, 2018 at 8:19 am #
  
  You’re welcome, I hope they help.
  
  Reply
vinyak March 6, 2018 at 5:48 pm #

Hello Jason,
I have a question about prediction in general.
1. Does it matter if you predict one value ahead or multiple values? for example: would 24 x one hour ahead forecast be more accurate than 24 hours ahead forecast if we do not use lags?

2. If we want to predict 24 values at a time for one day ahead forecast(wind, solar) how do we do that?

Reply
- Jason Brownlee March 7, 2018 at 6:11 am #
  
  One step forecasts are more accurate if you are using real obs as input to make the forecast.
  
  Forecasting a long time ahead with any model is really hard and will have a high error.
  
  In general, try multiple approaches with your data/model and see what has the lowest error.
  
  Reply
Hari March 7, 2018 at 5:07 pm #

Hi Jason,
Thanks for your articles. With a good combination of theory and code, it really helped me to get a kickstart in RNNs.

In your post, you mentioned that: “Remember that the internal state of the LSTM in Keras is reset at the end of each batch”. In addition, I would like to know if the LSTM reuses any hidden state among the instances within a batch.
For example, the first instance is: 0.129779 0.352941 0.245902 .. -> 0.148893. The second instance is 0.148893 0.367647 0.245902 .. -> 0.159960. If both belong to the same batch, will there be any hidden state which will get transferred to instance 2 after training based on instance 1 (or vice versa).

What I understood is that hidden states are maintained across timesteps within an instance. But hidden states are not reused/transferred across instances.

Reply
- Jason Brownlee March 8, 2018 at 6:21 am #
  
  Yes, state is reused between instances within a batch.
  
  Reply
  - Pavel Komarov July 23, 2021 at 4:35 am #
    
    According to the accepted answer here https://stackoverflow.com/questions/43882796/when-does-keras-reset-an-lstm-state, the states of samples within a batch are independent, stored in parallel, and completely don’t affect each other.
    
    Reply
    - Jason Brownlee July 23, 2021 at 6:03 am #
      
      Does not seem correct to me. Perhaps check the Keras API docs.
      
      Reply
weiliming March 7, 2018 at 9:17 pm #

Hi Jason, I’m so sorry, it’s too hard for me to read all of these comments.
My question is like this, now I have data from 80 cities, every city has 4 years of 8 input variables(pm2.5, DEWP, TEMP, PRES, cbwd, Iws, Is, Ir), I want to train a model which use all of these data from 80 cities, but only to predict in a specifed city.
I read some articles like “Example of LSTM with Multiple Input Features”, or “o Convert a Time Series to a Supervised Learning Problem”.
Q1: If I train a model by input shape(8760, 80, 8), how can I use model to predict air pollution of a single city, I do not have data from other 79 cities, so I can’t input (n, 80,
8), I can only input(n, 1, 8)
Q2: Convert LSTM to supervised learning may solve the problem, but I want to use time series RNN in the model, because In my dataset all features have strong time series relationship.
There is so little articles about this multi-input single-output RNN instance, I wonder if LSTM cannot do it.

Reply
- Jason Brownlee March 8, 2018 at 6:27 am #
  
  Generally LSTMs are poor at autoregression type forecasting problems. I would recommend MLPs first and only jump to LSTMs if they give better skill.
  
  Generally, you could model each city separately or have one model for all cities. I would recommend testing both and see what works best.
  
  This post will help you to better understand how to reshape input data:
  https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
  
  Let me know how you go.
  
  Reply
Monty Shaw March 8, 2018 at 2:00 am #

I have a question about the graph. Should the test line match the train line? I understand why we plot the error for the train and for the test, but since the model is trained when computing the test data, should it not be a straight line across the bottom (assuming a well trained model)? I guess I am concerned about ‘over-fitting’, something else I am confusing about.

I have modified the example given above, and I am getting Test RMSE: 22.027, and my line is fairly flat across the bottom, with a better rme than the training line.

I changed to use 90% of the data to train with, added another layer of lstm, changed the number or neurons to 32/16, and set the epoch to 10, batch size of 24.

Thanks for these great tutorials

Reply
- Jason Brownlee March 8, 2018 at 6:34 am #
  
  They could match, in general it would be nice if they did. You may see different results each run given the stochastic nature of the algorithm.
  
  Reply
  - Monty Shaw March 8, 2018 at 11:30 am #
    
    Interesting, I don’t see why they would ever match, unless the training model was not working or a bug in the code. It seems counter intuitive to me.
    
    Thanks for the reply
    
    Reply
latiaoshusheng March 9, 2018 at 1:08 pm #

Interesting! This is very useful for me, but I have a question that the features contain the historical PM2.5 what it is say all the train process contain y. I think it may be not right.

Reply
- Jason Brownlee March 10, 2018 at 6:17 am #
  
  Why is that?
  
  Reply
Sachin March 10, 2018 at 11:24 pm #

Hi Jason, while feeding the data to series_to_supervised function, it returns one row less than number of rows originally. Can you please have a look into it ?

Reply
- Jason Brownlee March 11, 2018 at 6:28 am #
  
  Yes, rows with NaN are removed.
  
  Perhaps read this post about why this is the case:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  Reply
NATALIE CARUANA March 11, 2018 at 10:08 pm #

Hi Jason,
thanks alot for this very interesting and useful tutorial!

Just one question…When you are scaling the data, you are using a range of (0,1). But then in LSTM you are not specifying the activation function. Doesn’t Keras assume tanh by default? If so shouldn’t the the data be scaled between -1 and 1 then?

thanks

Reply
- Jason Brownlee March 12, 2018 at 6:31 am #
  
  My own experiments have shown that 0-1 results in faster learning for LSTMs. Experiment for your dataset and use what works best.
  
  Reply
Kevin Daftary March 12, 2018 at 9:34 pm #

I’m working on a project about bus trip scheduling where I need to predict values for a particular timeslot, say 10:00:00-11:00:00 for the next week based on data from earlier months. Can this timeseries forecasting model be used to keep the timeslot same and just increment the day?

Reply
- Jason Brownlee March 13, 2018 at 6:28 am #
  
  I would recommend exploring multiple different framings of the problem and see what works best for your specific data.
  
  Reply
Fati March 14, 2018 at 1:30 am #

Hi,

How we can use sklearn train_test_split method for the second example?

Thanks

Reply
- Jason Brownlee March 14, 2018 at 6:27 am #
  
  What do you mean exactly?
  
  Reply
  - Fati March 14, 2018 at 8:18 pm #
    
    I meant instead of splitting data like this
    
    # split into train and test sets
    values = reframed.values
    n_train_hours = 365 * 24
    train = values[:n_train_hours, :]
    test = values[n_train_hours:, :]
    # split into input and outputs
    n_obs = n_hours * n_features
    train_X, train_y = train[:, :n_obs], train[:, -n_features]
    test_X, test_y = test[:, :n_obs], test[:, -n_features]
    print(train_X.shape, len(train_X), train_y.shape)
    
    What if we split the data using sklearn method?
    I split the data using sklearn method but I have got problem with reshaping, because I cant use hour and feature like you did.
    The reason for this question is that when I tried to use your sample I have got rmse=0 which means over fitting, so I decided to first split data to training and test data then do Normalization for each set, also I want the split be random because in this sample we don’t have random split (means we start at first row to 365*24 and the rest is for test).
    I hope I was clear.
    
    Reply
    - Jason Brownlee March 15, 2018 at 6:28 am #
      
      This post can teach you about reshaping data for lstms:
      https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
      
      Reply
      - Fati March 15, 2018 at 7:13 pm #
        
        Hi,
        
        Thanks for your help.
      - Jason Brownlee March 16, 2018 at 6:11 am #
        
        No problem.
      - Fati March 15, 2018 at 8:21 pm #
        
        Hi,
        
        Do we need to scale output? for example I have y_train and y_test do I need to scale these or not?
        
        Thanks in advance
      - Jason Brownlee March 16, 2018 at 6:17 am #
        
        It can help.
Moma March 14, 2018 at 1:51 am #

Hi Jason,

I would like to predict next 12 months of employee number based on 24 or more history data.

I have multiple features for this task such as turnover, profit and salaries.

So my first concern is what parameters should I supply for series_to_sequence function, would it be (values,24,12) appropriate solution?

Next, how should I use this time series frame from series_to_sequence to train on 24 months and predict employee numbers for next 12 months?

What should be the input for prediction model if I want to train on 24 months of 2016 and 2017 data and want to predict for whole 2018 year when I do not have any of the turnover, profit and salaries feature data for that year?

Thanks a lot!

Reply
- Jason Brownlee March 14, 2018 at 6:30 am #
  
  Perhaps you can use this post as a template:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
Rushabh Kapadia March 14, 2018 at 5:16 am #

Hi Jason,
I tried this code and modified it a bit according to my problem, the queries i had are:

1.The predicted forecast is yhat right? And if that is the case then, inv_yhat should be the forecast after scaling it back to the defined domain of values, now I’m getting negative values in these forecasts which should not be possible since the actual prediction and even the data does not have any negative values at all. (Assuming min-max scaler would map it back to the actual domain and there aren’t negative values in the domain)

2. If yhat isn’t the predicted forecast then which variable is?

This post was really helpful for implementing LSTM. Hopefully you can help me with my query.Thanks in advance.

Reply
Vishnu Prashanth Indramohan March 15, 2018 at 2:10 am #

Hi Jason,

Thank you so much for this great tutorial. I just need your suggestion/ reference to solve the business problem I have.

I have a dataset containing Dates, Product (Categorical Variable) and Quantities sold.
How can I forecast the Quantities sold for each Product(category)?

Say in this example, how can I use wnd_dir as categorical input to forecast the output?

Any suggestion would be highly helpful.

Thanks and Regards,
Vishnu

Reply
- Jason Brownlee March 15, 2018 at 6:32 am #
  
  This process will help you work through your problem systematically:
  https://machinelearningmastery.com/start-here/#process
  
  Reply
Moma March 15, 2018 at 3:36 am #

Thanks for your reply!

This is very useful post!

However, I still do not see if I use whole dataset of two years for training what should be the input in prediction model after I reframe to supervised sequence.

For example if I would use template from that post with series_to_supervised(values,1,3) with 6 features I would get (46,24) dimesion. So 3*6 is number of input columns and last 6 is output.

So expected output would be 10 3-month forecasts, but what would be the input to prediction model in real case without splitting the test set from reframed dataset in order to predict sequence for the next 12 months?

Thanks a lot!

Reply
- Jason Brownlee March 15, 2018 at 6:35 am #
  
  You define the input and output of the model. To make a prediction, you provide the required input.
  
  Perhaps this post will make this input/output relationship clearer for you:
  https://machinelearningmastery.com/how-machine-learning-algorithms-work/
  
  Also, this post will show you how to call the predict function:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
  - Moma March 17, 2018 at 2:03 am #
    
    Thanks Jason!
    
    Something is still keeping me down, so to be sure I understand I will give some example:
    
    If I have this reframed series with one lag value and 12 predictions for each month in a year.
    
    var3 corresponds to value that should be predicted, using multistep approach, from December in last year to predict January in next year and use January to predict February and so on.
    
    So what should be training set in model.fit, is it the first two columns (13, 2) for X and the third column (13,1) for y?
    
    What should be the argument in model.predict(?) for each time step prediction?
    
    (13, 39)
    var1(t-1) var2(t-1) var3(t-1) var1(t) var2(t) var3(t) var1(t+1) \
    1 -20.0 43000.0 3.0 -18.0 50692.0 3.0 -15.0
    2 -18.0 50692.0 3.0 -15.0 66060.0 3.0 -12.0
    3 -15.0 66060.0 3.0 -12.0 87786.0 3.0 -10.0
    4 -12.0 87786.0 3.0 -10.0 117319.0 3.0 -8.0
    5 -10.0 117319.0 3.0 -8.0 152754.0 4.0 -6.0
    6 -8.0 152754.0 4.0 -6.0 196452.0 5.0 -4.0
    7 -6.0 196452.0 5.0 -4.0 247350.0 6.0 -2.0
    8 -4.0 247350.0 6.0 -2.0 303460.0 6.0 -1.0
    9 -2.0 303460.0 6.0 -1.0 368524.0 8.0 1.0
    10 -1.0 368524.0 8.0 1.0 438343.0 9.0 2.0
    11 1.0 438343.0 9.0 2.0 517572.0 10.0 3.0
    12 2.0 517572.0 10.0 3.0 604000.0 12.0 3.0
    13 3.0 604000.0 12.0 3.0 688251.0 13.0 4.0
    
    var2(t+1) var3(t+1) var1(t+2) … var3(t+8) var1(t+9) \
    1 66060.0 3.0 -12.0 … 8.0 1.0
    2 87786.0 3.0 -10.0 … 9.0 2.0
    3 117319.0 3.0 -8.0 … 10.0 3.0
    4 152754.0 4.0 -6.0 … 12.0 3.0
    5 196452.0 5.0 -4.0 … 13.0 4.0
    6 247350.0 6.0 -2.0 … 15.0 4.0
    7 303460.0 6.0 -1.0 … 16.0 4.0
    8 368524.0 8.0 1.0 … 18.0 4.0
    9 438343.0 9.0 2.0 … 20.0 4.0
    10 517572.0 10.0 3.0 … 23.0 4.0
    11 604000.0 12.0 3.0 … 25.0 3.0
    12 688251.0 13.0 4.0 … 27.0 2.0
    13 788380.0 15.0 4.0 … 30.0 1.0
    
    var2(t+9) var3(t+9) var1(t+10) var2(t+10) var3(t+10) var1(t+11) \
    1 438343.0 9.0 2.0 517572.0 10.0 3.0
    2 517572.0 10.0 3.0 604000.0 12.0 3.0
    3 604000.0 12.0 3.0 688251.0 13.0 4.0
    4 688251.0 13.0 4.0 788380.0 15.0 4.0
    5 788380.0 15.0 4.0 892134.0 16.0 4.0
    6 892134.0 16.0 4.0 1006428.0 18.0 4.0
    7 1006428.0 18.0 4.0 1123891.0 20.0 4.0
    8 1123891.0 20.0 4.0 1252351.0 23.0 4.0
    9 1252351.0 23.0 4.0 1388010.0 25.0 3.0
    10 1388010.0 25.0 3.0 1526148.0 27.0 2.0
    11 1526148.0 27.0 2.0 1675973.0 30.0 1.0
    12 1675973.0 30.0 1.0 1827819.0 33.0 0.0
    13 1827819.0 33.0 0.0 1991810.0 36.0 -2.0
    
    var2(t+11) var3(t+11)
    1 604000.0 12.0
    2 688251.0 13.0
    3 788380.0 15.0
    4 892134.0 16.0
    5 1006428.0 18.0
    6 1123891.0 20.0
    7 1252351.0 23.0
    8 1388010.0 25.0
    9 1526148.0 27.0
    10 1675973.0 30.0
    11 1827819.0 33.0
    12 1991810.0 36.0
    13 2163000.0 39.0
    
    I very appreciate your help!
    
    Reply
    - Jason Brownlee March 17, 2018 at 8:42 am #
      
      Think of your problem in terms of model inputs and outputs, X and Y.
      
      Reply
  - Moma March 17, 2018 at 7:28 am #
    
    Just short explanation of the previous post. The thing is that I do not have available real features var1 and var2 for these months that I need prediction so that is why I am confised. What I am looking is similar behavior to generate prediction sequence like in Arima passing number of prediction steps if not input vector of var1 and var2.
    
    Thanks!!!
    
    Reply
Fafa March 15, 2018 at 4:35 am #

Hello, what if we have both categorical and numerical dataset? Is the code works fine?

Reply
- Jason Brownlee March 15, 2018 at 6:35 am #
  
  Categorical variables might need to be integer encoded or one hot encoded first.
  
  Reply
Marco March 15, 2018 at 7:25 am #

Hi Jason,
I have read the article on how to tune the parameters on the LSTM neural network and i have tried to do it on this dataset. My problem is the following: everytime i run the model even with the same number of epochs and without changing the parameters i obtain different results in term of RMSE. So even if found that the optimal number of epochs is 90, when i run the model with 90 epochs i obtain everytime different results.
Why does this happen? Do you have any suggestion ?

Reply
- Jason Brownlee March 15, 2018 at 2:44 pm #
  
  Yes, this is a feature of neural networks. Perhaps this post will make things clearer:
  https://machinelearningmastery.com/randomness-in-machine-learning/
  
  See this post for a better way to evaluate neural networks:
  https://machinelearningmastery.com/evaluate-skill-deep-learning-models/
  
  See this post for locking down randomness if you want to go that route:
  https://machinelearningmastery.com/reproducible-results-neural-networks-keras/
  
  I hope that helps.
  
  Reply
Zou Yanyun March 15, 2018 at 1:44 pm #

Hi Jason,
I want to predict the air pollution in next two, three or more hours instead of only next one hour, how can i modify the code?
Thank you so much.

Reply
- Jason Brownlee March 15, 2018 at 2:51 pm #
  
  Use this post as a template:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
Beibei March 16, 2018 at 2:50 am #

Hi Jason,
Thanks for your excellent blogs and it gave me much help! I am confused about the sequence length, the lag timestep and timestep. Is lag timestep same as the sequence length? I used your codes on my data and I set the lag timestep as 12. When I used the built model to predict new data, the number of the result became less. For example, I want to predict the number of 13 but I only got 1 result data.

Reply
- Jason Brownlee March 16, 2018 at 6:22 am #
  
  Perhaps this post will help you understand lag obs:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  This post will help you understand how to reshape input:
  https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
  
  Reply
Li Yue March 16, 2018 at 7:23 pm #

hi, this post really helps me a lot. thank you. i am confused that why the test set has more samples than the training set and the loss on test set is smaller than the training set. wish to get your reply, thank you.

Reply
- Jason Brownlee March 17, 2018 at 8:34 am #
  
  The training set has more than test, 4 years vs 1 year.
  
  A test loss less then training loss my be a statistical fluke.
  
  Reply
  - Li Yue March 18, 2018 at 10:08 pm #
    
    sorry, i missed these words in your blog: “To speed up the training of the model for this demonstration, we will only fit the model on the first year of data, then evaluate it on the remaining 4 years of data.” and “Interestingly, we can see that test loss drops below training loss. The model may be overfitting the training data.” I trained the model with the first year of data, and the test loss less than training loss maybe because “the model is overfitting the training data”. i will try to train the model with 4 years data, and calculate the loss on training set and test set to see if the overfitting can be solved. Thank you for your reply.
    
    Reply
  - Li Yue March 18, 2018 at 10:51 pm #
    
    i am also confused about these words in your blog: “Interestingly, we can see that test loss drops below training loss. The model may be overfitting the training data.”as what i have learned, if the model overfits the training data, the model will perform better on training set than test set and loss on training set will be less than on the test set.
    
    Reply
Fati March 17, 2018 at 1:06 am #

Hi,

How we can use model to predict value on new input data?
I saw you have post that talks about save and load model , if I want to apply this model on new data what should be the shape of input? (none, timestep, feature)?

Thanks,

Reply
- Jason Brownlee March 17, 2018 at 8:40 am #
  
  This post will show you how:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
Med March 18, 2018 at 12:58 am #

Hi Dr.Jason,

Firstly, thank you very much for this tutorial!
My question is, how to interpret the result and make a prediction, how to make a prediction using a new data?
Thank you,

Reply
- Jason Brownlee March 18, 2018 at 6:04 am #
  
  See this post:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
Jay March 19, 2018 at 4:22 pm #

Hi Dr.Jason,

Thank you very much for this tutorial!

Not sure if you have the same problem as I had, Well, keras is using tensorflow as the backend, it was kinda of using this code ( x = tf.placeholder(dtype, shape=shape, name=name)
and then this error shows up.

TypeError: ‘NoneType’ object is not callable

Reply
- Jason Brownlee March 20, 2018 at 6:11 am #
  
  Sorry, I have not seen this error. Perhaps you could try posting to stackoverflow?
  
  Reply
Christian March 21, 2018 at 11:30 pm #

Hi,
Thanks for the post.
Is it possible to frame the supervised learning problem as predicting the pollution at the next time step based only on the weather conditions at the current time?
Cheers,
Christian

Reply
- Jason Brownlee March 22, 2018 at 6:24 am #
  
  Sure.
  
  Reply
Marco March 22, 2018 at 6:21 am #

Hi Jason,
As some other people notice when you plot the graph of predicted and real values, it seems that they are shifted by one. I think that the main reason of this problem is the following line:

68 – test_X, test_y = test[:, :n_obs], test[:, -n_features]

The problem i that in this way when you do ‘test[:, :n_obs]’ you are you using the data of the previous hour, while the corresponding label that you have are scaled by one.
Instead if you do like this ‘test[:, n_obs : ] ‘ the results will be corrected and not shifted:

68 – test_X, test_y = test[:, n_obs : ], test[:, -n_features]

I have made some tests and i am quite sure that this is an error. Let me know what do you think

Reply
Nick WONG March 22, 2018 at 7:46 pm #

Hi Jason, I have followed your tutorials and they are very nice and helpful.
I’ve made a LSTM wheat price prediction model on Kaggle based on your tutorial.
Just want to share it and encourage others to try their hands on.

https://www.kaggle.com/nickwong64/lstm-wheat-price-predictions/

Reply
- Jason Brownlee March 23, 2018 at 6:05 am #
  
  Well done!
  
  Perhaps you could link back to where you copied the code from and credit the source?
  
  Reply
  - Nick WONG March 23, 2018 at 12:20 pm #
    
    Sure, added it back.
    
    Reply
- GuanYu April 2, 2018 at 4:28 pm #
  
  cool
  
  Reply
Moma March 22, 2018 at 11:24 pm #

Hi Jason,

I still have a problem with defining the input data in prediction. If I reframe the problem for example as 1 lag value and 1 prediction, from previous month to predict the next, I get (24,14) for 2 years of history data and 7 features so when I reshape it I get X for training with this dimension (24,1,13) and y (24,). I am using whole this history data to train LSTM and up to this step everything is ok when I design and train LSTM.

But if I pass last row from history data that represents December 2017, as input data in prediction method which is this dimension (1,1,13) I actually evaluating prediction of the last row for employee count that corresponds to this December, not generating new prediction for January 2018.

I do not have new features (salaries, turnover, etc) from the next month (January 2018) to generate prediction (number of employees) for that month.

I really do not understand what to pass as input in prediction to generate sequence of next 12 months from previous lag values. Can this be done like in ARIMA where we just pass the number of time steps for which we need prediction?

Thanks a lot!

I am sorry for bothering you with this!

Reply
Ruhin March 25, 2018 at 12:48 am #

Sir,can you please provide me with a python code for “NETWORK ANOMALY DETECTION IN RNN USING LSTM”.

Reply
- Jason Brownlee March 25, 2018 at 6:32 am #
  
  Thanks for the suggestion.
  
  Reply
Purvesh March 26, 2018 at 7:31 am #

TypeError: parse() takes 1 positional argument but 4 were given
while converting into timestamp

Reply
- Jason Brownlee March 26, 2018 at 10:04 am #
  
  Perhaps confirm that you are using Python3 and all libraries are up to date.
  
  Reply
  - Francesco Dainese June 8, 2018 at 8:07 pm #
    
    Hi Jason, I confirm I am using Python3.6 with Spyder IDE and I have just installed DateTime package thruogh conda but problem still remains. Yet I don’t understand the syntax in *date_parser=parse*. Shouldn’t it be *date_parser=parse(x)* with x being a tangible variable?
    PS:I’ve found and installed DateTime package..is it the one required?I have not found any other similar.
    
    Reply
    - Jason Brownlee June 9, 2018 at 6:52 am #
      
      We are providing the name of the function, not calling the function.
      
      Reply
  - Francesco Dainese June 9, 2018 at 3:14 am #
    
    my bad, there was a misplelled % inside the function…
    
    Reply
    - Jason Brownlee June 9, 2018 at 6:55 am #
      
      Glad to hear you worked it out.
      
      Reply
- Nilani May 16, 2020 at 3:32 pm #
  
  You need to import datetime as following:
  
  from datetime import datetime
  
  Reply
Yanyun Zou March 27, 2018 at 12:39 pm #

Hi Jason,
I’m considering the structure of this LSTM network. Is there a recurrent loop between hidden layer and output layer? Or is there a recurrent loop just in the hidden layer? I want to know where the circular structure is.

Reply
- Jason Brownlee March 27, 2018 at 4:19 pm #
  
  Recurrence occurs within the LSTM layer.
  
  Reply
joshua March 28, 2018 at 3:39 am #

Hi I got the code examples to run but I am curious how to make use of it?
I seen this use case for tuning boilers or furnaces.
https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc451.htm

Reply
- Jason Brownlee March 28, 2018 at 6:29 am #
  
  You can use it however you like. Perhaps I misunderstand your question?
  
  Reply
Raj March 29, 2018 at 2:24 am #

Hi Jason,

I stumbled upon your website through a referral link in LinkedIn. You have some great tutorials and a great teaching style, kudos to you. I followed through this tutorial and have a question related to a problem that I’m trying to solve. I’ve a time series data similar to the example in this tutorial except it has the following format:

t-n: x1(t-n) x2(t-n) x3(t-n) x4(t-n) x5(t-n) y(t-n)
:
:
t-2: x1(t-2) x2(t-2) x3(t-2) x4(t-2) x5(t-2) y(t-2)
t-1: x1(t-1) x2(t-1) x3(t-1) x4(t-1) x5(t-1) y(t-1)
t : x1(t) x2(t) x3(t) x4(t) x5(t) y(t)

I’m trying to predict y but y can’t be part of the feature vector [x1, x2, x3, x4, x5]. Will a LSTM architecture be able to predict y(t-n),…, y(t-1), y(t) in such a scenario? Thinking of y as say, temperature, y could be increasing as a function of time even for the same set of values of the feature vector. Will the code and example in this tutorial be applied to this case?

Thank you.

Reply
- Jason Brownlee March 29, 2018 at 6:37 am #
  
  Generally LSTMs are pretty poor at time series. Perhaps explore using an MLP instead?
  
  Reply
Sam March 30, 2018 at 4:05 am #

Hello,

Thanks for your article and I have a question.
In most cases, as you explained in your article, the goal of model is to predict y(t) given x1(t-n), x1(t-n-1),…,xn(t), y(t-n), y(t-n-1),…,y(t-1).

But in my case, I have lot’s of person’s time series data like the following. So I don’t know about how to split and use my data for model training.

[data for person #1]
x1(t-n) x1(t-n-1) … xn(t-1) xn(t) y(t-n) y(t-n-1) … y(t-1)
2011
2012
2013
…
2017

[data for person #2]
x1(t-n) x1(t-n-1) … xn(t-1) xn(t) y(t-n) y(t-n-1) … y(t-1)
2011
2012
2013
…
2017

…

[data for person #n]
x1(t-n) x1(t-n-1) … xn(t-1) xn(t) y(t-n) y(t-n-1) … y(t-1)
2011
2012
2013
…
2017

The goal of my model is to predict y(t) given a new person’s time series data.

Any opinions on how to design and train model will be appreciated.

Thank you

Reply
- Jason Brownlee March 30, 2018 at 6:46 am #
  
  Perhaps model per person, per group of people or for all people.
  
  Try each and see what works best?
  
  Reply
Med March 31, 2018 at 8:29 am #

Hi Mr Jason,
I made a prediction with this model using new data, I want to know what is the relation with the prediction value an RMSE? for exemple: real prediction=model prediction+RMSE?

Thank you

Reply
- Jason Brownlee April 1, 2018 at 5:40 am #
  
  The RMSE is an estimation of the model error when making a prediction.
  
  It cannot be used directly for calculating a confidence interval or a prediction interval.
  
  Reply
walau April 3, 2018 at 12:18 pm #

i transform a new dataset using function series_to_supervised, some of the values become negative, hows that happened

Reply
- Jason Brownlee April 4, 2018 at 6:04 am #
  
  That is surprising. The values are not changed. Perhaps check your original dataset?
  
  Reply
Raj April 5, 2018 at 2:08 am #

Is there a way to initialize the hidden state to a specific non-zero value in Keras? My understanding is that hidden and cell states are initialized to zero by default. Are you aware of any setting where I can set h0 to an arbitrary value for LSTM?GRU layer?

Reply
- Jason Brownlee April 5, 2018 at 6:14 am #
  
  In all of my testing, initializing state or warming up state has had no effect on model skill.
  
  Reply
Marco April 7, 2018 at 6:45 am #

Hi Jason,
In this example you use the data of the previous n hours to predict sample measure of the actual hour, but if i understand well you are not using the values of Humidity, Pressure ecc… of the hour you want to predict, but only of the previous hours. How can i do to use also the weather data of the hour that i want to predict?

Reply
- Jason Brownlee April 8, 2018 at 6:10 am #
  
  Take a look at the section “LSTM Data Preparation” to change the data yo wish to feed into the model.
  
  Reply
  - Marco April 8, 2018 at 6:26 am #
    
    What i want to do is :
    ‘Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour.’
    The problem is the following,the input_shape of the LSTM layer is :
    
    input_shape=(train_X.shape[1], train_X.shape[2])
    
    So if we have a lag=3 and a features=8 the shape will be (3,8).
    If i want to add the weather conditions of the next hour i should use instead 7 features since i cannot insert the feature that contain the value i want to predict(the pm 2.5 concentration), and this will raise an error in the LSTM
    How can i solve this problem?
    
    Reply
    - Andy August 16, 2018 at 5:57 am #
      
      I have this same question/issue as Marco. Should we drop the pollution from the features altogether when training the model?
      
      Reply
      - Jason Brownlee August 16, 2018 at 6:15 am #
        
        Why? We have a time series of pollution values in the past, they may be useful in predicting pollution in the future.
      - Andy August 17, 2018 at 1:35 am #
        
        Hi Jason, thank you for your quick response. I agree I would like to keep the pollution for previous timesteps, but I encounter the issue I will attempt to describe below.
        
        My goal is to include the features of the current timestep when predicting the pollution of the current timestep. The problem I have is that the current timestep has only 7 features since we do not have the pollution, but the previous timesteps have 8 features since we do have the pollution for those.
        
        This creates a problem when attempting to reshape the features into (samples, timesteps, features) because the current timestep has 7 features and previous timesteps have 8. Does that make sense?
      - Jason Brownlee August 17, 2018 at 6:32 am #
        
        You need a new framing of the problem, where pollution at the current time step is not used as input.
        
        Remember, the way the model is trained is the way the model will be used when making predictions. Start with what you want to forecast and with what input and work backwards to the framing required to address it.
      - Marco Miglionico September 25, 2018 at 7:13 am #
        
        Hi Andy did you find any solution?
ANWAR M ALQUTAN April 10, 2018 at 6:56 pm #

Very excellent code, really thanks Jason
How we update the last example train on multiple lag timesteps to be at the same time forecast multi step in futures
(I.e. multivariate & multisteps with the same code)
I tried that but I faced some problems with difference
Thanks again

Reply
- Jason Brownlee April 11, 2018 at 6:34 am #
  
  Here’s an example of multi-step forecasting with LSTMs that you can use as a starting point:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
hugo April 16, 2018 at 9:49 pm #

hi Doctor：
thanks for sharing.
I have a question:
why use all the 35k samples for training. its too long. You said the time-step is nice between 200-400.In this blog,the time-step is 1.why dont split to 35k/20 samples?

Reply
- Jason Brownlee April 17, 2018 at 6:00 am #
  
  In the example, we do split up and only use the prior time step as input.
  
  Quoting from the post:
  
  We will frame the supervised learning problem as predicting the pollution at the current hour (t) given the pollution measurement and weather conditions at the prior time step.
  
  Reply
hugo April 17, 2018 at 2:59 pm #

many thanks, Jason! you sovled my big problem. So,If the sequence is long enough, I can use 200 time-steps right?

thanks a lot.

Reply
- Jason Brownlee April 17, 2018 at 3:01 pm #
  
  Yes.
  
  Reply
Chris Skywalker April 17, 2018 at 9:00 pm #

excellent blog!! Thanks Jason.
Can LSTM output more than one type of output at the same time? like pollution and rainfall.

Reply
- Jason Brownlee April 18, 2018 at 8:03 am #
  
  Absolutely.
  
  You can output a vector each time step.
  
  Reply
James April 18, 2018 at 12:19 am #

Dear Jason:
Is there any way to output more than one type of sequence at the same time? Like pollution and rainfall

Reply
- Jason Brownlee April 18, 2018 at 8:10 am #
  
  Yes, the model can output a vector at each time step.
  
  Reply
Gabriela April 19, 2018 at 12:52 am #

Hello everyone,
here you find a script for several hours forecasting based on Jason’s code: https://github.com/gabrielamolinar/LSTM_TSForecasting.git
I hope you find it useful.
Cheers!
Gabriela

Reply
- Jason Brownlee April 19, 2018 at 6:34 am #
  
  Nice work!
  
  Reply
Kingsley Udeh April 19, 2018 at 9:57 pm #

Hi Jason,

Thanks for the post and tutorial.

Having calculated the RMSE for the LSTM, how could we now show pollution(t) from the previous one time step, say pollution(t-1), using real values, after knowing the error?

For example, I want to feed in previous pollution value for the past one hour, and see the corresponding forecast for time t, or t+1.

Reply
- Jason Brownlee April 20, 2018 at 5:50 am #
  
  That is what the model is doing.
  
  Perhaps I misunderstand your question?
  
  Reply
  - Kingsley Udeh April 20, 2018 at 6:05 am #
    
    That is, looking at the Temperature variable, I want to see the value 148 printed when I feed in 129(previous value) to the model, just as we do in feed forward networks, or do we just conclude that since the test or validate error is close to train error, and these error values are small, that the model has accomplished its expectation?
    
    I used my dataset to adapt your code, with few modifications, for the multivariate, one time step forecasting, and I got a RMSE of about 3.5. I’m wondering if that should be indication that the model is performing well.
    
    Reply
    - Jason Brownlee April 20, 2018 at 2:19 pm #
      
      I’m not sure I follow, sorry.
      
      This post has more information on how to make predictions with LSTMs:
      https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
      
      This post has more information on how to determine if model performance is good or not:
      https://machinelearningmastery.com/how-to-know-if-your-machine-learning-model-has-good-performance/
      
      Does that help?
      
      Reply
      - Kingsley Udeh April 21, 2018 at 2:00 am #
        
        Thanks, Jason. Your send link helps.
Jack April 24, 2018 at 2:14 am #

Dr. Brownlee, thank you for your tutorial. I’ve learned so much from you.
Here is something I don’t understand. In this example, the past pollution data (t-1) is an input variable, but what if I don’t have this data? Say if I have the past pollution data and past weather condition data and the next-24-hour weather condition data, and I want to use it to predict the pollution values for the next 24 hours, what should I do? How does it work if I don’t have the true values of current pollution data and just want to predict it?

Reply
- Jason Brownlee April 24, 2018 at 6:36 am #
  
  You must design a model to predict based on what data you do have or expect to have.
  
  You can frame the problem any way you wish, there are just no guarantees that the problem can be learned sufficiently.
  
  Reply
  - Jack April 24, 2018 at 12:54 pm #
    
    You mean I have to design my own model? Is there any adjustments I can make to the LSTM model? I’m dealing with a time series prediction problem now. I want to combine the past time series data with the influencing factor data, and that’s why I choose to do it with LSTM, because I think this model incorporate both. My objective is to forecast the target variable in the future, but I can’t do it without the corresponding time series data. I’m kind of stuck here, so what do I do now?
    
    Reply
    - Jason Brownlee April 24, 2018 at 2:50 pm #
      
      Not quite, I meant that you have control over the inputs and the outputs of the model.
      
      Does that make sense? In the example above I took the dataset and decided what the inputs and outputs were going to be. It is not obvious, there are many ways to do it, and there is no one best way. Frame the problem in a way that makes sense based on data you do have, or enumerate many framings of the problem to see what works best (if you have the resources).
      
      There models are approximating a mapping function, learn more about this here:
      https://machinelearningmastery.com/how-machine-learning-algorithms-work/
      
      For help on defining this for your dataset, see this post:
      https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
      
      If you are struggling to prepare time series data for the model, perhaps this post will help:
      https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
      
      Reply
      - Jack April 25, 2018 at 6:19 pm #
        
        Can I do a multi-step multi-variable forecasting?
      - Jason Brownlee April 26, 2018 at 6:25 am #
        
        Sure.
      - Jack April 26, 2018 at 6:03 pm #
        
        Do you have an example of multi-step multi-variable forecasting?
      - Jason Brownlee April 27, 2018 at 6:03 am #
        
        Not directly, you can combine this tutorial (above) with this tutorial:
        https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
izsak April 24, 2018 at 2:12 pm #

‘ProgbarLogger’ object has no attribute ‘log_values’
could you tell me how can i fix it？thx！

Reply
- Jason Brownlee April 24, 2018 at 2:52 pm #
  
  Sorry, I have not seen this error.
  
  Perhaps try searching and/or posting on StackOverflow?
  
  Reply
StarsDie April 25, 2018 at 2:15 pm #

Great article! I just have a quick question: because of the inherent nature of the RNN, if we’re trying to understand the fit for y(t), we use information from the past such as y(t-1), y(t-2).. etc. as part of the ‘features’. But when we’re performing prediction, in this example, it seems like the lag values are actually coming from the existing data as well.

In a real world scenario, should we predict for one time step at a time, and then use the predicted values as the ‘past’ values for the next prediction?

Reply
- Jason Brownlee April 26, 2018 at 6:20 am #
  
  Good question, there are multiple ways to solve this and I recommend testing each on your specific prediction problem.
  
  For an overview see this post:
  https://machinelearningmastery.com/multi-step-time-series-forecasting/
  
  For one example with LSTMs see this post:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
Zou Yanyun April 27, 2018 at 12:08 pm #

Hi Jason,
This tutorial helps me a lot.
And I want to add more LSTM layers instead of only one LSTM layer, how can I modify the code?
Thank you very much.

Reply
- Jason Brownlee April 27, 2018 at 2:28 pm #
  
  Here is an example of adding more LSTM layers:
  https://machinelearningmastery.com/stacked-long-short-term-memory-networks/
  
  Reply
Anindya Sankar Chattopadhyay April 29, 2018 at 10:49 pm #

Hi Jason:

A quick one. The example that is here takes care of multi variate,multi time lag time series. Wondering if there is any example of multi dimension.By that I mean,with the multi variate and multi time lag aspect remaining the same, we want to predict say pollution of not only 1 place but of 2 places.

Thanks

Reply
- Jason Brownlee April 30, 2018 at 5:36 am #
  
  Yes, this is called multi-step, you can see an example here:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
  - Anindya Sankar Chattopadhyay May 1, 2018 at 10:27 pm #
    
    Thanks
    
    Reply
Jeremiah April 30, 2018 at 10:12 pm #

Hi Jason, thank you for such a wonderful post. I am new to this time series data implementation and to be honest, I do not know where to start from. I have this dataset which I am using to predict the activity energy expenditure of a person. I just wanted to find out that using this same preprocessing analysis of converting the data to supervised learning, can I use it on my classification data? If yes, does it mean my t(1) value I want to predict here will be the labels am to predict? Thanks in advance

Reply
- Jason Brownlee May 1, 2018 at 5:33 am #
  
  I would recommend starting here:
  https://machinelearningmastery.com/start-here/#timeseries
  
  Reply
Zou May 1, 2018 at 10:33 pm #

Hi Jason,
If I want to use multiple recent time steps to make the prediction for the next time step, that is the window method, how can I do? And I already read one of your tutorials named “Time Series Prediction With Deep Learning in Keras”, the window method was introduced, but there is only one variable in that case. So how can i use the window method when there are multiple input variables?
Thank you so much.

Reply
- Jason Brownlee May 2, 2018 at 5:40 am #
  
  This post will show you how to prepare data for the window method:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  Reply
Zou Yanyun May 1, 2018 at 11:13 pm #

Hi Jason,
Thanks for your tutorial. And I’m wondering if I want to use multiple recent time steps to make the prediction for the next time step, what can I do? And I have read one of your posts named “Time Series Prediction With Deep Learning in Keras”, and you mentioned the window method to solve this problem, but there is only one variable in that case, so how can I apply the window method to multiple variables condition?
Thank you so much.

Reply
- Jason Brownlee May 2, 2018 at 5:44 am #
  
  This post will help you prepare your data for the window method:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  Reply
Yanyun Zou May 3, 2018 at 9:15 pm #

Hi Jason,
Thanks for your reply. And I have tried the window method on LSTM network, It seems work worse than using only one previous time step.
And I want to try MLP using the window method, and I have 13 variables, do you have any tutorials about it?

Reply
- Jason Brownlee May 4, 2018 at 7:43 am #
  
  No but I hope to prepare some soon.
  
  Reply
Dee May 4, 2018 at 1:45 pm #

please help, i got an error when I try to change the codes,

AttributeError: ‘DataFrame’ object has no attribute ‘inverse_transform’

please let me know how to solve this, thanks

Reply
- Jason Brownlee May 5, 2018 at 6:16 am #
  
  It looks like you may have modified the code. Perhaps ensure that you copy all of the code exactly.
  
  This might help:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-copy-code-from-a-tutorial
  
  Reply
Jack May 8, 2018 at 12:12 pm #

Hi，Jason.
I have used the model to forecast the numbers of crime of every grid in a street. But the forecast result is exactly the same as test_y. How could I improve the model ?

Reply
- Jason Brownlee May 8, 2018 at 2:56 pm #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/why-is-my-forecasted-time-series-right-behind-the-actual-time-series
  
  Reply
  - Jack May 13, 2018 at 12:29 pm #
    
    Thanks for your reply.
    And I want to use the trained model to predict the numbers of crime of every grid.
    I input the predict_X , but i don’t know the predict_Y , to use the model , i should assign values to predict_Y randomly.
    When i give different values to predict_Y, the final results are also different. Why is it ?
    Theoretically, the predict_Y should not influence the forecast results, right?
    
    Reply
    - Jason Brownlee May 14, 2018 at 6:32 am #
      
      I’m not sure I follow, sorry.
      
      Reply
Fatemeh May 15, 2018 at 6:23 am #

Hi Jason
Thanks for your well-explained examples.
I am using your code to predict the ice-jam occurrence in the rivers in Quebec (Canada) using daily hydrometeorological variables (i.e. temperature, precipitation, and river discharge). My problem is that I want to develop one model for whole the rivers so there are various data for one day from different rivers. How can I handle this spatial problem?
Thanks

Reply
- Jason Brownlee May 15, 2018 at 8:08 am #
  
  Sure.
  
  Perhaps look up some similar examples in the literature to get an idea of the type/structure of the models used for similar spatial problems.
  
  Reply
  - Fatemeh May 15, 2018 at 11:27 pm #
    
    Thanks
    
    Reply
Bhanuteja May 16, 2018 at 5:21 am #

hii jasoni sir,I reallly like your research blogs in machine learning and so on may i knew how much harder to be like you and how much time did u take for preparing each blog and writing ur findings in blog can u just summarise how to be master sry doctor like u in machine learning how to prepare ourself i am enthusiastic in machine learning and ai but failing in publishing research paper and publishing it seems hard to get my own finding i am failing from last 1 year and trying to publish a good research work paper in machine learning and artificial intelligence can i have guidance how to master in it and crystal clearly and perfect I personally an below average student thats i regret myself i am seriously expecting few words from u its really helpful to find my self to “regularise” and “fit to ai,ml world research” i request a few words from ur precious time to correct myself and set to this world.ai

Reply
- Jason Brownlee May 16, 2018 at 6:09 am #
  
  My best advice is to write every day and get critical feedback from your advisor.
  
  Reply
Mah May 16, 2018 at 6:27 am #

Hi,

First of all thanks a lot for this nice article. I just have a question here. I have a similar use case when I would like to predict power based on sensors data. However, I have multiple assets (30 Turbines). I am wondering if I can just simply add an ID column (1 to 30) and use the same approach? I appreciate if you can help me with this.

Thanks so much,
Mah

Reply
- Ma May 16, 2018 at 8:43 am #
  
  Do you have an example of this:
  
  Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour.
  
  First Question:
  Basically in a real world scenario, I have a set of features (sensors), for some of them like temperature, I have the expected value that I can use for prediction. However, for some of the features, I don’t have any expected value and I need to use the past values. Is it possible to do this?
  
  Second question: I was looking at this post:
  https://machinelearningmastery.com/multi-step-time-series-forecasting/
  In addition to predict the outcome variable for time t, I would like to do that for more timessteps ahead. If I Don’t have any expected value of my features other than the current value at t, I assume I cannot do the multi step time series forecasting for t+1, t+2 and t+3. right? For some features I have expected values like temperature and wind speed but not for every single feature.
  
  I really appreciate your comment.
  
  Thanks,
  Mah
  
  Reply
  - Jason Brownlee May 17, 2018 at 6:21 am #
    
    Yes, I believe this is what the tutorial shows.
    
    Yes, you can predict the future from the past, this is the field of time series forecasting:
    https://machinelearningmastery.com/start-here/#timeseries
    
    Yes, here is an example of multi-step forecasting with an LSTM:
    https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
    
    Reply
    - Mah May 17, 2018 at 11:22 am #
      
      Thanks so much for your quick reply.
      
      In your multivariate time series example in this port (Train On Multiple Lag Timesteps Example), can we do multi step forecasting and predict pollution for t, t+1, t +2 and t+3?
      
      Reply
      - Jason Brownlee May 17, 2018 at 3:11 pm #
        
        Sure.
- Jason Brownlee May 17, 2018 at 6:16 am #
  
  You can try to model each case standalone or try to model groups or even all cases together.
  
  No need to add id’s as they do likely do not contain information.
  
  Reply
Marco May 17, 2018 at 5:12 am #

Hi Jason, how can i change the problem setting in case i have 4 different datasets, 1 for each monitoring station of PM2.5. Should i create a LSTM neural network model for each station or there is a way to do it with only one neural network?
Thank you

Reply
- Jason Brownlee May 17, 2018 at 6:40 am #
  
  Try modeling each standalone and all together and double down on what works best.
  
  Reply
Sara May 18, 2018 at 1:30 am #

I would like to thank you firstly for this nice job. I have a question that concerns a different case.
The idea is to make a prediction at time ‘t’ based on the values of this feature at time ‘t-1’ and an other feature at time “t”.

A real use case: we want to make prediction of the solar power production of tomorrow giving the historical production data and the temperature of tomorrow (given the value of production of today and we know that tomorrow will be hot 35° for example what will be the estimation of the production for tomorrow )
How can use the RNN and LSTM in this case?

Reply
- Jason Brownlee May 18, 2018 at 6:27 am #
  
  Use can use the above example as a template for getting started.
  
  What problem are you having exactly?
  
  Reply
Thabet May 19, 2018 at 5:05 pm #

Hi Jason,
Is there a way to find what input that contributed the most to affect the output?

Reply
- Jason Brownlee May 20, 2018 at 6:35 am #
  
  You can try removing one feature at a time from the model and evaluate the impact.
  
  Reply
ikram May 19, 2018 at 9:00 pm #

Hi Jason, thank you for this amazing article.
My question is : can we add more hidden layers for example two or three ? if yes, how can I do this ? Which part of code should i modify ?

# design network
model = Sequential()
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss=’mae’, optimizer=’adam’)
# fit network
history = model.fit(train_X, train_y, epochs=50, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)

Reply
- Jason Brownlee May 20, 2018 at 6:38 am #
  
  Yes, here is more info on stacking LSTMs:
  https://machinelearningmastery.com/stacked-long-short-term-memory-networks/
  
  Reply
Anwesha May 21, 2018 at 10:43 am #

Hi! Do you know what embedding’s are? Also is it possible to use RNN’s for unsupervised learning and predict for multivariate time series?

Reply
- Jason Brownlee May 21, 2018 at 2:29 pm #
  
  I have an introduction to embeddings for NLP here:
  https://machinelearningmastery.com/what-are-word-embeddings/
  
  Reply
enri May 21, 2018 at 5:02 pm #

Would this be the right approach to take if I wanted forecast for a certain timeframe? Say, I have timeseries data in 2min buckets for January 2018 and February 2018, and I wanted to forecast based on two independent variables for the first week of March to determine some dependent variable. I have data for the two independent variables, also in 2min buckets, but I’m trying to predict the dependent variable. Reading through this blog, I think this is the approach I want to take — it makes a lot of sense to me. However, I’m having trouble making sense of predicting that first week of March in isolation.

1) Do I use the complete data from January 2018 and February 2018 to develop a training and testing sample, and then use the model to predict the March timeseries? Running into some errors in this, so I’m assuming this is not the right approach, but open to feedback.

2) Do I include the March timeseries in the testing sample and get the resulting values from that? If so, how does one map these testing values (predictions) to the the original timeseries/timeframe.

Regardless of either approach, what is the right way of mapping back the prediction of ‘foo’ back to timeframe ‘bar’? Perhaps this is a straight index lookup? There should be an easier way, no?

Thank you very much! This blog was very helpful 🙂

Reply
- Jason Brownlee May 22, 2018 at 6:23 am #
  
  I recommend trying a suite of approaches and see what works best for your data.
  
  Be systematic and use data/results to make decisions around model design.
  
  Reply
kg May 21, 2018 at 7:12 pm #

Hi Jason, thanks a lot for all clear explanations. what if I want to predict all the variables at the time (t+1)?

Reply
- Jason Brownlee May 22, 2018 at 6:25 am #
  
  Change the the model to output a vector or change the model to be a seq2seq such as an encoder-decoder.
  
  I have examples of both approaches on the blog, use the search.
  
  Reply
Kevin May 23, 2018 at 10:36 pm #

Hello Jason, I read your this article and run the code.

However, it works just like persistence model~ I’m so confused

Reply
- Jason Brownlee May 24, 2018 at 8:12 am #
  
  Generally, LSTMs and neural nets in general are poor a time series forecasting, learn more here:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-use-lstms-for-time-series-forecasting
  
  Nevertheless, there is huge demand for knowing how to do it.
  
  Reply
- Hany May 21, 2020 at 3:43 pm #
  
  I cannot believe how helpful you are Dr. Brownlee. Your are really a great man.
  
  God bless you.
  
  Reply
  - Jason Brownlee May 22, 2020 at 6:03 am #
    
    You’re welcome!
    
    Reply
Gene May 25, 2018 at 12:08 pm #

Lots of fun debugging this code!!

Reply
- Jason Brownlee May 25, 2018 at 2:52 pm #
  
  Thanks!
  
  Reply
Anwar May 28, 2018 at 3:14 am #

Hi Jason,
Using the same test set for validation during training and then for prediction wouldn’t cause biasness. If yes, how to specify a validation set from the training set.

Thanks

Reply
- Jason Brownlee May 28, 2018 at 6:02 am #
  
  It would introduce a bias.
  
  Try not to use the validation set too often.
  
  Reply
Phillip Otero May 28, 2018 at 9:09 am #

Hello Jason, Ran your model with a 15 min timestamp dataset with 11 features. Used a 2 period lookback var1(t-2) and a forecast with var1(t+1). Since the output is an array of 11 features, how can I reconcile these forecasted sequences (inv_yhat) with my original timestamps? Pls let me know if you want me to send my data and model.

Reply
- Jason Brownlee May 28, 2018 at 2:33 pm #
  
  The number of output time steps will match the number of input time steps directly.
  
  Reply
Alex May 29, 2018 at 2:50 am #

How do I get the result values themselves (not the diffs). Let’s say I wanna output them.

Reply
- Jason Brownlee May 29, 2018 at 6:29 am #
  
  Invert the diff operation.
  
  There’s an example in this post for example:
  https://machinelearningmastery.com/remove-trends-seasonality-difference-transform-python/
  
  Reply
Jurek May 29, 2018 at 2:14 pm #

Hi,
First of all thank you for sharing your knowledge. I learn a ton of things reading your blog.

My question is bit tricky. How would you aproach mv time series problem but not on one long observation, like in the example but on multiple smaller observations with diferent features of the same problem?
They can last for 40 to 180 days and can also overlap each other so one starts and next one starts after lets say 14 days and they run parallel. Then 3rd one starts and so on.

What I come up with:
I was thinking of showing observations like “slides” and train on single observations and save them somehow. End slide will be my observation that I want to predict. My concern is that showing multiple “slides” will confuse the network and it won’t be able to give good prediction.

Can you coment on that? How would you approach this problem? MAybe someone already did that and you can point me in the right direction?

Reply
- Jason Brownlee May 29, 2018 at 2:58 pm #
  
  I’m not sure I follow sorry.
  
  Do you mean discontinuous observations over time?
  
  Reply
  - Jurek May 29, 2018 at 9:07 pm #
    
    Ok, maybe example will cast some light:
    
    lets say you have time series starting 2016-01-01 it lasts for 90 days
    2nd starting 2016-01-20 lasting 90 days
    3rd starting 2016-02-10 lasting 90 days
    
    They all fall into same category but they have different features resulting in different outcomes
    
    Now I have 4th starting 2016-03-30 and it will last for 90 days. Based on trained data from 1, 2, 3 I want to forecast 4
    
    Reply
    - Jason Brownlee May 30, 2018 at 6:43 am #
      
      You must get creative and try many different framings of the problem to see what works best.
      
      Perhaps ignore the difference in periods and treat them as parallel variates?
      Perhaps pad all variates to the same lengths?
      Perhaps not all variates are useful?
      …
      
      Brainstorm and test.
      
      Reply
Sara May 30, 2018 at 12:50 am #

thank you for nice tuto,
I have a problem when I tested the code on my own data

after computing the inverse transform, the inv_y does not match with the original test data:

test=dataset[‘consom’].values
test.reshape(-1,1)
test[n_train:]

* n_train is n_train_hours in your code and consom is the output (to predict)

test: array([ 54.779979, 56.330428, 55.546604, …, 43.95959 , 43.196657,
43.160589])

inv_y: array([ 5.70597649, 5.62580299, 5.35393763, …, 4.44062805,
4.36259127, 4.35890198], dtype=float32)

could you help me please, thank you

Reply
Sarthak May 30, 2018 at 1:28 am #

I really like your blogs and these are really knowledgeable. Thank you for doing this.

I have a question, when i graph the test_y and predicted_y, the predicted_y is shifted to the right .Its not completed shifted it does overlaps over some of the points especially the minimum. Is there a way to make it better

Reply
- Sarthak May 30, 2018 at 4:28 am #
  
  I figured it out i just added more time steps but now the problem is that is over-fitting. I have multiple data-sets so it works really well for most of them.
  
  Thank you again for this blog
  
  Reply
- Jason Brownlee May 30, 2018 at 6:45 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/why-is-my-forecasted-time-series-right-behind-the-actual-time-series
  
  I have some suggestions here:
  https://machinelearningmastery.com/improve-deep-learning-performance/
  
  Reply
Ferda May 30, 2018 at 1:36 am #

Hi, thanks for this great tutorial. Could you please answer why did you give LSTM 50 neurons although data has 7 features. Weren’t 7 nodes (aount of features) would be enough for this, or, For example why didn’t you give 100 ?

Thanks in advance

Reply
- Jason Brownlee May 30, 2018 at 6:46 am #
  
  I configured the network via trial and error.
  
  The number nodes in the hidden layer is unrelated to the number of input and output time steps.
  
  Reply
Ashley Kleinhans May 30, 2018 at 11:05 pm #

Hi. I am trying to build my own! I have data which is days rather than hours. I have 3 years of days and I want to predict a week ahead. So each week I would like to run my model on a Tuesday and produce an output (linear value of percentage) for each day of the coming week – Wednesday to Wednesday. Right now though I am just starting and what I have done is divide my training and test set as follows:

n_train_days = math.floor(tot_days * 0.8)
train = values[:n_train_days, :]
test = values[n_train_days:, :]

These are my training and test shapes:

(298, 21, 11) (298,) (75, 21, 11) (75,)

Which come originally from:

values = reframed.values

Which has the shape:

(373, 242)

– I have 11 features and n_days = 21 (so 3 weeks of training)

Everything runs and at the end, I get inv_yhat and inv_y to plot, but I have an issue: I want to plot them against another model (ARIMA) output and the actual dates that they occur. So I go back to my original csv file and I extract what I think is the dates:

data_csv = load_csv(data.csv)
test_dates = data_csv[‘DATES’][n_train_days:]
arima_out = data_csv[‘ARIMA’][n_train_days:]

Now I want to plot inv_yhat and arima_out against dates – but the lengths are different:

Length of test_dates : 96, Length of arima_out : 96, Length of inv_yhat : 75, Length of inv_y : 75

I am confusing myself. Can you help me, please?

Reply
- Jason Brownlee May 31, 2018 at 6:18 am #
  
  I’m eager to help, but I don’t have the capacity to debug your code. I’m sure you can understand.
  
  Reply
  - Ashley June 1, 2018 at 5:54 am #
    
    Thanks – it’s all working correctly! no debugging needed! I just need to understand why there is a change in dimensions when going to a supervisory learning problem.
    
    My original data has 96 rows in the test set. but for some reason when making test and training sets I get 75 – its a 3-week difference, 21 days. But which three weeks is it, does the training set actually have 21 more days, and the test set have 21 less?
    
    Reply
    - Jason Brownlee June 1, 2018 at 8:27 am #
      
      Perhaps it is related to your chosen lag?
      
      Reply
      - Ashley June 1, 2018 at 10:57 pm #
        
        Yes. I think so. I shifted the input to just continue as if three weeks had already gone past. But I thought this was too simple. But simple is always better!
Dominik June 1, 2018 at 12:05 am #

Why in line
inv_y = concatenate((test_y, test_X[:, -7:]), axis=1)
there’s a “-7” ? I guess its something with number of features but then why didn’t you use the “n_features” variable here?

Reply
- Jason Brownlee June 1, 2018 at 8:22 am #
  
  Correct.
  
  Learn more about working with numpy arrays here:
  https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/
  
  Reply
kay June 1, 2018 at 12:17 am #

Hello Matt,

Very good example, but I want to use LSTM method on my data. Due to multiple reasons, the time series includes 10% missing data. Do you have some suggestions on this problem?

Thanks,

Best,
jay

Reply
- Jason Brownlee June 1, 2018 at 8:22 am #
  
  Here are some examples for working with missing data:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-handle-missing-data
  
  Reply
Anderson D June 1, 2018 at 4:03 am #

Jason,

I have a question about your preprocessing step – a lot of sources state that data normalization should be done separately on test and train but in your example you normalize the data and then split into test and training datasets. Is there something that I’m missing or does this not matter?

Reply
- Jason Brownlee June 1, 2018 at 8:26 am #
  
  Yes, that is correct. I simplified data preparation in this tutorial to focus on the learning method.
  
  Reply
Ashley June 1, 2018 at 11:57 pm #

If I want to predict more than a day ahead – so I have 3 weeks in and one week out (like 21*24 hrs and 7*24 out) do I just update the Dense(1) to be Dense(7) ?

When I try this I get an error:

ValueError: Error when checking target: expected dense_11 to have shape (7,) but got array with shape (1,)

And it comes from this line

—> 10 history = model.fit(train_X, train_y, epochs=50, batch_size=12, validation_data=(test_X, test_y), verbose=2, shuffle=False)

Reply
- Jason Brownlee June 2, 2018 at 6:30 am #
  
  I have an example here of multiple-step forecasts that you can use as a template:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
  - Ashley June 5, 2018 at 5:10 am #
    
    Thank you!
    
    Reply
vincent granville June 6, 2018 at 12:22 am #

I just published a new book related to time series:

Full title: Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems. Published June 2, 2018. Author: Vincent Granville, PhD. (104 pages, 16 chapters.)

This book is intended for professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject. It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.

New ideas, advanced topics, and state-of-the-art research are discussed in simple English, without using jargon or arcane theory. It unifies topics that are usually part of different fields (data science, operations research, dynamical systems, computer science, number theory, probability) broadening the knowledge and interest of the reader in ways that are not found in any other book. This short book contains a large amount of condensed material that would typically be covered in 500 pages in traditional publications. Thanks to cross-references and redundancy, the chapters can be read independently, in random order.

This book is available for Data Science Central members exclusively. The text in blue consists of clickable links to provide the reader with additional references. Source code and Excel spreadsheets summarizing computations, are also accessible as hyperlinks for easy copy-and-paste or replication purposes. The most recent version of this book is available from this link, accessible to DSC members only.

About the author

Vincent Granville is a start-up entrepreneur, patent owner, author, investor, pioneering data scientist with 30 years of corporate experience in companies small and large (eBay, Microsoft, NBC, Wells Fargo, Visa, CNET) and a former VC-funded executive, with a strong academic and research background including Cambridge University.

For details about the book, go to https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes

Reply
- Jason Brownlee June 6, 2018 at 6:41 am #
  
  Thanks for sharing.
  
  Reply
akn June 6, 2018 at 8:57 am #

Hi Jason,
Your website is a treasure of knowledge on Neural networks and Machine learning. Thank you so much for sharing with others.
I am trying to implement a time series forecasting where each row in my dataset has 3 columns: timestamp, 2D numpy array(10000×6000), float32. The numpy array is my input data in each row.
I have decided to use input of previous 12 timesteps and predict output for 4 future timesteps. I have a couple of questions, and hoping to find answers here:

1. Can I only have the numpy arrays in my input sequence without having the output value ? (in your example I see var1(t-1) … var8(t-1) and then var1(t). This means the var1 is being forecasted and you have var1 in input sequence as well.

2. what is the best way to use a 2D array as input ? I am flattening it to a 1D array but its too big.

3. if my dataset is a dataframe with columns X, y where X is my input and y is the output, can we use the LSTM to predict say, y[11] through y[15] using X[1] through X[10] as input.

Reply
- Jason Brownlee June 6, 2018 at 2:00 pm #
  
  Thanks.
  
  Yes, data must be in numpy arrays.
  
  My best advice on preparing data is here:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-prepare-my-data-for-an-lstm
  
  Reply
akn June 6, 2018 at 1:11 pm #

Hi Jason,
Thank you for sharing your wealth of knowledge with every one !!

I am attempting to forecast a specific value for next 5 timesteps. Here is how my data looks:

time input output
timestamp 2D numpy array float32

In the above example, I have seen that var1 is the input, and var2 is the output. I see totally 8 variables created where var1 is mentioned twice. This indicates you are adding the output variable also as part of input sequence.

Is this mandatory ? Or, can we have intput variables in the following way :
var2 through var8 for (t-1) and predict var1 for t, t+1

Also, is it valid to use a 1D numpy array as an input variable, just wanted to confirm since I haven’t seen this in examples

Reply
- Jason Brownlee June 6, 2018 at 2:01 pm #
  
  You can frame your problem any way you wish.
  
  Reply
Tin June 6, 2018 at 1:15 pm #

Hi Jason, thanks for the post, it is so great. I have two quick question after go through it.

1. how did you decide the batch size? is there any rule to follow?
2. The input data you use for each time step is 1×8 (8 attributes for one feature), could we update it into nxm dimension? I mean for each time step we have n training samples and each of them contains m attributes. If we could, where is the best palce to change the code?

Thx, Tin

Reply
- Jason Brownlee June 6, 2018 at 2:02 pm #
  
  I used trial and error and careful experimentation.
  
  Yes, you can change the lag in the call to transform the data from time series to supervised to add more past observations as input.
  
  Reply
Ashish June 6, 2018 at 11:01 pm #

How many attribute did you have used for predicting pollution? Kindly specify with code.
Where have you used date in your code as it is present in the data set?

Reply
- Jason Brownlee June 7, 2018 at 6:28 am #
  
  All input attributes were used.
  
  Date is discarded as the observations are contiguous and evenly spaced in time.
  
  Reply
Mah June 7, 2018 at 3:07 am #

Hi Jason.

I am getting the following when I want to calculate the rsme. Actually the error comes from concatenate((yhat, test_X[:, 1:]), axis=1)
Any idea?

from numpy import concatenate
from keras.layers import concatenate
from keras.layers import *
# invert scaling for forecast
inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]

Layer concatenate_1 was called with an input that isn’t a symbolic tensor. Received type: . Full input: [(array([[0.03575472],

Reply
- Jason Brownlee June 7, 2018 at 6:33 am #
  
  I have some ideas here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
- JJZ July 19, 2018 at 5:23 pm #
  
  I got the same error as you and I found that you just have to remove the “from keras.layers import concatenate” line to fix that error.
  
  Seems like compiler confused and use keras’s concatenate instead of numpy’s.
  
  Reply
- JJT July 19, 2018 at 5:30 pm #
  
  I have better idea for you.
  
  The compiler is confusing about which concatenate to use.
  
  Just remove “from keras.layers import concatenate” and you are good to go.
  
  Or better…
  
  import numpy as np
  
  then use np.concatenate instead of just concatenate from now on.
  
  Reply
- Mr.Wu August 7, 2018 at 8:01 pm #
  
  Hi，I met the same problem,and I have make it, you can try this:
  from numpy import concatenate, sqrt
  from sklearn.metrics import mean_squared_error
  
  Reply
AmitG June 7, 2018 at 10:01 pm #

Hi Jason,

I have been following your tutorial. You mention in the initial parts that one can predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour. I have been trying to do the same but in my case it I have 6 variables and i have to predict the sixth variable for time t+1 based on the expected five for t+1.
I have noted that you have kept pollution as the first variable in the data set that you have used. This quite nicely translates to your problem of predicting the pollution level for the next timestep when you are using some time lag, e.g. 3 in the tutorial, given pollution and weather variables for previous timesteps. This is because pollution at the next timestep becomes natural for the sequence, as a total of 24 data points are there in the sequence (after taking 3 lags) and the 25th one is naturally the pollution for the next timestep.
For the problem I have at hand, I am facing serious limitations in selecting the number of lags I can use for training. I had to keep the variable to be predicted in the sixth column in the dataset and take a lag of 5 and deliberately keep the features to be equal to 7. That created a sequence of length 36 (I have 6 variables in the data and lag used is 5) and taking the number of features equal to 7 framed the problem in a way that I can predict the 6th variable given the other five variables expected values for the next timestep. I cant use lag 4 because 4*6 = 24, 24-1 = 23 and 23 is not a composite number. I hope I have made the problem clear.

Question:
1) How can I generalize the data preparation for the prediction problem that I have been facing?

Please help!

Reply
- Jason Brownlee June 8, 2018 at 6:12 am #
  
  Perhaps this function will help:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  Reply
  - AmitG June 9, 2018 at 8:30 pm #
    
    Hi Jason! thanks for the link.
    I have a quick question.
    
    For example- Lets say I have a data frame of 6 variables. 5 of them are weather variables and 1 is a disease incidence variable for a plant. I need to predict the disease incidence given weather at the next timestep. I take a lag of 3, and I end up with 24 columns. So, technically I have to predict the 24th instance, which is the disease, in the sequence and I have to use the sequence of length 23 as the input. How can I achieve that?
    I had thought of using the input as (Number of samples, timestep = 1, features = 23).
    Is it appropriate if I don’t keep the number of timesteps in the input to LSTM equal to the number of lags I have taken?
    
    Thanks!
    
    Reply
    - Jason Brownlee June 10, 2018 at 6:01 am #
      
      You would have 3 time steps, and 5 features.
      
      Reply
Kingsley Udeh June 8, 2018 at 6:13 am #

Hi Jason,

Thank you so much for the time you have been devoting to questions asked on your blogs. I do really appreciate your selfless service.

Please I have couple of questions regarding the multistep, multivariate time forecasting. I have already seen the articles you wrote on them, but I have to ask from the following section of your code that I’m adapting to my data:

1. I have 6 features(0-5), and I will like to predict the last feature, is the following code correct?

# split into input and outputs
n_obs = hours_past * n_features
train_X, train_y = train[:, :n_obs], train[:, -1]
test_X, test_y = test[:, :n_obs], test[:, -1]

2. Inverting normalization of forecast and actual values like the following :

# invert normalization of forecast values
inv_yhat_i = concatenate((yhat_i, test_X_reshaped[:, 0:4]), axis=1)
inv_yhat_i = scaler.inverse_transform(inv_yhat_i)
inv_yhat_i = inv_yhat_i[:,0]

# invert normalization of actual values
inv_y_i = concatenate((test_y_i, test_X_reshaped[:, 0:4]), axis=1)
inv_y_i = scaler.inverse_transform(inv_y_i)
inv_y_i = inv_y_i[:,0]

produces the following ValueError:

“operands could not be broadcast together with shapes (2958,5) (6,) (2958,5) ”

What am I doing wrong here?
Also, can I forecast more than one features with values from other features?

Thank you again , in advance

Regards,
Kingsley

Reply
- Kingsley Udeh June 9, 2018 at 5:11 am #
  
  Hi Jason,
  
  I’m still waiting to hear from you regarding my previous post at your most convinient time.
  
  Thank you
  
  Reply
  - Jason Brownlee June 9, 2018 at 6:58 am #
    
    My best advice on how to prepare data for LSTMs is here:
    https://machinelearningmastery.com/faq/single-faq/how-do-i-prepare-my-data-for-an-lstm
    
    Reply
Phillip Otero June 8, 2018 at 8:41 am #

Hello Jason, I have data consisting of 6,000 time steps by 11 features. I am looking back 3 steps and want to project 2 steps forward for all 11 features. train_X.shape is (1760,33) and train_y.shape is (1760,22). my network design is:

# design network
model = Sequential()
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2]), return_sequences=True))
model.add(LSTM(50))
model.add(Dense(1))
model.compile(loss=’mae’, optimizer=’adam’)
# fit network
history = model.fit(train_X, train_y, epochs=10, batch_size=32, validation_data=(test_X, test_y), verbose=2, shuffle=False)
print(model.summary())
# plot history
pyplot.plot(history.history[‘loss’], label=’train’)
pyplot.plot(history.history[‘val_loss’], label=’test’)
pyplot.legend()
pyplot.show()
model.save(‘SP-LSTM.h5’)

however I get the following error on my fit line:

ValueError: Error when checking target: expected dense_1 to have shape (1,) but got array with shape (22,)

Also what changes will I need to make to output the two forecast time steps with 11 forecasted features each?

Reply
- Jason Brownlee June 9, 2018 at 6:44 am #
  
  I explain how to prepare data for LSTMs here:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-prepare-my-data-for-an-lstm
  
  Reply
shamsul June 13, 2018 at 12:44 am #

sir,
From my understanding here we are doing uni-variate forecasting considering Multivariate as an input. This can also be called MISO (multiple variable as an input and single variable as an output) technique.

how we can we do MIMO (multiple variable as an input and multiple variable as an output) ?

please do correct me if i am wrong?

Reply
- Jason Brownlee June 13, 2018 at 6:19 am #
  
  The above tutorial does exactly this.
  
  Reply
  - shamsul June 13, 2018 at 11:41 am #
    
    how we can we do MIMO (multiple variable as an input and multiple variable as an output) ?
    
    Reply
    - Jason Brownlee June 13, 2018 at 3:05 pm #
      
      This post has an example of multiple outputs that you can use as a template:
      https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
      
      Reply
    - saipavankumar March 9, 2020 at 6:07 pm #
      
      in dense layer just give Dense(train_.shape[1])
      
      Reply
Parsh June 13, 2018 at 1:51 am #

Hi Jason, Such a wonderful post for me to get started with multi variable input.

After that I extended to make_forecast for 5 more timesteps using other post.
but the prediction comes as 1 variable and now I need to feed for 5 more times but it expects 8

and getting this error.
ValueError: all the input arrays must have same number of dimensions. Any clue how to feed the prediction to get more predictions.

def make_forecast(model: Sequential, look_back_buffer: numpy.ndarray, timesteps: int=1, batch_size: int=1):

forecast_predict = numpy.empty((1, 1), dtype=numpy.float32)
for _ in trange(timesteps, desc=’predicting data\t’, mininterval=1.0):
cur_predict = model.predict(look_back_buffer)
forecast_predict = numpy.concatenate([forecast_predict, cur_predict], axis=0)
# This is where I am not sure if I need to have 8 input variable.
cur_predict = numpy.reshape(cur_predict.shape[0],1, cur_predict.shape[1])
look_back_buffer = numpy.delete(look_back_buffer, 0, axis=1)
look_back_buffer = numpy.concatenate([look_back_buffer, cur_predict], axis=1)
return forecast_predict

Reply
- Jason Brownlee June 13, 2018 at 6:19 am #
  
  Sorry, I don’t have the capacity to review your modifications.
  
  Reply
Fredrik Hellander June 13, 2018 at 8:04 pm #

Hi Jason,

Thanks for an interesting tutorial!

I discussed some performance metrics with a colleague and he suggested comparing all results to a benchmark where we simply use the most recent value in the time series as the next forecast, i.e Pollution(t=n) = Pollution(t=n-1).

I then calculate the RMSE of the benchmark as:

rmse_bench = np.sqrt(mean_squared_error(inv_y[1:], inv_y[:-1]))

The trained LSTM gives me a RMSE of 26.4 and my Benchmark RMSE is 26.6. Do you think this is a valid comparison and in that case have we really added that much value by using the LSTM model?

Kind regards,

Fredrik

Reply
- Jason Brownlee June 14, 2018 at 6:00 am #
  
  Yes, this is called a persistence model or the naive model.
  
  Yes, it is an excellent baseline, I explain more here:
  https://machinelearningmastery.com/persistence-time-series-forecasting-with-python/
  
  LSTMs don’t add much value, I explain more here:
  https://machinelearningmastery.com/suitability-long-short-term-memory-networks-time-series-forecasting/
  
  Nevertheless, there is a huge demand for LSTMs applied to time series.
  
  Reply
anusha June 14, 2018 at 10:53 pm #

I couldn’t understand the ‘invert scaling for forecast’ section of the code. Can you please explain it briefly?
Also , in my case , there are total 62 features where the 62nd feature is to be predicted.
test_X has the following shape:-(70080,61).yhat has shape:-(70080,1). Hence the concatenation statement is posing to be a problem as they are not of the same shape.

Reply
Xiaolu Wei June 16, 2018 at 10:49 am #

Hi Jason，
I wonder if there is any approach to forecast multifactor based on their history data via LSTM?
Best Regards,
Xiaolu Wei

Reply
- Jason Brownlee June 17, 2018 at 5:37 am #
  
  Yes, you could predict each series using the LSTM via a seq2seq type model.
  
  Reply
Democrito June 18, 2018 at 3:07 am #

Hi Jason Brownlee!

Thank you for all your really useful Topics!

I’m wondering about a thing related to the timesteps. Let’s suppose in an LSTM that I have a batch_size equal to 5 and timestep equal to 1 (like your examples). Is this architecture like an MLP or does it take into account the memory cell between one prediction and the next one?

Thank you!

Reply
- Jason Brownlee June 18, 2018 at 6:44 am #
  
  The memory cell may add value, as it is not reset until the end of the batch.
  
  I would be skeptical though and strongly suggest comparing results to an MLP to ensure the LSTM is adding value.
  
  Reply
Ashish June 18, 2018 at 8:07 pm #

# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]

In this portion of code you have taken all the lagged value with time step 1 even included pollution with 1 time lag.
Why have you included the 1 time lag of pollution in the train_X and test_X?

Reply
- Jason Brownlee June 19, 2018 at 6:30 am #
  
  Perhaps re-read the definition of the problem.
  
  Reply
Ashish June 18, 2018 at 8:58 pm #

Can we predict future values of pollution in numeric form?

Reply
- Jason Brownlee June 19, 2018 at 6:31 am #
  
  Yes.
  
  Reply
  - James August 10, 2018 at 9:01 am #
    
    What code do we need to add to predict the future value? I might have missed that, sorry.
    
    Reply
    - Jason Brownlee August 10, 2018 at 2:17 pm #
      
      You can call model.predict() to make a prediction.
      
      This tutorial offers more help:
      https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
      
      Reply
Ashish June 18, 2018 at 11:50 pm #

How to validate or get model score of the lstm model which you have applied?

Reply
- Jason Brownlee June 19, 2018 at 6:34 am #
  
  We calculate the RMSE, perhaps re-read the tutorial.
  
  Reply
Ambika June 19, 2018 at 12:20 am #

what is training and test losses as in your code?

Reply
- Jason Brownlee June 19, 2018 at 6:34 am #
  
  They are the losses calculated on the training and test sets respectively.
  
  Reply
Mark June 19, 2018 at 7:52 pm #

Hi Jason,

thanks for all these interesting and useful tutorials!

I was wondering how to decide the range in which to scale our data. In the “Time Series Forecasting with the Long Short-Term Memory Network in Python ” post, you suggest [- 1, 1]; here [0, 1]. There is a precise rule or something?

Moreover, do you have a tutorial, example or anything else about learning from several trajectories? For instance, I have N training examples of a paraboloid trajectory made of 3 features (x, y and z coordinates) and I want to predict the next point (so, again, x, y and z).
Instead of looping n_epochs times over the same trajectory (like for the shampoo dataset), I’d like to loop over these N trajectories.

Thanks!

Reply
- Jason Brownlee June 20, 2018 at 6:25 am #
  
  Normalizing to the range 0-1 is a good idea.
  
  I am currently preparing tutorials on activity recognition that I think will be helpful.
  
  Reply
  - Mark June 20, 2018 at 4:46 pm #
    
    Thank you very much! Can’t wait to read it.
    
    Do you have an estimate of the publishing period of these tutorials?
    
    Reply
    - Jason Brownlee June 21, 2018 at 6:12 am #
      
      August.
      
      Reply
Mark June 19, 2018 at 9:41 pm #

Hi Jason,

Thanks for your amazing work! It’s super useful.

I was wondering if you have a tutorial (or other suggested readings) about how to train a model on series of different length and with more than 1 feature. For instance, how to predict a 3D trajectory with (x,y,z) coordinates (3 features) training the model on N examples (possibly with different length, but not necessarily).

Thanks again!

(PS: I wrote something similar before, but I’m not sure it was sent successfully)

Reply
- Jason Brownlee June 20, 2018 at 6:26 am #
  
  Yes, you can pad all sequences to the same length, more on padding here:
  https://machinelearningmastery.com/data-preparation-variable-length-input-sequences-sequence-prediction/
  
  Reply
Ambika June 20, 2018 at 4:01 pm #

Can you please provide me the code for predicting pollution in numeric form ?

Reply
- Jason Brownlee June 21, 2018 at 6:10 am #
  
  I don’t understand sorry, can you please explain what you mean by “numeric form”?
  
  Reply
  - Ambika June 21, 2018 at 2:57 pm #
    
    I mean i want to predict pollution at 48th hour in numeric form not on plot.
    So how can i do that?
    
    Reply
    - Jason Brownlee June 21, 2018 at 4:58 pm #
      
      You can make a prediction with a fit model by calling the predict() function.
      
      I explain more here:
      https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
      
      Reply
Ambika June 20, 2018 at 4:33 pm #

Is there any post of multivariate VAR model in python?

Reply
- Jason Brownlee June 21, 2018 at 6:10 am #
  
  Yes, see here:
  http://www.statsmodels.org/dev/vector_ar.html
  
  Reply
Ashish June 20, 2018 at 4:35 pm #

what does model.evaluate function do in lstm?
Can we apply r2score for getting r2 score of model in lstm?

Reply
- Jason Brownlee June 21, 2018 at 6:11 am #
  
  It makes predictions using the model for a test set, then evaluates it.
  
  You can use any of the metrics or loss functions provided by keras or write your own. More here:
  https://machinelearningmastery.com/custom-metrics-deep-learning-keras-python/
  
  Reply
Yuan Yao June 22, 2018 at 12:45 pm #

Hi Jason,
I am thinking of using this multivariate time series is kind of the combine of many single variate time series.
For example, I use the pm2.5, NO2, SO2 data to predict the next month’s pm2.5. In keras model, is it real to use pm2.5, NO2 and SO2 data to predict the next month’s pm2.5, NO2 and SO2 data? Or it just use pm2.5 LSTM to predict pm2.5, NO2 LSTM to predict NO2, SO2 LSTM to predict SO2? This is kind of fake multivariate LSTM.

Reply
- Jason Brownlee June 22, 2018 at 2:57 pm #
  
  The model predicts pm2.5 from all input variates.
  
  Specifically, from the post:
  
  We will frame the supervised learning problem as predicting the pollution at the current hour (t) given the pollution measurement and weather conditions at the prior time step.
  
  Nothing fake about that.
  
  Reply
Jurgen June 22, 2018 at 10:01 pm #

Hi Jason, would encoding the wind direction with something like sin(dir) and cos(dir) where dir = “N” = 0, dir = “NE” = 45 etc. be better than integer or one-hot encoding? This (co)sine encoding would retain the “circularity” of the data in a sense, I think.

What do you think about this?

Reply
- Jason Brownlee June 23, 2018 at 6:18 am #
  
  Great idea!
  
  Reply
Maurice June 22, 2018 at 10:35 pm #

I followed your example in both Python and R code and was able to get the same answers as per the tutorial. Then, I tried some variations, swapping the size of the Train /Test data to be four years of train followed by one year of test data. I also used a different method of normalisation based on percentiles instead of min/ max, and applied the train normalised dataset to the future test dataset. Running this model gave a RMSE = 35 versus 25 (the original method using min /max across both train /test).
Perhaps this result is the effect of the bias of using a normalisation method across both train and test data and not from the changed method of using percentiles which are a better reflection of the train dataset, especially so if you use an accurate data extraction technique such as a constrained cubic spline.
So, a RMSE of 35 > RMSE of 30 for the persistence model, thus negating the LSTM’S supposedly superior forecasting!

Reply
- Jason Brownlee June 23, 2018 at 6:19 am #
  
  Yes, LSTMs and neural nets in general are terrible at time series forecasting. Yet, people are obsessed with using them.
  
  Reply
Arjun June 30, 2018 at 12:28 am #

Hi Dr. Brownlee! Thank you so much for these amazing tutorials! They’ve so deepened my understanding of both deep learning and python.

I’m working on a problem with my own Multivariate dataset (have 12 time series, one of which my goal is to also predict). I’ve been using the pandas diff function, as you went over in another article, to convert alll of my 12 series into 12 time series of differences over 1 time period. When I use this adjusted dataset as input into the model, and train the model, from the get-go, the validation loss is weirdly lower than the training loss, for up to 300 epochs of training. If I don’t do “diff” on my dataset, this behavior does not occur. It’s been bewildering to me, and I’ve tried other random data on the network to make sure there is not a problem with the network, and there doesn’t seem to be. This behavior has been confounding for over a week now, and I would really appreciate and suggestions or hints you may have. Thank you 🙂

Reply
- Jason Brownlee June 30, 2018 at 6:10 am #
  
  Test loss lower than training often means an unstable model:
  https://machinelearningmastery.com/faq/single-faq/what-if-model-skill-on-the-test-dataset-is-better-than-the-training-dataset
  
  Perhaps start by modeling the univariate series first and use the results as a baseline for more sophisticated methods:
  https://machinelearningmastery.com/start-here/#timeseries
  
  Reply
  - Arjun July 3, 2018 at 1:57 am #
    
    Hi Dr. Brownlee, thank you for the reply! I’ve varied the lookback, the test/training sizes, and model configurations, tried a univariate model, and tried modeli the time series with various lookbacks as a normal ANN, and the behavior was still exhibited. If you would indulge me, I have a couple questions I could use your advice on!
    
    One thing I’ve noticed is that, even after inverting the predicted data back to scale, my models still have a hard time learning the proper magnitudes of the data. This is true for both the univariate and the multicariate models, of all varieties. For example, if the distribution of actual inv_y is Norma with its tails at [-5, 5], the model’s predicted data after inversion may or may not demonstrate Gaussian behavior, but its distribution’s tails are in the range of, say, [-.5 ,0], and the predicted values are always much smaller than the actual values. Sometimes the values are all positive, or all negative, too. Is this a known problem with a known solution?
    
    My dataset has approximately 2300-2600 samples, depending on how large of a lookback I choose for the series_to_supervised input. Is it possible that I just have far too few samples for any robust model to be developed, irrespective of the lookback?
    
    Lastly, I’m wondering if there’s a good rule of thumb for determining the proper ranges of hidden units in the LSTM layer. I’ve read your articles that touch upon this topic and paid especially close attention the the hyper parameter grid search article, and as you choose a pretty wide range, I’m wondering if you have a rule of thumb we could use in the initial stages of building our network. Thank you so much!
    
    Reply
    - Jason Brownlee July 3, 2018 at 6:28 am #
      
      This might help:
      https://machinelearningmastery.com/faq/single-faq/how-many-layers-and-nodes-do-i-need-in-my-neural-network
      
      Reply
Oscar Labrador July 3, 2018 at 12:35 am #

Hi Jason,

Really useful post. I have a problem running the code, in line

inv_yhat = scaler.inverse_transform(inv_yhat)

I get the following error

X -= self.min_
ValueError: operands could not be broadcast together with shapes (35063,23) (12,) (35063,23)

Do you know the reason why?

Regards

Reply
- Jason Brownlee July 3, 2018 at 6:26 am #
  
  Ensure that your libraries are up to date.
  
  Reply
Mah July 4, 2018 at 3:18 am #

Hi Jason,

I am just wondering how I can just invert scaling for forecast and skip the concatenate part? I just need to have the actual outcome values and I don’t need the rest of variable.

Thanks

# invert scaling for forecast
inv_yhat = concatenate((yhat, test_X[:, -7:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]

Reply
- Jason Brownlee July 4, 2018 at 8:30 am #
  
  The scaler expects data to have the same dimensions for the inverse operation.
  
  You could write your own function to do this if you wish.
  
  Reply
  - Mah July 4, 2018 at 3:55 pm #
    
    Thanks Jason.
    
    This might not be related to this example but I really would like to get your opinion.
    I have a MLP model and I standardized both input features and my outcome variable. I deployed my model as a web service. As part of deployment I have a scoring script.
    
    When I use the web service to score my raw data, the predicted value is between 0 and 1 because my outcome was scaled to 0 and 1 before training the model. How can I rescale the predicted values? In scoring script, I standardized my input values and use the web service to predict the outcome. So, in raw data I don’t have the outcome variable. I hope this makes sense.
    
    In summary, when we use scaled outcome in training the model, how can we have the predicted outcome in actual scale in scoring phase with new data.
    
    PS. I tried MLP without standardizing the outcome variable and I didn’t get accurate predictions.
    
    I really appreciate your input here.
    
    Thanks so much.
    
    Reply
    - Jason Brownlee July 5, 2018 at 7:38 am #
      
      You can invert the transform on the predicted values prior to evaluating them.
      
      In sklearn you can call inverse_transform(), otherwise you can do it manually if you know the mean and standard deviation used for standaridzation or the min/max for normalization.
      
      Reply
      - Mah July 6, 2018 at 7:26 am #
        
        Hi Jason,
        
        I am confused. In a production case, when we call a web service (our deployed ml model), we have the raw data and the raw data is not normalized (like sensor data). However, the machine learning model was trained on standardized features. In this situation, I don’t know what we can do. Can we train a MLP model without standardization at all? I know in neural net we need to convert feature to [0,1].
        Can you help me and explain more?
        
        Thanks,
        Mah
      - Jason Brownlee July 7, 2018 at 6:08 am #
        
        We must hang onto the objects that prepared the data or the coefficients within those objects so that we can prepare new data in the same way as the training data.
      - Mah July 6, 2018 at 7:37 am #
        
        can I just use min and max values in the training dataset and change the scale to 0 and 1 for data coming from sensors?
      - Jason Brownlee July 7, 2018 at 6:08 am #
        
        Exactly.
Alex July 4, 2018 at 11:14 pm #

hi Jason
about the scaler too, should we not use a different MinMaxScaler for each column of the database ?
especially for pollution column for the invert transform ?
to keep the same scale from the pollution column of the raw file

all of that to calculate the RMSE

Thank you

Reply
- Jason Brownlee July 5, 2018 at 7:44 am #
  
  A very important point Alex. In fact, it scales per column by design.
  
  Reply
Jonathan Roy July 5, 2018 at 5:30 am #

Great demonstration and tutorial thank you very much!

I get stuck on an detail… how to reshape my data if I have for exemple
6 features and 3 hours times step
and the features #6 become my “y” on the last hours

#1 to #5 are observed feature on all timestep include “t”
#6 t-1, t-2 and t-3 are observed too
I want to predict #6 at “t”

Thank you very much for your attention

Reply
- Jason Brownlee July 5, 2018 at 8:03 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-prepare-my-data-for-an-lstm
  
  Reply
A Straker July 7, 2018 at 1:25 am #

Hi,
Great site, it’s proving to be a useful resource.

Perhaps I’m misunderstanding some LSTM fundamentals, but as I understand it, the ‘memory’ of the network is inherent in the structure of the LSTM node. Because of this, I’m a little confused why we structure the data as a lagged time series in the initial stages, in a manner similar to if we were using autoregression.

You say:
‘The LSTM is exposed to one input at a time with no fixed set of lag variables, as the windowed-multilayer Perceptron (MLP).’

in:
https://machinelearningmastery.com/suitability-long-short-term-memory-networks-time-series-forecasting/
which I think supports my thoughts. Could you perhaps explain this a little more please? Many thanks in advance.

Reply
- Jason Brownlee July 7, 2018 at 6:18 am #
  
  Yes, but we must still provide vectorized inputs to the model with the shape [samples, timesteps, features].
  
  Therefore we must take our data and shape it with this structure, the timesteps look like lags, they are just not treated as such by the model.
  
  Does that help?
  
  Reply
Arjun Majumdar July 7, 2018 at 4:19 am #

Hello Jason, I am facing a problem as follows:

In this tutorial, the train and test splits have 8 features viz., ‘pollution’, ‘dew’, ‘temp’, ‘press’, ‘wnd_dir’, ‘wnd_spd’, ‘snow’, ‘rain’ at step ‘t-1’, while the output feature is ‘pollution’ at current step ‘t’.

After fitting the model to the training and testing data splits, what if I want to make predictions for a new dataset having 7 features since it does not have the ‘pollution’ feature in it (while the remaining 7 features remain the same).

How do I handle such a situation?

Thanks and excellent tutorial!

Reply
- Jason Brownlee July 7, 2018 at 6:20 am #
  
  I would recommend training a different model that does not use pollution as an input.
  
  Reply
  - Arjun Majumdar July 7, 2018 at 7:13 pm #
    
    Can you recommend some other different models capable of handling such situations?
    Thanks!
    
    Reply
    - Arjun Majumdar July 7, 2018 at 7:18 pm #
      
      Do you mean training a separate LSTM model as demonstrated above and not using ‘pollution’ as an input feature? If yes, how should the training be done?
      Because if the target variable (‘pollution’ for this tutorial) is not included while training the model, how will the model make predictions for it?
      
      Or, do you mean training a different type of a neural network, say a Multi layer Perceptron, etc. for Time Series Predictions?
      
      Reply
      - Jason Brownlee July 8, 2018 at 6:20 am #
        
        Lag pollution values are used in the above model.
    - Jason Brownlee July 8, 2018 at 6:19 am #
      
      I recommend testing a suite of methods to see what works best for your specific dataset.
      
      Reply
Jordan July 10, 2018 at 1:39 am #

Dr. Brownlee,

Thank you so much for such an interesting post. I am attempting to run this program but am getting the following error.

TypeError: while_loop() got an unexpected keyword argument ‘maximum_iterations’

Do you have an idea of how this could be fixed?

Thanks

Reply
- Jason Brownlee July 10, 2018 at 6:50 am #
  
  I have not had this error myself, perhaps try searching or posting on stackoverflow?
  
  Reply
- hyu July 18, 2018 at 4:01 pm #
  
  This is caused by the old version of tensorflow. Updating tensorflow should fix the problem!
  
  Reply
  - Jason Brownlee July 19, 2018 at 7:46 am #
    
    Good tip!
    
    Reply
- Poiuwn September 13, 2018 at 12:11 am #
  
  Hi Jordan,
  I had the same issue when I tried to run the code. The I tried to upgrade my tensorflow, but it then gave me this error: ImportError: cannot import name ‘abs’.
  
  Then I uninstalled keras and tensorflow, and reinstall tensorflow and keras. The problems all cleared after that.
  
  python -m pip install –upgrade pip # upgrade pip
  pip uninstall keras
  python -m pip uninstall tensorflow
  pip3 install tensorflow
  pip3 install keras
  
  Please note I am using Anaconda 3, and details are shown below:
  ‘3.6.3 |Anaconda custom (64-bit)| (default, Oct 15 2017, 03:27:45) [MSC v.1900 64 bit (AMD64)]’
  
  Hope this helps.
  
  Reply
  - Jason Brownlee September 13, 2018 at 8:04 am #
    
    Nice tip, thanks for sharing!
    
    Reply
Dave Craft July 10, 2018 at 8:34 am #

Just an aside: It looks to me like you are performing fit_transform() on the total
data set but performing inverse_transform() only on the test data set. An
inverse_transform() on a small subset of the original transformation may not result in equivalent scaling to the original (larger data set). Thus inv_yhat and inv_y are
comparable but they may now be in different ranges than train_y

Your work is extremely helpful! Like many others I read lots of different topics on ml and
you are a *go to* for better explanations.

Reply
- Jason Brownlee July 10, 2018 at 2:25 pm #
  
  I don’t see the problem. Perhaps I am missing something?
  
  Reply
José Mayorga July 10, 2018 at 2:15 pm #

good job doctor

I want to know if this job can be applied with stock indexes, for example if the stock index “x”, affects the price movement (up or down) of the stock index “y”.

Reply
- Jason Brownlee July 10, 2018 at 2:29 pm #
  
  More on stocks here:
  https://machinelearningmastery.com/faq/single-faq/can-you-help-me-with-machine-learning-for-finance-or-the-stock-market
  
  Reply
subhash July 10, 2018 at 11:14 pm #

Dear Jason,
I have one question, it may sound naive. But this is bugging me. For prediction you are using (t-1) step data as input. at every time step you are using the data of (t-1) pollution data. means we can only predict one time step ahead?
What if I want to predict several time steps ahead. assuming that I have the data of all other variables wind, temp, etc. I want to input the data of pollution from the previos prediction.
thank you for answering

Reply
- Jason Brownlee July 11, 2018 at 5:58 am #
  
  This post gives an example of a multi-step forecast:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
INAVOLU Subhash July 10, 2018 at 11:38 pm #

Dear Jason,
I can see that to predict the pollution in we are using the (t-1) time step pollution data.
what if I want to predict several time steps ahead of pollution data.(t+1, t+2,t+3) but using predicted pollution (t,t+1,t+2) data and the existing data from other variables such as wind velocity and all

Reply
- Jason Brownlee July 11, 2018 at 5:58 am #
  
  Here is an example of a multi-step forecast:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
Meursault July 11, 2018 at 10:20 am #

Hi Jason,

I noticed that when I set n_out=0 in the series_to_supervised method, my results are almost perfect. This is pretty suspicious to me, but I went through the code and can’t figure out what is going wrong, if anything. The model is still predicting on the right column and using the other columns are the X data. I read your article linked above which discusses the method in more detail but couldn’t figure out what was going on from that. Interestingly, the results get worse as n_out increases, but when I look through the code, the future steps shouldn’t ever be used – so why any change at all? I’m pretty confused here, so any help would be greatly appreciated, and thanks for an awesome tutorial.

Reply
- Jason Brownlee July 11, 2018 at 2:56 pm #
  
  As you increase the number of output steps you will have less training data. This may explain the decrease in performance.
  
  Reply
Sarthak July 11, 2018 at 10:42 pm #

Hi Jason,

This article has really helped me.

I have a question, I want to predict for next 30 days and I have a lag of 4, I give the required value for 1 variable and constantly shift after each prediction. But since the value is scaled between 0 and 1 the predicted value differs from that scale. Causing problem after 10 days of predicted value. Is there any better way to predict for the next 30 days from the model that you have above

Reply
- Jason Brownlee July 12, 2018 at 6:25 am #
  
  Perhaps this post will help:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
Theo July 12, 2018 at 5:23 pm #

Hi Jason,
Very useful tutorial! I am trying this on a different dataset and the results are really good. However, I am afraid I am cheating by letting the output be part of the input?

Shouldn’t the non-shifted pollution column be dropped as well?

Reply
- Jason Brownlee July 13, 2018 at 7:33 am #
  
  The output at the time step being predicted is not part of the input.
  
  Reply
Jack July 12, 2018 at 5:38 pm #

Hi, Jason,
In section 2 Basic Data Preparation, when you plot all the data, how can I show date in the transverse axis instead of number counts?Please help me.

Reply
- Jason Brownlee July 13, 2018 at 7:34 am #
  
  You can set the axis of the graph to be anything you wish.
  
  Reply
theodor| July 12, 2018 at 10:27 pm #

Hi Jason,
Why isn’t the pollution column removed when this is the one we are trying to predict? is it not cheating to use the actual values in the prediction?

Reply
- Jason Brownlee July 13, 2018 at 7:41 am #
  
  No, we are providing the pollution at the last time step as an input.
  
  Reply
Gabriel Mouzella Silva July 13, 2018 at 11:45 am #

I usually never comment on those things, but you just saved my skin. I’ve been trying to create a good and generic way to produce a multivariate data frame for LSTM analysis and this is the only one with a good explanation that I’ve found. Keep doing this amazing job.

Thank you!

Reply
- Jason Brownlee July 14, 2018 at 6:11 am #
  
  Thanks.
  
  Reply
Jay B. July 13, 2018 at 2:48 pm #

Fantastic article! It’s also great to see that you’re still actively helping students a year later.

So, to be clear, this setup does not work for more than a single time-step into the future (i.e. autoregression), is that correct? I encountered numerous problems, but one in particular I couldn’t solve is when extending this problem to both 1.) predict multiple time-steps down the road (by changing the respective value in the series_to_supervised() fxn); and 2.) predicting more than a single value at a particular time step, e.g. predicting the temperature and dew point at the same time. Please let me know if I’m overlooking anything.

Reply
- Jason Brownlee July 14, 2018 at 6:12 am #
  
  Here is an example of multi-step forecasts:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
hamid July 13, 2018 at 5:48 pm #

Hi Jason,
Thanks for your incredible posts and tutorials. I ran your model with some modifications for my own problem and it just worked well.
I have a few questions. It will be the great if have some advice from you.

1) Is training neurons using a shape of ( number of samples, timesteps = 1, features = 24) the same as training using a shape of (number of samples, timesteps=3, features=8) ?

2) I don’t get the difference between the number of timesteps and the number of training samples. For example, If we use timestep=1, does it mean that we don’t need samples before timestep t-1 for updating weights? Of course we do. but I don’t know how.

3) Are validation set used for updating weights? If yes, why you used validation set to predict. This makes bias and over fitting.

Reply
- Jason Brownlee July 14, 2018 at 6:14 am #
  
  Perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-prepare-my-data-for-an-lstm
  
  The validation set is not used for updating weights.
  
  Reply
TC July 14, 2018 at 2:34 pm #

Dear Dr. Jason Brownlee,

First of all, thank you so much for a wonderful tutorial. I can learn faster in neural network and work faster in my project.

Today, I have a few questions that would like to ask about implement LSTM in multivariate time series data.

1. How to modify code if I would like to change column I would like to predicted? For example, predicted wind speed from other columns.

2. Similar to first question, but what if I would like to predicted columns from specific columns? For example, predicted wind speed only from temperatures and pollution values.

3. About model, how to know if this model is model is well-tuned already, or need more tuning? I am a little bit confused about it.

4. About RMSE, if I use another dataset, how could I know if this values is good or bad for regression prediction?

5. This question may out of this tutorial, but what if I would like to do classification problem instead of regression? I would like how to work out with multivariate time series data with LSTM? or maybe if you have another suggestion, I would appreciated it.

I am sorry if some question maybe too weird to ask, but I stuck with this problems for a while now. Also, sorry for my terrible English

Thank you so much for your answer in advance. I am looking forward to hear a response from expert like you.

Reply
- Jason Brownlee July 15, 2018 at 6:07 am #
  
  More on data prep for LSTMs here:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-prepare-my-data-for-an-lstm
  
  You can use diagnostics to see if the model is well suited:
  https://machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/
  
  You can use grid/random searches of hyperparameters to see if you can do better.
  
  Error is relative, and “good” performance is determined against a baseline method like persistence, more here:
  https://machinelearningmastery.com/how-to-know-if-your-machine-learning-model-has-good-performance/
  
  More on how to change a neural net to/form regression to classification:
  https://machinelearningmastery.com/faq/single-faq/how-can-i-change-a-neural-network-from-regression-to-classification
  
  Reply
  - TC July 15, 2018 at 12:31 pm #
    
    Thank you so much for your knowledge resources, Dr. Jason.
    
    Reply
    - Jason Brownlee July 16, 2018 at 6:10 am #
      
      I’m glad it helps.
      
      Reply
Zhang Bo July 14, 2018 at 5:22 pm #

hello,i have tried your univariable method and multivariable method on the problem of prediction for bank businnessvolume.The latter is much better.thanks for your courses.
Is there some suggest on chossing GRU or LSTM or reLSTM for prediction?

Reply
- Jason Brownlee July 15, 2018 at 6:09 am #
  
  Well done!
  
  Perhaps try each and see what works best for your problem?
  
  Reply
Theodor July 16, 2018 at 7:28 pm #

Since we are providing the pollution from the last time step does that mean we are only forecasting tomorrow “then we wait until tomorrow, get the actual value” to predict the day after that?

I apologize for asking this a third time, I am quite new to this concept.

Reply
- Jason Brownlee July 17, 2018 at 6:15 am #
  
  We are providing pollution from the prior time step (hour) to predict the pollution at the next time step (hour).
  
  Generally, LSTMs are terrible at time series forecasting.
  
  Perhaps start with intro to time series:
  https://machinelearningmastery.com/start-here/#timeseries
  
  Reply
Will July 17, 2018 at 6:42 pm #

Thanks for your article Dr. Jason.

I have two question that would like to ask.
How to improve RMSE values using LSTM model, What parameter(s) do I have to change in code? I have tried to edit some of parameters but it not work for me.
And is there any other way to predict future more than LSTM method?

Reply
- Jason Brownlee July 18, 2018 at 6:32 am #
  
  I have some suggestions here:
  https://machinelearningmastery.com/improve-deep-learning-performance/
  
  Also, try other methods, LSTMs are terrible at time series forecasting.
  
  Reply
Alice July 18, 2018 at 12:24 am #

Thanks Jason for sharing. I am considering using RNN to predict customer attrition, that is given all customers’ purchase data in history and labelled attrition status, predict the churn probability of the customers who are still active. I am wondering if LSTM can be applied in such case with such time series data.

Reply
- Jason Brownlee July 18, 2018 at 6:36 am #
  
  Hi Alice, try a suite of methods and see what works best for your specific problem.
  
  Reply
Mallick July 18, 2018 at 2:49 am #

Why does lstm forecast for my data gives a smooth curve instaed of following the given data?

Reply
- Jason Brownlee July 18, 2018 at 6:38 am #
  
  It may not be a suitable model for your data.
  
  Reply
Neha July 18, 2018 at 9:31 pm #

Hi Jason,

Thanks for the good article Dr.Jason.

Is it possible for you to give pointers on multi entity time series forecasting.
I need to forecast for 1000 customers. So was wondering if there is a way of doing so using Lstms or any other technique where multiple models are not required.

Appreciate the help.

Reply
- Jason Brownlee July 19, 2018 at 7:51 am #
  
  Good question.
  
  Some ideas:
  – try a model per customer
  – try a model per a group of customers
  – try a model for all customers
  
  Go with whatever works best.
  
  Reply
Ranjith July 19, 2018 at 1:26 am #

Dear Dr. Jason,

Thank you so much for your tutorial on air pollution.
I just want to try bidirectional LSTM for the above LSTM model to predict the air pollution.

I have same pollution dataset which is used for above LSTM model.

How to develop bidirectional LSTM for that pollution dataset.

Reply
- Jason Brownlee July 19, 2018 at 7:55 am #
  
  Here is an example of a bidirectional LSTM:
  https://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classification-python-keras/
  
  Reply
  - Ranjith July 20, 2018 at 10:02 pm #
    
    Dear Dr. Jason,
    
    I changed above code from LSTM to bidirectional LSTM model like
    
    model = Sequential()
    model.add(Bidirectional(LSTM(50, return_sequences=True), input_shape=(train_X.shape[1], train_X.shape[2])))
    model.add(Dense(1))
    model.compile(loss=’mae’, optimizer=’adam’)
    # fit network
    history = model.fit(train_X, train_y, epochs=50, batch_size=1, validation_data=(test_X, test_y), verbose=2, shuffle=False)
    ————————————————-
    
    but I got error like the following
    
    ValueError Traceback (most recent call last)
    in ()
    74 model.compile(loss=’mae’, optimizer=’adam’)
    75 # fit network
    —> 76 history = model.fit(train_X, train_y, epochs=50, batch_size=1, validation_data=(test_X, test_y), verbose=2, shuffle=False)
    77 # plot history
    78 pyplot.plot(history.history[‘loss’], label=’train’)
    
    ValueError: Error when checking target: expected dense_1 to have 3 dimensions, but got array with shape (50, 1)
    —————————–
    
    can you help me to fix the error.
    
    Reply
    - Jason Brownlee July 21, 2018 at 6:35 am #
      
      Perhaps don’t return sequences.
      
      Reply
Max July 19, 2018 at 8:11 am #

Dear Dr. Jason,

I find this very helpful. I was wondering what changes in this code if you would want to predict each and every time series that you put as input (i.e. pollution, dew, snow, pressure, etc) not just one target variable.

Reply
- Jason Brownlee July 19, 2018 at 2:11 pm #
  
  You could use a TimeDistributed layer wrapping the output model (a dense layer) and have one node for each series to be predicted.
  
  I have no idea how this might perform.
  
  Reply
boraton July 20, 2018 at 7:00 pm #

Hi Jason,

Thank you very much for this tutorial, I found it very useful. I was wondering if you can be of help and assistance in sharing an insight into how to do precipitation forecast using the set of images. I was tasked to train a model to take any number (determined by you) of daily precipitation maps as input, and generate precipitation forecast maps for one week (7 days) into the future. My challenge is how to transform the image dataset into something I can use for precipitation forecasting. Do you have the idea on how I can convert the images to numerical to allow me to use the LSTM and follow the process in your tutorial? I will really appreciate your help since this is my first task in machine learning project.

Reply
- Jason Brownlee July 21, 2018 at 6:33 am #
  
  A good model for working with a time series of images is the CNN LSTM, you can learn more here:
  https://machinelearningmastery.com/cnn-long-short-term-memory-networks/
  
  I also have an example in my LSTM book.
  
  Reply
Qian Wu July 21, 2018 at 5:19 am #

how can i resolve this problem?
“model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
Traceback (most recent call last):

File “”, line 1, in
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))

NameError: name ‘LSTM’ is not defined”

Thanks for your help.

Reply
- Jason Brownlee July 21, 2018 at 6:40 am #
  
  You need to import the LSTM layer.
  
  Perhaps make sure you have copied all of the code.
  
  Reply
Ian July 23, 2018 at 11:43 pm #

Hi Jason,

i get the following error: Input contains NaN, infinity or a value too large for dtype(‘float32’)

I suggest that the algorithm is still working with the wind direction, which causes the error due to the dtype is somehow still a string and can not be converted to float.

Does anybody has the same problem and can help out?

Reply
- Jason Brownlee July 24, 2018 at 6:19 am #
  
  Did you copy all of the code? Perhaps you skipped a step?
  
  Reply
  - Murat August 4, 2018 at 3:23 am #
    
    Hi Jason
    
    I am having the same issue with the Ian. My data does not have any nan values. Algorithm is producing this error: Input contains NaN, infinity or a value too large for dtype(‘float32’)
    with certain epochs or bach sizes. When I chance epoch, or batch size with the same data, I am not getting this error.
    I made a research on it and having 0 values in data cause the nans. However, after I remove the 0’s, I still get the same error. I don’t know how epochs or batch size, cause this problem.
    
    Thank you Jason and Ian
    
    Reply
    - Jason Brownlee August 4, 2018 at 6:13 am #
      
      Perhaps try scaling or not scaling the data before modeling?
      
      Reply
Ryan July 24, 2018 at 1:29 am #

Hi Jason, This is super helpful. You mention that LSTM is not good for time-series/sequence models. Why is that, and what would you recommend as the optimal algorithm to use for such models? Thanks!

Reply
- Jason Brownlee July 24, 2018 at 6:20 am #
  
  See this post for why:
  https://machinelearningmastery.com/suitability-long-short-term-memory-networks-time-series-forecasting/
  
  There are no optimal algorithms, just methods we test on a given problem to discover what works.
  
  I am finding CNNs to be very effective for time series problems at the moment. I have a ton of posts scheduled on the topic.
  
  Reply
  - Ryan August 31, 2018 at 6:46 am #
    
    This is perfect–thanks so much for your reponse!
    
    Reply
Muddassir July 24, 2018 at 3:44 pm #

I tried with Random forest.It is giving less mse and rmse compared to LSTM

Reply
- Jason Brownlee July 25, 2018 at 6:12 am #
  
  I’m not surprised.
  
  Reply
Malcom July 26, 2018 at 12:50 pm #

DR. Jason

Thanks for your tutorial. I am a little bit confused with # drop columns we don’t want to predict.

reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)

This line, means that we predict 8) Polution and use 1) Polution 2) dew 3) temp 4) press 5) wnd_dir 6) wnd_spd 7) snow 8) rain as features for prediction model, am I right?

so if we change the number, we can predict another column, am I right?

and if I want to predict more than 2 columns and/or use only some feature, what can I do?

Reply
- Jason Brownlee July 26, 2018 at 2:29 pm #
  
  Yes.
  
  You can predict more columns by having more nodes in the output layer.
  
  Reply
  - Malcom July 26, 2018 at 3:55 pm #
    
    Thank you for your reply. I have some more question about split into input and output
    
    # split into input and outputs
    train_X, train_y = train[:, :-1], train[:, -1]
    test_X, test_y = test[:, :-1], test[:, -1]
    
    from here, what does -1 means exactly? Thank you for your reply in advance.
    
    Reply
    - Jason Brownlee July 27, 2018 at 5:46 am #
      
      You can learn more about how to slice and split in Python here:
      https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/
      
      Reply
Luke July 27, 2018 at 5:44 pm #

Hello Jason, thanks for a nice article.
I am struggling with some error once I tried predicted more variables and use less variable to predicted. But there seems to be error like this while it is going to report RMSE value.

ValueError: operands could not be broadcast together with shapes (10000,18) (16,) (10000,18)

in this line

—> inv_yhat = scaler.inverse_transform(inv_yhat)

Any suggestion to modify code?

Reply
- Jason Brownlee July 28, 2018 at 6:31 am #
  
  Looks like there is something going on with the shape of your data.
  
  Confirm you copied all of the code exactly?
  Confirm the shape and content of the data?
  
  Reply
  - Luke July 28, 2018 at 2:54 pm #
    
    Thanks for your reply Jason.
    
    I am pretty sure all code are the same except the number in drop column in reframed. Because I want to try predict another column. (I use another dataset, it works well when I predicted only one column and use all columns for prediction but, it return value error when I want to predicted more than one or not use all column for prediction)
    
    any advice please?
    
    Reply
    - Jason Brownlee July 29, 2018 at 6:07 am #
      
      If you have changed the example, it is hard for me to help without debugging your changes, which I don’t have the capacity to do.
      
      Reply
jorge July 27, 2018 at 7:19 pm #

Hi Jason, agin thanks for your tutorial

There is one of your tutorial you said ” It also requires explicit resetting of the network state after each exposure to the training data (epoch) by calls to model.reset_states()”
-name of that tutorial is “Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras”

I am wondering why that idea was not implemented in this code

Thanks

Reply
- Jason Brownlee July 28, 2018 at 6:32 am #
  
  By all means you can try it.
  
  I wanted the focus of the tutorial to be how to get multivariate time series going with LSTM, not all the variations in which to do it.
  
  Reply
  - Luke July 29, 2018 at 9:24 pm #
    
    Thank you for your answer.
    
    One last question.
    According to your sample, if I change like this
    
    # drop columns we don’t want to predict
    reframed.drop(reframed.columns[[6, 7, 8,10,11,12,13,14,15]], axis=1, inplace=True)
    
    What should I edit more to make it work?
    
    Reply
    - Jason Brownlee July 30, 2018 at 5:47 am #
      
      I’m eager to help, but I don’t have the capacity to customize the tutorial for you.
      
      Reply
RYY July 28, 2018 at 2:25 am #

Hi Jason, Thank you so much for such useful code. It works very well.
By the way, my data set has hundreds of features and the number of lag time to be considered is over 10000. Therefore, when using the function “series_to_supervised”, insufficient memory happens and the operation stops completely.

I think it can be solved by using model.fit_generator, but I can not make generator code that incorporates series_to_supervised function….

Could you tell me your opinion?
I really need your help…

Reply
- Jason Brownlee July 28, 2018 at 6:38 am #
  
  Perhaps try working with less data?
  Perhaps try working on a machine with more RAM?
  Perhaps try writing a custom data generator?
  
  Reply
jorge July 28, 2018 at 7:41 pm #

Hi Jason

I did as you suggest reset the states after every epoch, results become better. Unfortunately when i add stateful = True at lstm layer, the results become not good and I used time series data. So is it OK to train with both stateful and return_sequence to be False

If stateful = False means the RNN does not learn the relation between sequences, it means sequence 1 will be treated independently of sequence 2?

Reply
- Jason Brownlee July 29, 2018 at 6:11 am #
  
  Not quite. A stateful LSTM will give you control over when internal state is reset.
  
  A “stateless” lstm will reset state after each batch of samples during training.
  
  Reply
Yang July 30, 2018 at 1:13 am #

Hi Jason Thank you for your great code.
I was following your direction and I got some error at
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
this code named ‘while_loop() got an unexpected keyword argument ‘maximum_iterations’.
Do you know how to solve this problem?

Reply
- Jason Brownlee July 30, 2018 at 5:51 am #
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Kaushik Dey July 30, 2018 at 4:44 am #

Hi Jason,

Thanks for the code and explanations.. Really helped me get a handle on time series using RNNs..
One question, I found during the data preparation phase, if I use a StandardScaler as opposed to a MinMaxScaler, the accuracy deteriorates by huge amount.. Can you throw some light on why a standard scaling cannot provide even a close result on the same code which MinMax scaling can?

Thanks and really appreciate your work in this blog

Regards
Kaushik

Reply
- Jason Brownlee July 30, 2018 at 5:54 am #
  
  It depends on the data and the model that you are using.
  
  Reply
  - Kaushik Dey July 30, 2018 at 2:13 pm #
    
    I was using the same pollution data and the model is the LSTM, coded in the way, which you have shown… All I did was change the scaling to StandardScaler and the prediction accuracy just went out of bounds… Any pointers/ thoughts you can provide on this would be helpful..
    
    Thanks in advance…
    
    Reply
    - Le Van Duc October 9, 2018 at 4:56 pm #
      
      Hi,
      I also run Jason’s tutorial with changing MinMaxScaler to StandardScaler (just change like this: scaler = StandardScaler() and everything else keeps no change) and I got a better RMSE of 24.619.
      Hope this help you !
      
      Reply
      - Kaushik Dey October 10, 2018 at 1:22 am #
        
        Thanks Le Van. However in my case using a Standard Scaler takes down the RMSE quite a few notches… Not sure why!!
      - Jason Brownlee October 10, 2018 at 6:01 am #
        
        Nice tip!
Jake July 30, 2018 at 12:11 pm #

I am curious about RMSE part.

First, does inv_y equivalent to test_y ? because inv_y is inverse of test_y.
And in RMSE calculation, why don’t we use rmse = sqrt(mean_squared_error(yhat, test_y)) instead? because mean square error should calculated from prediction and test. Or did I missing something?

Reply
- Jason Brownlee July 30, 2018 at 2:17 pm #
  
  We are inverting the transform on the prediction before comparing yhat to y_true in original units.
  
  Reply
  - Jake July 30, 2018 at 11:33 pm #
    
    Thank you for your answer. Nevertheless, I am really confused in Evaluate model part.
    
    In my understanding, we use inverse to inverse value that we normalized back to same value just like in dataset.
    
    But when I try to print(inv_y)
    
    and result in
    
    [31. 20. 19. … 10. 8. 12.]
    
    If this is really inverse of y or something we want to predict or test (Pollution). It should be as same as value in dataset. But those first 3 lines of values is not like to any value in pollution dataset column.
    
    To summarize my problem. I mean like this
    
    inv_y : 31, 20, 19
    Polution : 129, 148, 159
    
    They are not the same.
    
    I followed all of your code and it give me result of RMSE but I am a bit confused about this.
    Am I missing something? Thanks for your reply in advance.
    
    Reply
    - Jason Brownlee July 31, 2018 at 6:02 am #
      
      Are you sure you’re printing the correct column of data?
      
      Reply
      - Jake July 31, 2018 at 1:08 pm #
        
        I am pretty sure that I am printing correct data. Even though I print wrong column, it should be same as some column in dataset, but it don not match any column at all. That’s why I am curious about it.
        
        Thank for your answer in advance, looking forward for your reply soon.
      - Jake July 31, 2018 at 2:14 pm #
        
        I just found something that maybe useful.
        
        After I print test_y, it result like this
        
        [[0.03118712]
        [0.02012072]
        [0.01911469]
        …
        [0.01006036]
        [0.00804829]
        [0.01207243]]
        
        And when I print inv_y, it result like this
        
        [31. 20. 19. … 10. 8. 12.]
        
        For some reason, this look like inv_y is test_y * 100, not the inverse of data column.
Muddassir July 31, 2018 at 6:23 pm #

What if we don’t have the target variable ‘Y’ in the test.
In the above case we have the target variable in both train and test.

Can u please suggest?

Reply
- Jason Brownlee August 1, 2018 at 7:41 am #
  
  You must have targets in train and test.
  
  If you don’t have a target for a dataset, then you are making a prediction with a final model:
  https://machinelearningmastery.com/train-final-machine-learning-model/
  
  Here’s how:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
  - Muddassir August 3, 2018 at 1:59 pm #
    
    i have 4 years of data 2007,2008,2009,2010 data with 10 predictor variables and 1 target variable .
    Target variable is continous.
    train data:2007,2008,2009
    test data:2010
    I have to forecast the target variable for 2011
    Note: predictor variables are not given for 2011 data .
    
    Reply
    - Jason Brownlee August 3, 2018 at 2:24 pm #
      
      Once you choose your model configuration, you can train a final model and use it to predict 2011.
      
      I have more on final models here:
      https://machinelearningmastery.com/train-final-machine-learning-model/
      
      Reply
      - Muddassir August 4, 2018 at 12:04 am #
        
        I will check and get back to u
RUI J GONCALVES August 1, 2018 at 12:58 am #

It is multivariate or multivariable ? I think it is multivariable in this case (many “indicators” to predict one value : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3518362/ )

Reply
- Jason Brownlee August 1, 2018 at 7:46 am #
  
  Thanks for the ref.
  
  Reply
Michael August 3, 2018 at 5:05 am #

For evaluation, how about:

pyplot.scatter(test_y, yhat)

Reply
- Jason Brownlee August 3, 2018 at 6:05 am #
  
  Very nice.
  
  Reply
Simranjit Singh August 3, 2018 at 6:14 pm #

Sir, First of all Great Tutorial
I am new to this. In the tutorial, reframed.drop the columns u want to predict. how can i make changes such that i can predict more columns??

Reply
- Jason Brownlee August 4, 2018 at 6:01 am #
  
  Change the model to have multiple nodes in the output layer, then change the data accordingly.
  
  Reply
Alan August 3, 2018 at 8:37 pm #

Hi Jason,
in case know the covariates value in next 24 timesteps and i want to estimate thevalue of pollution. How can I adjust the model you have published? Thx A

Reply
- Jason Brownlee August 4, 2018 at 6:04 am #
  
  I don’t follow, perhaps you can rephrase your question?
  
  Reply
FP August 4, 2018 at 10:46 pm #

Good Day Jason,

Thank you for this example. I hope you do not mind a couple of questions:

1. Are you perhaps aware of similar example for time series stock market forecasting?
2. Could you clarify whether the back propagation algorithm is used in this demonstration?

Tx, FP

Reply
- Jason Brownlee August 5, 2018 at 5:31 am #
  
  I have more on the stock market here:
  https://machinelearningmastery.com/faq/single-faq/can-you-help-me-with-machine-learning-for-finance-or-the-stock-market
  
  Backpropagation was used to train the model. specifically Backpropagation through time or BPTT.
  
  Reply
jorge August 6, 2018 at 5:53 pm #

Dear Jason

Thanks for the good tutorial, I have another question regarding usage of dataset.
We used test_X, test_y as validation dataset and again we used test_X, test_y for prediction as testing dataset. Some data science tutorial said need three separate dataset for train, validation and testing. There is no effect of using the same dataset during fitting the model and evaluating the model

Reply
- Jason Brownlee August 7, 2018 at 6:25 am #
  
  Yes, you can learn more about the use of the different datasets here:
  https://machinelearningmastery.com/difference-test-validation-datasets/
  
  Reply
Katya August 12, 2018 at 2:03 am #

Hi Jason,

Thank you so much for creating this amazing blog. I have learned so much about time series modeling from you.

I’m new to machine learning and have a basic question about LSTMs. When you split your data into a test and training set as you did in your example, is the training set using the LSTM model-predicted value to predict the next time step value; or does the test set use the real previous day’s value to go to the next time step?

For example, I used an LSTM model with a 10 day lag and 7 independent variables to predict a dependent variable. All values are measured once a day and I had 2876 days of data. I made my training set the first 2000 values and used that model to predict the next 876 days. I got a RMSE of less than 1 and the plot between modeled and observed (real data) was extremely well fit. It was so well fit that it made me wonder if I was missing something.

To help illustrate my question, let’s say I’m looking at data point 2300, which is in the test set. Is the LSTM using the real dependent variables from days 2290-2299 to predict the dependent variable on day 2300 or is it using the predicted values for days 2290-2299 to predict day 2300? I understand that each day in the test set would use the real data for the 7 independent variables.

Please let me know if I need to clarify this further. I really look forward to hearing from you. Thanks.

Katya

Reply
- Jason Brownlee August 12, 2018 at 6:35 am #
  
  You can choose to model the problem any way you wish.
  
  I’d encourage you to explore a few framings of the problem in order to discover work works well.
  
  Reply
  - Katya August 12, 2018 at 6:44 am #
    
    Thanks Jason. I meant in your example, which way did you do this? Is your model predicting all of the data in your test set using predicted y-variables the whole time, or is each new y-prediction going back to the real data to forecast ahead? It seems the model is way more accurate if it can correctly simulate 800+ days of data when predicted values for t-1 and t-2 are used as opposed to using the real data, the x-variables, and the model to predict the next day’s value. Hopefully this makes sense.
    
    What I’m modeling varies between 60 and 200, and only moves up and down a few points each day. So it wouldn’t be hard to forecast it using a moving average if all you had to do was correctly guess the next day. But to correctly guess 800+ days in a row, which I thought it what a validation (test set) does, is much more impressive.
    
    I also had another question. Can you write a more detailed explanation of what “n_features = 8” means? I thought it would be something like the number of independent variables in your model, but there are only 7 of those so I am confused. Thanks.
    
    Reply
    - Jason Brownlee August 13, 2018 at 6:14 am #
      
      This post will better explain how to prepare input data for the LSTM:
      https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
      
      Reply
      - Katya August 17, 2018 at 12:48 am #
        
        Hi Jason. Thank you. I appreciate your help.
Rich Larrabee August 14, 2018 at 12:48 am #

Hi Jason,

Can this algorithm be used to find outliers or anomalies in the data set? If so, what changes would be used?

Thanks,

Rich

Reply
- Jason Brownlee August 14, 2018 at 6:21 am #
  
  Perhaps this will help:
  https://machinelearningmastery.com/how-to-use-statistics-to-identify-outliers-in-data/
  
  Reply
Tejpata August 14, 2018 at 4:49 am #

How can we do it for images ?

Reply
- Jason Brownlee August 14, 2018 at 6:25 am #
  
  Do what?
  
  Reply
Thibault August 14, 2018 at 9:55 pm #

Hello Jason, I took your example to make a hydrological forecast for the next hour using meteorological forecasts available as explanatory variables at time t + 0 and the hydrological variable t-1. It works pretty well thank you very much.
Given that I have weather forecasts for the next 72 hours, how do I run the model 72 times, taking each time my previous forecast (Y) as a new entry and having 72 hours of forecast? There I am stuck

Reply
- Jason Brownlee August 15, 2018 at 6:01 am #
  
  Perhaps try a for-loop.
  
  Reply
  - Thibault August 18, 2018 at 12:47 am #
    
    Helllo again
    I tried to adapt the multivariate forecast above on several timestamps to my case an it works. But I can not make a loop for my model outputs as model entries, for example for the next 72 hours. How are you doing that? It’s probably a bit like this example below, but it’s multivariate… Pease I would take any help :
    
    # make one forecast with an LSTM,
    def forecast_lstm(model, X, n_batch):
    # reshape input pattern to [samples, timesteps, features]
    X = test_X.reshape(1, 1, len(X))
    # make forecast
    forecast = model.predict(X, batch_size=n_batch)
    # convert to array
    return [x for x in forecast[0, :]]
    
    # evaluate the persistence model
    def make_forecasts(model, n_batch, train, test, n_lag, n_seq):
    forecasts = list()
    for i in range(len(test)):
    X, y = test[i, 0:n_lag], test[i, n_lag:]
    # make forecast
    forecast = forecast_lstm(model, X, n_batch)
    # store the forecast
    forecasts.append(forecast)
    return forecasts
    
    Reply
    - Jason Brownlee August 18, 2018 at 5:39 am #
      
      This may help:
      https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
      
      Reply
OR August 15, 2018 at 2:15 am #

What do you think about PyTorch, Jason? Is it going to replace Keras as a go-to toolkit for newbies? Maybe you can write an article comparing the two platforms and why you think one might be better than another.

Reply
- Jason Brownlee August 15, 2018 at 6:10 am #
  
  Perhaps, but not yet.
  
  Keras might be easier to use, pytorch might have more flexibility.
  
  Reply
Aynaz Biniyaz August 16, 2018 at 7:07 am #

Hi Jason
Thank you for your great blog and examples.I am very new in Machine Learning topic and I was wondering If we could just predict the Pollution based on the other inputs, not included the Pollution as input. I appreciate your help in advance.

Reply
- Jason Brownlee August 16, 2018 at 1:56 pm #
  
  Yes, you will have to modify the example to only use the other variables as input.
  
  Reply
  - Yzhao March 4, 2020 at 5:24 am #
    
    Hello, could you please to tell me how to modify it ? I am new in the python. Thanks.
    
    Reply
    - Jason Brownlee March 4, 2020 at 6:01 am #
      
      This is a common question that I answer here:
      https://machinelearningmastery.com/faq/single-faq/can-you-change-the-code-in-the-tutorial-to-___
      
      Reply
      - TG May 17, 2021 at 1:43 am #
        
        “Perhaps you can try to make the change yourself?”
        
        ????
- Yzhao March 4, 2020 at 5:23 am #
  
  Hello Aynaz Biniyaz,
  
  Didi you solve this problem, not include the Pollution as input, to predict Pollution ?
  
  Reply
Marco August 21, 2018 at 7:56 am #

Hi Jason,
I have read also other answers you provide and also your article about the difference between training, validation and test set. But it is still not clear to me why during the training of our model when we fit it we use ” validation_data=(X_test, y_test)” , that is the same test dataset we will use to make the final predictions. I hope you can help me to understand it since for me this is not clear.
Thank you,
Marco

Reply
- Jason Brownlee August 21, 2018 at 2:14 pm #
  
  You do not need to use a validation dataset, it is a choice.
  
  You can learn more about validation datasets here:
  https://machinelearningmastery.com/difference-test-validation-datasets/
  
  Reply
Henry August 21, 2018 at 6:10 pm #

Dear Dr. Jason:
Thanks for your share. Your example data is formed by weather conditions and pollution,and your goal is to predict current time’s pollution according to previous time step(s’)’s weather conditions and pollution. What if the weather conditions are the artificial control variables,and can I use LSTM to solve it? For example,my data is formed by system’s control variable and system performance(ipc,etc.) in time series, that is, each interval I change the systems control variable and measure a instant performance during emulator’s working. My goal is to train a model between system’s control variable and system’s performance, is it proper for me to use a LSTM to solve it? Hope your answer,thanks.

Reply
- Jason Brownlee August 22, 2018 at 6:08 am #
  
  Sure, try it and see.
  
  Reply
xiaojuan cheng August 21, 2018 at 8:20 pm #

hai,I have a question. when normalize data, you use all the data, including input and output.
and when invert scaler, you use all the training or test data, including input and output. why not
invert just output? because when compute rmse, we just need pre_y and true_y, if I invert only output values, not input value, is it right?

Reply
- Jason Brownlee August 22, 2018 at 6:11 am #
  
  We only need to invert the output to calculate RMSE. We create a larger matrix because the sklearn library requires the data to be the same shape on each call to fit(), transform() and invert_transform().
  
  Reply
Richard August 22, 2018 at 11:37 pm #

Hi Dr Brownlee
Deeply enjoyed this article, and all other ones.
I have a question regarding a problem I have, which is that I have a data with a timeline for 2 years and with data each week of 10 variables. eg, 2017/01/01 var1 = a, var2 = b, var3 = c etc. All data are numeric. i want to predict all varibales for the next 3 month for example, Is this a problem that lstm time seris can solve or is it a surviavl problem, thank you very much for your help.

Reply
- Jason Brownlee August 23, 2018 at 6:13 am #
  
  I recommend testing many other methods first.
  
  Follow this process:
  https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
  
  Reply
Miles August 23, 2018 at 3:33 am #

Thanks so much for this Jason! I have a question about seeding the forecast. With the LSTM, it looks like I have to provide a “guess” at the pollution for the forecast to work (e.g., I can’t just give it the inputs from the previous day and get an answer without also providing a “guess” at what the answer might be). This will probably work well for trying to predict the next day. But what if I wanted to forecast every day for the next month where I don’t have a good guess at what the pollution level might be?
Is this basically just a multi-step multivariate time series forecast? And do you have a tutorial for something like this?
Thanks!

Reply
- Jason Brownlee August 23, 2018 at 6:16 am #
  
  No guessing is going on.
  
  You can frame the problem any way you wish.
  
  Nail down the inputs you want to use and the outputs required, then define a model to meet that, then reshape your data into that form.
  
  I have a number of multivariate multistep examples written already and scheduled. I also have some in my new book that should be out in a week or two on deep learning for time series forecasting.
  
  Reply
  - Miles August 23, 2018 at 6:49 am #
    
    So if I wanted to follow this same example (forecasting air pollution), but I didn’t want to use the previous day’s pollution as an input, I could just drop that column from reframed dataframe, correct? e.g. Change
    # frame as supervised learning
    reframed = series_to_supervised(scaled, 1, 1)
    # drop columns we don’t want to predict
    reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)
    
    to
    
    # frame as supervised learning
    reframed = series_to_supervised(scaled, 1, 1)
    # drop columns we don’t want to predict
    reframed.drop(reframed.columns[[0,9,10,11,12,13,14,15]], axis=1, inplace=True)
    
    Dropping the first column drops the previous day’s pollution from the input.
    
    Reply
    - Jason Brownlee August 23, 2018 at 8:04 am #
      
      Yes, sounds good, although I’ve not tested your changes.
      
      Reply
      - Miles August 23, 2018 at 8:38 am #
        
        Fantastic. Going to try it. Will let you know how it turns out.
Chrisa August 25, 2018 at 8:42 pm #

Hi jashon,

Can I apply lstm if I want to categorize my input into 4 classes? Like the iris problem.

Reply
- Jason Brownlee August 26, 2018 at 6:27 am #
  
  LSTMs are not suitable for non-sequence prediction problems like the iris flower problem.
  
  Learn more here:
  https://machinelearningmastery.com/faq/single-faq/when-should-i-use-an-mlp-cnn-and-rnn
  
  Reply
James August 27, 2018 at 6:36 am #

Hi Jason,

Firstly I am new to this technology and this is has served as a great example, thank you! I have modified the example and built a number of LSTM models that appear to forecast properly based on 1 second data. Two questions:

1. What is the best way to predict given a real-time prediction scenario. I can loop thru the real-time data and update a prediction every couple of minutes. (i.e. wait until i have 60 rows of features then perform a prediction, wait for another 60 rows of features then re-predict etc …) Would i change series_to_supervised(scaled, 60, 1) to support looking at 60 seconds at a time?

2. I am new and therefore cautious of using the predict feature with the feature (y variable) we are trying to predict in the data set (yhat = model.predict(test_X)). Can we strip this variable out before loading the model.predict (e.g. yhat = model.predict(test_X[:,1:])? I have tried this but it complains about a shape error … I am probably be overly cautious but when i predict in a real-time scenario we won’t have the y variable …

Reply
- Jason Brownlee August 27, 2018 at 1:56 pm #
  
  It depends on your domain, e.g. whether there is benefit in fitting one final model, whether a model needs to be updated or whether a new model should be fit. Experiment and see what results in the best skill on your data.
  
  You can model the problem anyway you wish.
  
  Reply
Peter Peng August 28, 2018 at 1:10 pm #

Hi Jason. I apply LSTMs to the traffic flow predictiom(time series data). I have some questions to consult. First, i use “mse” as the loss function, but the test loss is always lower train loss during the whole process. And i get the same result even if change the dataset. That is why? Becase the loss function、model…? Second, you suggest that LSTMs can not be applied for time series data prediction and what preprocess(except for normalizition) needs to be done berfore features come into LSTMs, just do like your this example? In addition, i find that LSTMs can capture the trend of time series, but it is sometimes weak in accuracy.

Reply
- Jason Brownlee August 29, 2018 at 8:00 am #
  
  I have some notes on having better performance on the test dataset here:
  https://machinelearningmastery.com/faq/single-faq/what-if-model-skill-on-the-test-dataset-is-better-than-the-training-dataset
  
  Try Relu activation functions with LSTMs. Also, I have found LSTMs work better if the data is differenced to remove trends/seasonality.
  
  Also try CNNs, CNN-LSTMs and ConvLSTMs on time series, I’ve had great success.
  
  Reply
Vishwas Samanth August 29, 2018 at 12:55 am #

Hi Jason,

model = Sequential()
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))

In the code you have mentioned LSTM layer with 50 neurons, On what basis are we deciding the number of neurons here?

Reply
- Jason Brownlee August 29, 2018 at 8:13 am #
  
  Trial and error. You can learn more here:
  https://machinelearningmastery.com/faq/single-faq/how-many-layers-and-nodes-do-i-need-in-my-neural-network
  
  Reply
Wendy August 29, 2018 at 10:02 am #

Hi, Jason,

I want to predict daily temperature based on historical data which is measured in each 15 minutes.

6/16/07 4:45 1.94 1180 16.7
6/16/07 5:00 1.94 1180 16.7
6/16/07 5:15 1.95 1190 16.7
6/16/07 5:30 1.94 1180 16.6
6/16/07 5:45 1.94 1180 16.6
6/16/07 6:00 1.93 1180 16.6
6/16/07 6:15 1.94 1180 16.6
6/16/07 6:30 1.94 1180 16.5
6/16/07 6:45 1.94 1180 16.5
6/16/07 7:00 1.93 1180 16.5

# specify the number of lag hours
n_hours = 4*24 (is that correct ? if I want to have daily prediction)
n_features = 3

# split into train and test sets
values = reframed.values
n_train_hours = 365 * 24*4*8 (I have 10 years historical data. So I split 80% as train data. )

Reply
- Jason Brownlee August 30, 2018 at 6:21 am #
  
  I’m eager to help, but I don’t have the capacity to debug your changes.
  
  Reply
Peter Peng August 30, 2018 at 11:26 am #

Thanks, Jason. You suggest that LSTMs will work better if data is difference to remove trends/seasonality. Can you give me some examples or posts about it?

Reply
- Jason Brownlee August 30, 2018 at 4:51 pm #
  
  Yes, I have a number scheduled and I have examples in the new book:
  https://machinelearningmastery.com/deep-learning-for-time-series-forecasting/
  
  You can prepare your own examples also, fit a univariate problem that has trend+seasonality with and without differencing and compare results.
  
  Reply
Peter Peng August 31, 2018 at 7:42 pm #

Thanks，Jason. I find that predicted value x(t) is equal to actual value x(t-1), which means that the model has one step delay by LSTMs. Can you give me some suggestions on how to improve or solve this problem?

Reply
- Jason Brownlee September 1, 2018 at 6:19 am #
  
  I have some suggestions here:
  https://machinelearningmastery.com/improve-deep-learning-performance/
  
  Reply
Nitin Kanwar September 1, 2018 at 1:24 pm #

Hi Jason,

Exceptional tutorials you have here on this website. I have been following this website for a while now.

I am kinda new to RNNs. I have a few questions/doubts –

1. In the example above, do we predict only for one time step in the future? What if I want to predict multiple time steps into the future? Will this code work or I need to make changes?

2. I read through Andrej Karpathy’s blog “The Unreasonable Effectiveness of Recurrent Neural Networks”. He performs a sampling process where he generates new characters once the RNN has learned. The following excerpt is from the blog –

“At test time, we feed a character into the RNN and get a distribution over what characters are likely to come next. We sample from this distribution, and feed it right back in to get the next letter. Repeat this process and you’re sampling text! Lets now train an RNN on different datasets and see what happens.”

Can we do something similar in this RNN? Like feed it data for one time step and keep feeding the result back to the RNN and predict for multiple time steps? If this is how it is being done in your code then could you please point me to the code section.

Thank you for all your help.

Reply
- Jason Brownlee September 2, 2018 at 5:28 am #
  
  Here is an example for predicting multiple future time steps:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  You can use RNNs as a generative model for time series. Not sure why you would want to though?
  
  Reply
  - Nitin Kanwar September 2, 2018 at 6:20 am #
    
    Hi Jason,
    
    Thanks for your reply.
    
    I am working on predicting stock prices based on historical stock market data available. I would like to predict stock prices for future dates. I plan to use RNNs to learn the features and make predictions. Once the predictions are generated, I want to apply a reinforcement learning algorithm to maximize the future profits. Does that sound feasible? I am new to RNNs and RL so not sure if this is the right path. Please let me know your thoughts.
    
    Thanks.
    
    Reply
    - Jason Brownlee September 3, 2018 at 6:07 am #
      
      I’m not a fan of predicting stocks:
      https://machinelearningmastery.com/faq/single-faq/can-you-help-me-with-machine-learning-for-finance-or-the-stock-market
      
      Nevertheless, you can get a long way with stochastic optimization first, before trying RL methods.
      
      Reply
      - Nitin Kanwar September 3, 2018 at 7:15 am #
        
        Thanks for your reply.
        
        I am a graduate student working on a thesis to study the efficiency of Deep RL algorithms in predicting stock prices. I will really appreciate if you could point me to some good resources.
        
        Thank you for all your help.
      - Jason Brownlee September 3, 2018 at 1:34 pm #
        
        Sorry, I don’t have material on deep RL, I cannot give you good ad hoc advice.
SA September 2, 2018 at 9:48 am #

Hello Jason

I have a time series dataset which include 30 attributes and the price.I would like to predict the price.All 30 fields are related to the price and the price in the past is also an important input.
Any suggestions.

Thanks

Reply
- Jason Brownlee September 3, 2018 at 6:10 am #
  
  Yes, try a suite of different methods and discover what works best for your specific dataset:
  https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
  
  Reply
  - SA September 3, 2018 at 6:59 am #
    
    Thanks Jason
    I have read the article , very comprehensive .Thanks a lot.
    
    Is there any way that we can convert multiple inputs to one variable that represent all the inputs.For example I have 30 attribute which are all related to prediction .Is there any algorithm that receive multivariate and convert it to univariate before we make the final prediction.
    
    Reply
    - Jason Brownlee September 3, 2018 at 1:33 pm #
      
      Yes, you can train an autoencoder to compress multiple sequences to a fixed length vector.
      
      I have a post on this topic scheduled.
      
      Reply
      - SA September 4, 2018 at 5:52 am #
        
        Hi Jason
        
        Great. Will look into it an wait for your new article.
        
        You are the ML Wikipedia 🙂
      - Jason Brownlee September 4, 2018 at 6:12 am #
        
        Thanks.
Joe September 2, 2018 at 5:21 pm #

I wonder if this method can be applied to real-time prediction or online learning? Perhaps change batch_size to 1 might make it online?

Reply
- Jason Brownlee September 3, 2018 at 6:11 am #
  
  What do you mean by online?
  
  The model can make make predictions from one sample directly.
  
  Reply
Rahul B Raj September 3, 2018 at 4:49 pm #

Hi Jason,
I am relatively new to the topic. According to my understanding of the code, you have forecasted the pollution value for tomorrow providing today’s feature values(temperature, and the like). How can we do the same with forecasted feature values?
Thank You

Reply
- Jason Brownlee September 4, 2018 at 6:03 am #
  
  What do you mean exactly, what are the inputs and outputs that you want?
  
  Reply
  - Rahul B Raj September 4, 2018 at 11:55 am #
    
    Suppose I have trained the data using 3 months features f1 , f2 to predict w. Now I have an external data of f1 and f2 of the day after the trained 3 months. I need to predict the corresponding w for the same.
    
    Reply
    - Jason Brownlee September 4, 2018 at 1:52 pm #
      
      call model.predict()
      
      What problem are you having exactly?
      
      Reply
      - Rahul B Raj September 4, 2018 at 2:30 pm #
        
        According to the model that you have created, the argument in the model.predict() has values in f1, f2 and w right?
        I know the f1 and f2 values of the next timestep. I need to get the corresponding w value.
      - Jason Brownlee September 5, 2018 at 6:26 am #
        
        You can define the inputs and outputs of the model to be anything you wish.
Uday September 5, 2018 at 5:14 pm #

Hi Jason,

Thank you for this tutorial.

I have a question on “how to automatically identify time series data using python”. I want to build one data science workbench, where I need to classify the problem type programatically by reading the data. We can easily differentiate Regression Vs Classification Vs Clustering. But I am looking at differentiating Time-Series Vs Regression problems.

Need to know your suggestions on how to differentiate the problem type, like, Time-Series Vs Regression programmatically.

Thank You

Uday

Reply
- Jason Brownlee September 6, 2018 at 5:33 am #
  
  If observations are ordered by time, it is a time series.
  
  Reply
keras_tf September 6, 2018 at 4:00 pm #

Hi why are you using the same data for test and validation.Using the same data for both will not give proper info about its performance on truely unseen values.Or am i missing sth here.Thanks

Reply
- Jason Brownlee September 7, 2018 at 8:02 am #
  
  To simplify the example.
  
  Reply
Akim September 7, 2018 at 3:24 pm #

Hi Jason,

Amazing job! Thank you for sharing. I have one question. I have 3 features and I want to look 20 steps back in time. I read in your other post “Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras” you define that as look_back. Then in my case my input will be 3*20=60?
Thank you.

Regards,
Akim

Reply
- Jason Brownlee September 8, 2018 at 6:01 am #
  
  I would recommend preparing data using this post:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  Reply
ammara September 11, 2018 at 3:36 am #

Hi Jason Thank you for the code. I used a random input variable to predict pollution data. I did not change anything in pollution variable.

random_var=(np.random.randint(50, size=(1, 43800))).T
Add random variable as a column in dataset
random_var=dataset.iloc[:,8]

So basically input data is only pollution data and random variable
input_da=pd.concat([dataset.iloc[:,0:1],dataset.iloc[:,8]], axis=1)
dataset = input_da.iloc[:,0:3]
values = dataset.values

Model is predicting well even with random variable. How is that possible?

Reply
- Jason Brownlee September 11, 2018 at 6:32 am #
  
  I expect it did not do well.
  
  Reply
Ricardo September 12, 2018 at 2:50 am #

Hi Jason

thanks for this tutorial ! and the many others you made ! these are great learning tools, very practical !

I see this in the code , and I think there is a look ahead bias:
# normalize features
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)

and then later a split to train and test:
train = values[:n_train_hours, :]
test = values[n_train_hours:, :]

the usual approach is to 1st split in train and test and then do scaler.fit_transform(train) and scaler.transform(test)…

test data should be treated as unseen…

Reply
- Jason Brownlee September 12, 2018 at 8:15 am #
  
  I have advice on the ordering of transforms here:
  https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/
  
  Reply
Saad September 14, 2018 at 7:19 pm #

Hi Jason,
First of all, thank you for this wonderful blog.
I am actually trying to use your LSTM however, I don’t see how I can do that given my data structure.
I currently have time series for 500 stock returns over 5 years on a monthly basis (60 months total) along with characteristics of these companies (50 features like market capitalization, book-to-market ratio etc…), I want to apply the LSTM to predict one month ahead for all the stocks. So my dependent variable is a 60×500 and features 60x500x50.
Do you think there is a best practice for doing that? Consider that my output is multivariate or univariate and do a loop over my stocks? I am still struggling to build my input data for RNN. For MLP and RF I just did a pooled data by training on 55×500 and testing on 5×500 without really worrying about time series and stocks but it didn’t give good results.
Thank you!

Reply
- Jason Brownlee September 15, 2018 at 6:05 am #
  
  Thanks.
  
  Should not be a problem, although I think predicting stocks is a waste of time (you can’t).
  
  Reply
Brandon September 18, 2018 at 1:14 am #

Thank you for the tutorial.

Question: When training a multi-lag timestep regression problem with LSTM model, does the model need to understand the sequential order of the input variables (e.g., t-3, t-2, t-1), or is it expected to be able to learn the sequence and apply the appropriate weights during the training process?

If the former, can you please explain where in the code this understanding occurs (e.g., when defining the 3D tensor)? I envision a LSTM model that looks back three previous periods (i.e., t-3) to have three separate LSTM cells that are performing the input, forget, and ouput gate calculations in each cell, but I want to make sure that my expectation lines up with what is actually going on in the Keras model.

Thanks in advance.

Reply
- Jason Brownlee September 18, 2018 at 6:18 am #
  
  The order of time steps in a given sample is the order that the model is shown prior observations.
  
  Reply
Mitch Oldroyd September 20, 2018 at 1:01 am #

Jason,

It would appear that UCI has changed access (and content) of their databases. Your link is broken (or rather meets with “you don’t have permission…”

I was able to locate the public facing database, but it doesn’t include the “Air Pollution” data set any more.

http://mlr.cs.umass.edu/ml/datasets.html

Good luck,

Mitch

Reply
- Jason Brownlee September 20, 2018 at 8:03 am #
  
  Thanks, I have updated the link to my mirror. Here is a direct link:
  https://raw.githubusercontent.com/jbrownlee/Datasets/master/pollution.csv
  
  Reply
Channing September 20, 2018 at 7:52 pm #

Very impressive. But for the certain scenario, I found the predict is just the pollution of last hour. For more generally speaking, in a “smooth” curve prediction scenario, use the value from last time step to predict current value is not a bad idea. 😛

Reply
- Jason Brownlee September 21, 2018 at 6:26 am #
  
  I would encourage you to try a CNN, I would expect it to perform much better on this dataset.
  
  Reply
tmartin September 22, 2018 at 2:13 am #

Hi thanks for this great post, it was very useful.

I was not sure of what you mean here : “Remember that the internal state of the LSTM in Keras is reset at the end of each batch, so an internal state that is a function of a number of days may be helpful (try testing this).”

Could you elaborate on that please ?

Reply
- Jason Brownlee September 22, 2018 at 6:31 am #
  
  I mean changing the model to stateful and controlling when the state is reset based on the properties of the problem may change the performance of the model.
  
  Reply
  - ceng September 24, 2018 at 1:17 am #
    
    Hi thanks for this great post, it was very useful. But，how can I get the real number and the predicted number？ There is no answer.
    
    Reply
    - Jason Brownlee September 24, 2018 at 6:12 am #
      
      Call model.predict() to make a prediction.
      
      Reply
Dazhi September 24, 2018 at 12:27 am #

Hi，Jason. First, I have to say you are a great master. But I don’t know how to predict, you just given the Trained model. How can I get the predicting number?

Reply
- Jason Brownlee September 24, 2018 at 6:12 am #
  
  Call model.predict()
  
  Here’s an example:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
  - Dazhi October 6, 2018 at 12:53 am #
    
    Hi,master. I am coming to trouble you…Look:
    
    # make a prediction
    yhat = model.predict(test_X)
    test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
    …
    
    You have given the predicted value. But, I have multivariate, how to do? Just like your air pollution forecasting, how to use it in real forecasting?
    
    Reply
    - Jason Brownlee October 6, 2018 at 5:47 am #
      
      Good question.
      
      You can use a seq2seq to output multiple time steps and the size of each time step can be the number of features (e.g. multivariate).
      
      Reply
  - Dazhi October 6, 2018 at 1:22 am #
    
    I am a new learner, and I am not smart. I know that the test set is used to evaluate the model. It is only useful when building the model, right?
    
    Well, like the example of air pollution forecasting you talked about earlier, you have showed how to training the model，but no predictions. Later you showed the article——How to Make Predictions with Long Short-Term Memory Models in Keras.
    
    However,I still don’t know how can I input the new data to make predictions? How to type the code? I am confused… How to implement it in the new script? Please help me, thank you very much!
    
    Reply
    - Jason Brownlee October 6, 2018 at 5:48 am #
      
      What is the problem exactly? Which part is confusing?
      
      Reply
      - Dazhi October 6, 2018 at 5:49 pm #
        
        Master, I don’t know how to write the code? The real prediction. multivariate time series forecasting in lstm?
      - Jason Brownlee October 7, 2018 at 7:25 am #
        
        Which part are you stuck on?
      - Dazhi October 6, 2018 at 5:56 pm #
        
        Hi,Master. How to make predictions about “air pollution”(you trained by LSTM) in new data? Can you show me the code to understand?
      - Jason Brownlee October 7, 2018 at 7:25 am #
        
        This tutorial shows you how to make a prediction with an LSTM:
        https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
Xiang September 24, 2018 at 1:38 am #

Hi，master. Can you tell me how to do with the validation set in this example to set up a reliable neural network model?

Reply
- Jason Brownlee September 24, 2018 at 6:13 am #
  
  It is challenging to use a validation set for time series. I need to think about.
  
  Reply
  - Xiang October 9, 2018 at 12:07 am #
    
    Hi，Jason. I am troubled lately. Cause I have some problems about how to define the networks well. The parameters are quite uneasy to define. Can you show me some guidance?
    
    The code showed above:
    #define model
    model = Sequential()
    model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
    model.add(Dense(1))
    model.compile(loss=’mae’, optimizer=’adam’)
    # fit model
    history = model.fit(train_X, train_y, epochs=50, batch_size=360, validation_data=(test_X, test_y), verbose=2,shuffle=False)
    
    Reply
    - Jason Brownlee October 9, 2018 at 8:44 am #
      
      It is a common question that I answer here:
      https://machinelearningmastery.com/faq/single-faq/how-do-i-prepare-my-data-for-an-lstm
      
      Does that help?
      
      Reply
Marco September 25, 2018 at 7:18 am #

Hi Jason, from the plot of the meteorological data i can notice that temperature, pressure and dew show a seasonality. Is it necessary to remove this seasonality in this case or not? And why?

Reply
- Jason Brownlee September 25, 2018 at 2:44 pm #
  
  I have seen cases where CNNs and even LSTMs can handle trend and seasonality directly. It can simply the problem by first differencing the data.
  
  Reply
Felipe Gerolomo October 9, 2018 at 7:41 am #

Hi, i have a doubt, how your fuction know that polluition is your output variable? How do you specify that polluition is your output variable?

Reply
- Jason Brownlee October 9, 2018 at 8:48 am #
  
  The model.fit() function requires that we specify inputs (X) and outputs (y).
  
  Reply
Le Van Duc October 9, 2018 at 1:48 pm #

Dear Dr. Jason,
Thank you very much for your great tutorial. I tried your code with changed training set to 4 years and validation set is 1 year. The code still run very fast with a little better RMSE of 25.418.
Can I ask one question that with multivariate time series LSTM, each time series in LSTM model is trained and predicted independently ? Or they have some dependent in the trained weights ? Could you clear me about that or point me some references ?

Thank you very much !

Reply
- Jason Brownlee October 9, 2018 at 3:06 pm #
  
  Depending on your problem you can train the model once and use it to make predictions going forward.
  
  With enough resources, it might be better to re-fit the model as new data is made available.
  
  Reply
  - Le Van Duc October 9, 2018 at 5:37 pm #
    
    Thank you for your quick reply !
    I am considering to apply multivariate LSTM to a spatial-temporal air pollution data set (monitoring data in multiple locations of a city and in time series) to predict new value at multiple locations at some time ahead. Could you please have any suggestions in this ? Is this problem more fit to a CNN + LSTM model ?
    
    Thank you very much for your excellent blogs and your kind helping !
    
    Reply
    - Jason Brownlee October 10, 2018 at 6:03 am #
      
      I would recommend testing a suite of methods in order to discover what works best for your specific dataset.
      
      For spatio-temporal data, a CNN-LSTM and ConvLSTM would be two great models to start with.
      
      Reply
Chris October 10, 2018 at 8:53 am #

Hi Jason,
your tutorial is very helpful.
But I have a problem with the LSTM by training the model with data from the previous time steps and also data of the current time step t (all variables but pollution) to predict the current time step t of the pollution. If I try to do this, I don’t know what kind of shape to give to the LSTM. Of course I always get an error because there is missing the one column of pollution data. Do you have an idea how to fit a model with input t-1(all parameter), t(all but pollution)?

Reply
- Jason Brownlee October 10, 2018 at 2:58 pm #
  
  I have general advice on how to prepare data for LSTMs here that might help:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-prepare-my-data-for-an-lstm
  
  Reply
  - Chris October 11, 2018 at 12:39 am #
    
    Thanks for your quick response, Jason.
    It appears that this problem has not yet been addressed. The LSTM wants the input as [sample, timestep, feature]. But in my case (Input: t-1 of all features, t of all features without pollution; Output: t pollution) it is not possible to reshape the data into the dimensions [sample, timestep, feature] because all samples of timestep t from feature pollution are excluded from the input. I cannot find any way to reshape the data for this prediction problem. Thanks for your help.
    
    Reply
    - Jason Brownlee October 11, 2018 at 7:57 am #
      
      There are many ways to solve this problem. Perhaps the simplest would be to pad the missing pollution from the t time step with zero and perhaps make use of a masking input layer.
      
      Reply
Wang October 10, 2018 at 9:47 pm #

Hi, Jason. In this example，which LSTM type you have used？

Reply
- Jason Brownlee October 11, 2018 at 7:55 am #
  
  A vanilla LSTM.
  
  Reply
Tao.J October 12, 2018 at 1:28 am #

Hi, Jason. Thank you for your post !
I have a question that whether the date and time info are used in the LSTM model?
I can’t find where we input the index to the model.
Some data may have time periodicity and maybe it’s better to input the time info into the mode?

Reply
- Jason Brownlee October 12, 2018 at 6:41 am #
  
  No, just the sequence of observations.
  
  You can make the series stationary prior to using the LSTM and likely achieve better performance.
  
  Reply
Z October 14, 2018 at 3:34 am #

Hi Jason, really enlightening tutorial! Thx. I think I found a small problem.
In the one-timestep prediction example you show, I found yhat is not at the same pace as test_y. You see the first four values of yhat are 0.035, 0.032, 0.021, 0.020 while those for from test_y are 0.031, 0.020, 0.019 and 0.018. So it seems that the second to the fourth values in yhat are about the same as the first to the third values in test_y. It seems like the prediction yhat is always one timestep later than it should be. Weird. So if I add the two lines
inv_y = inv_y[:-1]
inv_yhat = inv_yhat[1:]
before calculating RMSE and change nothing else, actually I can get RMSE = 4.234. But if I don’t add those two lines and use your codes literally, I can get RMSE = 26.370 which is similar to yours.

Reply
- Jason Brownlee October 14, 2018 at 6:06 am #
  
  This is called a persistence model and a poor neural net will converge to something like persistence as a worst case.
  
  Indeed, LSTMs often perform poorly for time series forecasting. Instead, I recommend always testing against linear methods (SARIMA/ETS) and compare results to an MLP, CNN and hybrids.
  
  Reply
  - Z October 26, 2018 at 1:47 pm #
    
    Thank you for your reply!
    
    Reply
Bob October 15, 2018 at 2:11 am #

Hi,master Jason. Can I use the wavelet decomposition and reconstruction with LSTM model to make prediction in this sample? If yes, and how can I do it?

Reply
- Jason Brownlee October 15, 2018 at 7:32 am #
  
  Perhaps. Sorry, I don’t have an example.
  
  Reply
Bob October 15, 2018 at 7:04 pm #

Another question， why is there no learning rate?

Reply
- Jason Brownlee October 16, 2018 at 6:35 am #
  
  We use Adam, that adapts the learning rate, more here:
  https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/
  
  Reply
Bowen October 17, 2018 at 6:36 am #

Hi Jason, nice work going on right here! I was wondering if you can train lstm with multiple time series data? e.g. using your example, maybe use pollutions data on all cities (Beijing, New York etc. ) and then try to predict the pollutions trends on general earth. I would love to see a tutorial on that. Thanks for everything you do here! Respect!

Reply
- Jason Brownlee October 17, 2018 at 6:56 am #
  
  Yes, I call this multi-site forecasting and I have an example here:
  https://machinelearningmastery.com/how-to-develop-baseline-forecasts-for-multi-site-multivariate-air-pollution-time-series-forecasting/
  
  Reply
Charline October 17, 2018 at 11:40 pm #

Hi Jason,

I read your very good article https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/ . I am actually working on multivariate time series forecasting with LSTM.

I would like to predict total daily demand order Y for the next day based on Y and on the predicted attributes X over the last 10 days AND given the expected X for the next day. So I have Input: t-10 of all features, …, t-1 of all features, t of all features without Y and Output: t of Y.

In my first attempt, I have passed to the model all features X (from t-10 to t) and historical Y (from t-10 to t-1) in order to predict Y(t). However, I have seen that it is not possible to reshape the data into the dimensions [sample, timestep, feature] because all samples of timestep t from feature Y are excluded from the input.

Someone had the same problem than mine and you’d said « There are many ways to solve this problem. Perhaps the simplest would be to pad the missing pollution from the t time step with zero and perhaps make use of a masking input layer. »
I tried to do what you’d proposed for a week. In particular, I have taken Y(t) in my training set and set it to be equal to -1 (and in a second attempt to zero too). Then I applied the Masking function to the model for all -1 values during the training phase. However, the testing results were definitively wrong : to be clear, when I have set Y(t)= -1, the results of the model.predict were negative. So I guess I need to change something after I have trained the model, in order not to mess the testing predictions up.

I have tried to find an answer in these pages :
https://machinelearningmastery.com/handle-missing-timesteps-sequence-prediction-problems-python/
https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
https://machinelearningmastery.com/use-timesteps-lstm-networks-time-series-forecasting/
But I didn’t find any help.

To be more complete, you can find here my code :

# normalize features
scaler = preprocessing.MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)

n_days = 10
reframed = series_to_supervised(scaled, n_days, 1)
target_index = reframed.columns.get_loc(“var1(t)”)

# split into train and test sets
values = reframed.values
n_train_days = 30

train = values[:n_train_days, :]
test = values[n_train_days:, :]

# split into input and outputs
n_features = 13
n_obs = n_days * n_features
train_X, train_y = train, train[:, -n_features]
test_X, test_y = test, test[:, -n_features]

train_X[:,target_index]= -1.
test_X[:,target_index]= -1.

# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], n_days+1, n_features))
test_X = test_X.reshape((test_X.shape[0], n_days+1, n_features))

# design network
model = Sequential()
model.add(Masking(mask_value=-1., input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(LSTM(100))
model.add(Dense(1))

# MSE loss function and efficient SGD version of stochastic gradient descent
model.compile(loss=’mse’, optimizer=’sgd’)

# fit network
history = model.fit(train_X, train_y, epochs=100, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)

# make a prediction : return in 2D
yhat = model.predict(test_X)

Thanks a lot for your time, I really hope you can help me.

Best regards.

Reply
- Jason Brownlee October 18, 2018 at 6:33 am #
  
  If I understand correctly, you want to model a forecast problem by having multivariate input including the series that will be predicted, then make a univariate prediction.
  
  I have an example of exactly this here, in the section titled “Encoder-Decoder LSTM Model With Multivariate Input”
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/
  
  Does that help?
  
  Reply
  - Charline October 18, 2018 at 7:51 pm #
    
    Hi Jason,  
    
    thank you very much for your quick answer. However I think I didn’t explain you very well my problem.   
    
    I guess the part that made you misunderstanding is this :
    
    «I would like to predict total daily demand order Y for the next day based on Y and on the predicted attributes X over the last 10 days AND given the expected X for the next day. So I have Input: t-10 of all features, …, t-1 of all features, t of all features without Y and Output: t of Y. »  
    
    When I say «given the expected X» I don’t mean that I need to predict X : X represents a projected value that has already been given to us. So, let me reformulate it in a better form :
    
     I would like to predict Y(t) based on Y(t-1),…,Y(t-n) AND X(t),X(t-1),…,X(t-n).
    
    Hope this helps 🙂
    
    Thanks again for your time and help.
    
    Reply
    - Jason Brownlee October 19, 2018 at 6:03 am #
      
      You want to predict the next y given past values of x and y.
      
      Sure, test a suite of models and see what works best:
      https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
      
      The example I linked to showed exactly this.
      
      Reply
      - Charline October 19, 2018 at 6:20 pm #
        
        Hi Jason,
        
        “You want to predict the next y given past values of x and y.” : no, I don’t. I want to predict the next y given :
        
        – past values of y,
        – the next value of x (supposing we have it), and
        – past values of x.
        
        Hope this looks clear now 🙂
        
        Thanks again !
      - Jason Brownlee October 20, 2018 at 5:53 am #
        
        I see, thanks for being clear.
        
        I believe you can adapt the example to achieve this. I cannot write the code for you, but what problem are you having in adapting the example exactly?
      - Charline October 22, 2018 at 6:21 pm #
        
        Hi Jason,
        
        I tried to adapt your example but I have seen that it is not possible to reshape the data into the dimensions [sample, timestep, feature] because Y(t) is excluded from the input, whereas X(t) is included. 
        
        In a previous discussion, someone had the same problem than mine and you proposed « There are many ways to solve this problem. Perhaps the simplest would be to pad the missing pollution from the t time step with zero and perhaps make use of a masking input layer. »
        
         I tried to do what you proposed for a week. In particular, I have taken Y(t) in my training set and set it to be equal to -1 (and in a second attempt to zero too). Then I applied the Masking function to the model for all -1 values during the training phase. However, the testing results were definitively wrong : to be clear, when I have set Y(t)= -1, the results of the model.predict were negative. So I guess I need to change something after I have trained the model, in order not to mess the testing predictions up.
        
        Thanks for your help : really hope to find a solution on that 🙂
      - Jason Brownlee October 23, 2018 at 6:23 am #
        
        I have an example of using a Masking layer here:
        https://machinelearningmastery.com/handle-missing-timesteps-sequence-prediction-problems-python/
        
        Perhaps this tutorial will help you understand what we’re trying to achieve by reshaping the data:
        https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
        
        Perhaps one of these other tutorials will help:
        https://machinelearningmastery.com/start-here/#deep_learning_time_series
Gabriel Mouzella Silva October 20, 2018 at 2:02 am #

Hi jason,

I’m facing a problem with a multivariate time series analisys. I was looking into my results, and it seems that the values are only replicating the curve value, but delyed, so when i try to put it online it doesn’t really predict. Could you please help me. Thanks

Reply
- Jason Brownlee October 20, 2018 at 5:57 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/why-is-my-forecasted-time-series-right-behind-the-actual-time-series
  
  Reply
  - Gabriel Mouzella Silva October 22, 2018 at 7:41 am #
    
    Hi i’m not sure about my neural network being a persistence model or not, there is any way to measure it so i can be sure?
    
    Reply
    - Jason Brownlee October 22, 2018 at 2:31 pm #
      
      One approach would be to develop a persistence model, evaluate the performance of it and only use a neural net if it can out perform the persistence model.
      
      I explain more here:
      https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
      
      Reply
Bob October 22, 2018 at 11:19 pm #

Hi, Jason. I have trouble, can you help me?

—————————————————————————
IndexError Traceback (most recent call last)
in ()
4 # integer coding
5 encoder = LabelEncoder()
—-> 6 values[:,4] = encoder.fit_transform(values[:,4])
7 # ensure all data is float
8 values = values.astype(‘float32’)

IndexError: index 4 is out of bounds for axis 1 with size 2

Reply
- Jason Brownlee October 23, 2018 at 6:26 am #
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Bob October 23, 2018 at 12:03 am #

What‘s the LabelEncoder() used for？
values[:,4] = encoder.fit_transform(values[:,4])
why is 4 ?

Reply
- Jason Brownlee October 23, 2018 at 6:26 am #
  
  A label encoder converts string labels to integers/numbers.
  
  4 in this case refers to column with the index 4. You can learn more about array indexing in Python here:
  https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/
  
  Reply
Bob October 24, 2018 at 12:01 am #

Hi，Jason. I still can’t understand “lag timesteps=5 and 5 timesteps ahead”，What are the meaning of them and what are the differences between them?

Reply
- Jason Brownlee October 24, 2018 at 6:29 am #
  
  It has to do with the inputs and outputs of the model.
  
  Perhaps this will help:
  https://machinelearningmastery.com/time-series-forecasting-supervised-learning/
  
  Reply
Burak Küçükaslan October 24, 2018 at 12:17 am #

Hi Jason,

I want to implement your “Multivariate Time Series Forecasting LSTM Keras” model for forecasting spot electricity price.

So for this purpose ı collected the data which i ll use ın the forecasting model.

At begining my first aim is just can running the code smoothly.

So with this purpose ı used the limited inputs data which are wind plant electricity production data and electricity consumption data.

But i couldnt success to run to code smoothly , every my attempt ı ve gotten error.

If you dont mind can you help me for implementing my inputs data to your forecasting model and modifying your model code parameters?

I uploaded the my data file at the link.

https://drive.google.com/file/d/1q0fSAPPVNDDr23o2Z_FgloI2ucmk0EWj/view?usp=sharing

Reply
- Jason Brownlee October 24, 2018 at 6:30 am #
  
  Sorry, I don’t have the capacity to work on your project.
  
  My tutorials and my book will teach you how to work through this type of problem by yourself. Perhaps start here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Bob October 27, 2018 at 12:43 am #

Hi,Jason. In this example，I want to know how can I reduce the training time.

Reply
- Jason Brownlee October 27, 2018 at 6:02 am #
  
  Some ideas:
  
  – smaller network
  – faster hardware
  – fewer training examples
  – fewer training epochs.
  – larger batch size
  …
  
  Reply
  - Bob October 27, 2018 at 5:24 pm #
    
    Yeah，what you said is quiet useful！But, I don’t want to sacrifice my model training effect to improve operational efficiency. So， how can I do it？
    
    Reply
    - Jason Brownlee October 28, 2018 at 6:08 am #
      
      I don’t understand, sorry.
      
      Reply
Bob October 27, 2018 at 11:10 pm #

rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
mae=mean_absolute_error(inv_y,inv_yhat)

Why the next line is wrong in my model? The interpreter is anaconda. Can you help me?

Reply
- Jason Brownlee October 28, 2018 at 6:11 am #
  
  Perhaps post your code and problem to stackoverflow so that they can debug it for you?
  
  Reply
Bob October 29, 2018 at 9:52 pm #

Hi,doctor Jason. Today，I did a small test. I found that in this example, if I drop out any other features,but only left the “pollution” feature in your model. The test RMSE and the curve of the predicted pollution is the same to yours, why? I can’t figure it out.

Reply
- Jason Brownlee October 30, 2018 at 6:01 am #
  
  Nice work. Perhaps the additional features are not required.
  
  Reply
  - Bob October 30, 2018 at 11:16 pm #
    
    It seems that the model is no use as the multivariate prediction in your example…As I see，we should not build the model with the output feature as the input. Var1 is the pollution in your model, it can’t be used as the input values, we put the other features as the input , and the pollution as the output to make predictions, that’s all.
    
    Reply
MAK October 31, 2018 at 8:55 am #

Hello Jason!
Wonderful blog,
I have a question :
If I want to predict not only the pollution but also other attributes like dew, temp, press (or all other attributes) which changes I need to do in the model (and your code) for allowing multivariable forecasting?
In addition, it will damage in the model accuracy, in the matter of changing the hyper parameters (like num of epocs etch’) ?
Thanks,
Mak

Reply
- Jason Brownlee October 31, 2018 at 2:54 pm #
  
  Good question.
  
  This requires that you change the data samples to have n variables as input with m time steps, then the target would become a vector of n variables and probably 1 time step.
  
  The model would require n nodes in the output layer.
  
  You can then measure MSE or RMSE for all variables together or for each variable separately.
  
  Compare results to a separate linear model for each variable.
  
  Reply
  - MAK October 31, 2018 at 8:35 pm #
    
    Hii Jaso,
    
    So the change need to be like :
    ***************************************************
    n_obs = n_hours * n_features
    n_predict_features=2
    train_X, train_y = train[:, :n_obs], train[:, -(n_features-n_predict_features)]
    model = Sequential()
    model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
    model.add(Dense(n_predict_features)) // need to change this line
    model.compile(loss=’mae’, optimizer=’adam’)
    **********************************************************
    This is the only change I need to do ? , or I miss something?
    In addition , are you think I need to increase the number of epocs or any other hyper paramter )?
    
    Thanks,
    
    Reply
    - Jason Brownlee November 1, 2018 at 6:06 am #
      
      Yes, change the framing of the problem and change the model.
      
      Reply
Roh November 1, 2018 at 3:45 am #

Hi Jason,

Thanks for the great article!

I just started working with multivariate time series. I understood the concept of stationary in univariate series. How do we perform it for multivariate? do we have to stationarize each input feature individually along with the output?

Thank you!

Reply
- Jason Brownlee November 1, 2018 at 6:22 am #
  
  Yes, you could try modeling the raw data and then compare results when modeling with a stationary version of each series.
  
  Reply
Jessie November 1, 2018 at 1:13 pm #

I have ValueError: operands could not be broadcast together with shapes (592095,209) (21,) (592095,209) but i have any idea to ko this problem.i hope that someone can help me.thx

yhat = model.predict(test_X)

test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
# invert scaling for forecast
inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)

inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat.head()
inv_yhat = inv_yhat[:,0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]

rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
print(‘Test RMSE: %.3f’ % rmse)

Reply
- Jason Brownlee November 1, 2018 at 2:34 pm #
  
  Are you able to confirm that your version of Keras, TensorFlow and Python are up to date?
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Jessie November 1, 2018 at 4:01 pm #
    
    tensorflow:1.11.0
    keras:2.2.4
    python:3.6.6
    
    Reply
    - Jason Brownlee November 2, 2018 at 5:45 am #
      
      Nice work!
      
      Reply
      - Jessie November 2, 2018 at 10:50 pm #
        
        //aqhi(column0)
        dataset = pd.read_csv(data.csv’, header=0, index_col=0)
        
        locat = list(dataset.locationCode.unique())
        for i in locat:
        df=dataset.loc[dataset.locationCode == i,:].drop(columns=[‘locationCode’])
        values = df.values
        # ensure all data is float
        values = values.astype(‘float32′)
        # normalize features
        scaler = MinMaxScaler(feature_range=(0, 1))
        scaled = scaler.fit_transform(values)
        # frame as supervised learning
        ”’
        plt.plot(range(dataset.shape[0]),(dataset[‘aqhi’]))
        plt.xticks(range(0,dataset.shape[0],250),dataset[‘dateTime’].loc[::250],rotation=45)
        plt.xlabel(‘Date’,fontsize=20)
        plt.ylabel(‘AQHI’,fontsize=20)
        plt.show()
        ”’
        # convert series to supervised learning
        def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
        # past obserations(n_in-1,n_out-1) are used to make forecasting
        #data: Sequence of observations as a list or NumPy array.
        #n_in: Num of lag observations as independent(X). => VALUE(1- len(data))
        #n_out: Num of observations as dependent(Y). => VALUE(0- len(date)-1)
        #dropnan: Boolean whether or not to drop rows with NaN values.
        #Returns: Series framed for supervised learning.
        
        n_vars = 1 if type(data) is list else data.shape[1]
        df = DataFrame(data)
        cols = list()
        names = list()
        # input sequence (t-n, … t-1)
        for i in range(n_in, 0, -1):
        cols.append(df.shift(i))
        # shift function also works on so-called multivariate time series problems
        names += [(‘var%d(t-%d)’ % (j+1, i)) for j in range(n_vars)]
        # [var1(t-1)….var11(t-1)]
        
        # forecast sequence (t, t+1, … t+n)
        for i in range(0, n_out):
        cols.append(df.shift(-i))
        # append value to list
        if i == 0:
        names += [(‘var%d(t)’ % (j+1)) for j in range(n_vars)]
        else:
        names += [(‘var%d(t+%d)’ % (j+1, i)) for j in range(n_vars)]
        # put it all together
        agg = concat(cols, axis=1)
        agg.columns = names
        # drop rows with NaN values
        if dropnan:
        agg.dropna(inplace=True)
        return (agg)
        
        reframed = series_to_supervised(scaled, 1, 9)
        print(reframed.head())
        
        ### fit an LSTM on the multivariate input data(split dataset into train and test data sets)
        # split into train and test sets
        values = reframed.values
        hours = 365*24*2
        train = values[:hours, :]
        test = values[hours:, :]
        
        # split into input and outputs
        train_X, train_y = train[:, :-1], train[:, -1]
        test_X, test_y = test[:, :-1], test[:, -1]
        # reshape input to be 3D [samples, timesteps, features]
        train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
        test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
        print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
        
        # design network
        model = Sequential()
        model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
        model.add(Dense(1))
        model.compile(loss=’mae’, optimizer=’adam’)
        # fit network
        history = model.fit(train_X, train_y, epochs=50, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)
        # plot history
        pyplot.plot(history.history[‘loss’], label=’train’)
        pyplot.plot(history.history[‘val_loss’], label=’test’)
        pyplot.legend()
        pyplot.show()
        
        # make a prediction
        yhat = model.predict(test_X)
        test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
        # invert scaling for forecast
        inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)
        inv_yhat = scaler.inverse_transform(inv_yhat)
        inv_yhat = inv_yhat[:,0]
        # invert scaling for actual
        test_y = test_y.reshape((len(test_y), 1))
        inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)
        inv_y = scaler.inverse_transform(inv_y)
        inv_y = inv_y[:,0]
        # calculate RMSE
        rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
        print(‘Test RMSE: %.3f’ % rmse)
      - Jason Brownlee November 3, 2018 at 7:06 am #
        
        Sorry, I don’t have the capacity to debug your code, I have some suggestions here:
        https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  - Jessie November 2, 2018 at 1:42 am #
    
    but i have not any idea why i have error ? Although I have searched releated keyword in StackOverflow
    
    Reply
Juan B. November 2, 2018 at 4:49 am #

Hi Jason,

I am new at ML and I apprecciate your posts. Actually I have a multi input forecasting problem I use your code and it works well to predict values that I already have. Data is between 2004 and 2017 (all inputs), I just want 1 output, however, the code predict for example, the last 10 observations from 2017, but i want to predict the first step from 2018.
The code works for it? How i can use it? I understand that it is a request for a non supervised problem.

Thanks

Reply
- Jason Brownlee November 2, 2018 at 5:59 am #
  
  Fit the model on all available data, then make a prediction for the new data. More here:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-make-predictions
  
  What problem are you having exactly?
  
  Reply
  - Juan B. November 7, 2018 at 3:38 am #
    
    Hi Jason , thanks for the fast answer.
    
    The problem is that I need to predict values for the future, for the next time stamp.
    
    I understand that with your code I can predict values that already exist in my available data, but could not predict future values. So the result, is a prediction of existing values calculated using the RMSE, I am rigth?
    
    Reply
    - Jason Brownlee November 7, 2018 at 6:11 am #
      
      Call “model.predict()” to make a prediction beyond the dataset.
      
      I explain this here:
      https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
      
      Reply
      - Juan B. November 7, 2018 at 6:37 am #
        
        Jason but, you did it in your code not?
        
        Line 82
        
        # make a prediction
        yhat = model.predict(test_X)
        
        So, the predicted result “yhat” is that i need?
      - Jason Brownlee November 7, 2018 at 2:42 pm #
        
        Yes.
jim n November 6, 2018 at 4:39 pm #

using the latest (pip3 install tensorflow-gpu) as of this date and tweaking the imports to us tf.keras, model.fit() throws

AttributeError: ‘Tensor’ object has no attribute ‘assign’

the values being passed in are ndarray

this is my first keras endeavor, I’m afraid all the bug reports and patch requests about this assertion exceed my grasp of how to remedy the situation.

Reply
- Jason Brownlee November 7, 2018 at 5:59 am #
  
  I developed the code with the standalone Keras library, not tf.keras.
  
  sudo pip install keras
  
  Reply
jessie November 6, 2018 at 5:58 pm #

after using Multivariate Time Series Forecasting with LSTMs to predict, how to get a prediction of the date time and its value?

Reply
- Jason Brownlee November 7, 2018 at 6:00 am #
  
  I show how to make a prediction here:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
Thomas Lass November 6, 2018 at 11:52 pm #

Pls, can I get the Matlab codes for Multivariate Time Series Forecasting with LSTM? this is my email thomaslass2002@gmail.com

Reply
- Jason Brownlee November 7, 2018 at 6:06 am #
  
  I don’t have any matlab code tutorials sorry.
  
  I explain why here:
  https://machinelearningmastery.com/faq/single-faq/do-you-have-tutorials-in-octave-or-matlab
  
  Reply

Carolyn November 9, 2018 at 6:06 am #

Hi Jason,

Excellent tutorial! I’ve noticed folks asking for how to code a similar model but for multiple outputs. I’ve taken a stab at it below, modifying your multiple lags code.

Changes have the (subtle!) comment CHANGES HERE.

This model predicts the variable ‘pollution’ and the variable ‘dew’.

Problem: I have one RMSE score for each output variable. Is that right? I think not. What should I do instead?

The code:

#load dataset
dataset = read_csv('pollution.csv', header=0, index_col=0)
values = dataset.values

#integer encode wind direction, as it's the only categorical variable.
encoder = LabelEncoder()
values[:,4] = encoder.fit_transform(values[:,4])

#ensure all data are float32 values
values = values.astype('float32')

#normalize input features
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)

#frame as supervised learning
n_hours = 3 
n_features = 8 
reframed = series_to_supervised(scaled, n_hours, 1)
values = reframed.values
n_train_hours = 365*24
train = values[:n_train_hours, :]
test = values[n_train_hours:, :]

#CHANGES HERE
#split into input and outputs
n_obs = n_hours * n_features
train_X = train[:, :n_obs]
train_y = train[:, -n_features:(-n_features+2)] #+2 because of indexing madness.
test_X = test[:, :n_obs]
test_y = test[:, -n_features:(-n_features+2)]

train_X = train_X.reshape((train_X.shape[0], n_hours, n_features))
test_X = test_X.reshape((test_X.shape[0], n_hours, n_features))

#CHANGES HERE
#Need to output two values, not one.
#design network
model = Sequential()
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(2)) #changed from 1 to 2.
model.compile(loss='mae', optimizer='adam')

#fit network
history = model.fit(train_X, train_y, epochs=4, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)

#make a prediction
y_hat = model.predict(test_X)

#CHANGES HERE
test_X = test_X.reshape((test_X.shape[0], n_hours*n_features))
inv_yhat = concatenate((y_hat, test_X[:,-6:]), axis=1) #changed 7 to 6
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0:2] #changed from 0 to 0:2. Should be first 2 columns that contain the predictions

#CHANGES HERE
#invert scaling for actual
test_y = test_y.reshape((len(test_y),2)) #changed 1 to 2
inv_y = concatenate((test_y, test_X[:,-6:]), axis=1) #changed 7 to 6
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0:2] #changed from 0 to 0:2. Should be first 2 columns that contain the predictions.

#CHANGES HERE
#calculate RMSE - CHANGED to output RMSE for each variable.
rmse_1 = sqrt(mean_squared_error(inv_y[:,0], inv_yhat[:,0])) #RMSE for the first variable (pollution)
rmse_2 = sqrt(mean_squared_error(inv_y[:,1], inv_yhat[:,0])) #RMSE for the second variable (dew)
print('Test RMSE: ', rmse_1, rmse_2)

#load dataset

dataset = read_csv('pollution.csv', header=0, index_col=0)

values = dataset.values

#integer encode wind direction, as it's the only categorical variable.

encoder = LabelEncoder()

values[:,4] = encoder.fit_transform(values[:,4])

#ensure all data are float32 values

values = values.astype('float32')

#normalize input features

scaler = MinMaxScaler(feature_range=(0, 1))

scaled = scaler.fit_transform(values)

#frame as supervised learning

n_hours = 3

n_features = 8

reframed = series_to_supervised(scaled, n_hours, 1)

values = reframed.values

n_train_hours = 365*24

train = values[:n_train_hours, :]

test = values[n_train_hours:, :]

#CHANGES HERE

#split into input and outputs

n_obs = n_hours * n_features

train_X = train[:, :n_obs]

train_y = train[:, -n_features:(-n_features+2)] #+2 because of indexing madness.

test_X = test[:, :n_obs]

test_y = test[:, -n_features:(-n_features+2)]

train_X = train_X.reshape((train_X.shape[0], n_hours, n_features))

test_X = test_X.reshape((test_X.shape[0], n_hours, n_features))

#CHANGES HERE

#Need to output two values, not one.

#design network

model = Sequential()

model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))

model.add(Dense(2)) #changed from 1 to 2.

model.compile(loss='mae', optimizer='adam')

#fit network

history = model.fit(train_X, train_y, epochs=4, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)

#make a prediction

y_hat = model.predict(test_X)

#CHANGES HERE

test_X = test_X.reshape((test_X.shape[0], n_hours*n_features))

inv_yhat = concatenate((y_hat, test_X[:,-6:]), axis=1) #changed 7 to 6

inv_yhat = scaler.inverse_transform(inv_yhat)

inv_yhat = inv_yhat[:,0:2] #changed from 0 to 0:2. Should be first 2 columns that contain the predictions

#CHANGES HERE

#invert scaling for actual

test_y = test_y.reshape((len(test_y),2)) #changed 1 to 2

inv_y = concatenate((test_y, test_X[:,-6:]), axis=1) #changed 7 to 6

inv_y = scaler.inverse_transform(inv_y)

inv_y = inv_y[:,0:2] #changed from 0 to 0:2. Should be first 2 columns that contain the predictions.

#CHANGES HERE

#calculate RMSE - CHANGED to output RMSE for each variable.

rmse_1 = sqrt(mean_squared_error(inv_y[:,0], inv_yhat[:,0])) #RMSE for the first variable (pollution)

rmse_2 = sqrt(mean_squared_error(inv_y[:,1], inv_yhat[:,0])) #RMSE for the second variable (dew)

print('Test RMSE: ', rmse_1, rmse_2)

Jason Brownlee November 9, 2018 at 1:57 pm #

Thanks for sharing.

Reply
- Carolyn November 10, 2018 at 2:01 am #
  
  Hi Jason,
  
  Thanks for the reply. The problem in the code is that there is one RMSE score for each output variable. Is that right? If not, what should I do instead?
  
  Best regards,
  Carolyn
  
  Reply
  - Jason Brownlee November 10, 2018 at 6:09 am #
    
    Yes, you can report RMSE for each lead time or combine RMSE into a single score, or both.
    
    Reply

vedant wankhede November 10, 2018 at 6:42 am #

Hello Sir,
Thank you for this Great tutorial !
I kindly request you to offer me some tips for my project.
I have hourly data for weather parameters and solar irradiation.
I am willing to predict the solar irradiance from those weather parameters (wind velocity, air temperature, relative humidity).
can you kindly tell me that is this multivariate LSTM model will be suitable for my purpose or should i go for another one ?
i have already applied the statistical approach by using algorithms like random forest, decision trees and multivariate linear regression. However i want to use neural networks for the same, as my data is highly nonlinear and time dependent.
your answer will be greatly helpful. thank you

Reply
- Jason Brownlee November 11, 2018 at 5:53 am #
  
  I recommend this process:
  https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
  
  Reply
Bob November 12, 2018 at 3:30 pm #

train_X.shape[1], train_X.shape[2]
I know the “train_X.shape[0]” means the rows, “train_X.shape[1]” means the columns.
But what does “the train_X.shape[2]” mean?

Reply
- Jason Brownlee November 13, 2018 at 5:41 am #
  
  It would refer to the third dimension of the array.
  
  Reply
Bob November 12, 2018 at 4:23 pm #

Hi, doctor Jason. I have another question:
If I use the BPNN instead of the LSTM,
In my model, it has 3 input-timesteps and 1 timestep,9 features.
I did it like this:
# design network
model = Sequential()
model.add(Dense(100,input_dim=27,kernel_initializer=”uniform”)) # input_dim=27,
model.add(Activation(‘sigmoid’))
model.add(Dropout(0.01))
model.add(Dense(100,input_dim=27,kernel_initializer=”uniform”)) # input_dim=27,
model.add(Activation(‘sigmoid’))
model.add(Dropout(0.01))
model.add(Dense(100,input_dim=27,kernel_initializer=”uniform”)) # input_dim=27,
model.add(Activation(‘sigmoid’))
model.add(Dropout(0.01))
model.compile(loss=’mae’, optimizer=’adam’)

But the point is that, what should i do?
ValueError: Error when checking input: expected dense_5_input to have 2 dimensions, but got array with shape (18041, 3, 9)

Reply
- Jason Brownlee November 13, 2018 at 5:42 am #
  
  It suggests that the expectation of your model and the shape of your data differ.
  
  You could change your model or change your data.
  
  Reply
Richard November 13, 2018 at 12:49 am #

Hello, Jason.
Now，I found that I want to do it with multivariate-time-series-forecasting with BP neural network。 In this example， how can I do faster？

Reply
- Jason Brownlee November 13, 2018 at 5:46 am #
  
  What do you mean by faster?
  
  Reply
  - Richard November 13, 2018 at 4:12 pm #
    
    I mean， can I use the Keras to build a BPNN model？
    
    Reply
    - Jason Brownlee November 14, 2018 at 7:26 am #
      
      By BPNN, do you mean MLP? If so, sure. Start here:
      https://machinelearningmastery.com/how-to-develop-multilayer-perceptron-models-for-time-series-forecasting/
      
      Reply
      - Richard November 15, 2018 at 12:01 am #
        
        I mean, BP neural networks.
      - Jason Brownlee November 15, 2018 at 5:33 am #
        
        Most neural networks have their weights updated using BP == backpropagation, including CNNs, LSTMs and MLPs. It does not comment on the structure/type of the network, only how the weights are updated during training.
Mike Yang November 14, 2018 at 1:27 pm #

Hello Dr Jason,
Thank you for your Great tutorial

Actually, I have a small question.

In the one-timestep prediction example you show, I found yhat is not at the same pace as test_y.

When I plot the last 100 samples as you do.

pyplot.plot(inv_yhat[-100:])
pyplot.plot(inv_y[-100:])
pyplot.show()

It seems like the prediction yhat is always one timestep later than it should be.
if I add the two lines
inv_y = inv_y[:-1]
inv_yhat = inv_yhat[1:]
before calculating RMSE and change nothing else, the RMSE is much smaller. and the yhat is perfectly at the same pace as test_y.

What’s more, this problem also happened in your other examples such as this one
https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/

and this one
https://machinelearningmastery.com/time-series-forecasting-long-short-term-memory-network-python/

Can you explain why there is a one day delay in the result?
Why is it just exactly one day delay in every examples?

Reply
- Jason Brownlee November 14, 2018 at 2:03 pm #
  
  Yes, the model has learned a persistence model, meaning that it cannot do better than the most naive model.
  
  LSTMs are generally poor at time series forecasting (yet everyone wants to know how to use them), I recommend reading this:
  https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
  
  And this:
  https://machinelearningmastery.com/findings-comparing-classical-and-machine-learning-methods-for-time-series-forecasting/
  
  Reply
  - Mike November 15, 2018 at 7:19 am #
    
    Thank you Jason.
    
    Now I am little confused.
    
    What is a persistence model?
    
    If we plot the result like this
    pyplot.plot(inv_yhat[-100:])
    pyplot.plot(inv_y[-100:])
    pyplot.show()
    
    Why is it so regular that a persistence model have exactly one day delay in the prediction result?
    
    Can you explain that in detail?
    
    Can we eliminate the delay in the persistence model?
    
    Thanks a lot
    
    Reply
    - Jason Brownlee November 15, 2018 at 11:29 am #
      
      A persistance model uses the input as the output. If the input is the observation yesterday, then the output will have a 1-day delay.
      
      If your model learns a persistence model, you may have to change the configuration of the model or the model itself. I have suggestions here:
      https://machinelearningmastery.com/improve-deep-learning-performance/
      
      Reply
- Xiang November 14, 2018 at 11:58 pm #
  
  Well down，Yang. Can I make friends with you? And this is my Q:44706602.
  I am quite interesting in multi-steps forecasting, and I will be very glad to make friends with you.
  
  Reply
Masahiro November 14, 2018 at 6:20 pm #

Hi Jason. Thank you for great post every time.

I have tried to predict the difference between current and one-step ahead values instead of one-step ahead value itself.
Is this effective to avoid a persistence model?

Reply
- Jason Brownlee November 15, 2018 at 5:27 am #
  
  Not quite, differencing the data is a good strategy to make it stationary if there is a trend.
  
  Reply
Kashyap Maheshwari November 15, 2018 at 7:38 am #

Hey Jason,

I tried carrying out the same procedure as you have shown here, but I am getting the following error

yhat = model.predict(X_test)
X_test = X_test.reshape((X_test.shape[0], 1, X_test.shape[2]))
# invert scaling for forecast
inv_yhat = pd.concat((yhat, X_test[:, 1:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]
# invert scaling for actual
y_test = y_test.reshape((len(y_test), 1))
inv_y = pd.concat((y_test, X_test[:, 1:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]
# calculate RMSE
rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
print(‘Test RMSE: %.3f’ % rmse)
Traceback (most recent call last):

File “”, line 4, in
inv_yhat = pd.concat((yhat, X_test[:, 1:]), axis=1)

File “C:\Users\kashy\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py”, line 225, in concat
copy=copy, sort=sort)

File “C:\Users\kashy\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py”, line 286, in __init__
raise TypeError(msg)

TypeError: cannot concatenate object of type “”; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

Could you tell me where am I making a mistake?

Reply
- Jason Brownlee November 15, 2018 at 11:30 am #
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Kashyap Maheshwari November 16, 2018 at 3:44 am #
    
    Hey,
    I found a workaround for that piece of code and it did work out
    
    # make a prediction
    yhat = model.predict(X_test)
    X_test = X_test.reshape((X_test.shape[0], X_test.shape[2]))
    X_test = scaler.inverse_transform(X_test)
    
    #invert scaling for forecast
    # create empty table with 8 fields
    yhat_inv = np.zeros(shape=(len(yhat), 8))
    # put the predicted values in the right field
    yhat_inv[:,0] = yhat[:,0]
    # inverse transform and then select the right field
    yhat = scaler.inverse_transform(yhat_inv)[:,0]
    
    # invert scaling for actual
    y_test_inv = np.zeros(shape=(len(y_test), 8))
    y_test = y_test.reshape(y_test.shape[0],1)
    y_test_inv[:,0] = y_test[:,0]
    y_test = scaler.inverse_transform(y_test_inv)[:,0]
    
    # calculate RMSE
    from sklearn.metrics import mean_squared_error
    rmse = np.sqrt(mean_squared_error(y_test,yhat))
    print(‘Test RMSE: %.3f’ % rmse)
    
    Reply
    - Jason Brownlee November 16, 2018 at 6:17 am #
      
      Glad to hear it.
      
      Reply
Siddhesh wani November 16, 2018 at 7:40 pm #

Hi jason,

thanks for great tutorial. I’m trying similar kind of modelling but my applications needs to use iterative predictions. by iterative predictions i mean that use current predictions as input for next prediction and so on. In example given in the post, you predict for whole range of X values in one go. My requirement is to use previous (n) samples to predict next value(t=1) and then club this predicted value with previous (n-1) samples to make a new sample of length (n). use this new sample for predicting (t=2) and so on. Though my model gives good results for predicting in one go for available samples it fails for iterative predictions. Can you share your thoughts about it?

Reply
- Jason Brownlee November 17, 2018 at 5:46 am #
  
  Yes, this is called recursive. Let me know how you go.
  
  Reply
Schveta November 17, 2018 at 8:06 am #

Hi Jason,

Thank you so much an amazing tutorial! I managed to use your techniques on my data set and got forecast results. However, I getting a validation loss value that is slightly less than the training loss. Why do you think that is the case.

Secondly, all the test data is converted to supervised time series and normalized. How do I convert it back to how it was – unscaled and unsupervised, so that I get rid of the lagging variables and get back the raw data? I want to append unscaled y_inv and yhat to this dataframe and have a collective view of what was the input, what is the real value and what is the predicted value. How can this be obtained?

Reply
- Jason Brownlee November 18, 2018 at 6:34 am #
  
  It may be cause the validation dataset is less representative than the training dataset, e.g. it’s easier.
  
  You can perform an inverse transform to get back to original units.
  
  Reply
Tom November 18, 2018 at 3:42 am #

Hi Jason,
Big thanks for your tutorial, I’ve tried to apply it to an issue related to CPU utilization. I need to forecast usage of four CPU (cpu1 cpu2 cpu3 cpu4) in next iteration based on present usage and additional variable (ch) which in fact is the root cause of CPU utilization.
For unknown reason the learning process starts with a huge mean_squared_error :

Epoch 1/200
– 2s – loss: 2157.2555 – val_loss: 1959.0597
Epoch 2/200
– 2s – loss: 1994.9966 – val_loss: 1823.9065

and ends with much lower value but its still unsatisfying..

Epoch 199/200
– 2s – loss: 154.8171 – val_loss: 126.7922
Epoch 200/200
– 2s – loss: 150.6429 – val_loss: 126.9605

Do you have an idea what is wrong?

The Data basically looks like that:

ch cpu1 cpu2 cpu3 cpu4
7 24,02 2 0 0
47 24,19 2 0 0
87 25,25 2 0 0
128 25,98 2 0 0
167 26,5 2 0 0
…
2050 28,02 5,29 2,35 9,42
2093 28,02 5,4 2,35 9,58
2134 28,02 5,51 2,35 9,73
…
6014 30,04 14,69 8,02 32,57
6054 30,04 14,77 8,06 32,81
6094 30,1 14,85 8,08 33,08
…
13818 40,56 32,55 60,71 92,31
13818 40,56 32,58 60,71 92,24
13818 40,52 32,61 60,71 92,13

Reply
- Jason Brownlee November 18, 2018 at 6:47 am #
  
  Sounds like a fun project.
  
  Perhaps scale the data?
  Perhaps start with a linear model per series?
  
  Reply
  - Tom November 20, 2018 at 5:26 am #
    
    >> Sounds like a fun project.
    Indeed, I truly believe that ML can give better results than standard approach.
    Will give you feedback after all
    >> Perhaps scale the data?
    Please advise,
    CPU are 0-100 -> scale to 0-1?
    ch is 0- 20k (maybe 30 or even more) cant estimate the max value. -> what scaling
    function can I use here ?
    >> Perhaps start with a linear model per series?
    Do you mean another predictor like for e.g Linear regression?
    
    Reply
    - Jason Brownlee November 20, 2018 at 6:40 am #
      
      Try normalizing or standardizing the data prior to modeling.
      
      I meant SARIMA or ETS.
      
      See this process:
      https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
      
      Reply
Juan B. November 19, 2018 at 3:16 pm #

Hi Jason,
It’s possible to use this code in a unsupervised problem?
I want to predict new data

Thanks

Reply
- Jason Brownlee November 20, 2018 at 6:32 am #
  
  Predicting is a supervised learning problem.
  
  Reply
  - Juan B. November 22, 2018 at 7:17 am #
    
    Yes it is, but it doesn’t predict future values?
    What i understand it’s that it predict known outputs
    
    Reply
    - Jason Brownlee November 22, 2018 at 2:07 pm #
      
      Forecasting by definition involves predicting unknown values.
      
      Perhaps I don’t understand your question?
      
      Reply
      - Juan B. November 29, 2018 at 2:06 am #
        
        Hi Jason, please help me with two questions about your code:
        
        1. The predicted values are unkown values but, they are found in function of the test data set? In that case, it means it is predicting till actual time step,not to next time stemps. Please explain to me
        2. How many predictions does the code? Just one time step? Can I predict more time steps changing the variable n_train_hours (line 58)
        
        Thanks
      - Jason Brownlee November 29, 2018 at 7:44 am #
        
        I have examples of different types of LSTMs for time series forecasting here, including multi-step:
        https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
    - ZHOU November 23, 2018 at 8:17 pm #
      
      hello, I just face the same problem like you.Have you solved this?
      
      Reply
      - Jason Brownlee November 24, 2018 at 6:31 am #
        
        Perhaps try another model type, e.g. MLP or CNN?
      - Juan B. November 29, 2018 at 2:17 am #
        
        Hello Zhou, no i don’t.
        Tell me if you achieved please
      - Juan November 29, 2018 at 10:49 am #
        
        Hi Jason,
        
        the RMSE value magnitude (it’s too large) depends on my data magnitude order?
    - ZHOU November 24, 2018 at 12:29 am #
      
      hello, have you worked out this question?
      
      Reply
ZHOU November 23, 2018 at 7:36 pm #

Hello, I have a question. When I plot the curve of yhat and test_y, I find yhat just follow itself.(like yhat[i] = y[i]).Can you please explain this.

Reply
- Jason Brownlee November 24, 2018 at 6:30 am #
  
  Sounds like a persistence model:
  https://machinelearningmastery.com/faq/single-faq/why-is-my-forecasted-time-series-right-behind-the-actual-time-series
  
  Reply
Andreas November 23, 2018 at 9:52 pm #

Hi Jason,

i have a simple question about the time series to supervised function. In case i want to use a supervised model for a classification problem (e.g. SGDClassifier), do i have to include the original labels as well in the transformed input data for training and testing? It would look like this in case of 2 features in my input data and using a window size of 2:

x1(t-2) x2(t-2) y(t-2) x1(t-1) x2(t-1) y(t-1) x1(t) x2(t) y(t)

y(t) is the label that i either give in the traning stage or predict in the test stage. But do i have to remove the y(t-2) and y(t-1) from my transformed input data or do they have to be included?

Reply
- Jason Brownlee November 24, 2018 at 6:32 am #
  
  This post will help you prepare your data:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  Reply
Francisco Rodriguez November 24, 2018 at 10:06 am #

Hello,

Congratulate you, guide me and tell how I can reuse the model to predict a future value starting from a model generated and recorded as using for example

lstm.save (my_modelo.h5 ‘)

Now my question is the model that can be used to predict future values with new input, you could help me or guide if you have a post that says how to use multivarinate lstm already trained that iliustre how to process the model with new values.

I hope you can help me,

Greetings from Ecuador

Reply
- Jason Brownlee November 25, 2018 at 6:50 am #
  
  You can load the model and start using it by calling model.predict()
  
  I give more advice on making predictions here:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  And here:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
Michael November 24, 2018 at 3:18 pm #

Hello Jason

Thank you for your great posts.
Based on my readings, we need to normalize the data after we have splitted our train and test data. Can you please explain why you have normalized all the data at once. Thank you

Reply
- Jason Brownlee November 25, 2018 at 6:52 am #
  
  Sure, I explain how here:
  https://machinelearningmastery.com/how-to-scale-data-for-long-short-term-memory-networks-in-python/
  
  Reply
Yi November 25, 2018 at 2:40 am #

Hi Jason,
Thanks for the great article!
In your program, the input X is a one-dimensional vector, which is denoteded as 1*8. And in the model, input_shape=(train_X.shape[1], train_X.shape[2]), here the train_X.shape[2] represents 8 input characteristics. But what should i do when the input X is a two-dimensional vector? For example, sometimes we may want to organise these 8 imput features in a matrix of 2 rows and 4 columns. I hope you can help me.
Thank you for your careful guidance.Best wishes!
Guyi

Reply
- Jason Brownlee November 25, 2018 at 6:59 am #
  
  What do you mean by 2 rows and 4 columns for a single sample?
  
  Reply
  - Yi November 26, 2018 at 10:29 am #
    
    You can think of it as a matrix on a graph. Or in another way, when I want to put a sequence of images into the LSTM model, what should i do?
    
    Reply
    - Jason Brownlee November 26, 2018 at 2:01 pm #
      
      Perhaps try a ConvLSTM2D? I have an example here:
      https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
      
      Reply
      - Yi November 26, 2018 at 5:28 pm #
        
        I find a example of ConvLSTM in this website. But I can’t find the example of ConvLSTM2D. Does ConvLSTM model is suitable for my problem? I am a beginner in deep learning and hope you don’t mind.
      - Jason Brownlee November 27, 2018 at 6:32 am #
        
        The ConvLSTM is implemented using a ConvLSTM2D layer in Keras.
Gazelle November 26, 2018 at 2:47 pm #

Hi Jason,

Thanks for your fruitful tutorials. I wonder if can use time series in predicting multiple variable? Just like multi-task learning

Thanks

Reply
- Jason Brownlee November 27, 2018 at 6:31 am #
  
  Yes, I have some examples here:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Luis Mendes November 27, 2018 at 6:06 am #

hi there!! I have a big question!

So, this predicts the next day pollution, but i want to predict for example, 7 days in advance! not knowing the pollutions behind!

Lets imagine:

You have data until 2014-12-31, and i want to predict pollution data for 1, 2, 3, 4, 5 of January! knowing only the atmospheric data offcourse (dew,temp,press,wnd_dir,wnd_spd,snow,rain).

I ask this because i can’t figure it out how :/.

Reply
- Jason Brownlee November 27, 2018 at 6:40 am #
  
  Yes, this is multi-step forecasting and I have many examples. You can get started here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Hung Nguyen November 28, 2018 at 3:47 pm #

Hi Jason,
Firstly thanks for all useful tutorials so far.
I have one question regarding the first dimension “sample”. I just don’t get the meaning of converting 2D to 3D data frame here, as “Beijing, China” seems to be the one and only “sample” in the dataset. Am I misunderstanding something?

Reply
- Jason Brownlee November 29, 2018 at 7:36 am #
  
  Perhaps this will help:
  https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
  
  Reply
  - Hung Nguyen November 29, 2018 at 1:02 pm #
    
    Thanks! That did help.
    However after reading a comment below that post I had another confusion.
    “Am I correct to say that in the iris dataset, the timesteps can be 2, 3, 5, 6 – as long as it neatly divides the dataset into equal number of rows (iris has 150 rows).
    And the number of features will be the number of columns (apart from the target column/class)?
    —> The iris dataset is not a sequence classification problem. It does not have time steps, only samples and features.”
    
    But in this PM2.5 dataset you converted all time steps into samples, leaving only one time step. Isn’t it equivalent to a dataset with only samples and features (panel data)? Or is it correct to say panel data is 3D data with 1 time-step?
    
    Reply
    - Jason Brownlee November 29, 2018 at 2:42 pm #
      
      Yes, but problem has a temporal relationship between observations and the LSTM can harness this relationship.
      
      Iris does not have such a relationship, using an LSTM will cause it to try to learn this relationship, which would be problematic (it does not exist).
      
      Perhaps this post on time series as supervised learning will help:
      https://machinelearningmastery.com/time-series-forecasting-supervised-learning/
      
      Reply
wei November 28, 2018 at 8:34 pm #

Hi, jason
I have one question, after training the model, I use this code yhat = model.predict(test_X) to predict the pollution, actually the first col of test_X is real pollution, I want to use the other 7 col data to predict the pollution, can I fill the first col of test_X with zero? I do that, the predict result is wrong, why?
thank!
wei

Reply
- Jason Brownlee November 29, 2018 at 7:39 am #
  
  You can remove the variable from the input.
  
  Reply
Savan Gowda November 29, 2018 at 2:17 am #

Hi Jason,

Thank you very much for this great explanation of LSTMs for Multivariate Time Series. i have one question regarding the input variables that is included. Is it a good idea to include pollution at (t-1) also as an input variable to predict pollution at (t) along with other input parameters, as we already have information about the pollution available, wouldn’t the LSTM be biased and learn only from the behavior of this variable? Looking forward for your answer!

Thanks

Reply
- Jason Brownlee November 29, 2018 at 7:44 am #
  
  Maybe. Perhaps experiment and discover the answer.
  
  Reply
martin November 30, 2018 at 6:02 am #

Hi Jason Thanks for all those tutos, they are very helpful.

I’ve a question for the multivariate time series :
When the target Y is at step T, one uses the features and targets of previous steps T-1, T-2, etc. But one does not use the features of step T.

==> Is it possible to use the features contained at time T ?

Hope my question is clear enough.

Thanks in advance,
Best regards

Reply
- Jason Brownlee November 30, 2018 at 6:34 am #
  
  Sure.
  
  Reply
Luis Mendes December 1, 2018 at 3:22 am #

Hello again.

Well, i have another noob question.

Here:

train_X, train_y = train[:, :n_obs], train[:, -n_features]
test_X, test_y = test[:, :n_obs], test[:, -n_features]

To this, the test_X will have 3 * 8 columns, but there are 8 columns left, that are the var(t) values. Well, one of this 8 left is the pollution value, so lets say there are 7 columns left.

Shouldn’t text_X have this 7 columns from var(t), so the atmospheric data count for the predict of var(t) pollution day?

Many thanks!

Reply
- Luis Mendes December 3, 2018 at 10:27 am #
  
  Don’t kniw if you see this post.
  
  But can you check?
  
  Reply
- Jason Brownlee December 3, 2018 at 2:34 pm #
  
  Good question, no, we discard the remaining data, but then use it directly for predicting the subsequent time step.
  
  Perhaps this post will help the framing of the problem as supervised learning:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  Reply
  - Luis Mendes December 5, 2018 at 5:23 am #
    
    I really tried to figure it out, but I couldn’t :/.
    
    How can I shape de data to contain the atmospheric data of tomorrow excluding the pollution of tomorrow? because this changes the all thing.
    
    because i and appending 7 days (atmospheric data + pollution value (8 columns)), and i want to append the atmospheric data for tomorrow (7 columns of data) so the predict of tomorrow pollution can be more accurate.
    
    What am I missing here? :/
    
    Reply
    - Jason Brownlee December 5, 2018 at 6:22 am #
      
      You will have to write some custom code to prepare the data in this way.
      
      Sorry, I don’t have the capacity to write this code for you.
      
      Reply
      - Luis Mendes December 5, 2018 at 8:43 am #
        
        Ok thanks anyway :D.
        
        If i got a solution I’ll post it here.
Camilo December 1, 2018 at 5:00 am #

Hello Dr. Jason,
If I have a RMSE of 25496.75 it’s not a good value?

Reply
- Jason Brownlee December 1, 2018 at 6:54 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/how-to-know-if-a-model-has-good-performance
  
  Reply
Pierre Laflamme December 3, 2018 at 1:37 am #

Your articles are awesome! For my use as a process engineer, they provide the most useful information I can find. Keep up the excellent work!

Is there a good way to consider a time lag that changes through time in multivariate time series? For example, in a chemical industrial process I work on, the final product may take between 16 hours and 32 hours to get from the beginning to the end of the end process (passing though different stages of the process and through different tanks). The time lag will depend on the product flow in the different stages and on the level of the different tanks (we have real-time measurements of these flows and levels). For example, if tanks are full and all stages are slowed down, the time lag will be much longer for a given period.

I would thus like to predict a quality parameter at the end of the process from different process parameters at the beginning of the process considering this variable time lag. Currently, I do so doing weekly rolling averages, but I would like to improve the prediction precision in time.

Do you have an article on this subject?

Reply
- Jason Brownlee December 3, 2018 at 6:51 am #
  
  Sure, you could pad the variable length sequences with 0 values and use a Masking layer to ignore the padded values.
  
  Reply
Christophe December 11, 2018 at 2:25 am #

Hi Jason,

Great article again. Totally love your work.

I am curious to know if you have an idea why all my time series LSTM work is ending up in a network that return the same value for all cases in the dataset (roughly the mean). So instead of predicting (y):

[[-0.01705725]
[ 0.01895695]
[-0.01623851]
[ 0.00772999]
[ 0.00546604]
[-0.01859799]
[-0.00874636]
[-0.01666667]
[ 0.01186441]
[ 0.00201991]
[-0.00290083]
[-0.00986193]]

for example, it would predict (y_hat):

[[0.31817305]
[0.31918538]
[0.3168676 ]
[0.31791273]
[0.31691164]
[0.31631264]
[0.3179203 ]
[0.3183312 ]
[0.3190964 ]
[0.31722257]
[0.3165959 ]
[0.31672308]]

Where the mean of the dataset is 0.317702080498597

So it feels like my model always end-up trying to learn to output the mean (( I noticed the same effect with different time series and different LSTM architectures.

Have you had similar issue in the past ? How did you sort out the problem ? I tried to change the learning rate, the function, the number of layers, the number of nodes per layer, the “lag” length, etc … But it always gets back to outputting the same value ((

Thanks in advance for your answer.

Regards,
Christophe

Reply
- Jason Brownlee December 11, 2018 at 7:49 am #
  
  Yes, it suggests the model has learned a persistance model:
  https://machinelearningmastery.com/faq/single-faq/why-is-my-forecasted-time-series-right-behind-the-actual-time-series
  
  I recommend following this process:
  https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
  
  Reply
Peter Klein December 11, 2018 at 4:21 am #

Hello,

You say:

The first step is to prepare the pollution dataset for the LSTM.

This involves framing the dataset as a supervised learning problem

How does this make sense in the context of LSTM. Your input should just be the sequence. There is no need to frame it as a supervised learning problem by considering lags.

Of course your loss function will have to compare prediction to realized value, but isnt the idea behind RNN that you dont have to resort to the “trick” of reframing your time series problem as a supervised problem.

Reply
- Jason Brownlee December 11, 2018 at 7:50 am #
  
  No, you still need input and output patterns to fit the model, it just so happens that the input patterns are sequences of observations, rather than single observations.
  
  Reply
Spyros December 12, 2018 at 2:50 am #

Hello,

I have a univariate time series depicting user activity whose values exhibit diurnal patterns and are strongly dependent on the type of day (workday, weekend, holiday). I want to apply LSTM for forecasting and anomaly detection. Since holidays can happen on a weekday, the series has no clear periodicity. I think of two ways to handle this problem.
1) Split the data into classes and apply univariate LSTM in each class. This requires the use of some classification algorithm to decide how many classes I need to use as it might be sufficient to use a single class for both weekends and holidays.
2) Add an integer variable, encoding the type of day and then perform multivariate LSTM on the resulting 2 variable time series.
Any thoughts on which approach might work better in this case?

Reply
- Jason Brownlee December 12, 2018 at 5:55 am #
  
  I recommend testing a suite of approaches in order to discover what works best for your specific dataset.
  
  Reply
Amin December 14, 2018 at 4:14 am #

Hi Jason

Thanks for great tutorials.

I have a difficulty with a dataset that I am working with and appreciate your feedback very much.

My dataset consist of batches with varied sizes. For example, each batch has 14 to 17 days of worth of data. Each batch has it is own unique conditions and each day in that batch has multiple inputs and outputs and some dependency to previous days in the same batch.

I would like to train the model with this dataset, and then use that to predict a whole batch. For instance, by defining the input and conditions of the batch, what would be the prediction for output for each day of that batch.

There is also this difficulty that some of the batch missing information, for example no information for day 5.

I am not sure where to start as data set has varied batch sizes, missing days, also how to predict the whole batch (output values for all days in the batch) rather than just next day, how to shuffle the data without messing up each batch.

Do you have any suggestion to how to solve this problem or where to start?

Reply
- Jason Brownlee December 14, 2018 at 5:34 am #
  
  You can pad all batches to the same length, the use a masking layer to ignore the padding.
  
  Reply
Alessio December 15, 2018 at 12:10 am #

Hi Dr Jason, can I ask you why did you choose to train the network on a little part of the dataset and test it on a much bigger part? Is that typical of a LSTM structure? In the case of a simple MLP I would have expected the opposite.

Reply
- Jason Brownlee December 15, 2018 at 6:13 am #
  
  No major reason, just to speed up training for the example.
  
  Reply
Benedikt December 15, 2018 at 4:31 am #

Hi Jason,

thank you for this tutorial. One question popped up in my mind while reading it:

Shouldn’t you normalize the data AFTER you split it into training and test set instead of before? As far as I understand it, woudn’t you give your model information about the test set while using the training set if the normalization is done over the whole data?

A quick search on stack overflow seems to validate my concerns.

Is this a valid concern or am I getting something wrong?

Reply
- Jason Brownlee December 15, 2018 at 6:16 am #
  
  Yes, I simplified the example for brevity.
  
  Reply
Abs December 18, 2018 at 11:40 pm #

Hi Jason,

Thank you for all the amazing tutorials. Here is something I can’t seem to grasp.
I have a multivariate time series dataset (30-seconds) where the frequency of observations is varying.
Comparing to your dataset, you split train/test set by multiples of (365 * 24).
In my case, day(24) == one observation. But unlike the fixed length of 24 in your example, mine varies between 190 to 200. How do I split the data for train/test? Do I need to pad each observation (which is dataframe)?

The aim is to implement LSTM to make a prediction for future observation at time t=2 given the first time slot (30-sec) passed. And observation has a unique ID.

Reply
- Jason Brownlee December 19, 2018 at 6:35 am #
  
  Yes, I recommend padding each sample to have the same number of time steps – use trailing zeros. Then use a Masking layer on the input to ignore the zeros.
  
  Reply
venky December 20, 2018 at 1:33 am #

Hi Jason,

I am new to this field, trying to build demo on available data in my project. I only got approval to install only Anaconda so i would like to implement this in my jupyter note book which doesn’t have tensorflow back ground.
How do we use LSTM with tensorflow/keras and build the model

Reply
- Jason Brownlee December 20, 2018 at 6:29 am #
  
  I show how here:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Ya December 20, 2018 at 12:06 pm #

Hi Jason,

Thank you for the great post. I have a question on “how to use known features on time T to forecast target on time T”.

For example, I need to predict sales (target) for some product on time t, given historical sales on t-1, t-2, …. Also the price for the product is taken as co-feature to predict sales . Price is time series as well. Sample data is:

price sales
Day 1: 1.2 100
Day 2: 1.3 90
…
Day t: 1.4 ?

Now I want to use LSTM to predict sales on Day t based on
1) historical price and sales and
2) price on Day t.

If I format this time series problem as supervised learning as below (1 lag):

//////////////////////////////////////////////
var(t-1) var2(t-1) var1(t) var2(t)
1.2 100 1.3 90
//////////////////////////////////////////////

var(t-1), var2(t-1), var1(t) should be train_X, and var2(t) should be train_y. But when I re-shape above as input to Keras, I need to put them in 3D format of [samples, timesetps, features].

Now timesteps = 1, because I am taking 1 lag. But “features” vary depending on which time point we look at:

if it is t-1, “features” = 2 (sales and price)
it it s t, “features” = 1 (price only).

Do you know how I can get around this? I am thinking to create a dummy “sales” on t, but not sure if it is the right way to go.

Can you please shed some lights on this? Thank you very much!

Ya

Reply
- Jason Brownlee December 20, 2018 at 2:00 pm #
  
  Yes, I have many examples, you can get started here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Ya December 20, 2018 at 4:53 pm #

Hi Jason,

Thank you very much for the reply and sharing examples with me. I searched it and couldn’t find a particular example that address it.

Do you have the link for an example that “predict a target based on 1) historical target value 2) historical feature value, and 3) current feature values”?

Thanks a again!

Ya

Reply
- Jason Brownlee December 21, 2018 at 5:26 am #
  
  A good place to start with a basic model is here:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Ya December 20, 2018 at 4:56 pm #

Or if you don’t have the example, maybe you could give some directions on how to solve this kind of problem from high level (such as create a dummy column for the feature on time t but use a Masking layer ignore it) ?

thanks

Ya

Reply
mk December 26, 2018 at 1:37 pm #

the LSTM does not appear to be suitable for autoregression type problems.Is there any LSTM’s advantages that solve AR problems?

Is there any posts on MLP with a large window?

Reply
- Jason Brownlee December 27, 2018 at 5:37 am #
  
  Perhaps multivariate inputs/outputs is one advantage.
  
  Reply
  - mk December 28, 2018 at 11:38 pm #
    
    How to choose loss function?loss=’mae’ or ‘mse’ is applied to your model in the regression problem.
    I hold that ‘mse’ make more faster convergence,but not exactly sure accurate.
    
    Reply
    - Jason Brownlee December 29, 2018 at 5:52 am #
      
      Yes, you can specify loss=’mae’
      
      Reply
      - mk December 29, 2018 at 12:30 pm #
        
        “If the coefficients are estimated using the entire dataset prior to splitting into train and test sets, then there is a small leakage of information from the test set to the training dataset. This can result in estimates of model skill that are optimistically biased.” in another post.I note that scaled = scaler.fit_transform(values) before splitting into train and test sets. Is there a small leakage of information from the test set to the training dataset.
      - Jason Brownlee December 30, 2018 at 5:35 am #
        
        Yes, that is correct. I often skip over this separation in the interest of brevity in the tutorials.
      - mk January 1, 2019 at 12:07 am #
        
        Keras author also give the same example code，and he transform(values) before splitting into train and test sets.But he do not given method how to inverse transform after splitting into train and test sets.
        Many papers do not inverse transform,but give rmse directly.I I don’t think it’s accurate in really daily life.
        But I note that we inverse transform data ,which cause a new err.
      - Jason Brownlee January 1, 2019 at 6:17 am #
        
        Inverting the transform on the predictions is required to return the values to their original scale.
        
        You can choose how to run your project, take my blog posts as suggestions only.
Abhik Jha December 27, 2018 at 9:59 pm #

Hi Jason, another great article.

I was wondering if “Batch Normalization” can be applied in LSTM.

For example, can this be written:

model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(BatchNormalization())

What will the advantages or disadvantages in doing so?

Reply
- Jason Brownlee December 28, 2018 at 5:56 am #
  
  Yes, it can. It can speed up learning.
  
  Reply
Mishra December 28, 2018 at 5:10 am #

Hi

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True): -> when this function is invoked got below error

ipython-input-334-7d369ad51243> in ()
2 def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
3 n_vars = 1 if type(data) is list else data.shape[1]
—-> 4 df = DataFrame(data)
5 cols, names = list(), list()
6 # input sequence (t-n, … t-1)

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
420 dtype=values.dtype, copy=False)
421 else:
–> 422 raise ValueError(‘DataFrame constructor not properly called!’)
423
424 NDFrame.__init__(self, mgr, fastpath=True)

ValueError: DataFrame constructor not properly called!

Reply
- Jason Brownlee December 28, 2018 at 5:59 am #
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Mishra December 29, 2018 at 11:29 pm #

Hi Jason,

I am trying to use lstm for multivariate time series model. And i could able to build model. But got error when i do predict.

# prediction
yhat = model.predict(test_X)

ValueError: Error when checking input: expected lstm_1_input to have 3 dimensions, but got array with shape (3, 6)

Could you please help me on this. I have referred https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me but no luck.

Reply
- Jason Brownlee December 30, 2018 at 5:41 am #
  
  You can get started with LSTMs for a range of time series problems here:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Richard Knight December 31, 2018 at 12:53 pm #

Thanks for the very clear and useful article.

If anyone is interested, I’ve ported the example code to R. You can find it at https://github.com/RJHKnight/MultiVariateLSTMWithKeras

Reply
- Jason Brownlee January 1, 2019 at 6:12 am #
  
  Nice work Richard.
  
  Reply
mahmood January 6, 2019 at 8:51 pm #

Hi. Thanks for this great tutorial.

How can we use this model to forecast the next 24 hours values that we don’t have?

I tried to put +24, is that right?

pyplot.plot(inv_yhat[+24:])
pyplot.plot(inv_y[+24:])
pyplot.show()

Reply
- Jason Brownlee January 7, 2019 at 6:30 am #
  
  I have many many many examples of this, perhaps start here:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  And here:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  And here:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Sabeel January 8, 2019 at 1:43 pm #

Hi Jason

Thank you for this great tutorial.

I have one problem.I tried to execute your code for my understanding but I am getting an error in the following line:

values = values.astype(‘float32’)

ValueError: could not convert string to float: ‘NW’

Hope you can help me.

Reply
- Jason Brownlee January 9, 2019 at 8:37 am #
  
  It sounds like there is a string in your data, perhaps double check you followed all of the steps in order.
  
  Reply
  - Adnan ÖNCEVARLIK January 17, 2020 at 7:58 pm #
    
    Hi Jason,
    Really appreciated for your tutorial. But I have same issue like Sabeel, I think,
    
    # integer encode direction
    encoder = LabelEncoder()
    values[:,4] = encoder.fit_transform(values[:,4])
    # ensure all data is float
    values = values.astype(‘float32’)
    
    in this code Column 4 is not Wind Direction and we can not Encode the directions. Is it right? (May be dataset could be changed).
    
    Reply
    - Jason Brownlee January 18, 2020 at 8:43 am #
      
      Data in column 4 is wind direction.
      
      Perhaps I don’t understand the problem you’re having?
      
      Reply
    - Lorentz Yeung January 20, 2021 at 8:30 am #
      
      Yes, i think you can just labelEncode both columns, 4, and -4.
      Jason is as awesome as always, i ve bought a few books and read through a few already, they are the best in the market.
      
      Reply
      - Jason Brownlee January 20, 2021 at 8:33 am #
        
        Thanks!
- Rudina December 6, 2021 at 7:51 am #
  
  Try to add this line of code to change column 8 from catorigical value to number:
  b, values[:, 8] = numpy.unique(values[:, 8], return_inverse=True)
  after line:
  values[:, 4] = encoder.fit_transform(values[:, 4])
  It will solve the problem
  
  Reply
Aryorobo January 11, 2019 at 1:46 pm #

Hi Jason,
you mentioned about “Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour”
if the strategy follow the statement above, how is the data input looks like or features preparation for multi lag and multi step prediction ? for example, to predict multi step ahead pollution (2 days in the future) given “expected” weather and 7 days historical pollution

Reply
- Jason Brownlee January 12, 2019 at 5:35 am #
  
  Perhaps this tutorial will help:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Barkey January 12, 2019 at 2:22 am #

Hi Jason , thanks for all your great stuff !
I have data with both categorical feature and numerical (2 features) .
I need to do some kind of sampling, similar to a language model – In train
X = [y,other_feature] and y_hat is compared to y_truth.
In test I will pass the y_hat instead of y , meaning
X_test = [y_hat<t-1,other_feature] .
The y’s are categorical (1,2,3,4) and the other_feature is numerical (1-100) .
I guess I need to one-hot-encode y values with to_categorical and my question is :
1. Do I need to one-hot-encode the y that I use as an input at test?
2. If I do need to encode y , what should I do with the other_feature ? I will have a vector of length 5 and a separate discrete number(the other feature)
3. At test (sampling actually) I guess the y_hat will come up as probability ( I would use a softmax) , I will have to decode it back – and goes back to the same question as 1. Am I right ?

Thanks!

Reply
- Jason Brownlee January 12, 2019 at 5:44 am #
  
  I don’t follow your questions, sorry. Perhaps start with one question and elaborate a little.
  
  Generally, if you’re unsure whether or not to transform a data, try modeling with and without the transform and use the approach that results in the model that learns faster or has better skill.
  
  Reply
  - Barkey January 12, 2019 at 7:29 am #
    
    Sorry , I’ll start with one question :
    
    When I have data that is both categorical and numerical (2 feature) , what should I do ?
    One-hot-encode the categorical feature and concatenate the other (e.g. [1,89] will transform to [0,1,0,0,89] ?
    Encode them both and get 2 one-hot-encoded vectors (won’t I lose the importance of the numerical feature ?) etc…
    
    Reply
    - Jason Brownlee January 13, 2019 at 5:36 am #
      
      Try modeling the data with multiple different transforms, compare results and use the transform that results in the most skillful model.
      
      E.g. some ideas to try:
      
      – without the var
      – numeric
      – integer encoded
      – one hot encoded
      – learned embedding
      – etc…
      
      Reply
Mike January 12, 2019 at 7:55 am #

Great article! It’d be useful to see how LSTM compares against other learning algorithms (e.g. ensemble regression tree approaches, MLP). Perhaps some proof of improved performance would help motivate people to try out LSTM.

Reply
- Jason Brownlee January 13, 2019 at 5:37 am #
  
  Yes, I teach how to make such comparisons here:
  https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
  
  More details here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Rajesh January 13, 2019 at 10:06 pm #

Dear Jason,
thank you for answering my other questions from other tutorials. I’ve another, more general question:

Assuming that you wouldn’t want to use your output as input in a multivariate LSTM ( that is, you would want to leave the PM 2.5 feature out of the list of features – you would just use it as the output (train_y/text_y)), would you still difference it?

What is the general consensus on differencing when it comes to categorical data – on the surface it appears to me that it shouldn’t be differenced, but am I missing a logical reason as to why it should? To be more specific in this example, if I LabelEncode the wind direction, would I difference it? If I further OneHotEncode the categorical data after LabelEncoding it, should that be differenced?

Thank you for your patience and again sorry if my questions are trivial or illogical.

Cheers,
Rajesh

Reply
- Rajesh January 13, 2019 at 11:06 pm #
  
  One more question:
  If I were to difference my data, would I do it before or after I resale it? Is there a difference if I interchange this order?
  
  Cheers,
  Rajesh
  
  Reply
  - Rajesh January 13, 2019 at 11:25 pm #
    
    Edit: “text_y” = test_y, and “resale” = rescale
    
    Cheers,
    Rajesh
    
    Reply
  - Jason Brownlee January 14, 2019 at 5:29 am #
    
    The order of transforms is here:
    https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/
    
    Reply
- Jason Brownlee January 14, 2019 at 5:28 am #
  
  Categorical data would need to be integer encoded, one hot encoded or use an embedding.
  
  No differencing is performed on categorical data.
  
  Reply
Shweta January 15, 2019 at 3:18 pm #

I applied similar code for my time series data. Is it a good idea to apply cross-validation to such data? How can I apply k-fold cross validation to this problem? Will cross validation improve the results in any way?

Reply
- Jason Brownlee January 16, 2019 at 5:42 am #
  
  Yes, it’s called walk-forward validation:
  https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
  
  Reply
June, Chung January 16, 2019 at 1:05 pm #

Dear Jason,

Thanks for your open minded.

Actually i tried and tested your sample code to understand LSTM
To check my understanding, i want to ask this.

In your Multiple Lag Timesteps Example,
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))

Its structure is One input layer with 8 inputs, One hidden layer with 50 Units and One output layer with 1 output
And they are densely-connected.

Is it right?

Reply
- Jason Brownlee January 17, 2019 at 5:21 am #
  
  Yes.
  
  Reply
  - June Chung January 18, 2019 at 1:29 pm #
    
    Thanks for your answer
    Your lessons and opinion in it are super helpful to me.
    
    I got 24.3xx, the minimum value of RMSE.
    I thought it was relatively high, because range of air pollution value is 0 to 300 usually.
    
    So i changed many factors, eg. n_train_hours, n_features, n_train_hours and added more hidden layers and tried other loss functions, optimizers and activation functions.
    But i couldn’t reduce RMSE.
    
    1.
    What do you think the reason is?
    Is there any further improvement?
    
    2.
    I hope to get under 5 of RMSE value.
    Do you think it is possible? if so, what do you think about the solution?
    
    Reply
    - Jason Brownlee January 19, 2019 at 5:33 am #
      
      Yes, there are many ways to improve the model.
      
      I recommend starting with this process:
      https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
      
      Reply
Ying January 18, 2019 at 1:52 am #

Dear Jason，

Thank you for such a clear tutorial about LSTM. I could understand most code above, but in fact I am totally a novice. I am confused about the choice of optimizer. Adam may be the best for this example. If I want to use the method above to predict other time series, how can I get the best optimizer? Do you have any advise or example for me?

Thanks a lot.
BRs

Reply
- Jason Brownlee January 18, 2019 at 5:44 am #
  
  A good starting point is to use SGD and experiment with different learning rates and momentum values.
  
  Once you’ve tuned the model, see if an automatic method like rmsprop or adam can do better.
  
  Or if you don’t have much time, start with adam/rmsprop.
  
  Reply
Juan B. January 24, 2019 at 7:40 am #

Hi Jason,

I want to know why do you choose the default option as activation function?

Thanks

Reply
- Jason Brownlee January 24, 2019 at 1:20 pm #
  
  It works reasonably well as a starting point.
  
  Reply
Raghav Pangasa January 24, 2019 at 4:47 pm #

Hi Jason,
I loved the tutorial.
When I practised the steps on a project of mine, it got a bit confusing. I have to predict values of certain data, for which I do not have the actual values (y), due to which, I cannot convert the data to supervised, and hence, cannot be used as input in the prediction function. I hope my doubt is clear, please help.
Thankyou.

Reply
- Jason Brownlee January 25, 2019 at 8:42 am #
  
  If you don’t have outputs, you cannot train a supervised learning model.
  
  Perhaps spend some time defining your problem:
  https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
  
  Reply
Sergio January 25, 2019 at 6:18 am #

Hi Jason,
I tried to first apply the function to transform the DataFrame and thereafter apply scaling, as follow:

# frame as supervised learning
reframed = series_to_supervised(dataset, 1, 1)
# drop columns we don’t want to predict
reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)

# integer encode direction
values= reframed.values
encoder = LabelEncoder()
values[:,4] = encoder.fit_transform(values[:,4])
# ensure all data is float
values = values.astype(‘float32′)
values[:3,:]

# normalize features
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)

# split into train and test sets
values = reframed.values
n_train_hours = 365 * 24
train = values[:n_train_hours, :]
test = values[n_train_hours:, :]

# split into input and outputs (output in last column/position)
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]

# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(f’Train shape: {train_X.shape}’,f’Train y length: {train_y.shape}\n’)
print(f’Test shape: {test_X.shape}’,f’Test y length: {test_y.shape}\n’)

……
……

But when I try to inverse transform:

inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat.shape

I get the following error:

ValueError: operands could not be broadcast together with shapes (35039,8) (9,) (35039,8)

I can’t figure it out what’s going on?
Could you help?
Thanks.

Reply
- Jason Brownlee January 25, 2019 at 8:47 am #
  
  Both the transform and inverse must take data with the same dimensions, even if you are only intersted in one column.
  
  Reply
Sergio January 28, 2019 at 9:03 pm #

Hi Jason,
Thanks for your reply.

I finally solved the reported issue by using two separate “scaler” (scalerX for predictors and scalerY for output), one for predictors and one for the output, I think that in this way is clearer.

I have another question regarding how to evaluate the model.

Suppose that I split the whole dataset by year choosing year 2010 for training and year 2011 for test (or I should say validation, eventually applying Early stopping) and I follow along your code example footprint.
Thereafter I want to evaluate my model for each of the remaining year (test datasets).
If I am right, I have to:

1. retrieve predictors and output for each dataset (year)

2. use “scalers” already fitted on year 2010 to transform predictors and output (to avoid data leakage)

3. retrieve model’s (no retraining) prediction as:

yhat = model.predict(test_X.reshape(-1,1, num_features), batch_size=batch_size)

4. do scalerY.inverse_transform(yhat) to retrieve output in original scale

5. evaluate metric of performance.

What reported above is correct? There is perhaps a better way?

All this for a one-step-ahed forecasting, but what if I want to do a multi-step ahead forecasting (24h or 24 samples) ?

On your page https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/ you describe a different approach for this situation if I have well understood.

Reply
- Jason Brownlee January 29, 2019 at 6:10 am #
  
  It is hard for me to comment on what would be best for your specific project.
  
  Instead, I outline a suite of approaches that you could use in tutorials, and you can select what makes the most sense for your project.
  
  Reply
Jason Koh February 1, 2019 at 4:36 pm #

Hi Jason,

I feel your warm heart. Thanks a lot for the dedication.

I have a question regarding the network design:
# design network model = Sequential() model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2]))) model.add(Dense(1)) model.compile(loss='mae', optimizer='adam')

In your design, the number of the timesteps of a sample is 1 but you do not enable stateful=True to keep the states during training and testing. Is LSTM still useful in this way? In other words, is the history of the data embedded inside the cell when you train/test a current sample?

I expected the model to be something like:
model = Sequential() model.add(LSTM(50, batch_input_shape=(some_batch_size, train_X.shape[1], train_X.shape[2]), stateful=True)) model.add(Dense(1)) model.compile(loss='mae', optimizer='adam')

Would it make any difference?

Thanks a lot!

Reply
- Jason Brownlee February 2, 2019 at 6:08 am #
  
  State is maintained between samples within a batch (e.g. between internal calls to the reset_state() function).
  
  Reply
  - Jason Koh February 2, 2019 at 11:05 am #
    
    Oh I thought stateful=True maintains states of sampels within a batch, which is actually not after carefully reading the API doc. Thanks for the clarification, a lot!
    
    Reply
    - Jason Brownlee February 3, 2019 at 6:13 am #
      
      When stateful is set to true, it means the model will no longer reset states at the end of each batch and instead you are responsible for when the internal states will be reset.
      
      Reply
      - Jason Koh February 3, 2019 at 12:43 pm #
        
        If you learn a model for the data with a long history (especially with the timesteps=1,) why would you want to reset the internal states? In that case, shouldn’t we set stateful=True?
      - Jason Brownlee February 4, 2019 at 5:44 am #
        
        It really depends if the model is capable of learning something useful/predictive across samples.
Sam February 1, 2019 at 4:37 pm #

HI Jason,
I have a question. In this work you used var1(t-1) in training dataset and you could predict var1(t) which is the air pollution. I am working on same project except I don’t wanna put var1(t-1) in training set and just with other features I have , I am going to predict var1(t). Is LSTM still suitable for this work?

Reply
- Jason Brownlee February 2, 2019 at 6:08 am #
  
  I generally recommend testing a suite of different algorithms in order to discover what works best for your specific dataset, for example:
  https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
  
  Reply
Johan Ericson February 4, 2019 at 8:08 pm #

Hi! I really love all of your tutorials, thank you!

However, there’s one thing I wish to do which I cannot find:
I have built my model and trained it on a big data set and now I would like to use that model to predict tomorrows outcome, the two data sets are describing the same thing and structured in the same way. How would I add a row in the dataset with the predicted value for tomorrow?
I have been able to add lines with new dates as my index column, but how do I get the predicted value for tomorrow?

Thank you!

Reply
- Jason Brownlee February 5, 2019 at 8:15 am #
  
  You can call model.predict() with the required input to make a prediction.
  
  Perhaps this tutorial will help:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Does that help?
  
  Reply
Kaiche February 6, 2019 at 2:29 pm #

Hi Jason

Thanks for good tutorial, I have a question base on reshape input during calculate rmse

For example this are train data shape input
train_X = train_X.reshape((train_X.shape[0], n_hours, n_features))

train_Y = train_Y.reshape((train_Y.shape[0], n_hours, n_features))

and this are test data shape input
test_X = train_X.reshape((test_X.shape[0], n_hours, n_features))

test_Y = train_Y.reshape((test_Y.shape[0], n_hours, n_features))

Now during prediction(evaluate the model) we use test dataset
yhat = model.predict(test_X)

I want to know is it Ok to calculate rmse without reshape?
-rmse = sqrt(mean_squared_error(yhat, test_Y))

Don’t you think reshape to two dimension is not good way to evaluate a model that train with 3D dataset

e.g test_X = test_X.reshape((test_X.shape[0], n_hours*n_features))

Reply
- Jason Brownlee February 7, 2019 at 6:35 am #
  
  When calculating the RMSE, you must provide two arrays or lists of scores, actual and predicted.
  
  Reply
tom February 9, 2019 at 9:06 pm #

how to improve lstm performance?
i have already changed the neurons, epochs size, batch_size , it seems too low acc (20.32%). Have any solution to improve lstm model???

Reply
- Jason Brownlee February 10, 2019 at 9:41 am #
  
  Here are some suggestions:
  https://machinelearningmastery.com/improve-deep-learning-performance/
  
  Reply
Partha Pritam Deka February 13, 2019 at 7:28 am #

Jason, this is really an in-dept write up on using LSTM for a multivariate time series forecasting problem, thank you.

I understand that you are using the previous datapoints (previous data hour) for the features to predict the next time step (next hour) pollution. This is something like we having 1 lag Auto Correlation for all the variables ? What if there are lag2 or lag auto correlations, in that case we should bring in step 2 /3 lag features as well… the feature set might grow very wide ? Now, what if the time series is non-stationary, in that case shall we stationarize the series first right before creating the AR features? What if there is seasonality shall we deseasonalize first ? Shall we also model the residuals with Auto-regression and think of adding the predicted residuals to the final predictions of the original LSTM model like in ARIMA.

Basically what I am trying to see is if we shall use LSTM with an ARIMA mindset first – deseasonalize , stationarize the model first and apply LSTM with the AR (1,2,3 lags etc..) features, get the prediction and than revert the non-stationarity and seasonality. Is this a viable approach for further improving the accuracy or heuristically this would not help at all or I am just adding too much unwanted complexity ?

Reply
- Jason Brownlee February 13, 2019 at 8:08 am #
  
  Good question.
  
  Maybe. I find that a CNN or MLP can learn a trend/seasonality as well as the residual. Try modeling with and without the trend/seasonality and compare results. Also, try a suite of methods, not just LSTMs.
  
  I have more advice on deep learning for time series here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Matty February 15, 2019 at 7:52 am #

Thank you Jason. I’ve been working with ML for several years now, and still there are many things that I learn from your posts.

Reply
- Jason Brownlee February 15, 2019 at 8:19 am #
  
  Thanks Matty!
  
  Reply
Ravi T February 16, 2019 at 6:08 pm #

Hi Jason – Thanks for this write-up. This dataset was about predicting the weather in China, what if lets say this dataset has another column, which indicates country and lets say we have 2 different countries in the dataset. Does this mean we need to create 2 LSTM models?

Reply
- Jason Brownlee February 17, 2019 at 6:32 am #
  
  Good question, there are many ways to approach the problem. This might help as a start:
  https://machinelearningmastery.com/faq/single-faq/how-to-develop-forecast-models-for-multiple-sites
  
  Reply
  - Ravi February 18, 2019 at 10:35 am #
    
    Thanks for the response, Jason. One more question. Do you have a project in one of your books that deal with this scenario?
    
    A suggestion. You may want to consider having a ‘Subscribe’ button so people can subscribe to topics, which can also be used to notify people when their questions are answered.
    
    Reply
    - Jason Brownlee February 18, 2019 at 2:13 pm #
      
      Not directly.
      
      Thanks for the suggestion.
      
      Reply
Hamdi February 18, 2019 at 7:26 am #

Please help me to fix this, thanks

ValueError Traceback (most recent call last)
in ()
42 values[:,4] = encoder.fit_transform(values[:,4])
43 # ensure all data is float
—> 44 values = values.astype(‘float32’)
45 # normalize features
46 scaler = MinMaxScaler(feature_range=(0, 1))

ValueError: could not convert string to float: ‘NW’

Reply
- Jason Brownlee February 18, 2019 at 2:12 pm #
  
  It looks like you were trying to encode a string variable.
  
  Perhaps you skipped some steps?
  
  Reply
  - Steve May 24, 2019 at 2:52 am #
    
    he didn’t. i think your code is broken. after ecoding you didn’t replace the string column so values still has the string column
    
    Reply
    - Prateek Samuel September 17, 2019 at 7:56 pm #
      
      Steve is right.
      
      @Jason: your code is broken
      
      Reply
      - Jason Brownlee September 18, 2019 at 6:01 am #
        
        Thanks for your feedback, I recommend following this tutorial:
        https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
Shaun February 18, 2019 at 7:37 am #

Hi Jason,

Can you please provide code for multi-target prediction using single LSTM ?

Reply
- Jason Brownlee February 18, 2019 at 2:12 pm #
  
  What do you mean exactly?
  
  Do you mean multi-step forecasts? If so, you can get started here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
jaehyeong an February 19, 2019 at 1:12 pm #

If you do not mind, can I translate this post into Korean and put it on my blog?

Reply
- Jason Brownlee February 19, 2019 at 2:01 pm #
  
  No, please do not translate and re-publish my posts.
  
  I explain more here:
  https://machinelearningmastery.com/faq/single-faq/can-i-translate-your-posts-books-into-another-language
  
  Reply
Vini Lopes February 21, 2019 at 12:05 am #

Hi Jason, excelent class.

I am implementing this procedure to a dataset quite similar, but I have one doubt.

In order to obtain the best LSTM model, which order do I need to use in my lagged input features ? For example, rain on the last 3 hours must be ordered like: rain(t-3),rain(t-2),rain(t-1) when reshaped, or must be ordered like: rain(t-1), rain(t-2), rain(t-3). My intuition, knowing the structure of a LSTM, says that the first sequence fits better the application, but I really don’t know if if even matter.

Thanks. Best regards!

Reply
- Jason Brownlee February 21, 2019 at 8:12 am #
  
  Perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
Jack February 21, 2019 at 2:53 am #

Hi Jason,
I need to ask a very basic question. When I print(test_X), I will get data with 8 columns. And when I use
yhat = model.predict(test_X)
print(yhat)
I will get data with one column, so, basically for which column or feature I am getting predictions for? And why is it not giving predictions for all the features(columns) we have in test_X?

Reply
- Jason Brownlee February 21, 2019 at 8:14 am #
  
  You pass in an array of one or more “samples” and get a prediction for each sample.
  
  Perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  And this:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
Kaidelian February 23, 2019 at 12:01 am #

Dear professor Jason,
I am using the LSTM to forecast Power Quality(PQ) .When i trained the LSTM,i found a strange question.My raw data is periodical,and 24 hours a cycle.Because the value of junction point has a biggish gap，when i tested the testing data using the trained model,the result showed the junction point always had a higher relative error,sometimes even reached 80%.I have tried to fix it,but i failed.So i hope you can do me a favor.
Thanks.Best regards.

Reply
- Jason Brownlee February 23, 2019 at 6:33 am #
  
  Perhaps try removing the seasonality from the data prior to modeling?
  
  Reply
Learnd February 24, 2019 at 6:40 am #

Hi Jason,

Problem is regarding Time series, where i have 15 feature variable (x1,X2,X3,——,T) and data collected with 2 hour interval. x1, x2 and x3 is significant feature.

need to forecast value of T for next 24 hours . What would be the approach? I am trying multivariate time series model using LSTM. But not getting clue how can i predict for next 24 hours with current data. Could you please let me know your approach.

Thanks

Reply
- Jason Brownlee February 24, 2019 at 9:15 am #
  
  I recommend following this process:
  https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
  
  The tutorials here will help:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
dyy February 26, 2019 at 4:11 pm #

Hi Jason,

There’s a part from this that i got indexError: tuple index out of range when i test on my dataset.
May I know what’s the meaning of this line
test_X = test_X.reshape((test_X.shape[0], test_X.shape[2])). What is the value 2 for?

Reply
- Jason Brownlee February 27, 2019 at 7:24 am #
  
  It refers to the third index of the shape variable, e.g. the size of the third dimension of the test_X array.
  
  Reply
  - dyy February 27, 2019 at 2:38 pm #
    
    Crystal clear jason! Thank you! Btw, if i want to get the training error, i just have to
    # make a prediction
    yhat = model.predict(train_X) right? And continue the rest process with train_x?
    
    Reply
Mike Gardner February 28, 2019 at 4:12 pm #

Thanks so much for this tutorial. It’s amazing. I’m sure you know this but a lot of your pyplots can be simplified using the plot method available on DataFrames. I recreated your first plot below.

https://i.imgur.com/aDDuEPG.png

Reply
- Jason Brownlee March 1, 2019 at 6:14 am #
  
  Thanks, great tip!
  
  Reply
boughrara March 1, 2019 at 1:57 am #

Hello
thank you very much for your tutorials which are very interesants
I wanted to develop an LSTM model for the weather forecast, with several variables, 7 variables, and I wanted to predict the 7 variables for several time steps in the future (24 values in the future) and exactly at this point I encountered errors at level of the output layer ‘Dense’, what is the number of neurons that I have to put, (Dense (?)), is what you can help me please,
Thank you

Reply
- Jason Brownlee March 1, 2019 at 6:24 am #
  
  The Dense or fully connected output layer defines the number of outputs when making a prediction.
  
  Perhaps start here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Alexandre March 1, 2019 at 11:34 am #

Hi Jason, thanks for this amazing tutorial, this helped me so much! I still have one problem pending here: I set two features A and B (n_features=2) as input features, and the number of outputs as two also (n_outputs=2).

I want to use a naive model for forecasting the feature A based on B. However, yhat=model.predict(test_X) returns a shape of (test_X.shape[0], 1), while test_X (used for persisting by appending the last value of yhat) expected a shape of (1, n_lag, n_features).

I’ve made a naive model with only one feature and it worked pretty well! But with two features I think I’m missing something.

How I accomplish the naive model with two features as input? Setting the last Dense layer with units=2 don’t work out, I’m confused.Thanks!

Reply
- Jason Brownlee March 1, 2019 at 2:21 pm #
  
  Perhaps this post will help:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Anustup Das March 2, 2019 at 11:10 pm #

Dr.Jason,

Thank you for this great post.
I have multiple time series [180] each having the length of 51. in total I have (180 X 51) data with 24 features each. I guess I have a Multiple Multivariate time series problem. How can I apply LSTM to this data. Any help will be much appreciated.

Reply
- Jason Brownlee March 3, 2019 at 8:02 am #
  
  Perhaps start by reading this:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Then start here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Lohith Desu March 4, 2019 at 10:42 pm #

Hi Jason, I am facing an error something like this in line number 45. What am I supposed to do know?

—————————————————————————
ValueError Traceback (most recent call last)
in ()
43 values[:,4] = encoder.fit_transform(values[:,4])
44 # ensure all data is float
—> 45 values = values.astype(‘float’)
46 # normalize features
47 scaler = MinMaxScaler(feature_range=(0, 1))

ValueError: could not convert string to float: ‘NW’

Reply
- Jason Brownlee March 5, 2019 at 6:39 am #
  
  I believe you might have skipped a step where that column was removed from the dataset.
  
  Reply
Jerry Zhang March 10, 2019 at 12:16 am #

Hi Jason, Thanks for the fabulous tutorial. I have run your multi-step example with fewer hidden neurons and get better RMS errors. For example,
LSTM hidden RMS error
4 24.364
2 24.378
1 24.728

Is that possible?

Reply
- Jason Brownlee March 10, 2019 at 8:17 am #
  
  Well done.
  
  Reply
Ratnesh Kumar Tiwari March 11, 2019 at 1:00 am #

Hi Jason, I have a similar project, but need to predict for the next 3 days instead for 1 day . Please suggest me relevant approach to tackle this challenge. Thanks in advance!!

Reply
- Jason Brownlee March 11, 2019 at 6:52 am #
  
  This is called multi-step forecasting, perhaps start here:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
Yasir Merchant March 12, 2019 at 2:02 am #

Hi Jason. I have a similar problem that i’m dealing with which is doing time-series forecasting on hundreds of SKUs in different cities. In other words, predicting how much a SKU is likely to be sold (in quantity) given a certain city, week of year (1 – 52), and temperature (domain experts know a relationship exists between the amount of a certain SKU sold and temperature).

I came across a post on stackexchange (https://stats.stackexchange.com/questions/389291/strategies-for-time-series-forecasting-for-2000-different-products?noredirect=1&lq=1) on which the answerer mentioned that Amazon Forecasting uses a RNN LSTM model to achieve what i’m trying to achieve which is prediction on the SKU level and using just one model to predict multiple-time series instead of a separate model for separate time-series (for different SKUs). And the post is right because after analyzing their “recipes”, few of them are RNNs. Simply knowing that Amazon is utilizing the same methodologies reinforces my idea that I’m on the right path. However, my question is that in your Conclusion of this post, you mentioned that LSTMs are not a good idea for Auto-regression problems. Would my problem be considered as an auto-regression type of a problem? If yes, do you have any strategies for me to use to tackle this specific of a problem in which i’m trying to forecast on the SKU level and ideally use one model for it?

Thanks!

Reply
- Jason Brownlee March 12, 2019 at 6:58 am #
  
  I would recommend starting here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Also, this may help (replace sites with SKUs):
  https://machinelearningmastery.com/faq/single-faq/how-to-develop-forecast-models-for-multiple-sites
  
  Reply
SUBHADEEP March 14, 2019 at 10:11 pm #

Hi Jason,

I’m a new learner, I just try to get accuracy and validate accuracy using the below code

model = Sequential()
model.add(LSTM(10, input_shape=(train_X.shape[1], train_X.shape[2])))
#model.add(Dropout(0.2))
#model.add(LSTM(30, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1), return_sequences=True)
model.compile(loss=’mae’, optimizer=’adam’, metrics=[‘accuracy’])
# fit network
history = model.fit(train_X, train_y, epochs=50, batch_size=120, validation_data=(test_X, test_y), verbose=2, shuffle=False)
# plot history
pyplot.plot(history.history[‘loss’], label=’train’)
pyplot.plot(history.history[‘val_loss’], label=’test’)
pyplot.legend()
pyplot.show()
print(history.history[‘acc’])

As the loss value is very less (which is round 0.0136) inspite of that I’m getting the accuracy is 6.9% and validate accuracy is 2.3% respectively, which is very low
So, can you please help with this same.

Reply
- Jason Brownlee March 15, 2019 at 5:31 am #
  
  I have suggestions for improving model performance here:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
raks March 19, 2019 at 11:18 pm #

im getting this error

line 50, in
values[:,4] = encoder.fit_transform(values[:,4])

IndexError: index 4 is out of bounds for axis 1 with size 0
how to resolve

Reply
- Jason Brownlee March 20, 2019 at 8:30 am #
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
steven March 20, 2019 at 4:30 am #

How can we get date on x axis while plotting predicted values?

Reply
- Jason Brownlee March 20, 2019 at 8:34 am #
  
  Good question, this might help:
  https://matplotlib.org/gallery/text_labels_and_annotations/date.html
  
  Reply
Alex March 20, 2019 at 5:09 am #

Hi Jason,

I really like your tutorials. However I just came up with a small doubt, so maybe you can help me out. In my dataset I have 2 features and various timesteps. Feature 1 corresponds to the timestamp of feature 2. So in my forecasting problem consist on predicting the future values of feature 2.
So far, everything is good. However, now I’d like to use lag timsteps of feature 2, and lag+1 timesteps of feature 2. This way, when I can set the timestamp of the prediction for feature 2.

Would you know how to address this issue?
The general problem would be: Can we use different lags for different features?

Thanks!

Reply
- Jason Brownlee March 20, 2019 at 8:36 am #
  
  Perhaps create lags of all variables, then remove the unwanted columns.
  
  This post will help, at least as a starting point:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  Reply
  - Alex March 20, 2019 at 7:25 pm #
    
    I believe if I do what you recommend I would be considering the lags as features and so I would be miss-using the LSTM celss, or maybe I didn’t explain myself correctly.
    Here is an example with data from your link. Suppose we have var1(t-1) and var2(t-1) and we want to predict var2(t), then this would be our data structure:
    
    var1(t-1) var2(t-1) var2(t)
    1 0.0 50.0 51
    2 1.0 51.0 52
    3 2.0 52.0 53
    4 3.0 53.0 54
    5 4.0 54.0 55
    
    Nevertheless, now I want to predict var2(t), from var1(t-1), var1(t), and var2(t-1). This would mean that var1 has lag=2 while var2 has lag=1. And as far as I know, keras input_shape is only (n_timesteps, n_features), so we would need to adapt our input matrix to that shape, maybe reshaping it somehow like:
    
    1) Considering var1(t) as a new variable called var3(t-1). This would be like lag = 1 and n_features = 3. Although I’m afraid this will be counterproductive for the RNN as I said before.
    
    var1(t-1) var2(t-1) var3(t-1) var2(t)
    1 0.0 50.0 1.0 51
    2 1.0 51.0 2.0 52
    3 2.0 52.0 3.0 53
    4 3.0 53.0 4.0 54
    5 4.0 54.0 5.0 55
    
    2) Set the lag as long as the longest one, and set Nan or other value that does not naturally appear on the actual dataset. This would be like lag = 2, and n_features = 2. Here the RNN should learn to predict var2_predict(t), although it should also learn to discard var2(t).
    
    var1(t-1) var2(t-1) var1(t) var2(t) var2_predict(t)
    1 0.0 50.0 1.0 -1 52
    2 1.0 51.0 2.0 -1 53
    3 2.0 52.0 3.0 -1 54
    4 3.0 53.0 4.0 -1 55
    5 4.0 54.0 5.0 -1 56
    
    Unfortunately I can not come up with any other idea… hopefully I explained better this time or you could give me a more thorough insight.
    
    Reply
    - Jason Brownlee March 21, 2019 at 8:02 am #
      
      Yes, I think I see.
      
      If you don’t have all time steps for all input variables – as I understand your problem – then two starting options include:
      
      – have all time steps for all input vars and use zero padding with a masking layer
      – frame time steps as features.
      
      Reply
Andreas March 21, 2019 at 1:45 am #

Hi Jason!

I have two questions regarding this tutorial. I´m a bit confused about how many features that
are used. I saw your answer to Lg that 7 features are used, but when you run print(reframed.head()) under the “LSTM Data Preperation” section it shows 8 input variables and 1 output variable. Can you explain what I’m missing here?

My other question is about the updated example when you’re using multiple lag timesteps. Why do we not drop the columns for all the other fields like in the original example with one timestep?

Best regards,
Andreas

Reply
- Jason Brownlee March 21, 2019 at 8:19 am #
  
  Hi Andreas, the features/timesteps aspect of LSTMs can be confusing, I think this will help:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Yes, I have a few more advanced examples, you can get started here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
  - Andreas March 21, 2019 at 4:39 pm #
    
    Thank you for your reply. I’ll check the links you provided.
    
    However, I still wonder in this particular case: How many features are used, 7 or 8? And why are columns dropped in the first example and not in the example with multiple lag timesteps? I mean, we want to predict t1 in both cases so why not drop the columns in both examples?
    
    Thank you for your time,
    Andreas
    
    Reply
    - Jason Brownlee March 22, 2019 at 8:22 am #
      
      One feature is categorical, we drop it for simplicity of modeling.
      
      Reply
mk123qwe March 22, 2019 at 7:24 pm #

Where can I find the persistence model result? The persistence model I tried myself can only reach 80.

Reply
- Jason Brownlee March 23, 2019 at 9:20 am #
  
  This post may help:
  https://machinelearningmastery.com/how-to-grid-search-naive-methods-for-univariate-time-series-forecasting/
  
  Reply
Fan March 27, 2019 at 7:52 am #

Hey, Jason, I have a clarifying question. I think LSTM will automatically decide what previous data will be used, and there will be no need for an LSTM model for multiple lag timesteps. This is also the reason why the model with multiple lag timesteps has a bad performance.

Reply
- Jason Brownlee March 27, 2019 at 9:08 am #
  
  This can be the case.
  
  You can choose to use a dynamic RNN and have the model figure this out, or use a large fixed sized input for efficiency reasons and have the model figure it out – either way.
  
  Reply
Shubha March 27, 2019 at 9:27 pm #

Hi Jason,

Is it always necessary to frame the Dataset as a supervised learning problem ? Do we have any alternative approach where we do not need to frame the dataset as a supervised Learning problem. I am trying to implement a solution which has around 50 Input features. Even , If I try 10 time steps , then my input would become very huge. Please let me know if there is any alternate approach.

Thanks,
Shubha

Reply
- Jason Brownlee March 28, 2019 at 8:11 am #
  
  Yes, always.
  
  Sometimes, the library will do it for you, in the case of some of the linear models like ARIMA.
  
  You can try modeling less data, try a simpler model, or use a larger/faster machine?
  
  Reply
Dharmendra Sahani March 29, 2019 at 7:01 pm #

Hi Jason,

Your articles are helpful, Thank you so much. Need your help.

My data looks like this

Date Iron Copper Aluminium Zinc Lead
1-Jan-16 345 254 453 542 645
1-Feb-16 346 255 460 575 646
1-Mar-16 347 256 461 576 647
1-Apr-16 348 257 456 545 648
1-May-16 349 583 457 546 649

How do I input this data in LSTM Timeseries for price prediction of each material. Please advice.

Thank you

Reply
- Jason Brownlee March 30, 2019 at 6:25 am #
  
  You can learn how to prepare data for LSTMs here:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-prepare-my-data-for-an-lstm
  
  Reply
Dharmendra Sahani March 30, 2019 at 8:36 pm #

Thanks Jason, really helpfull

Reply
- Jason Brownlee March 31, 2019 at 9:28 am #
  
  I’m glad it helped.
  
  Reply
Jerry Pan April 4, 2019 at 12:01 pm #

Hi Jason,

First of all. thanks for your great tutorial.

And I wonder that Can we apply this model to the future dates that are not even included in testing data?

I mean, for example, that’s say

now is April 3rd, so the testing data is only until April 3rd
from the similar air pollution data that you use in this tutorial.

But Can I predict the “PM2.5 concentration” in May or June?

and What code should I change to predict for the far future?

(PM2.5 concentration in May (future), which is not even in the testing data)

In other words, the shape of input is different.
Can I use only time (future date) as input to get the output (PM2.5 concentration)
in this trained LSTM model?

Thank you so much.

Reply
- Jason Brownlee April 4, 2019 at 2:13 pm #
  
  Yes, you can make out of sample forecasts by calling model.predict()
  
  You can learn more here:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  You can also make multi-step forecasts, I have examples here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
  - Jerry Pan April 5, 2019 at 7:50 am #
    
    Hi Jason,
    
    I read those articles, but that’s not what I asked.
    
    I mean, if now (April 4th) I want to predict the air pollution, PM2.5 concentration, in May 1st,
    I don’t know any other variable in May 1st.
    
    (Like I don’t know the temperature or wind speed in May 1st in the future)
    
    All I know is the index, which is May 1st,
    and other columns like temperature or wind speed in the future is unknown.
    
    So, What kind of input should I put into “yhat = model.predict( ??? )”
    the future input for May 1st, X, is actually unknown,
    
    I only know the time index.
    And the “X.shape” is totally different.
    
    Can I still make prediction in May 1st when all the other variables are unknown?
    
    Or should I use ARIMA to predict the future temperature and wind speed in May 1st,
    and then use these “ARIMA predicted variables” as the input to put into LSTM??
    
    Thank you so much.
    
    Reply
    - Jason Brownlee April 5, 2019 at 1:59 pm #
      
      Yes, you can frame the problem any way you want, e.g. you can define what inputs and outputs you want use for the model, then train it for your use case.
      
      I am encouraging you to prototype a few different solutions or different framings of the problem to see what works best for your specific dataset. I am linking to the posts to help you prepare those prototypes.
      
      Always start with a linear model, often a neural net cannot out perform it.
      
      Reply
    - Techai August 1, 2021 at 3:55 pm #
      
      Hi Jerry,
      
      I’m also having this problem in my use case, Since we don’t know the exact input feature values for the future, how we can predict our output.
      So could you please suggest to me the solution that worked for you?
      
      Thanks
      Techai
      
      Reply
      - Jason Brownlee August 2, 2021 at 4:52 am #
        
        Design your model to only take as input the data that is available at prediction time.
        
        Or use predictions as input, called the recursive approach to forecasting.
prince April 4, 2019 at 7:33 pm #

thanks for the tutorial …used it on solar energy prediction and its working great… wanted to how how i can modify it to have more than one output

Reply
- Jason Brownlee April 5, 2019 at 6:14 am #
  
  Well done!
  
  I call this multi-step forecasting, and I have some examples here:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
Pooja April 5, 2019 at 6:39 pm #

How will the input array be if i consider categorical data ? Something like this : [value value…..[0 1]] ? How will i model if i have categorical data as one of the feature in the input?

Reply
- Jason Brownlee April 6, 2019 at 6:44 am #
  
  I would recommend using an integer encoding, one hot encoding or an embedding for categorical variables.
  
  Reply
Jack April 8, 2019 at 1:51 am #

Hi Jason

Thank you for your great code and articles. I tried to use the code in the article for my study project.

My data set has 6,913 columns, 14 columns, the first column is time data (df[0]), and the format is datetime.

I want to do multivariate single-step prediction, the target field is in column 6 from the left (df[5])

But try to use your code and always run out of the predicted value in column 2 (df[1])

How can I modify the code to achieve this?

Reply
- Jack April 8, 2019 at 1:57 am #
  
  correct the mistake:
  Is 6913 rows × 14 columns
  
  By the way, I always get a high RMSE value during the training. Is there any suggestion for improvement?
  
  Reply
  - Jason Brownlee April 8, 2019 at 5:56 am #
    
    Yes, perhaps scale the data prior to modeling.
    
    Reply
- Jason Brownlee April 8, 2019 at 5:56 am #
  
  Perhaps you can prepare your data such as that the column you want to predict is on the end of the data frame?
  
  Reply
Jack April 8, 2019 at 5:35 pm #

Thank you for your reply.

I have moved the target field to the end of the DataFrame. What should I do next?
Is the code that needs to be modified located inside the series_to_supervised function?

About my data set, it is from a variety of environmental sensors, 1 per hour, from 2018/06/01 to the present, there may be some zero value or missing in the middle, stored in MongoDB.

The goal of the problem is to consider the past 6 hours to predict the soil moisture in the next hour.
(I am also learning time series multivariate multi-step predictions to predict more time in the future, I wonder if there are suggestions for reading?)

Is this parameter correct?
Series_to_supervised(scaled, 6, 1)

The test data used in this code has been scaled by MinMaxScaler, but still get high RMSE values, and what else can I do?

My question is a bit long, I hope I can get some suggestions from you, thank you!

Reply
- Jason Brownlee April 9, 2019 at 6:21 am #
  
  Yes, the usage of the function looks good, if the data is hourly.
  
  Perhaps you need to try alternate models and model configurations, my best advice is to start here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Abdullah Kahraman April 9, 2019 at 7:00 am #

We are trying to run this code with a 4-variable-data. One of the variables is the observed wind speed, and other three are output from an atmospheric model (wind speed at different levels). What we have modified are the following lines:

groups = [0, 1, 2, 3, 4]

values[:,3] = encoder.fit_transform(values[:,3])

reframed.drop(reframed.columns[[4,5,6]], axis=1, inplace=True)

n_train_hours = 20 * 72

When we define the groups like above, we have an error (IndexError: index 4 is out of bounds for axis 1 with size 4), and we can not have the last graph plotted. When we have only 0, 1, 2, and 3 in the “groups” line, then we have the graph without errors; but the values in our “dataset” are modified strangely.

Would you think it works fine although the dataset values are modified?

Reply
- Jason Brownlee April 9, 2019 at 2:37 pm #
  
  Sorry, I don’t have the capacity to prepare custom code.
  
  Perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
Rohan April 9, 2019 at 11:09 am #

Respected sir,

I have a doubt regarding lag.
I am working on a project to use lstm to model rainfall – runoff
My input features (X) are – rainfall, min temperature, max temperature
My output (y) – runoff
total 4 columns of data

But the problem is if I am trying to predict runoff at time step t, the train_X before 3D has input features of time step up to (t-1) only.

For example if I want to predict feature ‘a’ using ‘b’,’c’,’d’ features and if I use lag as 1:
your code goves train_x before 3D as a(t-1), b(t-1), c(t-1), d(t-1) (4 columns)
and train_y as a(t)
I want train_x as a(t-1), b(t-1), c(t-1), d(t-1), b(t),c(t),d(t) (7 columns)
and train_y as a(t)

So, when I ran ypur model and tested on test data, the output looks like shifted.
A baseline model predicting at timestep ‘t’ as ‘(t-1)’ performs similarly.
Other algorithms like mlp, xgboost using current time step inputs (7 columns) performed much much better.

So, my question is how can I incorporate current time step (t) input features for predicting at (t).

Thank you.

Reply
- Jason Brownlee April 9, 2019 at 2:40 pm #
  
  This post might help, and you may have to manually curate the resulting array to ensure the data has the desired structure:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  Reply
jill April 10, 2019 at 5:51 am #

I’m confused about the X input shape.

In your previous tutorial, you state:

# Samples (one sequence = one sample)
# Timesteps (one timestep = one point of observation in the sample)
# Features (one feature = one observation at at time step)

However, in this tutorial, we are now setting timestep=1 (to fit the model on the first year of data). Doesn’t one year of data represent one sample? Then each sample within that year of data would represent a timestep?

I was expecting the shape to be (1, 8760, 8) instead of (8760, 1, 8).

Reply
- Jason Brownlee April 10, 2019 at 6:15 am #
  
  This will help:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
Amy April 10, 2019 at 8:30 am #

Hi Jason,

I just wanted to clarify: you use walk-forward validation in this example right? (Or is it a separate implementation?) I know you mention using walk-forward validation in other LSTM examples (e.g. you power consumption tutorial)… is it the same case with this tutorial?

Thank you so much!

Reply
- Jason Brownlee April 10, 2019 at 1:44 pm #
  
  Yes, I believe I use walk-forward validation for almost all LSTM demonstrations.
  
  Reply
  - André de Sousa Araujo September 6, 2020 at 4:24 am #
    
    Hi Jason,
    First, thank you for this amazing tutorial!
    I don’t understand how you use walk-forward validation here in this experiment. How the model.fit() did this implicit? Is it a Keras feature when you pass some subset TEST to validate?
    
    Reply
Evan April 10, 2019 at 3:55 pm #

The 16 lines of code which plot the “Line Plots of Air Pollution Time Series” can be cut to 5 lines:

from pandas import read_csv
from matplotlib import pyplot
dataset = read_csv(‘pollution.csv’, header=0, index_col=0).drop([‘wnd_dir’],axis=1)
dataset.plot(subplots=True)
pyplot.show()

Love your blog!

Reply
- Jason Brownlee April 11, 2019 at 6:28 am #
  
  Thanks Evan!
  
  Reply
jessy April 13, 2019 at 8:57 am #

sir,
i have above code.i am getting an error.
OSError Traceback (most recent call last)
in ()
14 print(dataset.head(5))
15 print(“||”*40)
—> 16 dataset.to_csv(‘F:\General dataset\rawpollution.csv’)

C:\Users\Tanu\Anaconda3\lib\site-packages\pandas\core\frame.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal, **kwds)
1342 doublequote=doublequote,
1343 escapechar=escapechar, decimal=decimal)
-> 1344 formatter.save()
1345
1346 if path_or_buf is None:

C:\Users\Tanu\Anaconda3\lib\site-packages\pandas\formats\format.py in save(self)
1524 f = _get_handle(self.path_or_buf, self.mode,
1525 encoding=self.encoding,
-> 1526 compression=self.compression)
1527 close = True
1528

C:\Users\Tanu\Anaconda3\lib\site-packages\pandas\io\common.py in _get_handle(path, mode, encoding, compression)
422 f = open(path, mode, encoding=encoding)
423 else:
–> 424 f = open(path, mode, errors=’replace’)
425 else:
426 f = open(path, mode)

OSError: [Errno 22] Invalid argument: ‘F:\\General dataset\rawpollution.csv’

Reply
- Jason Brownlee April 13, 2019 at 1:46 pm #
  
  Sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
jessy April 16, 2019 at 4:41 pm #

sir,
above code you are calculated rmse value and you suggested not good value.. what would be the rmse value…and why cant we use mse for above problem

Reply
- Jason Brownlee April 17, 2019 at 6:54 am #
  
  Great question, I answer it here:
  https://machinelearningmastery.com/faq/single-faq/how-to-know-if-a-model-has-good-performance
  
  Reply
Ali April 18, 2019 at 8:35 pm #

Hello Jason,

I want to use LSTM-RNN for a large data with 4.4GB. The first 27 signals I want to use as input and the 28th signal as output. I load all the packages that I need for the network. As backend I use TensorFlow. I have a dataframe shape of 21607359, 28. All NaN-values are removed.
I use the “def series_to_supervised (data, n_in=1, n_out=1, dropnan=True)” function. n_vars=1. I load the data and normalize the features with “scaler = MinMaxScaler (feature_range=(0, 1)). After this I use the command “scaled = scaler.fit_transform(values).” I frame the data as supervised learning. After that I drop all columns I don´t want to predict with the command “reframed.drop(reframed.columns[[1,2,3,4 etc.]”. But they are shown me after printing.
The next step is that I split the data into train and test sets:
values = reframed.values
n_timestep = 100
n_train_time = 14260860
train = values[:n_train_time, :]
test = values[n_train_time:, :]
# split into inputs and output
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_Y = test[:, :-1], test[:, -1]
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], n_timesteps, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], n_timesteps, test_X.shape[1]))
After that I print it. But an error message appears: ValueError: cannot reshape array of size 2795128560 into shape (14260860, 100, 196)

My questions are:

1.) Why are the inputs listed although I removed them with reframed.drop(reframed.columns? How can I remove them?

2.) Why does the error message appears? How can I solve this problem?

3.) I want to test different timesteps. How can I do it? With which command?

I searched a lot but couldn´t find anything. I hope you can help me. I´m in a very bad situation now.

Thanks a lot.

Kind regards

Ali

Reply
- Jason Brownlee April 19, 2019 at 6:08 am #
  
  You may beed to reshape the data into sequences of about 200-400 time steps.
  
  This post will give you some advice:
  https://machinelearningmastery.com/prepare-univariate-time-series-data-long-short-term-memory-networks/
  
  Also, there’s more help here:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Let me know how you go.
  
  Reply
  - Ali April 24, 2019 at 3:00 pm #
    
    Hello Jason,
    
    thank you for your answer. I am not sure which method is the right for my problem.
    
    I have 27 measured signal values. These signals shall predict one output signal which was also measured. The output signal has values of 0 and 1. 0 is the “healthy” state and 1 is the “unhealthy” state.
    
    The problem is that I do not know the relationship between each input signal. I want to see the order of influence of the input signals to the output signal and want to predict the output signal.
    
    Each signal is a column and the values to each signal are in the rows. I have nearly 22 million rows.
    
    I want to make predictions for example 1 month into the future.
    
    Shall I use multivariate time series with multi-step forecasting or univariate time series with multi-step forecasting? What would you recommend?
    
    Thanks a lot.
    
    Kind regards
    
    Ali
    
    Reply
    - Jason Brownlee April 25, 2019 at 8:05 am #
      
      Perhaps let the models learn any relationship if it exists. Start with something really simple like a RandomForest and then review what features are used/ignored. That would be a great start.
      
      I recommend testing a suite of methods. Start with a naive forecast, then a linear, then explore MLP, CNN, LSTM and hybrids. Discover what works best for your specific problem.
      
      Reply
sinh nguyen phuoc April 19, 2019 at 8:09 pm #

Hello Jason Brownlee,
I work in the field of hydraulics, currently handling the issue of flood control on the river. There are 2 hydropower plants in the upstream branch and 1 downstream Dischare-Gage . At 3 points, I have flow time data on a few yeah, with 15 minimum time steps, I named it Q1, Q2 and Q3 ( flow data ~ time)
Based on ideas from your article here:
https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/

I built a model to forecast the flow at the downstream Q3 base on data Q1, Q2, and Q3 in the previous 3 days.
The model runs and gives pretty good results.

From here, I wonder, is there any method to determine the optimal Q1 and Q2 process so that Q3 satisfies a certain condition, in this case max (Q3) and the volume of flood is minimum as possible.
Thank you.

Reply
- Jason Brownlee April 20, 2019 at 7:34 am #
  
  Well done.
  
  Good question. My first thought would be to perform a sensitivity analysis to try to understand how the different data/processes impact the model.
  
  Reply
jessy April 23, 2019 at 9:20 pm #

hi jason ,
how can we give multiple input to different layers(lstm and dense layer)…i have seen ur blog with two input into dense layer..

Could u tell me that processing data in lstm layer and constant data in the dense and concatenate these two

Reply
- Jason Brownlee April 24, 2019 at 7:59 am #
  
  This might help as a start:
  https://machinelearningmastery.com/keras-functional-api-deep-learning/
  
  Reply
  - jessy April 24, 2019 at 10:49 am #
    
    thanks ,awesome post …In the above link your using numbers..can i use csv file process the data into two different layers…
    
    Reply
furkan April 24, 2019 at 2:33 am #

hi Jason,
I’m working on bitcoin price predict with multiple input LSTM. I have some issues. that’s my codes:

import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import math

from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error

from tensorflow.python.framework import ops

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM

data=pd.read_excel(“C:\\Users\\user\\Desktop\\spyder veri\\son.xlsx”)

fige=plt.figure(figsize=(8,5))

dataset = data.values
dataset = dataset.astype(‘float32′)

scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)

train_size = int(len(dataset) * 0.70)
test_size = len(dataset) – train_size
train, test = dataset[0:train_size, :], dataset[train_size:len(dataset), :]
print(len(train), len(test))

def create_dataset(dataset, look_back=1):
dataX, dataY = [], []
for i in range(len(dataset)-look_back-1):
a = dataset[i:(i+look_back), 0]
dataX.append(a)
dataY.append(dataset[i + look_back, 0])
return np.array(dataX), np.array(dataY)

look_back = 1
trainX, trainY = create_dataset(train, look_back=look_back)
testX, testY = create_dataset(test, look_back=look_back)

# trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
# testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
trainX = np.reshape(trainX, (trainX.shape[0], trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], testX.shape[1]))
# print((trainX.shape[0], trainX.shape[1]))
# print(“trainY=”,trainY,”\n”)
# print(“trainY.shape[0]=”,trainY.shape[0],”\n”)
# print(“trainX=”,trainX,”\n”)
# print(“testX=”,testX,”\n”)
# print(“testY”,testY,”\n”)
# print(“trainX.shape[0]=”,trainX.shape[0],”\n”)
# print(“trainX.shape[1]=”,trainX.shape[1],”\n”)
# print(“testX.shape[0]=”,testX.shape[0],”\n”)
# print(“testX.shape[1]”,testX.shape[1],”\n”)
# #print(“scaler.inverse_transform([trainY]=”,scaler.inverse_transform([trainY]))
# print(“trainX, (trainX.shape[0], trainX.shape[1])=”,trainX, (trainX.shape[0], trainX.shape[1]),”\n”)
# print(“testX, (testX.shape[0], testX.shape[1])=”,testX, (testX.shape[0], testX.shape[1]))
model = Sequential()
# model.add(LSTM(40, input_shape=(1, look_back)))
# model.add(Dense(1))
# model.compile(loss=’mean_squared_error’, optimizer=’adam’)
# model.fit(trainX, trainY, epochs=10000, batch_size=256, verbose=2)
model.add(Dense(40, input_dim=1, activation=’relu’))
# model.add(Dense(20, activation=’relu’))
model.add(Dense(1, activation=’linear’))
model.compile(loss=’mean_squared_error’, optimizer=’adam’)
model.fit(trainX, trainY, epochs=1000, verbose=2)

trainPredict = model.predict(trainX)
testPredict = model.predict(testX)

trainPredict = scaler.inverse_transform(trainPredict)
trainY = scaler.inverse_transform(trainY)
testPredict = scaler.inverse_transform(testPredict)
testY = scaler.inverse_transform(testY)

trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:, 0]))
print(‘Train Score: %.2f RMSE’ % (trainScore))
testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:, 0]))
print(‘Test Score: %.2f RMSE’ % (testScore))

trainPredictPlot = np.empty_like(dataset)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[look_back:len(trainPredict) + look_back, :] = trainPredict

testPredictPlot = np.empty_like(dataset)
testPredictPlot[:, :] = np.nan
testPredictPlot[len(trainPredict) + (look_back * 2) + 1:len(dataset) – 1, :] = testPredict

plt.plot(data[‘bitcoin’], label=’Actual’)
plt.plot(pd.DataFrame(trainPredictPlot, columns=[“close”], index=data.index).close, label=’Training’)
plt.plot(pd.DataFrame(testPredictPlot, columns=[“close”], index=data.index).close, label=’Testing’)
plt.plot(‘Train Score: %.2f RMSE\n\n’ % (trainScore))
plt.plot(‘\n\nTest Score: %.2f RMSE’ % (testScore))
plt.legend(loc=’best’)
plt.subplots_adjust(left=0.30,wspace=0.90,hspace=0.40)
plt.show()
fige.savefig(‘fig9.png’)

The error is: non-broadcastable output operand with shape (24,1) doesn’t match the broadcast shape (24,3)

Reply
- Jason Brownlee April 24, 2019 at 8:07 am #
  
  Sorry, I don’t have the capacity debug your code.
  
  Reply
Evan April 24, 2019 at 4:42 pm #

If you don’t use all the features in the general (i.e. last) example, lines 63,64 will be problematic

train_X, train_y = train[:, :n_obs], train[:, -n_features]
test_X, test_y = test[:, :n_obs], test[:, -n_features]

For instance, if you only want to use the first 2 features, and naively enter n_features = 2 and run the code, your network will effectively be trying to predict var7(t) and var8(t) from

var1(t-3) var2(t-3) var3(t-3) var4(t-3) var5(t-3) var6(t-3)

instead of predicting var1(t) and var(2) from var1(t-3) var2(t-3), var1(t-2) var2(t-2) , var1(t-1) var2(t-1)

which is what people would probably expect.

You can check this by changing n_features = 2 and running the first 69 lines of the last example. Observe that the first row of train_X is equal to the first 6 elements of the first row of reframed, i.e the var1(t-3) var2(t-3) var3(t-3) var4(t-3) var5(t-3) var6(t-3) elements.

Reply
- Jason Brownlee April 25, 2019 at 8:08 am #
  
  Thanks.
  
  Reply
Thomas April 25, 2019 at 2:27 am #

Hello,

Thank you for that very interesting article.

I am curious as to why when I set the test values ( here [n_train_hours:, :]) in the CSV to to some arbitrary value, then the prediction does not work anymore.

If I only keep the dates valid in the test set, and run the prediction, the predicted values have nothing to do with what was predicted if the test values are left untouched.

Shouldn’t the prediction of the test part be the same regardless of the content of the CSV?

Thanks

Reply
- Jason Brownlee April 25, 2019 at 8:24 am #
  
  Not sure I follow sorry. Perhaps there is a fault in your experiment or in the tutorial?
  
  Perhaps these simpler examples might be a better starting point:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
jessy April 25, 2019 at 10:21 am #

hi jason,
how to handle time irregularities in time series data (i.e i am having data like 2006,2007,2009) here 2008 data are missing how to handle it.could u suggest me an idea

Reply
- Jason Brownlee April 25, 2019 at 2:39 pm #
  
  I have some suggestions here:
  https://machinelearningmastery.com/handle-missing-timesteps-sequence-prediction-problems-python/
  
  Reply
  - jessy April 26, 2019 at 10:28 am #
    
    thanks…………great
    
    Reply
jessy April 25, 2019 at 10:35 am #

hi jason
,
in all time series problem you are using walk forward validation method ,is that necessary to use walk forward validation method to valid the model…

Reply
- Jason Brownlee April 25, 2019 at 2:40 pm #
  
  Yes. More details here:
  https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
  
  Reply
  - jessy April 26, 2019 at 10:40 am #
    
    thanks for your reply…….really useful….
    
    Reply
Bob April 25, 2019 at 3:30 pm #

Hi, Jason. Lately I found a big question which troubled me a lot time. LSTM and XGBoost, LightGBM, they all are the prediction algorithms, but what are the advantages and disadvantages between them, and when use them in different scenes? I have been pondering for a month, still do not understand very well, I hope to get your professional answers here.

Reply
- Jason Brownlee April 26, 2019 at 8:24 am #
  
  The best way to consider the differences across multiple algorithms is by evaluating their performance on your specific problem.
  
  An algorithm is only “good” or “useful” if it makes good predictions for your dataset.
  
  Does that help?
  
  Reply
Owen April 26, 2019 at 8:31 am #

Hi Jason,

First of all, thank you so much for all your posts! I’m picking up Python for ML and your blogs helped me a lot! I have two questions about this tutorial: 1. Is there a specific reason that you picked a batch size of 72? Or is it just an arbitrary number? 2. It looks like you fit transformed all the values including the test data. I thought you should just transform instead of fit transform on the test data. Otherwise, you are assuming you would know about future behavior. Am I missing something?

Reply
- Jason Brownlee April 26, 2019 at 8:42 am #
  
  Not really, it is arbitrary after some trial and error.
  
  Yes, I typically transform all data in one step (data leakage!) for brevity in the tutorials.
  
  Reply
Emre April 28, 2019 at 6:32 am #

Hi jason,

Thanks for a good tutorial.

I wonder it is it possible to plot (show) future data after training.

For example we have upto 2019 data but we try to show 2020 outputs is it possible ?

with LSTM if it is could you explain it ? Thanks a lot

Reply
- Jason Brownlee April 28, 2019 at 6:58 am #
  
  Sure, use the model to make a prediction via model.predict() then create a line plot of the result.
  
  Reply
  - Emre April 30, 2019 at 5:36 am #
    
    Thanks for really quick reply, i called this funcion it is also giving me same type of the previous result, my question is for example we have pollution value as a 126 after the prediction it gives us only error rate not the value of the polluiton,
    
    do we need to apply |Approximate Value − Exact Value| / |Exact Value | = error rate
    
    So from here we can handle the real value but we did for 50 epoch, and prediction also creates 50 epoch is it in terms of hours, days, years ?
    
    I’m a bit confused about this point we’re handling values what are purpose of these values ?
    
    If I am not cleare please let me know, thank you for sharing your time with us, you’re really good person i’m thankfull.
    
    Reply
    - Emre April 30, 2019 at 5:54 am #
      
      I got it now i guess,
      
      after model.predict() call we got 50 error rate and each step is next hour the before one,
      
      So
      
      fore ex: first value is : 0.83487886
      
      after multiply 0.83487886*(pollution)=predicted pollution next hour
      Thank you so much again.
      
      Reply
ranran April 30, 2019 at 3:28 pm #

Hello, would you please tell me whether this experiment is static prediction or dynamic prediction? The results of the experiment I made turned out to be very accurate, so I guess it used all the previous real value predictions — static predictions. Is my guess correct?

Reply
- Jason Brownlee May 1, 2019 at 6:58 am #
  
  What do you mean by static and dynamic exactly?
  
  Reply
  - ranran May 2, 2019 at 1:36 am #
    
    Static prediction refers to the use of the actual values of all previous sequences in the prediction of the next point, while dynamic prediction refers to the use of the real values of the previous training set and the predicted values of the test set in the prediction of the next point. In other words, static prediction is a one-step time series prediction, constantly adding actual values to predict the next point. Thank you very much!!!
    
    Reply
    - Jason Brownlee May 2, 2019 at 8:05 am #
      
      Sounds like a recursive forecasting model, described here:
      https://machinelearningmastery.com/multi-step-time-series-forecasting/
      
      Reply
  - ranran May 2, 2019 at 1:43 am #
    
    I conducted experiments according to your method and found that the prediction accuracy was too accurate. Random factors are also accurately predicted, so I suspect it is a static prediction, using real data to predict one step forward. Because I am a beginner, also hope you can explain some more, thank you!!
    
    Reply
    - Jason Brownlee May 2, 2019 at 8:06 am #
      
      It sounds like your model has overfit the training data, you can learn more about this here:
      https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/
      
      Reply
      - ranran May 2, 2019 at 4:33 pm #
        
        Yes, you are right! Thank you very much!!
  - Emin May 2, 2019 at 4:50 am #
    
    I think he means static and dynamic branch prediction that is used in computer architecture to handle control hazards. Has nothing to do with LSTM or any other ANN.
    
    Reply
sourabhxiii May 7, 2019 at 3:17 am #

A small bug exists!

df = DataFrame(data) # this is supposed to be the aggregated DataFrame object

But agg was used in the following code.

agg = concat(cols, axis=1)
agg.columns = names

Reply
- sourabhxiii May 7, 2019 at 3:21 am #
  
  Ignore it please!
  
  Reply
  - Jason Brownlee May 7, 2019 at 6:20 am #
    
    No problem.
    
    Reply
Yao May 10, 2019 at 10:00 pm #

Thanks for your post! I learnt a lot about using LSTM in keras. I have a question about the output dimension. Can I use LSTM to predict a whole sequence rather than a value? For example,
the lag is set to 1 and the output step is also set to 1, can we train lstm as the following:
X=[feature1(t-1),feature2(t-1),feature3(t-1)] and Y = [feature1(t),feature2(t),feature3(t)], I predict Y using X. I have tried this by predicting a 3d curve which consists of (x,y,z), the result is not so good as what I expected..

Reply
- Jason Brownlee May 11, 2019 at 6:15 am #
  
  Yes, either directly or via recursive use of the model.
  
  Perhaps start here:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
Chris May 13, 2019 at 4:19 pm #

Dear Jason,
thank you very much for all the posts on your site. Programming on hobby-basis only, I’ve really learnt a lot about ml thanks to you.
Able to combine different examples on your site, I’m running into troubles changing the batch size and implement a multivariate input for this example, even if it looks straight forward to do this, since you are reusing functions from other posts.
Could you please give me a hint where to start?

Reply
- Chris May 13, 2019 at 7:18 pm #
  
  Sorry, wrong post – this is the correct one:
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
- Jason Brownlee May 14, 2019 at 7:39 am #
  
  What batch sizes have you tried and what issues are you getting?
  
  Reply
  - Chris May 15, 2019 at 4:01 pm #
    
    this is what I get if I increase n_batch from 1 to 2:
    —————————————————————————
    ValueError Traceback (most recent call last)
    in
    168 model = fit_lstm(train, n_lag, n_seq, n_batch, n_epochs, n_neurons)
    169 # make forecasts
    –> 170 forecasts = make_forecasts(model, n_batch, train, test, n_lag, n_seq)
    171 # inverse transform forecasts and test
    172 forecasts = inverse_transform(series, forecasts, scaler, n_test+2)
    
    in make_forecasts(model, n_batch, train, test, n_lag, n_seq)
    99 X, y = test[i, 0:n_lag], test[i, n_lag:]
    100 # make forecast
    –> 101 forecast = forecast_lstm(model, X, n_batch)
    102 # store the forecast
    103 forecasts.append(forecast)
    
    in forecast_lstm(model, X, n_batch)
    89 X = X.reshape(1, 1, len(X))
    90 # make forecast
    —> 91 forecast = model.predict(X, batch_size=n_batch)
    92 # convert to array
    93 return [x for x in forecast[0, :]]
    
    ~/anaconda3_501/lib/python3.6/site-packages/keras/engine/training.py in predict(self, x, batch_size, verbose, steps)
    1167 batch_size=batch_size,
    1168 verbose=verbose,
    -> 1169 steps=steps)
    1170
    1171 def train_on_batch(self, x, y,
    
    ~/anaconda3_501/lib/python3.6/site-packages/keras/engine/training_arrays.py in predict_loop(model, f, ins, batch_size, verbose, steps)
    300 outs.append(np.zeros(shape, dtype=batch_out.dtype))
    301 for i, batch_out in enumerate(batch_outs):
    –> 302 outs[i][batch_start:batch_end] = batch_out
    303 if verbose == 1:
    304 progbar.update(batch_end)
    
    ValueError: could not broadcast input array from shape (2,3) into shape (1,3)
    
    It might be a stupid simple solution for this, but i can’t figure out where to start.. Sorry to ask..
    
    Reply
    - Jason Brownlee May 16, 2019 at 6:27 am #
      
      I recommend starting here:
      https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
      
      Reply
      - Chris May 17, 2019 at 12:25 am #
        
        Great, exactly what I was searching for!
        Thank you very much!
      - Jason Brownlee May 17, 2019 at 5:56 am #
        
        I’m happy it helped Chris.
Jor May 15, 2019 at 6:42 am #

How would you modify the LSTM if there is forecast available for one of the variables ?

Reply
- Jason Brownlee May 15, 2019 at 8:20 am #
  
  It would be another input series, e.g. another feature.
  
  Reply
MrHou May 15, 2019 at 9:16 pm #

Your post is very helpful to me, thank you very much! I have a problem, in fact, we know that the pollution at time t is not only related to the characteristics of time t-1, but also related to some characteristics (such as temperature) of the current time. If I predict this, when I consider more than 1 hour Enter the time step (such as 3), my X does not seem to be reshape to fit the LSTM input format requirements, because like the example above, 24 X corresponds to a y, we can reshape X to (3, 8), and now X has become 24+7=31, I don’t know how to reshape X, please help me answer it, thank you very much again.

Reply
- Jason Brownlee May 16, 2019 at 6:31 am #
  
  Perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
  - MrHou May 17, 2019 at 6:48 pm #
    
    I saw the link you sent me. I think I can distinguish between samples, timesteps, and features, but I still don’t know how to answer my question. It may be that I am in some sort of dilemma. Just like the multi-step lag example in the tutorial, if I want to consider the meteorological features at time t, the total number of features becomes 3*8 + 7, then how do I reshape the input data to meet the requirements of the LSTM model. Can you help me answer it, thank you very much again.
    
    Reply
    - Jason Brownlee May 18, 2019 at 7:36 am #
      
      If you have weather data at time t as input for forecasting another variable also at time t, then there are many ways to frame this problem, no single best way.
      
      One approach might be to keep all input series in sync, including lags for the target feature, then use zero padding input for time t for the target feature, and a masking layer to ignore it.
      
      Reply
Sooraj Raj May 16, 2019 at 1:31 pm #

Hello Jason,
I am PhD student studying time series prediction and your book Deep Learning for time series forecasting helped me getting my first ever model for time series prediction.
I am now exploring Wavenets and do you know if a Keras sequential model like below will implement a wavenet architecture?

self.model = Sequential()
self.dilation_rates = [2**i for i in range(8)]

for dilation_rate in self.dilation_rates:
self.model.add(Conv1D(filters=64, kernel_size=3, padding=’causal’,
dilation_rate=dilation_rate,
input_shape(self.train_x.shape[1],self.train_x.shape[2])))

Reply
- Jason Brownlee May 16, 2019 at 2:26 pm #
  
  Sorry, I don’t have examples of working with wavenets, I hope to cover the topic in the future.
  
  Reply
Sooraj Raj May 16, 2019 at 2:43 pm #

Thank you,Jason.

Reply
- Jason Brownlee May 17, 2019 at 5:49 am #
  
  You’re welcome.
  
  Reply
Samudranil Roy May 17, 2019 at 6:11 pm #

I have time series 10 datafiles. Out of which I am training a LSTM model with 5 datafiles, validation using 3 files and test using 2 files. I have used fit_generator from Keras and have written one generator function for both of the training and validation dataframes. But unfortunately during prediction it’s initial predictions are very higher than original target.

On the other side if I use model.fit for each dataframe then comparatively I am getting better result. My question is is it right approach for time series data where each of the datafiles are separate (e.g, each contains ratings from 0 hr to 24 hrs) to use fit on each iteration for each of the datafiles?

for scaled_dataset in training_list:

reframed_new = series_to_supervised(scaled_dataset, n_in, n_out)

values = reframed_new.values
train = values
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
model.fit(train_X, train_y, epochs=50, batch_size=475, validation_data=None, verbose=1, shuffle=False)

In my code training list contains all the separate 5 dataframes. So in each iteration I am fitting one model. Can anyone please tell me if it’s right approach or not thanks in advance

Reply
- Jason Brownlee May 18, 2019 at 7:33 am #
  
  Perhaps, as long as you are not training on the future and testing on the past.
  
  Reply
Eric Jin May 19, 2019 at 3:45 am #

Hi Jason,

Thanks a lot for this article! It really helps me a lot. I am wondering if you have any articles or suggestions about 1) how to split train, evaluation, and test sets for time series data and 2) recommended models for multi-target time series regression.

Specifically, I am concerned about using skin elongation to predict human shoulder movements, which are expressed in Euler angles. Therefore, having the machine learning models to understand the dependencies of the three Euler angles is very useful, but I currently don’t know how to do.

I am currently using the beginning 80% of a period of recorded motion as training set and last 20% as testing set and treat three Euler angle outputs as independent variables (which is not ideal). I have tried various models including linear regression, various boosting, MLP, and LSTM. Surprisingly, MLP and LSTM gave me similar if not worse results than linear regression. Any insights on what might be causing this?

Thanks a lot!

Best,
Eric

Reply
- Jason Brownlee May 19, 2019 at 8:06 am #
  
  The split of the data is really dependent on your data and how you intend to use the model.
  
  The goal is to have a test harness that best simulates how to expect to use the model in practice.
  
  LSTMs are generally poor at time series, I recommend testing a range of models, I think this will help:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
  - Eric Jin May 19, 2019 at 11:53 am #
    
    Thanks a lot!
    
    Reply
    - Jason Brownlee May 20, 2019 at 6:25 am #
      
      You’re welcome.
      
      Reply
Sanjay May 21, 2019 at 3:59 am #

Hi Jason ,

The article is very informative . I have been going through your different posts . You mentioned an alternate formulation ” Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour. ” I am currently working on a similar forecasting formulation where i know the values of the independent features for future time periods . I m getting a little confused with the 3D input and output vectors for that . i have 6 features including the time series itself. Do you have a post which elaborates on this type of formulation ?

Reply
- Jason Brownlee May 21, 2019 at 6:40 am #
  
  Perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
Tinu Tholiyil May 24, 2019 at 7:25 pm #

Hi Jason,

Article is very useful. Thanks.

I have a dataset with 23 features with 183 observations(Day 0, Day1,…. Day 183) for a particular location. Data is available for 1000 locations. Target variable is available only at day 183. Can I use LSTM ouput at each time step and feed as input to next time step. After training is it possible to predict output at 183th day if I can give input for say 10 days only.

Reply
- Jason Brownlee May 25, 2019 at 7:46 am #
  
  Perhaps try it and see.
  
  Reply
youcef May 28, 2019 at 2:55 am #

Hello Jason,
thanks a lot it was very useful
I’m new into ML and LSTM so sorry my question might seam a little stupid
How can I print the predicted Value of pollution on the time t+1?

Reply
- Jason Brownlee May 28, 2019 at 8:20 am #
  
  yhat = model.predict()
  
  This may also help:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
John June 4, 2019 at 6:30 pm #

Hey Jason,

I really like your tutorials, but I have a question though:

My dataset is not as large as the one you use here, although it is larger than the other I’ve seen you using (shampoo), but the prediction I’m trying to make are more complex.
So, overall I’m facing the problem that using the techniques of your LSTM tutorials I’m not being able to predict the proper outcomes.

What happens is that my training loss goes down, however, my validation loss never goes down, it either stays the same or just increases, and I’ve noticed that the predictions are really sensitive to the initialization.

So, I’d like to ask you if you knew why that might be or if you have solutions in mind. Right now I’m splitting my dataset in 75% training and 25% for validations, so would you think that using cross validation techniques would help me out? In such case, have you made any tutorial about it with LSTM networks?

Thank you

Reply
- Jason Brownlee June 5, 2019 at 8:35 am #
  
  I recommend starting with similar liner and naive models then try more complex models to confirm they add value. They might not.
  
  Try this framework:
  https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
  
  Cross validation is not appropriate, instead, use walk-forward validation:
  https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
  
  Reply
Dustan June 5, 2019 at 12:38 pm #

Hi Jason,

I have a question about your suggestion for possible alternate formulations of the pollution problem:

* Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour.

In the LSTM data preparation for the original problem (with 8 input variables and 1 output variable), series_to_supervised() yields something like what I’ve pasted below. I’m trying to wrap my head around how I would use series_to_supervised() and account for the impact of the current-hour weather variables when predicting the pollution level at time, t. Is it as simple as not dropping the weather-variable columns at time, t? My assumption is that LSTM data preparation for this modified pollution problem is a bit more involved.

var1(t-1) var2(t-1) var3(t-1) var4(t-1) var5(t-1) var6(t-1) \
1 0.129779 0.352941 0.245902 0.527273 0.666667 0.002290
2 0.148893 0.367647 0.245902 0.527273 0.666667 0.003811
3 0.159960 0.426471 0.229508 0.545454 0.666667 0.005332
4 0.182093 0.485294 0.229508 0.563637 0.666667 0.008391
5 0.138833 0.485294 0.229508 0.563637 0.666667 0.009912

var7(t-1) var8(t-1) var1(t)
1 0.000000 0.0 0.148893
2 0.000000 0.0 0.159960
3 0.000000 0.0 0.182093
4 0.037037 0.0 0.138833
5 0.074074 0.0 0.109658

Reply
- Jason Brownlee June 5, 2019 at 2:37 pm #
  
  It may require that you snip out the relevant columns, e.g. some work is required.
  
  Reply
  - Dustan June 11, 2019 at 8:39 am #
    
    My apologies – a quick followup, with better specifics on my part:
    
    I am trying to understand how I would prepare the data [using series_to_supervised()] in order to account for the “expected” weather conditions at the next hour. My initial thought was that the column structure would look as follows:
    
    var1(t-1) var2(t-2) … var7(t-1) var8(t-1) var1(t) var2(t) … var7(t) var8(t),
    
    where var2(t) … var3(t) var8(t) described the “expected” weather conditions at time, t. However, in this structure, I believe the weather conditions would also be treated as direct output — much like the pollution level at time, t (which we are trying to predict).
    
    Any additional feedback on the column structure that would represent the “expected” weather conditions at time, t, when the goal is specifically not to predict them (just refine the pollution-level prediction)?
    
    Thank you for your time.
    
    Reply
    - Jason Brownlee June 11, 2019 at 2:25 pm #
      
      I would recommend preparing the data with the required inputs and outputs, and perhaps have the predicted column as an input, at least as an output from to_supervised. E.g. pollution values for t may appear as both inputs and outputs in the raw output from to_supervised..
      
      You can then curate the input columns and remove the value to be predicted.
      
      Does that help?
      
      Reply
Samuel Alfred June 7, 2019 at 11:48 am #

Hello Doctor Brownlee. Thanks alot for this great tutorials. They have been so helpful. I have a question I want to ask.

I have a dataset with a lot of data similar to this one used in this example. I am trying to start simple first before going advanced with my data.

I have a trajectory dataset with three features (x,y,z). I want to predict the three features (x,y,z) for the next step by inputting the previous three timesteps as the input.

The problem I am having now is that during the Prediction phase,
yhat = model.prediction(test_X) (In your case, yhat = (35039,1)

The output of this is yhat.shape = (timesteps, 1) but I expect it to be (timesteps, 3) since I want three outputs (x,y,z). Please how do I make this change to show that the network has predicted the new x,y and z at the next timestep.

Thanks for your anticipated response.

Reply
- Jason Brownlee June 7, 2019 at 2:35 pm #
  
  You can predict 3 values by specifying 3 nodes in the output layer of your network and training the model with a y vectors.
  
  Reply
  - Samuel Alfred June 11, 2019 at 6:08 am #
    
    Hello Doctor. I am still not sure how this will work. Can you explain better? Perhaps just specify how this is done briefly.
    
    model = Sequential()
    model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
    model.add(Dense(1))
    model.compile(loss=’mae’, optimizer=’adam’, metrics=[‘acc’])
    
    yhat = model.predict(test_X)
    
    How do I specify the 3 nodes here and also make a prediction? Sorry for disturbing you and thanks a lot.
    
    Reply
    - Jason Brownlee June 11, 2019 at 8:04 am #
      
      Change the number of nodes in the output layer from 1 to 3:
      
      …
      model.add(Dense(3))
      
      Reply
      - Samuel Alfred June 12, 2019 at 5:10 am #
        
        Thanks Alot.
Dave Hiltbrand June 8, 2019 at 4:53 am #

Hi Jason,

I just wanted to confirm I’m setting up my input correctly. I have 50 sites each with 20 variables that I get a report on everyday. So if I’m using daily values as my timestamp and go back for the last year my input would look like (50,35,20), correct? Each layer of the tensor would be a 365×20 dataframe for a single site. Thanks.

Reply
- Jason Brownlee June 8, 2019 at 7:04 am #
  
  Seems reasonable, try it and see.
  
  Reply
- Guanta January 24, 2023 at 9:07 am #
  
  How did you do your test and split? Was it on 50 seperate dataframes? If so how did you feed them back into the lstm model to make predictions that take into sccount the time series from the other sites?
  
  I am working on a similar issue in which I have 200 time series of different patient information, i.e 4 columns for each patient. All occuring at the same time. Each time series is specific to the individual. I could run seperate time series for each individual however this wont encorporate information from the other patients.
  
  Run them in one model? How???? LSTM Uses one evolving time series sequence for one entity. I have searched high and low on the net for this and NO ONE has a solution on how to actually put it in the model.
  
  Please help
  
  Reply
Mat June 11, 2019 at 12:16 am #

Dear Jason, thank you for your post. Really, really interesting!
In this framework, I am wondering how to teach the model the “panel” structure of your dataset. In other words, how to account for the fact that hour x in month j and day z is also present in year t-1 and year t-2 in the same day and month.
How can the model process this information?

Reply
- Jason Brownlee June 11, 2019 at 7:55 am #
  
  Good question, sorry I don’t have a tutorial on working with panel data. I hope to cover it in the future.
  
  Reply
chiranjeev June 11, 2019 at 5:08 pm #

why did you drop the column[9,10,11,12,13,14,15]?
ca you explain the documentation why dont we need it and if removed these why not other columns too

Reply
- Jason Brownlee June 12, 2019 at 7:51 am #
  
  As it states in the code, we are dropping the columns we do not want to predict.
  
  e.g. everything that is not the pollution column for the time step.
  
  Does that help?
  
  Reply
chiranjeev June 11, 2019 at 5:10 pm #

and how did you solve the problem of cbwd as they are in words se,nw ,cv etc

Reply
- Jason Brownlee June 12, 2019 at 7:51 am #
  
  We dropped that column.
  
  Reply

Alon June 11, 2019 at 11:33 pm #

Hi,

I’m trying to predict 3 features based on the same 3.

My question is regarding the “Evaluate Model” part. As I understand in your example you swapped the pollution feature with your prediction of the same feature.
In my case I would have to swap all 3.

1. Do I understand correctly?
2. Do I need to do this part x3 for every feature?
3. Is there a better way to do so?

Thanks a lot,

Jason Brownlee June 12, 2019 at 8:04 am #

Yes, I believe this tutorial will help as a first step:
https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/

Alon June 12, 2019 at 6:08 pm #

Thanks,

Did not find a reference to model evaluating in the above article.
Could you maybe describe in general how would you approach this?

Jason Brownlee June 13, 2019 at 6:13 am #

No, this is quite an advanced tutorial. I linked to a simpler model for you to start with for your specific problem.

Alon Lavian June 16, 2019 at 4:13 am #

Thanks again,

I’ve managed to evaluate all features, one by one. Here is the code for others interested:

for i in range(n_features):
  
  # invert scaling for prediction
  yhat_i_feature = yhat[:,i].reshape((len(yhat[:,i]), 1))
  
  test_X_replaced = test_X_reshaped
  test_X_replaced[:,i] =  yhat_i_feature[:,0]

  inv_yhat = scaler.inverse_transform(test_X_replaced)
  inv_yhat = inv_yhat[:,i]
  
  # invert scaling for actual
  test_y_i_feature = test_y[:, i].reshape((len(test_y[:, i]), 1))

  test_X_replaced = test_X_reshaped
  test_X_replaced[:,i] =  test_y_i_feature[:,0]

  inv_y = scaler.inverse_transform(test_X_replaced)
  inv_y = inv_y[:,i]
  
  
  # calculate RMSE
  rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
  print('Test RMSE: %.3f' % rmse)

for i in range(n_features):

# invert scaling for prediction

yhat_i_feature = yhat[:,i].reshape((len(yhat[:,i]), 1))

test_X_replaced = test_X_reshaped

test_X_replaced[:,i] = yhat_i_feature[:,0]

inv_yhat = scaler.inverse_transform(test_X_replaced)

inv_yhat = inv_yhat[:,i]

# invert scaling for actual

test_y_i_feature = test_y[:, i].reshape((len(test_y[:, i]), 1))

test_X_replaced = test_X_reshaped

test_X_replaced[:,i] = test_y_i_feature[:,0]

inv_y = scaler.inverse_transform(test_X_replaced)

inv_y = inv_y[:,i]

# calculate RMSE

rmse = sqrt(mean_squared_error(inv_y, inv_yhat))

print('Test RMSE: %.3f' % rmse)

Jason Brownlee June 16, 2019 at 7:15 am #

Nice work!

Pan Xiong December 20, 2019 at 6:53 pm #

Hi, Alon

I am interested about your code, can you post complete code, thanks

Reply

Craig.Y June 12, 2019 at 12:39 pm #

Hi Jason,
Thanks a lot for your post. I have learned a lot. If I try to predict a categorical variable using multivariate time series, how to build such an LSTM model? For example, if i want to predict wind direction the next hour using prior 3 hours pollution, drew, temp….. as inputs? I didn’t konw how to do such a classification using lstm. Loking forward to your reply. Thanks again!

Reply
- Jason Brownlee June 12, 2019 at 2:25 pm #
  
  This would be a time series classification problem, I give an example here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
chiranjeev June 12, 2019 at 2:38 pm #

# make a prediction
yhat = model.predict(test_X)
print(y_hat)
test_X = test_X.reshape((test_X.shape[0], n_hours*n_features))
# invert scaling for forecast
inv_yhat = concatenate((yhat, test_X[:, -7:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_y, test_X[:, -7:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]
# calculate RMSE
rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
print(‘Test RMSE: %.3f’ % rmse)

i am getting this error in this code-
ValueError Traceback (most recent call last)
in
89 # invert scaling for forecast
90 inv_yhat = concatenate((yhat, test_X[:, -7:]), axis=1)
—> 91 inv_yhat = scaler.inverse_transform(inv_yhat)
92 inv_yhat = inv_yhat[:,0]
93 # invert scaling for actual

~/.local/lib/python3.5/site-packages/sklearn/preprocessing/data.py in inverse_transform(self, X)
402 force_all_finite=”allow-nan”)
403
–> 404 X -= self.min_
405 X /= self.scale_
406 return X

ValueError: operands could not be broadcast together with shapes (35061,8) (11,) (35061,8)

please help me with this

Reply
- Jason Brownlee June 13, 2019 at 6:09 am #
  
  Perhaps double check the shape of your data.
  
  Reply
  - Gaurav Sharma September 28, 2020 at 5:14 am #
    
    I am facing the same issue Dr. Jason please suggest what should I be following! Thank you
    
    Reply
BHAVI June 16, 2019 at 2:37 am #

hi jason can you help me to predict multiobservation data in a single instant just like that i mentioned below
time location temp humidity wind speed
t1 new york …………………….
t1 california……………………………..
t1 texas……………………………………..
t1 LA………………………………………..

Reply
- Jason Brownlee June 16, 2019 at 7:14 am #
  
  I think you’re asking about a multivariate forecast.
  
  You can discover some models for this here:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
BHAVI June 16, 2019 at 8:44 pm #

why i am getting this error :KeyError: ‘val_loss’

Reply
- Jason Brownlee June 17, 2019 at 8:22 am #
  
  I have not seen that error before, sorry.
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
- nuunuu June 26, 2019 at 2:47 pm #
  
  I fixed the same error by using ‘validation_split’ instead of ‘validation_data ‘
  
  -before
  history = model.fit(train_X, train_y, epochs=50, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)
  
  -after
  history = model.fit(train_X, train_y, epochs=50, batch_size=72, validation_split = 0.2, verbose=2, shuffle=False)
  
  Reply
Hrishi June 18, 2019 at 7:50 am #

Hey Jason

I’m fairly new to ML can you tell me if I want to predict the pollution level after 24 hours where should I make the changes?

Reply
- Jason Brownlee June 18, 2019 at 2:21 pm #
  
  What do you mean after 24 hours? Do you mean for 24 hours?
  
  Perhaps start here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Jamie June 21, 2019 at 5:13 pm #

Yet another intuitive and amazing article. Thanks!

One question though. I noticed in the following code:

# normalize features
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)

that you fit the scaler on values, where values in the entire dataset matrix. Is there a reason you do not fit the scaler on the train set, and then transform the test set? In my opinion this should be a relatively quick gain, making the code even better yet.

Thanks

Reply
- Jason Brownlee June 22, 2019 at 6:34 am #
  
  Yes, brevity. Scaling data in these tutorials always causes confusion.
  
  More recently, I just leave it out.
  
  Reply
Guhan palanivel June 21, 2019 at 11:41 pm #

I am building a multivariate Time series prediction model using LSTM.
Is it possible to build a model which can forecast for future horizon ?

Reply
- Jason Brownlee June 22, 2019 at 6:44 am #
  
  Yes. I have many tutorials on this.
  
  Start here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
dayi June 22, 2019 at 1:24 am #

Hello, thank you for your post, I have a doubt, Can I use this code for predict the next 24 hours using like input the prior 24 hours, when the model is trained?

Reply
- Jason Brownlee June 22, 2019 at 6:46 am #
  
  Perhaps start with the multi-step forecasting examples here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Luiz Pizano Fonseca June 22, 2019 at 4:23 am #

Dear Jason,

Thank you for such an useful article.
Where it’s written “One-hot encoding wind speed”, shouldn’t it be “One-hot encoding wind direction”?

Reply
- Jason Brownlee June 22, 2019 at 6:48 am #
  
  Thanks, fixed.
  
  Reply
LUIZ PIZANO FONSECA June 22, 2019 at 6:45 am #

Dear Jason,

Could you help me please with “Interestingly, we can see that test loss drops below training loss. The model may be overfitting the training data.”? The criteria I know is that when validation loss gets smaller and validation loss starts to get greater, overfitting may have started to happen.

Reply
- Jason Brownlee June 22, 2019 at 6:49 am #
  
  This might help:
  https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
  
  Reply
Rushi June 24, 2019 at 7:39 am #

Hey Jason,

Thanks for excellent article again. Earlier in the post you mentioned that it is possible to ‘predict the pollution for the next hour based on the weather conditions and pollution over the last 24 hours’.

Could you please let me know how I can modify this program to predict the pollution next hour.

Reply
- Jason Brownlee June 24, 2019 at 2:29 pm #
  
  Yes, I have a number of examples of multi-step forecasting with LSTMs, you can get started here:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
Guhan palanivel June 24, 2019 at 4:43 pm #

Hi jason can you help me to predict for next 6 time steps in a multivariate problem?

Reply
- Jason Brownlee June 25, 2019 at 6:11 am #
  
  You have a few options, perhaps start here:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
Jimmy Zhang June 26, 2019 at 12:42 pm #

Hi Jason!

Really good tutorial. I was able to complete my first LSTM project due to your help. Much appreciate.

However when I tried to run
plt.plot(history.history[‘loss’],label = ‘train’)
plt.plot(history.history[‘val_loss’], label = ‘test’)

it gave me an error saying Sequential does not have history attribute. Do you know why ?

Reply
- Jason Brownlee June 26, 2019 at 2:21 pm #
  
  Well done!
  
  Perhaps you skipped some lines/code?
  
  Reply
nuunuu June 26, 2019 at 2:50 pm #

I mean I fixed this error that some people might get
KeyError: ‘val_loss’

Reply
Ala June 26, 2019 at 6:44 pm #

Hi Jason. Can you show me how to reshape time series for multivariate multi-step to be like supervised learning. I want for 3 time series (Input is 3 dimensional and output is also 3 dimensional) like 10 steps in future. The functions def_to_supervised either can do multivariate or multi-step but not both do you have any example we can do both together.

Reply
- Jason Brownlee June 27, 2019 at 7:47 am #
  
  Yes, see this post:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  Reply
Lopa June 27, 2019 at 1:28 am #

Hi Jason,

I have followed your tutorials & these have helped me to a great extent . I am trying to generate forecasts beyond my data points. I have 608 data points & 10 predictors & I want to predict 100 steps into the future & in order to do that I am using the following code:

#future unknown predictions: in this case, test_set doesn’t exist

future_pred_count = 100 #let’s predict 100 new steps

model.reset_states() #always reset states when inputting a new sequence

#first, let set the model’s states (it’s important for it to know the previous trends)
predictions = model.predict(fulldata) #this creates states

#future predictions
future = []
currentStep = predictions[:,-1:,:] #last step from the previous prediction as a 3d array

for i in range(future_pred_count):
currentStep = model.predict(currentStep) #get the next step
future.append(currentStep) #store the future steps

#after processing a sequence, reset the states for safety
model.reset_states()

Basically I am predicting for the entire dataset & trying to use the last step from the previous prediction to forecast ahead. The problem is that the predictions are a 2d array while inorder to use the .predict function I will have to have 3d (sample,timestep,features) & I have 10 features in my model.

Can you please advice how can I achieve this. I am also following your book but could not find an answer to this question.

Reply
- Jason Brownlee June 27, 2019 at 7:57 am #
  
  You can train the model to predict 100 time steps in the future or use the same model recursively.
  
  Perhaps start here:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
Dylan & Erica July 1, 2019 at 11:55 pm #

Hey Jason,

What do you think about using this RNN model for nowcasting? For example using air temperature to nowcast road surface temperature. Perhaps there is another method you would recommend?

Thank you
Dylan & Erica

Reply
- Jason Brownlee July 2, 2019 at 7:33 am #
  
  I always recommend testing a suite of methods in order to discover what works well/best for a specific dataset.
  
  Reply
Liang Zhao July 3, 2019 at 12:44 am #

Thank you so much! This is a fantastic tutorial!

After I run the code, the kernel died after the first epoch:

The following is the results I have got:

Using TensorFlow backend.
(43797, 32)
(8760, 24) 8760 (8760,)
(8760, 3, 8) (8760,) (35037, 3, 8) (35037,)
WARNING:tensorflow:From /Users/nikozhao/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /Users/nikozhao/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Train on 8760 samples, validate on 35037 samples
Epoch 1/50
2019-07-02 15:41:45.848357: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-07-02 15:41:45.848554: I tensorflow/core/common_runtime/process_util.cc:71] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.

Kernel died, restarting

Reply
- Jason Brownlee July 3, 2019 at 8:36 am #
  
  Sorry to hear that, it looks like a problem with your development environment.
  
  Perhaps this tutorial will help:
  https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/
  
  Reply
  - Liang Zhao July 4, 2019 at 12:58 am #
    
    Thank you so much! I have solved this problem!
    
    Now, I am wondering how I can make a prediction based on multiple time series.
    
    I have multiple time series. They are actually the taxi pick-ups at a stadium during and after special events. Therefore, the length of each time series is not very long, about 4 hours ( 15 mins lag, 16 points in each time series) and I have about 100 time series in total. ( I can try to find more)
    
    The lengths of those time series are the same, but the starting times are different. ( special events are all basketball games)
    
    I also want to incorporate other time series into each pick-up time series, maybe, weather condition. Then, it becomes a multivariate forecasting problem.
    
    Therefore, I am facing a multiple multivariate time series forecasting. I want to train a model using those time series and forecast pick-ups at the time “t+1” after a special event starts.
    
    I have searched online for a long time, but have not found anything.
    
    Can this be done by using LSTM, if yes, how can I train this model?
    
    Thank you very much!
    
    Liang
    
    Reply
    - Jason Brownlee July 4, 2019 at 7:50 am #
      
      Well done.
      
      Perhaps standard the sequences to start and end at the same time and use zero padding and a masking layer to ignore the padding?
      
      Reply
      - Liang July 25, 2019 at 1:40 am #
        
        Thank you for your reply, but maybe I did not really explain my question clearly.
        
        What if I want to train a model, that learns the pollution during several special events in Beijing, like the Olympic game, the national holiday, etc. And I want to predict what the pollution will be during the next special event.
        
        Assuming those time series of special events have the same length.
        Assuming I want to train over 30 such special events.
        
        Is it a good idea to concatenate those time series together and train a single time series?
        
        I have tried that, but I think a serious flaw is that there is a long time gap between two time periods.
        
        What method do you think can solve this problem?
        
        Thank you very much!
      - Jason Brownlee July 25, 2019 at 7:55 am #
        
        It is a challenging problem. The goal is to find those factors that influence or correlate with the target variable.
        
        The pollution level the day before will be far more relevant than what happened years before.
      - Liang July 25, 2019 at 4:55 pm #
        
        Yes, I agree with you.
        
        Do you suggest any model that can test whether the factor is correlated with the target variable? I know VAR could do it. What else do you suggest?
        
        If I have found one factor and want to make multivariate forecasting, could you give me some suggestion on how to make this forecasting?
      - Jason Brownlee July 26, 2019 at 8:16 am #
        
        chi-squared might be a good test for a factor, if you make the output discrete via binning.
        
        Yes, this process is my best general advice:
        https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
      - Liang Zhao August 1, 2019 at 8:13 am #
        
        Hey Jason,
        
        This is what I have asked one month ago:
        —————————————————————————–
        
        Now, I am wondering how I can make a prediction based on multiple time series.
        
        I have multiple time series. They are actually the taxi pick-ups at one stadium during multiple basketball games.
        
        The length and interval of those time series are the same, but the starting times are different.
        
        I also want to incorporate other time series into each pick-up time series, maybe, the score gaps time series of a basketball match. Then, it becomes a multivariate forecasting problem.
        
        Therefore, I am facing a multiple multivariate time series forecasting. I want to train one model using those time series and forecast pick-ups at the time “t” based on past pick-ups and score gaps.
        
        I have searched online for a long time, but have not found anything.
        
        Can this be done by using LSTM, if yes, how can I train this model?
        
        ————————————————————————-
        
        And you answered:
        
        Perhaps standard the sequences to start and end at the same time and use zero padding and a masking layer to ignore the padding?
        
        ————————————————————————-
        
        I actually did not really understand your reply. What does “standard sequences to start and end at the same time” mean?
        
        In terms of padding data, if I have two matches on Monday and Friday, did you mean I pad all the time stamp between Monday and Friday? or I want to ask: what determines the number of padding?
        
        I am very appreciated if you could reply to me, I have stuck at this point for one month.
      - Jason Brownlee August 1, 2019 at 2:11 pm #
        
        Sorry, I mean “standardize” – as in make the same or fit to a standard in terms of one or more factors, like length, start/end times, time steps, etc.
      - Liang Zhao August 2, 2019 at 6:57 am #
        
        Thanks a lot, but could you please explain more about padding and masking layer?
        
        My problem is I have multiple time series of taxi demand around a stadium. They are all during basketball games, which means if I concatenate them, it is not reasonable to predict pick-ups according to pick-ups several days ago.
        
        But you said in the previous reply:” use zero padding and a masking layer to ignore the padding.
        
        This makes me think that: can I concatenate all the time series, and pad some data to zero between two games, and use a masking layer?
        
        If it is what you meant earlier, how many points should I pad, does it depend on my sliding window?
      - Jason Brownlee August 2, 2019 at 2:32 pm #
        
        Yes, you can pad with the value 0, and use a Masking input layer that will ignore all observations with that value (or use any value you wish).
        
        I believe there is an example here:
        https://machinelearningmastery.com/handle-missing-timesteps-sequence-prediction-problems-python/
        
        You must choose how to frame the prediction problem, e.g. what are the inputs and outputs. Once defined, you can standardize all “samples” to meet this expectation.
        
        What is the right framing for your data – this is unknown maybe even unknowable give we have incomplete information, you must experiment and discover what works well or best.
Rachel July 3, 2019 at 6:52 am #

Thanks for the great article.

I’m working on a problem now that is essentially bullet point #d under LSTM data preparation: “Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour.”

In my context, there is a prediction made each day for a value that will occur days in the future. My goal is to use these 4 sequential predictions (as well as additional variables associated with each prediction day) as input for a model to predict the final value.

How would you incorporate a series of past predictions into such a model?

Reply
- Jason Brownlee July 3, 2019 at 8:43 am #
  
  I recommend devising many different framings of the problem (inputs and outputs) and test each to see what works well/best for your specific dataset. Also try a suite of models:
  https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
  
  Reply
nita22 July 5, 2019 at 2:01 am #

Hi, Jason!
I want to draw the line, the code as below:

size = yhat.shape[0]
aa = [x for x in range(size)]
pyplot.plot(aa, inv_y[:size], marker=’.’, label=”actual”)
pyplot.plot(aa, inv_yhat[:size], ‘r’, label=”prediction”)
pyplot.ylabel(‘Global_active_power’, size=15)
pyplot.xlabel(‘Time step’, size=15)
pyplot.legend(fontsize=15)
pyplot.show()

The whole image url is here: https://imgchr.com/i/ZaiKw6
It looks good. But when I see the detail, I found a problem.
https://imgchr.com/i/Zai0k8
The predict result is later than real result. What’s the problem?

Reply
- Jason Brownlee July 5, 2019 at 8:09 am #
  
  This is very common, see the explanation here:
  https://machinelearningmastery.com/faq/single-faq/why-is-my-forecasted-time-series-right-behind-the-actual-time-series
  
  Reply
  - nita22 July 5, 2019 at 10:57 am #
    
    As your example in this post, how can we fix this problem? Thx.
    
    Reply
    - Jason Brownlee July 5, 2019 at 11:12 am #
      
      Improve the model or use a different model.
      
      Reply
Bruno Morabito July 7, 2019 at 8:31 pm #

Hi Jason,

thanks for you post, it was very useful! I am new to RNN and I am struggling to understand why the past labels ( the pollution level) enter the train_X (so the feature matrix) and not the train_Y.

You do that in line 63 and 64 of the code which uses more than one time step.

I was thinking one as to define what are the past labels so that they can be associated with the past features. What am I missing?

Thanks a lot!

Reply
- Jason Brownlee July 8, 2019 at 8:40 am #
  
  You can learn more about time series forecasting framed as supervised learning here:
  https://machinelearningmastery.com/time-series-forecasting-supervised-learning/
  
  Reply
1984 July 9, 2019 at 5:56 pm #

Hi Jason,
Could you please show where can I find the RMSE of 30 as this senstence tells, “We can see that the model achieves a respectable RMSE of 26.496, which is lower than an RMSE of 30 found with a persistence model.” in the Section 【UPDATE】

Reply
- 1984 July 9, 2019 at 5:57 pm #
  
  Actually it’s a little before the 【Update】sec, not in that sec. I typed it wrong.
  
  Reply
- Jason Brownlee July 10, 2019 at 8:05 am #
  
  I fit a persistence model but did not post the example in the blog post.
  
  Reply
YWINTERN July 9, 2019 at 6:25 pm #

Hi Jason,
I’d like to know what is the persistent model you mentioned in this post, and it has a RMSE value of 30. It’s a little above the UPDATE section.
Thank you in advance!

Reply
- Jason Brownlee July 10, 2019 at 8:06 am #
  
  More on what a persistence model is here:
  https://machinelearningmastery.com/how-to-grid-search-naive-methods-for-univariate-time-series-forecasting/
  
  Reply
KW Cho July 11, 2019 at 6:20 pm #

Hi jason,
Thank you for your informative post.
You’ve used ‘pollution’ as a feature not a target.
Then the model is predicting pollution with the answer.
I think pollution should be used just for target(train_y or test_y)
Isn’t it? please let me know

Thank you!
Cho

Reply
- Jason Brownlee July 12, 2019 at 8:32 am #
  
  It is both a feature and a target – e.g. autoregression.
  
  Reply
  - KW Cho July 13, 2019 at 1:47 pm #
    
    For training, It make sense.
    But for prediction (test) input, pollution column should be deleted. Isn’t it?
    Prediction pollution with answer pollution data doesn’t make sense.
    Good result is obvious.
    
    Am i wrong?
    
    Cho
    
    Reply
    - Jason Brownlee July 14, 2019 at 8:04 am #
      
      Not in the case of walk-forward validation:
      https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
      
      Reply
    - Kingsley Udeh July 16, 2019 at 4:13 am #
      
      As Cho was suggesting, how do we train with all features including pollution, but predict with prediction column deleted? Of course, not using Walk Forward Validation.
      
      Do we say:
      
      new_X_test = X_test[:,:-1]
      new_test_y = X_test[:,-1]
      
      yhat = model.predict(new_X_test)
      
      We can now compare yhat and new_test_y ?
      
      Reply
      - Jason Brownlee July 16, 2019 at 8:22 am #
        
        You can frame the problem anyway you wish.
        
        Choose the input and output columns, prepare the data to meet your framing, define the model to meet the data.
karthik July 11, 2019 at 11:22 pm #

Hi Jason, it is great article and thanks for doing it. However, I ran this code on my dataset and see the inverse transform is not actually transforming to the original units of “Y” (target) Variable. Say, my actual Y is in milions but still the transformed Y is on tens.

I am not getting any error but the transformed value is very very less

Reply
- Jason Brownlee July 12, 2019 at 8:44 am #
  
  Perhaps check this post:
  https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/
  
  Reply
  - karthik July 12, 2019 at 2:51 pm #
    
    Thanks, Jason for your valuable inputs. I got that sorted out. But I have another problem. I am now predicting the revenue for the next months but the prediction is kind of flattening out and I am thinking this could be due to my features not being rich enough. Is it fair thought? or any thoughts on what the problem could be here?
    
    Also, I have not differenced the data as I will have to preserve the seasonality and trend in predicting it.
    
    Can you please guide me ?
    
    Reply
    - Jason Brownlee July 13, 2019 at 6:51 am #
      
      Perhaps try other models?
      Perhaps try other model configurations?
      Perhaps try adding new features?
      
      Reply
ctrado July 13, 2019 at 2:41 am #

Hey,

this is a very nice article.

But I have difficulties to understand why persistence models are bad?

You have a correlation of nearly 1 with time lag of 1. So the model fit very closely, but why is this bad?

Thanks.

Reply
- Jason Brownlee July 13, 2019 at 6:58 am #
  
  They are not bad, they are just the simplest thing we can do for time series forecasting.
  
  If a learning model cannot outperform a persistence model, then the learning model does not have any skill.
  
  Reply
Kingsley Udeh July 15, 2019 at 10:11 pm #

Hi Dr. Jason,

I quite understand your excellent tutorial. Due to some related ideas I’m tackling with, I like to ask the following questions for the benefit or input of others.

Can we use the LSTM model you created to predict the next pollution measurement for current time step given other features’ prior time steps minus pollution?

That is, how can we design our input samples such that we train our model with prior time steps of all the features including pollution measurement, and predict only the pollution variable in the current time step given dew, temp, press, wnd-spd,snow, and rain variables as prior time steps.

I have been trying to design the above, but it’s given me unstable predictions. I have gone through your book on Time Series with LSTMs, MLPs, etc, but need more clarifications on the said problem.

Can anyone points me to the right direction? I will appreciate your help.

Reply
- Jason Brownlee July 16, 2019 at 8:17 am #
  
  Yes, but you may need to adapt the model to a new framing of the problem (e.g. inputs and outputs) and prepare data to meet this new framing.
  
  You have freedom over this framing, perhaps try a few different approaches and see what works best?
  
  Reply
  - Kingsley Udeh July 16, 2019 at 10:57 pm #
    
    Thanks, Dr. Jason.
    
    I have tried to successfully removed pollution variable from test data. However, the problem I had was that training feature or shape isn’t equal with the test features or shape, hence I LSTM threw an error due to the different shape.
    
    Is there a way I can train the model with different shape and predict with a different shape?
    
    Thanks in advance
    
    Reply
    - Jason Brownlee July 17, 2019 at 8:25 am #
      
      What do you mean by different shapes?
      
      Reply
      - Kingsley Udeh August 9, 2019 at 9:33 am #
        
        By different shapes, I meant when using the trained model to predict pollution variable, do not include the pollution variable in the test set. Thus,
        
        Train with all the features including pollution variable, but predict future pollution without providing its values(empty or zeros) in the test set.
        
        Note that pollution variable is the target output or variable
        
        Is this possible? If so how do I go about it? That is, do I have to change the current code in anyway?
        
        Thanks in advance and will be glad to see your response.
      - Jason Brownlee August 9, 2019 at 2:19 pm #
        
        Yes, you can frame the problem anyway you wish, then prepare the data to meet your requirements and fit the model. Once fit, you can use the model to make predictions.
        
        You will need to prepare the data manually, you can use an existing function from the tutorial as a starting point and adapt it for your needs.
      - Kingsley Udeh August 10, 2019 at 5:10 am #
        
        Thanks again for your response, Dr. Jason.
        
        This is the way I plan to prepare the test set manually:
        
        Provide all the weather variables and the pollution variable as the test set, but remove all values or time steps from the pollution variables, or assign zeros to it, and then make prediction for the future time step(s) of the pollution variable.
        
        The reason for including the pollution variable as a placeholder in the test data is to maintained the shape structure used to train the model in the first place as in the following make up data sample:
        
        Train set variables:
        pollution dew temp press wnd-speed snow rain
        30 7 -5 2 36 20 89
        
        Test set variables:
        pollution dew temp press wnd-speed snow rain
        13 2 65 3 23 11
        
        Prediction:
        pollution
        ?
        
        If the system raining a NAN or empty value error in the test set, then I will assign the pollution variable with 0 time step, meaning missing values.
        
        Do you see any potential issue with this? I’m yet to try the above framing.
      - Jason Brownlee August 10, 2019 at 7:24 am #
        
        Data with nan’s must be removed prior to modeling.
Lopa July 18, 2019 at 1:23 am #

Hi Jason,

Firstly, I would like to thank you for responding & helping me resolve my queries. I was trying to implement LSTM for a real life time series problem where given 18 months of data I have to forecast next 12 months.

Although there’s some relief that the data is at daily level enabling me to work with more data points. However, I was finding it difficult to forecast multiple steps ahead in time so I have developed multiple models meaning , I forecast 2 months ahead then added it back to the original data & retrained the model to generate 3 months of forecasts & so on …

This approach has helped me to generate reasonable forecasts.

My question to you is that is this a correct approach ?

Reply
- Jason Brownlee July 18, 2019 at 8:31 am #
  
  I recommend testing a range of approaches and discover what works best for your chosen model and specific dataset.
  
  More ideas here:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
  - Lopa July 18, 2019 at 12:39 pm #
    
    Thanks I have gone through all these tutorials and your books d have tried all the approaches which have been mentioned for multi step forecasting.
    But still have this question is it reasonable an approach to predict until a certain time and use those predictions as inputs retrain the model and forecast few more steps ahead ?
    
    Thanks in advance.
    
    Reply
    - Jason Brownlee July 18, 2019 at 2:13 pm #
      
      It depends on the problem and the model.
      
      I always recommend testing and use the results to guide your choice.
      
      Reply
    - durga July 19, 2019 at 10:54 am #
      
      Hi,
      I dont think thats a great idea as you might just be rolling the errors and eventually end up with bad predictions few time steps down the line.
      Instead you can use batch_size of 1 , save the model and retrain the model with actual values.
      
      Reply
      - Jason Brownlee July 19, 2019 at 2:20 pm #
        
        Nice tip.
pgaiwak July 20, 2019 at 9:25 am #

Hi Jason,

Thanks a lot for your blogs. They are very informative and always give me insight on how to proceed with problems.

I am trying to use LSTM (keras) to predict power consumption of individual houses as a part of a high dimensional analysis. For some reason all the outputs of LSTM have the same value. I am appending the code below ( Most of it is motivated from this blog post). Can you guide me about this?
Thanks in advance:)

CODE

model = Sequential()
model.add(LSTM(units = 100,input_shape=(1, dim_obs)))

model.add(Dense(2))

model.compile(loss=’mae’, optimizer=’adam’,metrics = [‘accuracy’])

history = model.fit(train_x, train_z, epochs=20, batch_size=100, validation_data=(valid_x, valid_z), verbose=2, shuffle=False)
model.summary()

yhat = model.predict(test_x)

Reply
- Jason Brownlee July 20, 2019 at 11:00 am #
  
  I recommend following this process:
  https://machinelearningmastery.com/framework-for-better-deep-learning/
  
  Reply
  - pgaiwak July 27, 2019 at 7:52 am #
    
    Thanks a lot Jason. I did lookup the tutorial, found my error and rectified it. It was very helpful.
    
    I have another question: I am using weather data to predict power consumption. Is it essential to use embedding layer for the weather data before feeding it to the LSTM layer?
    
    Regards
    Paritosh Gaiwak
    
    Reply
    - Jason Brownlee July 28, 2019 at 6:37 am #
      
      Well done, happy to hear that.
      
      I recommend testing with and without it and compare the performance. Use results to drive model design decisions.
      
      Reply
Jimmy July 25, 2019 at 12:30 pm #

Hi Jason !

Thank you so much for this tutorial !

I have a simple problem that I encounter when I tried to reshape the train_x in my LSTM model. Do I have to set the timestep(in your case its the n_hours) to a number that can be divided by the total length of the train_x ?

Best,
Jimmy

Reply
- Jason Brownlee July 25, 2019 at 2:12 pm #
  
  Yes, that is a good idea, e.g. use hours of day or weeks of year or something.
  
  Reply
Pedro July 28, 2019 at 8:36 am #

Hello Jason.

Thanks a lot for this tutorial, it’s helping me a lot on my undergrad thesis.

I have a question: What if I want to feed the model more than one dataset? How would I adapt the code for that?

Thanks in advance, and keep up the good work! 🙂

Reply
- Jason Brownlee July 29, 2019 at 6:01 am #
  
  What do you mean exactly?
  
  The input for this example is a multivariate time series – e.g. multiple “dataset” or “series” as input.
  
  Reply
  - Pedro July 30, 2019 at 1:56 am #
    
    Sorry, it actually got confusing because I was thinking about the dataset that I have. Let me explain a little further.
    
    I’m working in a problem that I need to predict network traffic for anomaly detection, and my datasets are made from data such as bytes, packets, etc. in one file each, and those contain a whole day (24h) of data.
    Considering that each file contains one single column of data, I merged the files in one thing, so that each column of the resultant would represent a different feature, but that is just for one single day.
    Since I have more than one day of data, what I was thinking of doing was to merge the data sequentially (following days below each other).
    I was wondering if there are any better ways of doing so.
    
    Reply
    - Jason Brownlee July 30, 2019 at 6:20 am #
      
      Sounds good.
      
      If you want a model to learn across days, then you will need to train a model on multiple days of data. A training dataset must be comprised of multiple days in order to achieve this.
      
      You can use a data generator to load one (or a few) day of data at a time if it does not all fit into memory.
      
      Does that help?
      
      Reply
      - Pedro July 30, 2019 at 6:40 am #
        
        It does help.
        Thank you very much! 🙂
      - Jason Brownlee July 30, 2019 at 2:07 pm #
        
        Happy to hear that.
      - Pedro July 30, 2019 at 10:58 am #
        
        Actually, I have more questions.
        I was trying out two features, so I put them on the train_y and test_y. Then I guessed that I should also use Dense(2).
        In the evaluation part, because I am predicting for two different values, I did:
        …
        test_y = test_y.reshape((len(test_y), 2))
        …
        
        but, at the end, I got too big of a RMSE: “Test RMSE: 22074.224”
        
        This number means I did something wrong, I figure…
        
        Could you help-me?
        
        Thanks in advance.
        
        P.S.: I’m not using the pollution dataset, but my network traffic dataset.
      - Jason Brownlee July 30, 2019 at 2:08 pm #
        
        I would encourage you to estimate the RMSE for each element in the output vector separately.
  - Pedro July 30, 2019 at 6:36 am #
    
    One more question: If my train_y shape have more than one column, e.g. if I’m training my model to predict polution and dew, will I have to tweak anything to use the model.fit() method?
    
    Thanks!
    
    Reply
    - Jason Brownlee July 30, 2019 at 2:06 pm #
      
      No.
      
      Reply
Jigyasa July 30, 2019 at 11:31 pm #

Hi, Jason
I have a question regarding the future prediction. For example here the model is been divided into training and testing set and the test set is predicted. What if I want to predict what comes after the test set. Do you have any idea? If yes, cab you give me any suggestion or links to follow?

Thank you so much!

Reply
- Jason Brownlee July 31, 2019 at 6:53 am #
  
  I show how to make a prediction here:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
  - Jigyasa July 31, 2019 at 4:43 pm #
    
    And I have one more question that my data consist of time and 2 more data columns. I made the supervised data by removing date column from my data. If, I want to add the date in the final graph so that I can visualize it. How should I do that ?
    
    Reply
    - Jason Brownlee August 1, 2019 at 6:45 am #
      
      You can create a line plot in matplotlib and specify the date as the label for the x-axis.
      
      Reply
      - Jigyasa August 1, 2019 at 4:55 pm #
        
        Thank you so much for the help. 🙂
      - Jason Brownlee August 2, 2019 at 6:45 am #
        
        You’re welcome.
Sen July 31, 2019 at 4:45 am #

In this dataset all look input variables, which is the target variable or Is it necessary to keep target variable? I have an idea to forecast time series for traffic flow. I have data for traffic volume, speed, headway etc. Could you please suggest me in details how can I develop it?

Reply
- Jason Brownlee July 31, 2019 at 6:57 am #
  
  You can frame the problem any way you wish.
  
  Perhaps explore some of these exampels to find an appropriate model:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Amine July 31, 2019 at 10:22 am #

Hello Jason, great post !!

I have a question that has been asked before here. In fact, you said in some comments that we should try different timesteps in the input and see what can give us the best performance.

But what if Timestep=1 is giving the best performance, how can you explain it to people claiming that the LSTM purpose is neglected (BPTT too) in this case, and it’s like a simple feed forward MLP?

Thanks a lot for your posts,

Amine

Reply
- Jason Brownlee July 31, 2019 at 2:06 pm #
  
  If an LSTM with one timestep is performing the best, then I would expect an MLP to outperform it.
  
  Reply
Sen July 31, 2019 at 8:27 pm #

Thank you very much. I have already tried with univariate LSTM and it works nice. I am trying for multivariate LSTM. Your tutorials are absolutely great, very useful. One more question please. How to proceed for prediction with new dataset (unlabelled)?

Reply
- Jason Brownlee August 1, 2019 at 6:48 am #
  
  Fit the model on all data and call:
  
  yhat = model.predict(newX)
  
  Reply
Samuel Alfred August 1, 2019 at 2:21 am #

Hello Dr Jason.
1) LSTM accepts input as (sample, timesteps, features). Most of the examples in your tutorial have used something like (1, 120,2). Please I want to make predictions with something that has a multiple samples like (3,120,2).

Please how do I manipulate this to go into the LSTM ?

2)I want it to be trained in such a way that the LSTM model will receive one sample as input at a time i.e. One sample of (120,2) then feed in the next etc till the training is over.

Please an ideas how this should be done? Thanks.

Reply
- Jason Brownlee August 1, 2019 at 6:55 am #
  
  You can provide any number of samples to the model, no change needed.
  
  Samples are processed one at a time. You can choose to reset the internal state between samples or not, buy default, the internal state is reset at the end of each batch. To take control of when state is reset, you can use a stateful lstm and call reset_states() on demand.
  
  Reply
  - Pavel Komarov July 23, 2021 at 4:46 am #
    
    Resetting states between samples shouldn’t have an effect if states for each sample are indeed kept independent, as indicated here https://stackoverflow.com/a/46331227/2084503.
    
    Reply
Malathi August 1, 2019 at 9:25 pm #

Hi Jason,

Highly informative as usual and saved a lot of my time and effort.
I tried the code given and got the results. I applied to my data set as well.
In this code , the parameters you passed to the series_to _suprervised function is(data,1,1)

1. I tried for multiple lags for my data set, increased from 30,50,100 and 365 and third
parameter is 1
2. I tried one shot prediction (samples,30,30) predicting var(t+29) leaving all the variables
from var1(t) till var(t+28) here . And also I changed the second and third parameters
values.
3. I got no significant ncrease in RMSE(only marginal increase by 0.1 or 0.2. Can you tell me
the reason for that?
4. I conducted these experiments without scaling. I thought I will do the scaling part later.
So my RMSE=np.sqrt(mean_squared_error(test_y, yhat))

Thanks in advance

Reply
- Jason Brownlee August 2, 2019 at 6:47 am #
  
  Nice work. Generally, it is a good idea to tune the model when the structure of the problem is changed.
  
  Reply
Malathi August 2, 2019 at 8:30 pm #

Thanks for the reply Jason

You have mentioned training LSTM on multiple lags(time steps) did not lift model skill
in your updated text. I have the same opinion after conducting all these experiments.
what would be the reason for that?

Thanks,

Reply
- Jason Brownlee August 3, 2019 at 8:01 am #
  
  LSTMs are generally poor at univariate time series generally and are hard to configure for multivariate cases.
  
  Try CNN or CNN-LSTM hybrids:
  https://machinelearningmastery.com/how-to-develop-convolutional-neural-network-models-for-time-series-forecasting/
  
  Reply
Malathi August 4, 2019 at 11:57 am #

Thank you very much Jason.

I will follow the tutorials.

Reply
- Jason Brownlee August 5, 2019 at 6:44 am #
  
  You’re welcome.
  
  Reply
Adurthi Ashwin Swarup August 5, 2019 at 3:57 pm #

Hi Jason ,
Your blog specifically states that increase in the number of lags does not necessarily affect the performance of LSTM models .

I was intrigued to understand the reasoning behind this statement ?

Is the conclusion an observation or is there a theoretical backing to this ?

Reply
- Jason Brownlee August 6, 2019 at 6:30 am #
  
  Both.
  
  Empirically, the amount of history must be tested.
  
  Theoretically, more history results in vanishing gradients after 200-400 timesteps.
  
  Reply
Adurthi Ashwin Swarup August 6, 2019 at 7:05 pm #

To rephrase you answer – the number of lags has to be empircally determined expecially if one is doing longer predictions .

And have more than 200 – 400 lags would cause a vanishing gradient problem

Do you concur ?

Reply
- Jason Brownlee August 7, 2019 at 7:46 am #
  
  Yes, in general.
  
  Reply
kent August 9, 2019 at 3:12 pm #

How can we relate “samples(batch_size)” in input tensor and “batch_size” in model.fit() in keras?

Reply
- kento August 9, 2019 at 3:14 pm #
  
  When these two are different, what is the implication about it?
  
  Reply
  - Jason Brownlee August 10, 2019 at 7:11 am #
    
    The number of samples is the number of rows in your data.
    
    The batch size is the number of samples used in one update to the model.
    
    You can learn more here:
    https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/
    
    Reply
- Jason Brownlee August 10, 2019 at 7:10 am #
  
  I don’t follow, sorry. What do you mean exactly?
  
  Reply
  - Kent August 11, 2019 at 6:46 pm #
    
    Thanks very much for your reply.
    
    In the original document in Keras RNN, the input shape requires “(batch_size, timesteps, input_dim)” it mentions.
    The link is here: https://keras.io/ja/layers/recurrent/
    
    “bacth_size” in input shape and “bacth_size” inside fit() function denotes different thing?
    
    Reply
    - Jason Brownlee August 12, 2019 at 6:35 am #
      
      Batch size is only needed in the input shape if your model is stateful (e.g. stateful=True).
      
      Reply
Amirreza August 12, 2019 at 9:00 pm #

Dear Mr. Brownlee,

Thank you very much for your great example. It was very helpful.
I just have a question because I am rather new to Python:
In my model I am going to predict temperature and volume of water using multivariate LSTM, So, different to your example I will have two outputs. Could you please let me know how can I modify this model to have two outputs?

Thank you

Reply
- Jason Brownlee August 13, 2019 at 6:08 am #
  
  Yes, I give an example of this here, it will provide an excellent starting point for you:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Lopa August 15, 2019 at 12:28 am #

Hi Jason,

I am attempting to build a multivariate LSTM with 2 explanatory variables. I have been able to build a reasonably good model & now I want to forecast for the next 3 months. One of the explanatory variable is an indicator for the holidays but the other one is continuous.

Having said that the train & validation goes well . But when I have to predict for the next 3 months I have to feed in the 2 explanatory variables for the future time frame & since one of them is continuous I am scaling it . But when I attempt to invert scale the values that I see are not consistent with the original variable. I cannot use the same scaler function that I used while developing the model because the array size are different.

Because I develop the model using 3 variables which is the variable I want to predict & the 2 explanatory variables. Can you please help me out ? I have tried looking it in your book as well but could not find something to help me out .

Reply
- Jason Brownlee August 15, 2019 at 8:12 am #
  
  Perhaps try scaling/inverting manually to avoid any issues with array sizes?
  
  Reply
  - Lopa August 15, 2019 at 6:41 pm #
    
    Thanks Jason. I hope I have been able to explain my problem well. As mentioned previously the problem happens when I am attempting to forecast beyond the size of the entire data set & as you can understand that I need to feed in all the explanatory variables .
    
    Reply
    - Jason Brownlee August 16, 2019 at 7:50 am #
      
      Correct.
      
      Reply
      - Lopa August 16, 2019 at 6:42 pm #
        
        Thanks
      - Jason Brownlee August 17, 2019 at 5:34 am #
        
        No problem.
Leon August 16, 2019 at 12:21 am #

Thank you for sharing

I would like to ask: If I want to divide the training set and the test set in more detail, say to minutes, with my own dataset, how do I change this,such as “n_train_hours= 365*24*60”

Reply
- Jason Brownlee August 16, 2019 at 7:55 am #
  
  You can adapt the example to fit your data, I cannot write code for you.
  
  Reply
Lopa August 17, 2019 at 12:40 am #

Hi Jason,

Is it possible to have prediction interval around LSTM time series forecast. I went through this post of your https://machinelearningmastery.com/prediction-intervals-for-machine-learning/ but could not really understand how can I replicate it for LSTM .

Thanks for your help.

Reply
- Jason Brownlee August 17, 2019 at 5:49 am #
  
  It is possible. I don’t have an example, sorry.
  
  Reply
Eva August 19, 2019 at 7:00 pm #

Hello Dr. Jason,

Very helpful post, as always!
You mentioned about data preparation by making all series stationary with differencing and seasonal adjustment.
But how to prepare a chaotic series?
Also, when do we say the RMSE is low and the model is skillful? Any rule of thumb?
Regards.

Reply
- Jason Brownlee August 20, 2019 at 6:25 am #
  
  I don’t know about chaotic series, are they predictable?
  
  Yes – excellent question, the idea of model performance is relative, e.g. to a naive model:
  https://machinelearningmastery.com/faq/single-faq/how-to-know-if-a-model-has-good-performance
  
  Reply
Eva August 19, 2019 at 8:39 pm #

And how to compute RMSE for multi-step univariate output? single-step, multivariate output? multi-step, multivariate output?

Thanks!

Reply
- Jason Brownlee August 20, 2019 at 6:26 am #
  
  You must consider what you want to measure exactly?
  
  E.g. error across all series? all time steps? separate series? separate time steps? etc.
  
  It is up to you.
  
  Reply
Amirreza August 19, 2019 at 9:12 pm #

Thank you very much for your helpful instructions.
Just a question: Here you have used the same data for validation, and prediction. So what percentage of the data would be for validation, and for test?

Thanks

Reply
- Jason Brownlee August 20, 2019 at 6:27 am #
  
  It is problem specific, the test set must be representative of the broader problem.
  
  Reply
  - Amirreza August 20, 2019 at 8:29 pm #
    
    Thank you, but could you please let me know that when you use the same data for both validation and test as in this example, what is the default percentage which is used for validation and test respectively?
    
    Reply
    - Jason Brownlee August 21, 2019 at 6:39 am #
      
      There is no fixed rule. Generally the validation set and test set should be representative of the broader problem.
      
      Reply
Kingsley Udeh August 21, 2019 at 7:55 am #

Hi Dr. Jason,
How do I successfully use fit_transform() on train data and transform() on test data if I’m using walk_forward validation strategy that requires a retrain of the model each time a prediction is made on the test samples?

In my current project, I used fit_transform() on the entire dataset as you did in your tutorial, while at the same time, implemented walk forward validation – model retraining. Is there any kind of information leak or bias in my approach?

Reply
- Jason Brownlee August 21, 2019 at 2:02 pm #
  
  Perhaps re-fit the transform each time the model is prepared?
  
  Perhaps prepare a custom data prep scheme that takes into account domain knowledge?
  
  Reply
Kingsley Udeh August 21, 2019 at 11:16 pm #

What you meant by “Perhaps re-fit the transform each time the model is prepared”, is the transformed test data should be rescaled with fit_transform() each time it’s passed to the model for retraining after prediction is collected, right?

Can you throw more light on what you meant by the second option: “Perhaps prepare a custom data prep scheme that takes into account domain knowledge?” I did not really get that aspect.

Finally, my project is already completed and I’m wondering if it worths it redoing the recaling again. Like I said earlier, I used fit_transmit() on the entire dataset like you did in your tutorial, and had good and reasonable results. What’s your thought?

Again, thanks in advance.

Reply
- Jason Brownlee August 22, 2019 at 6:29 am #
  
  I was suggesting that perhaps there is benefit in preparing the transform again each time you prepare the model.
  
  I was then suggesting that perhaps you don’t need to refit the transform and that instead you can use domain knowledge to define the scaling coefficients once and re-use them throughput the use of the model. Perhaps that is too advanced for now.
  
  Sorry, I cannot give good comments on your project, I have not seen it and don’t have the capacity to review it.
  
  Reply
  - Kingsley Udeh August 22, 2019 at 11:00 am #
    
    Thanks so much for responding.
    
    This is the way I currently implement the scaling procedures:
    
    1. I divided the entire dataset into training set and test set
    2. I used transform_fit() on the training set
    3. I applied the transform() on the test set
    4. Since I used Walk Forward Validation(WFV) strategy, I fit my model on the training set, and make predictions on the first batch of my transformed test.
    5. Collect the predictions and refit the model on the actual transformed test set, and so forth, until the end of the test set.
    6. Calculated RMSE on the predicted data, and results look great
    
    Final question:
    Is there need for me to use fit_transform() on each batch of the transformed test set before refitting the model on them? This is currently very challenging for me to achieve using WFV.
    
    Reply
    - Jason Brownlee August 22, 2019 at 1:59 pm #
      
      Seems reasonable.
      
      You could refit the transform on the updated training set during each step of the walk-forward validation as new data is added to “train”.
      
      Reply
Ahmad August 22, 2019 at 2:50 am #

Hi
I am using the day number, and the hour of day as inputs to this model. As these values are discrete, I am not sure if I can follow exactly the same approach as you have used or not. Would you please let me know that what should I do to these values to use them in this approach?

Thank you

Reply
- Jason Brownlee August 22, 2019 at 6:32 am #
  
  Typically we discard the date information and model the variable directly.
  
  Reply
ola August 24, 2019 at 12:10 am #

Hi,
I was wondering if you also made an example for this case:

“Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour.”

That would be very useful!

Best regards,

Reply
- Kingsley Udeh August 24, 2019 at 7:06 am #
  
  Hi Ola,
  
  I think the current framing of the problem(tutorial) addresses your question.
  
  You are ideally saying predicting the pollution for the next hour given weather conditions for the next hour, also taking into account pollution up to the current hour or lagged values.
  
  Let me know your thoughts.
  
  Reply
- Jason Brownlee August 24, 2019 at 7:53 am #
  
  I believe you could easily adapt the example for this case.
  
  Reply
Ganesh August 28, 2019 at 10:22 pm #

Hi Jason,

The post looks great but when you train actually its says 15 features (i.e t-1 and t) which include the pollution (var1(t-1)) as well. How could it show to you 8 features in the 3D array also var1(t-1) as part of the test?

Do we need to include pollution (vart(t-1) in the train and test??

Reply
- Jason Brownlee August 29, 2019 at 6:10 am #
  
  Sorry, I don’t follow? What do you mean exactly?
  
  Reply
Shanavaz September 4, 2019 at 9:29 am #

Hi Jason, Thanks for the tutorial. I adapted the code to my data. The training and test was good enough. Then i tried to predict for a new data set.
The training and test was done with 14 variables. Then when i try to predict i used a data set with 12 variables, (obviously i do not have the output variables which were earlier present in the training set) When i try to predict, it throws an error stating that it was expecting 14 variables instead of 12 variables. Logically i cannot provide the output variable while predicting also right? if i know those future values why should i even predict…

What am i missing?
I guess i am doing something wrong here…
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))

train_X.shape[1] – 14
test_X.shape[1] -12

this is causing the issue when i run yhat = predict(test_X)

Please help as it is kind of urgent….

Reply
- Jason Brownlee September 4, 2019 at 1:38 pm #
  
  Yes, you must frame the problem and train the model in the same way that you intend to use it for prediction.
  
  If you only have 12 variables when making a prediction, then the model should be trained to expect 12 variables as input.
  
  Reply
Jason Lee September 5, 2019 at 9:33 pm #

Hi Jason Brownlee,

Please check! Major discovery, I think I found a big problem in your example??

Seems like the result is shifted +1 if you plot and look (and I couldn’t explain why it should shift):

y_tes = pd.DataFrame({‘y_test’:inv_y, ‘y_pred’:inv_yhat})
y_tes.plot(figsize=(15,7), xlim=(None,180))

And when you shift it back, the plot looks much better and the RMSE = 4.321964

y_tes[‘y_pred’] = y_tes[‘y_pred’].shift(-1)
y_tes.plot(figsize=(15,7), xlim=(None,180))
y_tes.dropna(inplace=True)
np.sqrt(mean_squared_error(y_tes.y_test, y_tes.y_pred))

Reply
- Jason Brownlee September 6, 2019 at 4:58 am #
  
  It suggests the model is poor and has learned a persistence forecast:
  https://machinelearningmastery.com/faq/single-faq/why-is-my-forecasted-time-series-right-behind-the-actual-time-series
  
  Reply
Jason Lee September 5, 2019 at 9:34 pm #

Oh I saw few people commented on the same thing, let me check

Reply
Peter September 6, 2019 at 6:32 am #

Hello Jason, thank you for the post. I have a univariate problem and my goal is to predict x_t on a combination of consecutive lags and non consecutive lags after that. For example, I want to predict x_t using x_t-1, x_t-2, x_t-3, x_t-24, x_t-168 (the last few hours, yesterday’s same hour, last week’s same hour). In your opinion, how is the best way to represent this data as input? Thanks

Reply
- Jason Brownlee September 6, 2019 at 1:55 pm #
  
  I would encourage you to explore multiple different framings of the problem in order to discover what works well/best for your specific dataset.
  
  This framework may help:
  https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
  
  Reply
Rohith September 10, 2019 at 1:20 am #

Hello Jason,
I want to predict a forecast for 7 days, how do i convert the time series to supervised learning and split train – test dataset. Need prediction for 7 days, Kindly send me code for this

Reply
- Jason Brownlee September 10, 2019 at 5:51 am #
  
  I give examples. Perhaps start with these simpler posts:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Sindi September 14, 2019 at 7:17 am #

Hi Jason
Under section ” Define and Fit model ”

model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))

Please explain why you use 1 and 2 for train_X.shape

Reply
- Jason Brownlee September 15, 2019 at 6:13 am #
  
  To specify the number of time steps per sample and the number of features per time step.
  
  For more explanation of these concepts, see this:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
Sindi September 14, 2019 at 10:04 am #

Hi again Jason, I am running this code using my data, which is in 10 minutes intervals instead of 1 hour and has I used 5 features instead of 8.
my code is as follows which shows where I modified using my data:

# specify the number of lag hours
n_hours = 6
n_features = 5
# frame as supervised learning
reframed = series_to_supervised(scaled, n_hours, 1)
print(reframed.shape)

# split into train and test sets
values = reframed.values
n_train_hours = 584*144
train = values[:n_train_hours, :]
test = values[n_train_hours:, :]

# split into input and outputs
n_obs = n_hours * n_features
train_X, train_y = train[:, :n_obs], train[:, -n_features]
test_X, test_y = test[:, :n_obs], test[:, -n_features]
print(train_X.shape, len(train_X), train_y.shape)

# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], n_hours, n_features))
test_X = test_X.reshape((test_X.shape[0], n_hours, n_features))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

running the above lines i get the following shapes :
(84096, 30) 84096 (84096,)
(84096, 6, 5) (84096,) (21306, 6, 5) (21306,)

I get error when I run the lines below:

import math
# make a prediction
yhat = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], n_hours*n_features))

# invert scaling for forecast
inv_yhat = np.concatenate((yhat, test_X[:, -4:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = np.concatenate((test_y, test_X[:, -4:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]
# calculate RMSE
rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
print(‘Test RMSE: %.3f’ % rmse)

ValueError: Error when checking input: expected lstm_1_input to have 3 dimensions, but got array with shape (21306, 30)

I cant figure out where the error is.

Reply
- Jason Brownlee September 15, 2019 at 6:15 am #
  
  The error suggests that there is a mismatch between your loaded data and the expectations of the model.
  
  You can change the model or change the data. This may help:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
  - Abhi February 29, 2020 at 3:31 am #
    
    I used your data and got this error too. If I am not wrong here at ths line
    model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
    
    we are adding a 2d array and expecting a 3d share. Again I am not sure
    
    Reply
Nischay September 17, 2019 at 7:44 pm #

Thanks for the code.
But i have a slight problem the code only works for prediction and not for forecasting for future dates given only the 7 features , the values of Pollution is not being forecasted. How do i forecast the values for Pollution given the date and 7 features?

Reply
- Jason Brownlee September 18, 2019 at 6:00 am #
  
  Prediction is forecasting the future.
  
  Perhaps this will help:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
Shekhar P September 19, 2019 at 5:08 pm #

Hello Sir, Thanks for such an helpful tutorial.

I used this code above for forecasting Electrical load. In multivariate, I have parameters like: Load, Rainfall, Temp, HetIndex, WindChill, festival Index. But my results with Univariate and multivariate are almost same. Why So? Why my effect of Rainfall not getting incorporated in model?
1) Please guide me for MVInput and
2) Predict the pollution for the next hour as above and given the “expected” weather conditions
for the next hour.

Reply
- Jason Brownlee September 20, 2019 at 5:35 am #
  
  Perhaps the additional variates are not predictive of your target?
  Perhaps you need to tune the model?
  Perhaps you need to try alternate models?
  …
  
  Reply
  - Shekhar P September 21, 2019 at 5:24 pm #
    
    Thanks Sir..Trying to figure out.
    
    Reply
Ragul Kesavan S September 22, 2019 at 1:40 am #

Hello Sir, Thanks for such an helpful tutorial.
I am looking to apply multivariate spatial temporal model to predict pollution parameters at different locations .How should I build my model with RNN and LSTM.

Reply
- Jason Brownlee September 22, 2019 at 9:34 am #
  
  Perhaps try a CNN-LSTM or ConvLSTM?
  
  This is a good place to start:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Vee87 September 23, 2019 at 11:18 pm #

Hello, can you tell how to get the accuracy of the trained LSTM instead of the RMSE value?

Reply
- Jason Brownlee September 24, 2019 at 7:45 am #
  
  You cannot calculate accuracy for regression, learn more here:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-calculate-accuracy-for-regression
  
  Reply
  - Vee87 September 24, 2019 at 7:59 pm #
    
    Thank you so much!
    also i referred to the tutorial https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/ on making predictions of the LSTM model we saved.
    
    so according to this example if ii want to predict one step ahead then i should give input data of a previous step. am i right? can i adjust the model so that i could make the rest of the parameters (dew point, temperature, pressure, wind direction, wind speed) as inputs to the system and the ‘pollution’ as the output which i can predict for a number of days ahead?
    
    # split into input and outputs
    train_X, train_y = train[:, :-1], train[:, -1]
    test_X, test_y = test[:, :-1], test[:, -1]
    
    can we adjust it through this line?
    
    Reply
    - Jason Brownlee September 25, 2019 at 5:57 am #
      
      The input to your model will be whatever you have defined the model to expect as input.
      
      If you train the model to expect 7 days of input, you must provide 7 days of input to make a one step prediction.
      
      Reply
Sam07 September 24, 2019 at 4:01 pm #

Hi Jason I am new to time series
I have a dataframe with columns like storeid,temp,brand,category and want to forecast it’s sales
here category and brand are categorical and encoded them to numeric and have the data preprocessed and the date ranges from jan to apr and I want to forecast for may.
Here in this blog the code is written for hours but my data is on day’s.

Also in invert scaling the forecast how do I change this part of code on explanation is given on this(may be I would not have noticed).

inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]

thanks for your help.

Reply
- Jason Brownlee September 25, 2019 at 5:51 am #
  
  Good question, I recommend starting with a linear model and work your way up to more advanced models.
  
  This framework will help:
  https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
  
  See the tutorials here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Also, this will help for multi-product models (replace site with product):
  https://machinelearningmastery.com/faq/single-faq/how-to-develop-forecast-models-for-multiple-sites
  
  Reply
Shekhar P September 28, 2019 at 7:04 pm #

Hello Dr.,

Can you direct me for below variation:

I want to Predict the pollution(or any dependent variable) for the next hour given the “expected” weather conditions for the next hour.

It would be great help, if you.

Reply
- Jason Brownlee September 29, 2019 at 6:09 am #
  
  Yes, the expected weather conditions would be another input variable with the other variables.
  
  Does that help?
  
  Reply
  - Shekhar P October 1, 2019 at 8:32 pm #
    
    Thanks For the reply.
    But my Inputs are: load, Temp, Rainfall, HeatIndex.
    
    Now shall I add like: load, Temp, Rainfall, HeatIndex, ExpTemp, ExpRainfall, ExpHeatIndex
    
    But then how will data preparation for historic data?
    I mean, do i have to add expected values of weather variables for all past days?
    Please elaborate.
    
    Thanks in advance
    
    Reply
    - Jason Brownlee October 2, 2019 at 7:57 am #
      
      Yes, you must train the model in the same way you intend to use it – same inputs.
      
      Reply
Radhouane Baba October 1, 2019 at 3:14 am #

Hi Jason,

a very simple Question:
how can the model know whether tomorrow is a holiday, if we feed him with an input that does not contain this information???

should i then shift the features n_output backward so that the Model can recognize what day tomorrow is?

or else the model cannot know that tomorrow is a holiday or “special day” !

Thank you so much

Reply
- Jason Brownlee October 1, 2019 at 6:59 am #
  
  If you have additional information, perhaps provide it to the model to see if it improves skill?
  
  Reply
  - Radhouane Baba October 1, 2019 at 10:47 am #
    
    So how to provide these data??
    
    My idea is for example to provide the features for example shifted up n-output steps
    
    so that the model sees them in the input… and knows if tomorrow a special day or not..
    
    Might it work?
    
    Reply
    - Jason Brownlee October 1, 2019 at 2:18 pm #
      
      Perhaps a boolean variable, e.g. a flag or integer.
      
      Reply
      - Shekhar P October 1, 2019 at 8:51 pm #
        
        Hi Dr. Can we shift those independent variables one day before?
        Means, if holiday is on 25 july, then we can mention it one day before, on 24 July in data, then model will change dependent variable accordingly.
        
        I think this is the right way…Please check n reply.
      - Jason Brownlee October 2, 2019 at 7:58 am #
        
        I see, good question.
        
        You could provide information about the prediction interval as a separate input series, or a separate input to the model. Perhaps try a few framings and see what works best.
Sam October 2, 2019 at 11:45 am #

Hello Doctor Jason. If you save a model using model.save() , can you use it later to just predict ?

I tried it but my model will always start running the model again( training the model again based on the number of epochs set). Is this normal?

I thought it will just predict immediately as you give it an input. Thanks.

Reply
- Jason Brownlee October 2, 2019 at 2:13 pm #
  
  Yes, you can load i later and use it to predict.
  
  Here’s an example:
  https://machinelearningmastery.com/save-load-keras-deep-learning-models/
  
  Reply
  - zafer kovancı June 7, 2021 at 8:06 am #
    
    Hello Jason , I have implemented your codes to my lstm time series prediction model, my model is very close to your model, When I try to save model it gives
    
    NotImplementedError: Layer ModuleWrapper has arguments in __init__ and therefore must override get_config. error
    
    # design network
    regressor = Sequential()
    
    regressor.add(LSTM(units = 32, activation=’tanh’, return_sequences = True, input_shape=(train_X.shape[1], train_X.shape[2])))
    regressor.add(Dropout(0.2))
    
    regressor.add(LSTM(units = 32,return_sequences = True, activation=’tanh’))
    regressor.add(Dropout(0.2))
    
    regressor.add(Dense(units = 1,activation=’sigmoid’))
    
    regressor.compile(loss=’binary_crossentropy’, optimizer=’RMSprop’,metrics=[‘accuracy’])
    # fit network
    history = regressor.fit(train_X, train_y, epochs=55, batch_size=4, validation_data=(test_X, test_y), verbose=1, shuffle=True, callbacks=[lr_sched] )
    
    Reply
    - Jason Brownlee June 8, 2021 at 7:08 am #
      
      Perhaps you need to update your version of Keras and TensorFlow?
      
      Reply
Williams October 3, 2019 at 11:46 pm #

Hello Doctor Jason. Thanks for this amazing tutorials. Quick question. This tutorial predict just the next step. Can I make it predict more than one step , for instance, the next 4 steps?

If so, what changes do I have to make to this current model ? Thanks for your anticipated response.

Reply
- Jason Brownlee October 4, 2019 at 5:42 am #
  
  Yes, I have many examples. Perhaps start here:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
- zafer kovancı June 7, 2021 at 8:32 am #
  
  Sorry Jason ı have imported wrong packages now it is resolved
  
  Reply
  - Jason Brownlee June 8, 2021 at 7:09 am #
    
    I’m happy to hear that!
    
    Reply
Williams October 10, 2019 at 3:26 am #

Hello Doctor. The original dataset you made reference to does not have the ‘Pollution’ column. Even the one with link to github. How come the column (Pollution) is now used in your example? If it was generated, then how was it then?

I want to do something similar with my dataset so I want to follow this example closely. Thanks

Reply
- Jason Brownlee October 10, 2019 at 7:02 am #
  
  As mentioned in the tutorial, “pm2.5” is the pollution column.
  
  Reply
Ivan October 11, 2019 at 1:25 am #

Hi, I have a question on your use of the LabelEncoder() on variable ‘cbwd’ (Combined Wind Direction). What puzzles me is: why label encoding? In this way you are turning ‘cbwd’ into an ordinal variable. Is it realistic to assume so? Why a given direction should have a value “greater than” another direction? Thank you, and thanks also for this tutorial.

Reply
- Jason Brownlee October 11, 2019 at 6:24 am #
  
  I did it for simplicity of the tutorial.
  
  A one hot encoding would be better.
  
  Try it and compare performance. I’m not convinced the variable adds value.
  
  Reply
  - Mustafa Nadeem October 19, 2019 at 4:04 am #
    
    How can we predict in one hot encoding ?
    
    Reply
    - Jason Brownlee October 19, 2019 at 6:50 am #
      
      Perhaps you can summarize the problem you are having exactly?
      
      Reply
Tom October 11, 2019 at 8:18 pm #

Jason,

First of all thank you for sharing your knowledge through this great website. Like many others I really appreciate your input in various machine learning topics.

About this particular post.
In general I understand what you are doing and with minor difficulties I can follow. Currently I’m working on something that fits very well with the topic you gave as development of this post problem, which is:

“Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour.”

I’ve read all of the relevant answers of yours about this question. Yet, I’m still can’t figure out how to correctly prepare my input data to LSTM.
In my case I have data with 5 columns, where first 4 columns are the features (Xs) and 5th column is my result value (Y).
Example below

Power WG Res Cn Yvalue
2019-10-01 09:00:00 1000 100 23 432 87
2019-10-01 10:00:00 1100 88 22 378 82
2019-10-01 11:00:00 1088 123 15 409 89
2019-10-01 12:00:00 1034 134 17 411 83
2019-10-01 13:00:00 1090 111 14 392 81

My dataset consist of 3 year historic data with hourly timestep. I would like to build model to predict next 8 hours of Y but with given the exact values of all 4 features for this predictions. So basically I know what my Power, WG, Res and Cn values are for t, t+1 are and I want to predict the Yvalue.
Now I stuck on preparation of my data, because I have the dataframe with missing only Y values for next 8h (which I want to predict). Should I use only 4 features columns shifted 8h as input to LSTM and Y column as target to LSTM.

Any thoughts or comments will be appreciated. I’ve read many posts of yours but can’t figure out the right answer for my problem.

Reply
- Jason Brownlee October 12, 2019 at 6:56 am #
  
  Thanks Tom.
  
  Great question, there’s no best answer.
  
  You can provide all vars up to t as input to predict t+1, that is straightforward. You can provide the t+1 inputs along side the other inputs, but they will not match up in terms of time steps. Try it anyway and compare results to not including them at all.
  
  Also, you can try a multi-headed LSTM model, one with the vars up to t, and ones with inputs t+1, …, then use a concat layer to combine.
  
  Does that help?
  
  Perhaps I need to write a tutorial on this topic…?
  
  Reply
  - Tom October 12, 2019 at 4:35 pm #
    
    Thanks for your fast response.
    
    In carrying out my problem I will start with this “basic” model where all data up to t will be input. Then I will use my t+1, t+2..,t+8 data as input in:
    
    model.predict(input[t+1..t+8]). I would rather avoid providing t+1 also as input due to match up correct values.
    
    To be honest, I doubt that I could create multi-headed LSTM model with my current level of experience.
    
    Thank you for your input 🙂
    
    If you decide to write tutorial on this topic I believe that many of your readers will benefit from such a post.
    
    Anyway, your website is quite high in Google search position (on phrase “machine learning”). Hopefully it will reach top 3 someday.
    
    Reply
    - Jason Brownlee October 13, 2019 at 8:28 am #
      
      Sounds like a good start.
      
      Yes, I’ll whip something up and compare a few approaches.
      
      Thanks.
      
      Reply
      - Tom October 15, 2019 at 6:20 pm #
        
        Jason you wrote previously:
        
        “Also, you can try a multi-headed LSTM model, one with the vars up to t, and ones with inputs t+1, …, then use a concat layer to combine.”
        
        Could you elaborate how to set this in model or which tutorial of yours cover this?
        
        On more thing. Correct me if I’m wrong.
        
        Base on your tutorials I prepare LSTM model. I used all of my data up to t as my inputs – 4 features, Ys as target. Of course I divided it into train and test (70/30%).
        
        And now I want to use last 8 rows of data as input in model.predict(input…). I assume I can use matrice 8×4 8 timesteps with 4 features directly as input and expect 8×1 output.
        
        Why I state this question in first place:
        In order to prepare data for train and test I used mostly of your code with function to_supervised() which create a lot of additional columns. However it seems to me that last step-prediction-could be achieved without using this function to my data I want to predict. I must admit that I realize it is very basic question but more I read about ML more I feel like on rollercoaster.
      - Jason Brownlee October 16, 2019 at 8:01 am #
        
        Yes, see examples of multi-input models here:
        https://machinelearningmastery.com/keras-functional-api-deep-learning/
        
        Not sure I follow the question. Perhaps try it and see?
Ali October 15, 2019 at 11:27 pm #

Hi Jason,

I have a following multivariate multi-step demand forecasting problem. I am supposed to forecast the demand (quantity) for products out of the assortment. I have data from several warehouses from the last few years. Can you give me any hints regarding the shape of the input?

I would like to start with an LSTM for a single product. Let’s say I have data for the past 3 years for 2 warehouses. I was thinking of using two years for training and one year for testing. As for the forecast, I thought about making a prediction for the next 7 days based on the data from past month. Can you help me with framing of this problem? I am quite lost.

Reply
- Jason Brownlee October 16, 2019 at 8:05 am #
  
  Yes, see this:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Yes, also see this:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
Ali October 16, 2019 at 2:02 am #

One more question. I am supposed to make demand forecast for different products, but it is still connected to the same variable (quantity). Would you describe this as multivariate or univariate problem?

Reply
- Jason Brownlee October 16, 2019 at 8:10 am #
  
  This will help you answer the question:
  https://machinelearningmastery.com/taxonomy-of-time-series-forecasting-problems/
  
  Reply
Mustafa Nadeem October 18, 2019 at 11:18 pm #

Great Work Sir.
I have a situation where I am having a predictive maintenance problem in which I am predicting the error. It is a classification problem
I have data with errorID(target Variable) having 18 codes. There are 4 inputs(JobID, EmployeeID, MachineID, Speed). The data is not correlated to each other in any way. I have to predict the errorID for the future in time series analysis.
Tell me a way sir

Reply
- Jason Brownlee October 19, 2019 at 6:39 am #
  
  Perhaps try modeling it as a time series classification task?
  
  The tutorials here might help as a first step:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Behrouz October 21, 2019 at 4:41 pm #

Hi Jason,
Thank you so much for such a great source. It’s wonderful.
May I ask a question about ‘validation’ and ‘test’ in the code?
I noticed the validation part of the dataset is used for testing later? Does it cause overfitting?
The RMSE that I get is very good, but I believe it is because the test data is used for validation earlier.
Thanks again.
Cheers,
Behrouz

Reply
- Jason Brownlee October 22, 2019 at 5:41 am #
  
  I recommend using a separate validation set.
  
  I use test for validation for brevity.
  
  Reply
Corey October 25, 2019 at 1:07 am #

This may be a silly question but I’m failing to understand how this is predicting the next value, when I run your code verbatim the yhat output seems close to the t-1 variable of the test data which was part of the input of the model.

e.g.

t-1 of pollution is 0.0362173, actual 0.0311871 predicted output 0.0346678
next row then
t-1 of pollution is 0.0311871 actual 0.0201207 predicted output 0.0312007

and this trend continues, am i missing something or is the output of the prediction pretty much the same as the input value?

Reply
- Jason Brownlee October 25, 2019 at 6:46 am #
  
  Yes, the model is not great – it learned a persistence model.
  
  Reply
Felipe October 29, 2019 at 4:54 am #

Hi Jason. Your example is very interesting! Thank you for sharing.

Can you give me a tip?

I used the same example for prediction on my dataset.
I only changed the dataset.

The RMSE resulting is very high! About 50,000.

I have about 870 samples, where 600 sample are used for training and remaining to validation.

I’ve tested with 1 and 2 variables for input.

What could be wrong? Any idea?

Thanks!

Reply
- Jason Brownlee October 29, 2019 at 5:33 am #
  
  You may need to prepare the dataset for modeling and tune the model to your dataset.
  
  Reply
Felipe October 29, 2019 at 6:09 am #

Hi Jason, thanks for your reply.

I used MinMaxScaler to normalize the features and two LSTM layers (with 100 units each) to create the model.

Could you give me please any other suggestion?

Do you think that create a model CNN-LSTM Encoder-Decoder could improve the results?

Thanks.

Reply
- Jason Brownlee October 29, 2019 at 1:46 pm #
  
  I recommend testing a suite of data preparation methods, framings of the problem and different models.
  
  Also, see the tips here for getting the most out of a given model:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
D. James October 29, 2019 at 6:21 am #

Hi Jason. Thanks for this tutorial. I’m trying to do something similar to your multiple lag timesteps example above, except I want to predict pollution in the next hour given past observations as well as the expected weather conditions in the next hour. I’m not sure how to include the future weather conditions as features. At that timestep, there will be (t-1) features because pollution is what we’re trying to predict and is therefore not included as a feature. How would you go about doing this? Thank you!

Reply
- Jason Brownlee October 29, 2019 at 1:47 pm #
  
  The observations for the future could be provided along side the history inputs or as part of a second input to the model, e.g. a multi-input model:
  https://machinelearningmastery.com/keras-functional-api-deep-learning/
  
  Reply
  - D. James October 31, 2019 at 2:28 am #
    
    Thank you!
    
    Reply
Qizal Ashfaq October 30, 2019 at 8:41 am #

from pandas import read_csv
from datetime import datetime
# load data
def parse(x):
return datetime.strptime(x, ‘%Y %m %d %H’)
dataset = read_csv(‘GHI_total.csv’, parse_dates = [[‘year’, ‘month’, ‘day’, ‘hour’]], index_col=0 , date_parser=parse)
# manually specify column names
dataset.columns = [‘temp’, ‘w.s’, ‘Hum’,’GHI’]
dataset.index.name = ‘date’
can u tell me my mistake.it gives this error.
TypeError: parse() takes 1 positional argument but 4 were given

Reply
- Jason Brownlee October 30, 2019 at 1:56 pm #
  
  I have some suggestions here that might help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Qizal Ashfaq November 2, 2019 at 5:18 am #
    
    i have checked multiple times.it always gives error in this this.
    dataset = read_csv(‘GHI_total.csv’, parse_dates = [[‘year’, ‘month’, ‘day’, ‘hour’]], index_col=0 , date_parser=parse)
    TypeError: parse() takes 1 positional argument but 4 were given
    
    Reply
    - Jason Brownlee November 2, 2019 at 6:53 am #
      
      Perhaps try posting your code and error message to stackoverflow?
      
      Reply
      - Qizal Ashfaq November 6, 2019 at 3:27 am #
        
        this problem solved
      - Jason Brownlee November 6, 2019 at 6:45 am #
        
        Happy to hear that.
Jiggy October 31, 2019 at 12:04 am #

Hi Jason,

when I am using the inverse transform function to get the original data back I am not getting it. Can you tell me why?

Reply
- Jason Brownlee October 31, 2019 at 5:31 am #
  
  Perhaps there is a bug in your implementation?
  
  Reply
Jiggy October 31, 2019 at 6:31 pm #

No, I have two data colums and one data colums is getting transformed back to its original value but the other column is not getting back to the same original values instead it is creating ts new value.

Reply
- Jason Brownlee November 1, 2019 at 5:27 am #
  
  Perhaps confirm that the data has the same column order when the transform is fit, applied and inverted?
  
  Reply
Abby November 1, 2019 at 8:58 am #

Hi Jason,

I am getting broadcast error when doing inverse_transform. The shape of array when it was scaled was different (as it was the raw shape). While after concatenating yhat +test_x[:,1:], the shape is different. Is that the reason for following error?

ValueError: operands could not be broadcast together with shapes (719,235) (118,) (719,235)

What should I do in order match the shapes here?

Thank you,
Abby

Reply
- Jason Brownlee November 1, 2019 at 1:41 pm #
  
  Sorry I don’t have the capacity to modify the tutorial to your needs:
  https://machinelearningmastery.com/faq/single-faq/can-you-change-the-code-in-the-tutorial-to-___
  
  Reply
HAO LIU November 4, 2019 at 4:00 am #

Hi Jason,

thanks for your sharing which is impressive.
I have been studying time series predictions. But I have some speciatial problems.

I have different sets of time series data at different conditions. for example: data_A is potato growth factor for 100days at 10°C and data_B is potato growth factor for 100days at 15°C.
and data_C (20°C), data_D (25°C) .

I know that I can use multiregression method to predict the growth factor at these different temperatures (10 °C, 15°C, 20°C, 25°C).

But I want to use these data to predict the growth factor at 30°C which is out of the temperature range.

are there any methods or algorithms to predict it?

looking forward to your reply.

Best regards
Hao

Reply
- Jason Brownlee November 4, 2019 at 6:49 am #
  
  Yes, you could fit a model to learn the relationship between temp and growth, then plug in new temperature and see the growth.
  
  Typically a linear model is used to you can interpret the coefficients.
  
  Reply
  - HAO LIU November 4, 2019 at 9:05 pm #
    
    Hi Jason,
    
    thanks for your reply!
    
    Can ANOVA be used?
    
    Reply
    - Jason Brownlee November 5, 2019 at 6:52 am #
      
      To explain the observed variance, perhaps.
      
      Reply
  - HAO LIU November 6, 2019 at 1:28 am #
    
    Hi Jason,
    
    I am sorry that I didn’t explain correctly.
    
    The potato was actually placed in a chamber so the temp was unchanged consistently. at this condition, we have a time series data of potato growth for 100days.
    then, we changed the chamber temp and then we got another set data.
    
    so the temp is a preset variable, and the growth is time series data at this preset condition.
    
    in our question, we want to predict the growth time series data at other specific temp.
    
    are there any methods available to predict? could you suggest some links about this kind of questions?
    
    Thanks in advance!
    
    Best regards
    Hao
    
    Reply
    - Jason Brownlee November 6, 2019 at 6:40 am #
      
      Good question. Without thinking too hard, I think it is not a prediction problem, it is a modeling problem.
      
      Nevertheless, some ideas:
      
      – Try a mutlistep time series forecasting problem forecasting size from an initial size and temperature.
      – Try a regression problem predicting final size given initial size and temperature.
      
      Reply
Qizal Ashfaq November 6, 2019 at 3:25 am #

It is very difficult for begginers to understand this.Kindly explain each and every line plz.I want to understand this code but failed.kindly help me plz.

Reply
- Jason Brownlee November 6, 2019 at 6:45 am #
  
  Start with a gentle introduction and progression in complexity here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
  - Qizal Ashfaq November 6, 2019 at 10:24 am #
    
    thats great.thnku for helping me.
    
    Reply
    - Jason Brownlee November 6, 2019 at 2:16 pm #
      
      You’re welcome.
      
      Reply
Abhilash November 6, 2019 at 6:37 am #

No problem, I understood what was wrong now. Closely looking at outputs at different steps from your example sample case and from the case that I am working on helped me figure out the reason.

Just for reference to somebody who might have a similar problem- here’s what I was missing

I forgot to modify this line of code based on needs of my data.

reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)

Thanks Jason for the tutorial! It’s been very helpful. Do you have any thoughts/ reference on the theory of rnn and lstm rnn? Also, which other methods will you suggest for carrying a comparison ?

Reply
- Jason Brownlee November 6, 2019 at 6:50 am #
  
  Happy to hear that!
  
  Theory – perhaps the original papers or the deep learning textbook.
  
  Other methods – yes here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Ismet November 7, 2019 at 12:25 am #

Hi Jason,

first of all thank you for this awesome tutorial.
I have a question regarding an important hyperparameter:

Why did you choose the amount of LSTM units in the LSTM layer in Keras as 50, is there any reason behind that especially for your data set or just random?
I tried for my own time series data set different units and experienced with 1 unit a low and smooth val loss towards 0, but with 50 units a zigzagging curve.

My data set is a csv file with approx. 24k samples (rows), 7 features and 1 label (columns)

It would be awesome if you could give some suggestions.

Best regards from germany
Ismet

Reply
- Jason Brownlee November 7, 2019 at 6:43 am #
  
  I chose the config after some trial and error.
  
  There are no good theories on how to choose the number of layers or nodes, see this:
  https://machinelearningmastery.com/faq/single-faq/how-many-layers-and-nodes-do-i-need-in-my-neural-network
  
  Reply
Ehsan November 8, 2019 at 2:42 am #

Hi Jason,

MAny thanks for writing this.

Assuming that we we want to predict value at current (t)
Question, if we use the LSTM to benefit from its memory, then why we provide multiple points from the past (t-1, t-2) as input? My understandig was that only one history (only t-1) would be enough. What am I missing here?

Reply
- Jason Brownlee November 8, 2019 at 6:52 am #
  
  We are using an efficient LSTM that takes a vector of inputs and processes them one at a time internally, rather than processing a vector of one element at a time.
  
  Reply
Ryan November 9, 2019 at 11:18 pm #

I ran your code and got a miserable 3.9% validation accuracy. What’s gone wrong? What alternative models would suggest for multivariate time series forecasting?

Reply
- Jason Brownlee November 10, 2019 at 8:21 am #
  
  You cannot measure accuracy for time series forecasting, instead you measure error.
  
  More here:
  https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/
  
  Reply
Sabbir November 14, 2019 at 9:48 am #

Hi, I have a weather dataset of input shape of (8016, 8) and output of (8016,4). I am a new learner. I was wondering how should I reshape the input for LSTM as I want every output should look up previous two weeks data that is 336 timesteps.

Reply
- Jason Brownlee November 14, 2019 at 1:44 pm #
  
  This will explain the basics:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
ptk November 16, 2019 at 1:24 pm #

Hello,

Thank you for the very nice tutorial.

I see this error while running this code. Could you please help me figure out what’s wrong.

ValueError Traceback (most recent call last)
in
4 # invert scaling for forecast
5 inv_yhat = concatenate((yhat, test_X[:, -7:]), axis=1)
—-> 6 inv_yhat = scaler.inverse_transform(inv_yhat)
7 inv_yhat = inv_yhat[:,0]
8 # invert scaling for actual

~/anaconda3/envs/anaconds_python3.6_tf2.0/lib/python3.6/site-packages/sklearn/preprocessing/data.py in inverse_transform(self, X)
404 force_all_finite=”allow-nan”)
405
–> 406 X -= self.min_
407 X /= self.scale_
408 return X

ValueError: operands could not be broadcast together with shapes (32397,8) (12,) (32397,8)

Only thing i have changed in the given code is

dataset[‘cbwd’] = encoder.fit_transform(values[:,4]) while encodind wind direction.

Reply
- Jason Brownlee November 17, 2019 at 7:12 am #
  
  Sorry to hear that, perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Ayhan Akgun November 18, 2019 at 2:36 am #

Hi Jason,

I need a problem solution for multivariate time-series forecasting problem. DAtaset has Tru/False campaign description, I want to prepare a model with this variable and want to observe how is the campaign effect on sales roughly.

Which models and approach you can recommend for this problem? Also which material are proper for providing solutions in mutivariate systems, analysing the variables effects on forecasted data.

You have two materials related to those topics which focused on deep learnng and other one LSTM . I am not sure how to approach for solution.

Can you give some advice?

Reply
- Jason Brownlee November 18, 2019 at 6:49 am #
  
  I recommend testing a suite of models in order to discover what works best for your dataset.
  
  It sounds like a time series classification problem, perhaps start here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Bob November 18, 2019 at 10:41 pm #

Thanks for your nice tutorial, Doc. Brownlee! I hope to read a post that about the the case study between LSTM 、BP neural networks、SVM、ELman neural networks， etc.

Reply
- Jason Brownlee November 19, 2019 at 7:42 am #
  
  Thanks for the suggestion!
  
  Reply
Qizal November 19, 2019 at 7:35 am #

I have 2018 year data available for testing and 2015-2017 data for training.By giving 2018 data for testing i want to predict 2019 data.can this model do this? I am new to lstm.

Reply
- Jason Brownlee November 19, 2019 at 7:53 am #
  
  Perhaps test the model and evaluate its performance on your dataset, compare to a naive model and a linear model to see if it has skill?
  
  Reply
  - Qizal November 19, 2019 at 10:31 am #
    
    I am trying to understand what is this model predicting.if i will give this model 2018 data for testing does it predict 2019 data? The code which you have made above with 1 time step i am talking about that.
    
    Reply
    - Jason Brownlee November 19, 2019 at 1:59 pm #
      
      Perhaps start with the simpler models here:
      https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
      
      Reply
Anjana November 20, 2019 at 10:15 pm #

Hi Jason,

I have two variables x1 and x2. I want to use lag 2 values of x1 and lag 3 values of x2 for predicting y. Can you please advise how to prepare the input file

Reply
- Jason Brownlee November 21, 2019 at 6:05 am #
  
  You can use the function described in this tutorial:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  Reply
Harish Chidanandappa November 21, 2019 at 2:19 am #

Hi jason,
I am implementing this model for a different time series prediction of postion.
i am having no problem till the test vs plot graph. later when i try to predict and do the inverse transform im getting this error : ValueError: operands could not be broadcast together with shapes (48,9) (5,) (48,9) .

could you help me with this.

Reply
- Jason Brownlee November 21, 2019 at 6:09 am #
  
  Perhaps step through the code and adjust the plot section for your dataset as well?
  
  Reply
david November 21, 2019 at 5:31 pm #

I am trying to fit a LSTM model for sales volume data for multiple market and there are 8000 data points. If I take one market then the no of data points comes down to 156. Should I take the smaller dataset and upsample or go with the bigger one.

Reply
- Jason Brownlee November 22, 2019 at 5:59 am #
  
  Perhaps explore a few different framings/scales and see what works best for your dataset? Also try mixed approaches with different models?
  
  Reply
Jatin November 21, 2019 at 10:09 pm #

I want to use features from current timestep and previous few timesteps for current y. How to do that?

Reply
- Jason Brownlee November 22, 2019 at 6:04 am #
  
  Great question!
  
  Perhaps a multi-input model, one input for the lag obs, one for the current time obs, then the model merges the inputs and feeds to the rest of the model.
  
  Reply
  - Jatin November 25, 2019 at 6:50 pm #
    
    Thanks Jason. Can you please suggest any tutorial for the same.
    
    Reply
    - Jason Brownlee November 26, 2019 at 6:00 am #
      
      Sorry, I don’t have a tutorial on this topic. Perhaps soon.
      
      Reply
Nishant Mathur November 23, 2019 at 5:13 am #

Hi Jason, I am performing a time series analysis with LSTM on an hourly data for air quality, which has variables like PM2.5, PM10, CO, Temprature, SO2, 03,SO2 and Wind speed.
Now what I am getting confused with is the kind of test that I need to perform before applying LSTM. Do I need to check the Stationarity and Seasonality both or just one?

Thank You

Reply
- Jason Brownlee November 23, 2019 at 6:54 am #
  
  Perhaps start by fitting the raw data.
  
  Then see if you can improve model skill with data scaling, and stationary transforms?
  
  Reply
  - Nishant Mathur November 23, 2019 at 2:01 pm #
    
    Thank you for the reply Jason!
    I did as you suggested and I am getting an RMSE of 28.23 for my LSTM model, is it a good RMSE or should I try making my data stationary ?
    
    Reply
    - Jason Brownlee November 24, 2019 at 9:16 am #
      
      Perhaps compare the RMSE to that of a naive model, like a persistence model?
      
      Perhaps try making the data stationary and compare?
      
      Reply
      - Nishant Mathur November 25, 2019 at 10:14 am #
        
        Thanks for your input Jason
        Much appreciated!
      - Jason Brownlee November 25, 2019 at 2:07 pm #
        
        You’re welcome.
jessy November 24, 2019 at 12:08 pm #

Hi jason,
You droped few columns here why…whether this features will not suitable for prediction

# drop columns we don’t want to predict
reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)
print(reframed.head())

Reply
- Jason Brownlee November 25, 2019 at 6:21 am #
  
  As mentioned in the post, the weather variables for the time step we are predicting are removed. We don’t want them as input or output.
  
  This is to meet the chosen framing of the problem, you may choose to frame the problem differently.
  
  Reply
Qizal Ashfaq November 30, 2019 at 8:23 pm #

when i invert values after using minmaxscaler my values are changed i am not getting my actual values.why this happened?

Reply
- Jason Brownlee December 1, 2019 at 5:42 am #
  
  Perhaps a bug was introduced in your code?
  Perhaps preparing a separate program to confirm your understanding of transform and inverse transform.
  
  This might help:
  https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/
  
  Reply
siri December 4, 2019 at 4:48 am #

Hi Jason,

I tried running the file but i keep getting this error at the invert scaling for forecast stage:

cannot concatenate object of type ”; only Series and DataFrame objs are valid

Reply
- Jason Brownlee December 4, 2019 at 5:50 am #
  
  I’m sorry to hear that, I have some suggestions here that might help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Peter Isaac December 6, 2019 at 7:35 am #

Hi Jason,

Many thanks for the very informative tutorial. I had to tweak the Keras import and some of the pandas syntax, probably slight differences between versions (I’m still on V2.7), but everything was good after that.

There is a phase difference of 1 time step between inv_y and inv_yhat (inv_yhat leads inv_y by 1 time step). Before correcting for the phase difference, I get RMSE=26.756, after correcting I get RMSE=6.180. May not need to tune the network after all …

Cheers,
Peter

Reply
- Jason Brownlee December 6, 2019 at 1:38 pm #
  
  Nice work!
  
  Perhaps check that you have not learned a persistence model:
  https://machinelearningmastery.com/faq/single-faq/why-is-my-forecasted-time-series-right-behind-the-actual-time-series
  
  Reply
Ravi Pandit December 8, 2019 at 6:17 am #

How you are going to forecast future time series values? which function we have to use for this /

Reply
- Jason Brownlee December 8, 2019 at 6:18 am #
  
  You can make predictions by calling predict(), this will help:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
kasun udara December 9, 2019 at 2:01 am #

how to visualize predicting data in graphically?

Reply
- Jason Brownlee December 9, 2019 at 6:53 am #
  
  You can use the matplotlib plot() function and pass in the actual and expected values in separate calls.
  
  Reply
SoumyaRanjan December 13, 2019 at 5:56 pm #

Hi Jason,
Thank you very much for this wonderful blog. I could not find a single material on multivariate time
series forecasting using LSTM on the internet until I found your blog.

Thanks again!!

I have 2 doubts:
1. While reshaping the X_train into a 3D matrix , what does the term “timesteps” mean?
Is it same as the delay we are giving i.e time stamp delay by 1 , i.e (t-1)?

(please see below)
samples=No. of data points
timesteps=???
features=No.of features

# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))

2. In keras official documentation the sahape of the 3D matrix is defined as follows:

batch_size, timesteps , input_dim
Which is little different from your code.
What is batch size here?

Could you please reply ASAP ?
Thank you!

Reply
- Jason Brownlee December 14, 2019 at 6:07 am #
  
  Thanks!
  
  More on timesteps and the input shape here:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
Soumya Ranjan December 14, 2019 at 9:34 pm #

Thank you for your quick response..
Will go through the link..

Reply
- Jason Brownlee December 15, 2019 at 6:04 am #
  
  You’re welcome.
  
  Reply
SoumyaRanjan December 15, 2019 at 5:11 am #

Hello Jason,
One more doubt.Could you please clarify it?

In the code..

inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]

Why is y_hat concatenated with test_X?
Why can’t we directly inverse transform using scaler.inverse_transform?

Reply
- Jason Brownlee December 15, 2019 at 6:11 am #
  
  The input to the scaler when inverting the transform must have the same shape/same columns in the same order as when the fit on the transform was performed.
  
  Reply
  - Soumya Ranjan December 16, 2019 at 12:00 am #
    
    Thank you!
    
    Reply
    - Jason Brownlee December 16, 2019 at 6:16 am #
      
      You’re welcome.
      
      Reply
      - SoumyaRanjan December 16, 2019 at 2:44 pm #
        
        Instead of concatenating yhat with test_X values, can I create any matrix(may be zero or unit matrix) and concatenate with yhat such that it has same dimension as when transformation was done ?
      - Jason Brownlee December 17, 2019 at 6:28 am #
        
        Sure.
sushanth December 15, 2019 at 2:08 pm #

Hi Jason

In the following step & in general, why do we take train_y as only one dimension? Shouldn’t we take more than 1 dimension and try to fit best fitting plane or hyper plane?
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]

Kindly explain.

Reply
- Jason Brownlee December 16, 2019 at 6:10 am #
  
  In this tutorial we have multiple inputs and one output.
  
  You can choose to the model this dataset any way you wish.
  
  Reply
SoumyaRanjan December 17, 2019 at 10:51 pm #

Hi Jason,

Thanks for your help as always!

In this example output is ‘pollution’(variable 1)
In the input matrix ,we have taken time lags of all other variables{(var2(t-1),var3(t-1)….var8(t-1)} as well as the time lag of the out put (var(t-1)).
And the output is Var1(t).
For training it is fine.
My doubt comes in the testing phase..
For testing we can not use Var1(t-1) as an input because we won’t be knowing it as we will be predicting it.

Or in other words

If we are given a test dataset which has all the variables except the output variable(var1 ) , how to do it?

Awaiting your reply..

Reply
- Jason Brownlee December 18, 2019 at 6:07 am #
  
  The assumption in the framing of the problem is that the input data will be available when making a prediction.
  
  If this is not the case for your problem, change the faming of the problem.
  
  Reply
  - SoumyaRanjan December 19, 2019 at 4:51 pm #
    
    Hi Jason,
    Thank you for the reply.
    
    But that is where I am stuck now.
    If I train the model with one data and I want to predict the output for another data (which has the same features as the training data),how should I proceed without using delayed output in the test data ?
    
    Reply
    - Jason Brownlee December 20, 2019 at 6:40 am #
      
      Perhaps you can adapt the example in the above tutorial for your needs?
      
      Or perhaps start with one of the simpler examples here:
      https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
      
      Reply
      - SoumyaRanjan December 21, 2019 at 10:35 pm #
        
        Ok..Thank you
Jiada December 23, 2019 at 12:34 pm #

Hi Jason, thank you for posting such a great tutorial! I got two questions:

1) Why do you need to do encoder and decoder for # col.5 data?

2) I’m trying to use ‘model.add(Activation(‘softmax’))’ to add activation function for output layer, but this syntax doesn’t work. The error shows ‘Activation’ was not defined’. It is so weird. Do you know how to fix it?

Thanks

Reply
- Jason Brownlee December 24, 2019 at 6:36 am #
  
  You don’t need an encoder-decoder, it is just one approach.
  
  You must import the Activation layer before using it.
  
  Reply
Pranav Jadhav December 24, 2019 at 4:19 am #

Dear Mr.Brownlee,
I used this tutorial to create a model that predicts river streamflow based on the previous day’s rainfall. The code for my LSTM is the same as yours. However, I am getting an RMSE of around 300. What can I do to improve the model?

Reply
- Jason Brownlee December 24, 2019 at 6:45 am #
  
  Well done!
  
  Perhaps some of the suggestions here will help:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
Harsh Yadav December 30, 2019 at 5:20 pm #

Hi Jason, I have a doubt on how to formulate my data for the following step mentioned by you:
— Predict the pollution for the next hour as above and given the “expected” weather conditions
for the next hour.

Btw, thanks for such an amazing tutorial.

Thanks & Regards

Reply
- Jason Brownlee December 31, 2019 at 7:26 am #
  
  Thanks.
  
  I may cover that in a future tutorial.
  
  Reply
  - Harsh Yadav January 3, 2020 at 3:06 pm #
    
    Greeting for the new year!
    
    I am really stuck at this problem, it would be great if you can help me out in just preparing the input data for such a case.
    
    Thanks and Regards
    
    Reply
Serap December 31, 2019 at 4:59 am #

Hi Jason, I have some questions:

1. I am not sure how to interpret my MAE result which is 0.039 ? Should I think like my result might have difference from actual values between the range of MAE?

2. Do you suggest me to use MAPE to interpret model accuracy? (I assume MAPE is nothing but percentage display of MAE.)

3. My test MAPE result is 98.4 which seems almost as same as actual values. Could I think this is good model fitting ? Or what do you suggest me to do before saying the model result is good and model’s result is reliable?

4. At the preparation step, I did not check either the series are stationary or autocorrelated. Should I consider those before fitting the model or we do not have to do those for sequence data if we use Neural Network?

Thanks,

Serap.

Reply
- Jason Brownlee December 31, 2019 at 7:37 am #
  
  MAEs are relative. Compare all results to a MAE achieved via a persistence model to see if it has skill.
  
  Use a metric that best captures what is important about the skill of a model to you and stakeholders.
  
  Monitor learning curves to see if the model is overfitting:
  https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
  
  Perhaps try making the series stationary prior to modeling and compare final results.
  
  Reply
  - Serap January 1, 2020 at 6:49 am #
    
    Thank you for replying Jason. I just need to understand, even my test result is higher could we talk about overfitting at that situation ? Because I was wondering that when I get higher accuracy on training data but really lower accuracy at test data then we are able to say it is overfitted. However, in my results, the test accuracy is really high almost 98. Could we say even at that situation it is overfitted ?
    
    Thanks.
    
    Reply
    - Jason Brownlee January 2, 2020 at 6:35 am #
      
      Maybe, but it might not matter.
      
      If you have great skill on the test set, it could be a sign that the test set is too small or not representative.
      
      Reply
      - Serap January 3, 2020 at 6:23 am #
        
        I have only 1092 observations and I splitted %80-%20. I used “shuffle_buffer” hyperparameter in LSTM, btw. Is there anything you suggest me to do ?
        
        Thanks,
      - Jason Brownlee January 3, 2020 at 7:36 am #
        
        What is “shuffle_buffer”?
      - Serap January 4, 2020 at 4:40 am #
        
        It is a hyperparameter shuffles data in tensorflow.
      - Jason Brownlee January 4, 2020 at 8:40 am #
        
        Sorry, I am not familiar with it.
Qizal Ashfaq January 1, 2020 at 7:09 am #

i have ran this example but your code is not returning pollution values back after using scaler.inverse_transformm.can you explain this?all values are totaly changed.

Reply
- Jason Brownlee January 2, 2020 at 6:36 am #
  
  Sorry to hear that, this might help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Qizal January 3, 2020 at 10:03 am #

I want to know if i am giving january 2018 data to this model for testing what is this model predicting ? Is this predicting january 2019 data?

Reply
- Jason Brownlee January 4, 2020 at 8:15 am #
  
  It comes down to how to defined the prediction problem and your model, e.g. see this:
  https://machinelearningmastery.com/time-series-forecasting-supervised-learning/
  
  Reply
Pranav Jadhav January 5, 2020 at 4:48 pm #

Dear Dr. Brownlee,

Is it ok to scale (MinMaxScaler) after calling series_to_supervised, or is there a particular reason you did it first?

Thanks

Reply
- Jason Brownlee January 6, 2020 at 7:10 am #
  
  No, it is better to scale the data prior to converting lag observations to features/timesteps.
  
  Reply
Mahen January 7, 2020 at 10:31 am #

Hi Jason, many thanks for the article and it was very useful to understand and experiment with multi variate time series prediction.

I have implemented similar model with my test data and it works perfectly fine with good accuracy.

However, I am kind of stuck with a future requirement.

My input is like this:

Timestamp, f1,f2,f3,f4,f5,f6,f7

Say my target field to predict is f1 which is dependent on fields f2 to f7.
The current model is able to predict f1 at current timestep based on values (f1,f2,f3,f4,f5,f6,f7) from the previous time step.

However, I now need to predict f1 at CURRENT time step based on values (f2,f3,f4,f5,f6,f7) from the CURRENT time step. My input dataset is a real-time streaming application so I will have access to all features at CURRENT time step, and I want to predict f1 so I can compare predicted f1 versus actual f1 that is arriving at the current time based on dependent features

Any suggestions please ?

Reply
- Jason Brownlee January 7, 2020 at 1:48 pm #
  
  Great question. I need to explore this myself in a future tutorial.
  
  Until then:
  
  – Perhaps model the lag time steps as features?
  – Perhaps try dummy/pad the f1 value?
  – Perhaps try alternate models like MLP?
  
  Reply
  - Mahen January 9, 2020 at 1:59 am #
    
    Thanks for your response Jason.
    
    I tested using dummy values for f1 value at the current timestep, this helped to get more accurate results. As this is required to run every minute, I plan to update the predicted t1 value back into the training data so the next minute prediction will use the output which came from model at the previous minute. The end goal is to trigger another standalone process when there is large variance between prediction versus actual for f1 at current minute.
    
    Eventually I don’t want the model to pickup too much history from the predicted f1 values.
    
    Is there a recommended frequency at which we refresh the full Training data from actuals for f1 values ?
    
    On other note, I didn’t quite get the first suggestion of how to model lag time steps as feature. I will try alternate models as well but so far LSTM seems to work with very good accuracy.
    
    Reply
    - Jason Brownlee January 9, 2020 at 7:30 am #
      
      Not really, design some controlled experiments to see the effect of history on model skill for your dataset.
      
      You can play with using lag obs as features or time steps, its a design choice – numbers in an array. This might help:
      https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
      
      Reply
mark mos January 7, 2020 at 5:43 pm #

Just found this site- excited to be here! Trying to advance my understand of ML.

Reply
- Jason Brownlee January 8, 2020 at 8:20 am #
  
  Thanks, welcome!
  
  Reply
María January 8, 2020 at 9:29 pm #

Hi Dr. Brownlee,
I read https://machinelearningmastery.com/use-different-batch-sizes-training-predicting-python-keras/ a few days ago, but but I don’t understand how batch_size works (if I set batch size or I don’t specify it). In this tutorial (Air Pollution Forecasting) I set the input data as batch_input_shape=(72, train_X.shape[1], train_X.shape[2]) and I get an error: Incompatible shapes: [72] vs. [48]. I don’t know where 48 comes from.
I also thought it should work because the training is done with batch_size=72.

On the other hand, if I don’ t specify batch_size in the predict function, does it use batch_size as used in training?

can you answer me these questions or tell me some reading?

Thank you very much for your time

Reply
- Jason Brownlee January 9, 2020 at 7:25 am #
  
  This might help with the batch size:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-a-batch-and-an-epoch
  
  Reply
Vis January 9, 2020 at 1:43 am #

Hi Jason, find your blogs very useful. Just one question:

Regarding your suggestion of using previous 24 hrs (Predict the pollution for the next hour based on the weather conditions and pollution over the last 24 hours.)

Should I change the arrays to
train_X = train_X.reshape((train_X.shape[0]/24, 24, train_X.shape[1])) ?

Do you have some examples for multivariate time forecasting using more than 1 timesteps, would be interesting to see the accuracy of it?

Thanks,

Reply
- Jason Brownlee January 9, 2020 at 7:28 am #
  
  Thanks!
  
  Yes, perhaps start here:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
Adeel Khan January 13, 2020 at 9:10 pm #

Thank you, Jason, for such an amazing tutorial. I really found your blogs really useful. I would like to know how can I find the feature contribution score(feature importance) in this time series analysis?

Reply
- Jason Brownlee January 14, 2020 at 7:21 am #
  
  Thanks!
  
  I’m not sure off hand, sorry.
  
  Reply
  - Adeel Khan January 14, 2020 at 4:43 pm #
    
    Thank you, Jason, for the reply. Would you like to suggest any material or link which I can look to get the feature importance? I have followed your blog in which you covered feature importance( https://machinelearningmastery.com/feature-selection-time-series-forecasting-python/). But it has directly used (model.feature_importance). For my project, I want to know which meteorological variable is contributing more in forecasting the pollution.
    
    Thank you
    
    Reply
    - Jason Brownlee January 15, 2020 at 8:18 am #
      
      To find the importance of lag observations on one time series, you can use ACF/PACF plots:
      https://machinelearningmastery.com/how-to-develop-an-autoregression-forecast-model-for-household-electricity-consumption/
      
      To find the importance of different whole series, fit a model for each different series removed and compare the relative results.
      
      Reply
María January 14, 2020 at 2:44 am #

Hi Jason!

I need an LSTM model to predict heating consumption in 18 different homes. I have other features that can influence like the square meters of the house, type of insulating material, the number of radiators and the temperature. My question is: Can I make a single model for the 18 homes or should I make 18 different models?

If a single model is possible, the input matrix must be of the type (the homes are in the same city so the temperature is the same):

heat_units temperature m2 insulating n_radiators
step1 (v_1,…,v_18)_1 t_1 (s_1,…,s_18)_1 (i_1,…,i_18)_1 (r_1,…,r_18)_1

… … … … …
stepN (v_1,…,v_18)_N t_N (s_1,…,s_18)_N (i_1,…,i_18)_N (r_1,…,r_18)_N

where each cell is a vector and so far I have not seen such examples. Can you give me something?

Reply
- Jason Brownlee January 14, 2020 at 7:25 am #
  
  This might give you ideas:
  https://machinelearningmastery.com/faq/single-faq/how-to-develop-forecast-models-for-multiple-sites
  
  Reply
Lorenzo January 14, 2020 at 8:57 pm #

Hi De Brownlee,

I am looking for material for autoencoder for multivariate time series to use for anomaly detection, would you raccomend your book?

BR
Lorenzo

Reply
- Jason Brownlee January 15, 2020 at 8:26 am #
  
  I don’t cover autoencoders in the deep learning for time series book.
  
  Perhaps this will help:
  https://machinelearningmastery.com/lstm-autoencoders/
  
  Reply
María January 15, 2020 at 6:27 pm #

Hi Jason, there are time series forecasting problems where you may have data from multiple sites, I would like to develop one model for all sites.

I’ve never seen a (MLP, LSTM) model like this. Can you give me a reference example? Thank you.

Reply
- Jason Brownlee January 16, 2020 at 6:11 am #
  
  Yes, this will give you some ideas:
  https://machinelearningmastery.com/faq/single-faq/how-to-develop-forecast-models-for-multiple-sites
  
  Reply
Bryan January 28, 2020 at 8:30 pm #

Hi Jason, your posts have always been my references to study applications of deep learning and this time series prediction is really insightful.
I wonder if we can predict the “pollution” attribute based on model you created before for the upcoming days, like 7 days ahead or two weeks ahead.. is it possible?

Reply
- Jason Brownlee January 29, 2020 at 6:34 am #
  
  Yes, I give many examples, start here:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
  - Bryan January 29, 2020 at 1:36 pm #
    
    Thank you very much!
    
    Reply
    - Jason Brownlee January 29, 2020 at 1:47 pm #
      
      You’re welcome.
      
      Reply
Bryan January 29, 2020 at 4:46 am #

Hi Jason
How can we use that multivariate model to predict only upcoming pollution value for 1, 2, 3 or even 24 hours ahead?

Reply
- Jason Brownlee January 29, 2020 at 6:48 am #
  
  See this:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
  - Bryan January 29, 2020 at 1:37 pm #
    
    Thank you very much!
    
    Reply
sumunthra January 30, 2020 at 2:25 am #

Hi Jason, I want to input collection of X co-ordinate data, y-coordiante data, jointly train the multivariate CNN to get the classification results based on combination of X and Y. Please suggest on how to proceed. IN summary, how to use multi variate CNN for classification

Reply
- Jason Brownlee January 30, 2020 at 6:55 am #
  
  Perhaps start with this tutorial and adapt it for your dataset:
  https://machinelearningmastery.com/how-to-develop-convolutional-neural-network-models-for-time-series-forecasting/
  
  Reply
JooYeon January 30, 2020 at 3:13 pm #

I’ve been reading through series of your articles and got help from them as I’m a newbie.
But now my head is kind of messed up. I’m wondering whether LSTM can be used for multiple parallel time series or not.

To make a prediction, you used test_X values in this article, like this”yhat = model.predict(test_X)”
Based on this prediction, we can calculate RMSE or see the plot to check if this model is okay to use.
But if I want to forecast future values whose X values are not inside the data set, how can I forecast yhat values? Because “model.predict(…..)” will be empty.
Should I use other models only to predict X values and then come back to LSTM to predict y values?
Or is there other options to forecast in this case?

Thank you in advance

Reply
- Jason Brownlee January 31, 2020 at 7:37 am #
  
  Yes it can.
  
  Correct.
  
  No, model.predict() takes the inputs required to make the prediction. If you model predicts 7 days based on prior 30, then you provide the prior 30 as input.
  
  Reply
  - JooYeon January 31, 2020 at 5:46 pm #
    
    Thank you so much, I appreciate your help!
    
    Reply
    - Jason Brownlee February 1, 2020 at 5:50 am #
      
      You’re welcome.
      
      Reply
  - JooYeon January 31, 2020 at 5:57 pm #
    
    One last question, then what parameters should I use in model.predict()?
    Will it be “steps”?
    
    Reply
    - Jason Brownlee February 1, 2020 at 5:50 am #
      
      predict() only takes one argument, which is the input required to make a prediction. E.g. an array with [samples,timesteps,features] for the predictions to make.
      
      Reply
      - JooYeon February 3, 2020 at 11:15 am #
        
        Thank you so much!
      - Jason Brownlee February 3, 2020 at 1:52 pm #
        
        You’re welcome.
Jeremy February 2, 2020 at 4:03 am #

Hi. Great website.

Do you have best practices for including static data in a multi-step parallel LSTM? Ex, adding demographics to individual shopping or medical claims TS.

Reply
- Jason Brownlee February 2, 2020 at 6:27 am #
  
  Yes, a multi-input model where static data is fed as a separate into, see this:
  https://machinelearningmastery.com/keras-functional-api-deep-learning/
  
  Reply
Rob February 5, 2020 at 3:02 pm #

Hi Jason, it’s a wonderful post! But I am a little confused in “test_X[:, -7:] “below

inv_yhat = concatenate((yhat, test_X[:, -7:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_y, test_X[:, -7:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]

I understand why we need concat the other 7features. Maybe it’s about the inversed tranfrom.
My question is can we use other 7features? I mean in your post, you use the the 7features of (t-1), but can we use the 7features of (t-2) or (t-3) or even (t)?

I am looking forward to your reply

Reply
- Jason Brownlee February 6, 2020 at 8:17 am #
  
  Thanks.
  
  You can use all zeros, or whatever. We only care about the inverse transform of the target.
  
  Reply
Shekhar P February 5, 2020 at 5:08 pm #

Hi Doctor, I have one question here. In line plots above, I can see that : variables Dew, temperature, pressure have co relation among them. Still you are using those in the model. So desn’t it introduce problem of multi-colinearity here? I deally, colinear variables should not be taken in model. Please explain this problem.

Reply
- Jason Brownlee February 6, 2020 at 8:19 am #
  
  Yes, perhaps try removing one of them and evaluate the effect on the model skill.
  
  Reply
shan February 6, 2020 at 6:39 pm #

Hi Jason, I tried your code and it worked fine with my own data set.

I wanted to test something of my own hence I tried simple pain vanilla RNN.
But I am having shape issues with the dense layers. Can you suggest where am I doing it wrong?
Error:

Error when checking target: expected dense_2 to have shape (2,) but got array with shape (1,)

Code:

#X_train.shape = (7141, 1)
#y_train.shape = (7141, 1)
model = Sequential()
model.add(Dense(5, activation=’relu’))
model.add(Dense(2))
model.compile(loss=’mean_absolute_error’, optimizer = ‘adam’)
history = model.fit(X_train, y_train, epochs=10, batch_size=64, verbose=1, shuffle=False)

Reply
- Jason Brownlee February 7, 2020 at 8:11 am #
  
  Well done!
  
  Your output expects 2 features per sample. Ensure your data has this or change the model.
  
  Reply
Maaz February 16, 2020 at 1:03 am #

Hi Dr Jason
When i am fitting the network i get the following warning
C:\Users\******\Anaconda3\envs\pytorch\lib\site-packages\keras\backend\tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
Its due to this warning the code starts to accumulate the memory and ultimately crashes without training the required number of epochs i am using the following versions of tensor flow and keras
Name: tensorflow
Version: 1.14.0
Name: Theano
Version: 1.0.4
Name: Keras
Version: 2.3.1
Can you please help me in this regard to make the code stable.Thanks

Reply
- Jason Brownlee February 16, 2020 at 6:08 am #
  
  Sorry to hear that, your versions look good. Perhaps try running from the command line:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-run-a-script-from-the-command-line
  
  Reply
bunty sahoo February 18, 2020 at 6:03 pm #

Thanks for the post. I have a silly question to ask. My dataset has monthly observations and i want to predict for next month(future).

What should be the time steps value here. I searched a lot but unable to find answer.

Reply
- Jason Brownlee February 19, 2020 at 7:59 am #
  
  The input will be whatever you configured your model to take as input. E.g. if the model takes in 7 days to predict 7 days, then the input will be the last 7 days.
  
  Reply
Zhiyuan Yu February 19, 2020 at 1:20 pm #

Hi Jason,

I would like to know if I want to predict two responses, is it possible to predict two responses at one time instead of doing it one by one?

Best,
Zhiyuan

Reply
- Jason Brownlee February 19, 2020 at 1:33 pm #
  
  Yes, see this:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
  - Zhiyuan Yu February 19, 2020 at 1:36 pm #
    
    Thanks for your help!
    
    Reply
    - Jason Brownlee February 20, 2020 at 6:05 am #
      
      You’re welcome.
      
      Reply
Raanan February 19, 2020 at 9:13 pm #

Hi,

Thank you for the post, it was very educating.

A question:

I have a dataset which I thinks resembles the post, but I’m not sure.

I have two time series with non-matching timestamps,
for example in the pollution problem if we would have separate measurements of pollution at different timestamps in one dataframe, and in another dataframe the other parameters (temp,pressure) measured at different times then the pollution measurements.

How can we then predict the pollution?

Reply
- Jason Brownlee February 20, 2020 at 6:14 am #
  
  This will help:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-handle-discontiguous-time-series-data
  
  Reply
Raanan February 20, 2020 at 9:07 am #

Thank you.
I will try.
Can you elaborate more on Ignore the discontiguous nature of the problem and model the data as-is – how can we do it?

Reply
- Jason Brownlee February 20, 2020 at 11:27 am #
  
  Yes, feed the raw data to the model directly with the discontinuities present.
  
  Reply
Raanan February 21, 2020 at 1:27 am #

But how then we join the tables – on what filed? and what will be the lstm inputs?

Reply
- Jason Brownlee February 21, 2020 at 8:25 am #
  
  Can you please elaborate on your question? What are you referring to exactly?
  
  Reply
Raanan February 21, 2020 at 10:05 am #

Ok.

For example if on dataset is pollution and the other is the other parameters (pressure,temp)
but the measuring time of the pollution and of the pressures+ttemp is not the same exactly,
and we want to create an lstm, like in the post, then what are the inputs to the lstm?

history = model.fit(train_X, train_y, epochs=50, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)

what would be the train_x in here composed of?

Reply
- Jason Brownlee February 21, 2020 at 12:00 pm #
  
  It is whatever you want, such as past observations of the input variables.
  
  This will help in thinking about LSTM input:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
  - Raanan February 22, 2020 at 6:27 am #
    
    Watched the url.
    
    I think my main problem is that 2nd dataset does not have y, so how can I fit model ?
    I think to find the nearest y in time ( or in this case just the nearest row of pollution ) in the 1st dataset , but is it a good way?
    
    Reply
    - Jason Brownlee February 22, 2020 at 6:38 am #
      
      I don’t understand your description.
      
      Perhaps start with a strong definition of your task:
      https://machinelearningmastery.com/taxonomy-of-time-series-forecasting-problems/
      
      Reply
Lopa February 21, 2020 at 10:52 pm #

Hi Jason,

What is the best . way to implement LSTM if there are multiple cities & although they are not correlated but have similar trends. Building separate models will be time consuming but still if we want the forecasts for each city what is the best possible option ?

Reply
- Jason Brownlee February 22, 2020 at 6:27 am #
  
  We cannot know what model/architecture/config will work best for a problem.
  
  The best we can do is use controlled experiments and DISCOVER what works best for a given problem. Get creative!
  
  Reply
  - Lopa February 22, 2020 at 7:08 am #
    
    Thanks Jason !
    
    I have 36 months of daily data for different cities & the monthly patterns are pretty much the same across different years. The volume peaks up during summer months (June,July & August) & then comes down in September.
    
    So, when you talk about controlled experiment what are the options that can be tried/tested in order to find that the LSTM model is capable of remembering the monthly trends which will be useful in generating future forecasts.
    
    Would be great if you can help.
    
    Reply
    - Jason Brownlee February 23, 2020 at 7:18 am #
      
      Choice of data, framing of problem, date preparation, model architecture, model training, etc.
      
      Reply
- Raanan February 23, 2020 at 3:49 pm #
  
  Hi,
  
  Here’s dataset example:
  
  file1: pollution measurments
  
  12:21 35
  12:56 39
  13:31 37
  
  file2: air pressure, temp,humidity,dewp,ls measurment.
  
  12:19 452 96 51 69 70
  12:43 398 56 48 25 12
  13:14 490 72 25 15 90
  13:27 400 88 26 15 80
  
  and the need is to predict pollution in the next 5 measuments
  
  so how can you use file2 data the best way?
  
  Reply
  - Jason Brownlee February 24, 2020 at 7:38 am #
    
    I recommend testing a number of different framings of the problem and different models in order to discover what works best for your data.
    
    Reply
Mas February 22, 2020 at 3:32 am #

I can’t understad did u are predicting just one feature or all?
Where precisely do u select wich predict and where i have to change the code if i want to predict another one?

Reply
- Jason Brownlee February 22, 2020 at 6:32 am #
  
  One. Pollution. Perhaps re-read the tutorial?
  
  Reply
  - mas February 25, 2020 at 12:56 am #
    
    prob my english is not so good, but what i want to say is can i modify this scrip to predict more than one varable? wher ei have to change the code? ty
    
    Reply
    - Jason Brownlee February 25, 2020 at 7:48 am #
      
      Yes, I give examples here:
      https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
      
      Reply
      - mas February 26, 2020 at 1:14 am #
        
        ty so much!
        Can i also ask you why my val and test error has this strange behaviour:
        
        loss: 0.0034 – val_loss: 0.0042
        loss: 0.0024 – val_loss: 0.0038
        loss: 0.0024 – val_loss: 0.0037
        loss: 0.0024 – val_loss: 0.0036
        loss: 0.0024 – val_loss: 0.0035
        loss: 0.0022 – val_loss: 0.0030
        loss: 0.0020 – val_loss: 0.0028
        loss: 0.0019 – val_loss: 0.0030
        loss: 0.0018 – val_loss: 0.0030
        loss: 0.0018 – val_loss: 0.0031
        loss: 0.0017 – val_loss: 0.0032
        loss: 0.0017 – val_loss: 0.0032
        loss: 0.0016 – val_loss: 0.0027
        loss: 0.0016 – val_loss: 0.0029
        loss: 0.0015 – val_loss: 0.0030
        loss: 0.0014 – val_loss: 0.0030
        loss: 0.0014 – val_loss: 0.0029
        loss: 0.0013 – val_loss: 0.0029
        loss: 0.0012 – val_loss: 0.0030
        loss: 0.0012 – val_loss: 0.0029
        loss: 0.0011 – val_loss: 0.0022
        loss: 0.0010 – val_loss: 0.0018
        loss: 9.2146e-04 – val_loss: 0.0012
        loss: 8.8040e-04 – val_loss: 0.0011
        
        Same architecture of your article above just different data set.
        After i plot the result and i got a very nice prevision, calculate mape and get 5%
        Can u help me?
      - Jason Brownlee February 26, 2020 at 8:24 am #
        
        You can use the tutorials here to diagnose issues with your model:
        https://machinelearningmastery.com/start-here/#better
Lopa February 22, 2020 at 5:15 am #

I have gone through your books but couln’t find any relevant example.

Reply
- Jason Brownlee February 22, 2020 at 6:34 am #
  
  Of what exactly?
  
  Reply
- Lopa February 22, 2020 at 7:14 am #
  
  Sorry this is irrelevant couldn’t delete it
  
  Reply
Joe K February 22, 2020 at 5:51 am #

Hi Jason. First of all, your tutorials are the best – they have helped me tremendously! Really dumb question though – how does the LSTM know that ‘pollution’ is the value I am trying to predict as oppose to any of the other features? The network return 1 value but I don’t see where we tell it which one we are predicting. Sorry if the answer is obvious!

Reply
- Jason Brownlee February 22, 2020 at 6:37 am #
  
  Thanks Joe.
  
  We define our samples where the target it is trying to map to is pollution, it makes a prediction and the error between the output and the pollution is used to correct the model.
  
  Reply
mah February 23, 2020 at 7:20 am #

Hi Jason,
Thanks for your perfect tutorial. I am using it on my own dataset and I get good results until the train and validation steps. In the test set, I actually have a question and would be thankful if you can help me.
In the data set you are using the variable you are predicting for the (t-1) is the first column for the input data, so in the evaluation step, you concatenate the target to the test set (as the first column) in order to rescale it to the actual values. I do not know how I can do this when my target value is the 6th column of my input matrix (for (t-1)).

Reply
- Jason Brownlee February 23, 2020 at 7:35 am #
  
  This may help you with numpy array indexes:
  https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/
  
  Reply
  - mah February 23, 2020 at 8:46 am #
    
    Thanks for your response. I went through the website you recommended but actually it did not help. Cause I’m looking for a way to add y-hat to the 6th column of the test_x. Using the concat function will add y-hat to the first column.
    
    Reply
Stephen February 25, 2020 at 12:18 pm #

Hi Jason,

How if the output number is more than one number?

Reply
- Jason Brownlee February 25, 2020 at 1:45 pm #
  
  Sorry, I don’t understand your question. Can you please elaborate?
  
  Reply
Mary Jasmine.E February 26, 2020 at 9:11 pm #

in pollution.csv there is a column pollution. How did u calculate pollution from raw.csv?

Reply
- Jason Brownlee February 27, 2020 at 5:46 am #
  
  I did not calculate it. It was provided in the file.
  
  Reply
Roy February 28, 2020 at 2:06 am #

Very nice!
Can I ask you with which architecture you will solve the problem of classification sequence having 30 time step of 6 variable?

Reply
- Jason Brownlee February 28, 2020 at 6:17 am #
  
  I recommend testing a suite of model types and configurations for a given model in order to discover what works best for your specific dataset.
  
  See this:
  https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
  
  Reply
Carlos February 28, 2020 at 8:16 am #

Hi Jason,

I understand the code and we got RMSE scale of target, my doubt is about “yhat”, this value is like probably? eg: this is first vale 0.03533247, it is like 3% of pollution?

“# make a prediction
yhat = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))

# invert scaling for forecast
inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]

# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]

# calculate RMSE
rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
print(‘Test RMSE: %.3f’ % rmse)”

Reply
- Jason Brownlee February 28, 2020 at 1:24 pm #
  
  No, it is a time series forecasting problem – we are predicting a numerical value.
  
  You can learn the difference between regression and classification here:
  https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/
  
  Reply
Carlos February 29, 2020 at 12:14 am #

Thanks a lot! I understand that value is Pollution Level.

That post taught me to convert regresion problem to classification.

CRACK!!!!

Reply
- Jason Brownlee February 29, 2020 at 7:15 am #
  
  You’re welcome.
  
  Reply
Rachana March 2, 2020 at 4:27 pm #

Hi Jason,

Thanks for the great tutorials. I didn’t understand what does the columns 9 to 15 indicate in the below code.

reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)

Reply
- Jason Brownlee March 3, 2020 at 5:55 am #
  
  The variables that we don’t want to predict at t.
  
  Reply
raaj March 3, 2020 at 9:33 pm #

Hi Jason,
Thanks for the tutorial. I am actually stuck at something. I was trying to tweak this code to use the forecast features as well. Let’s say i have values for ‘dewpoint’ etc at the current time and i have previous weather features as well as ‘pollution’ values. What i want to do is predict the current value USING both current and previous values.
Would be great if you could help me out here. I have arranged the dataframe in such a way that i take the current ‘pollution’ value as Y and current plus prev(window) predictors as X. But unfortunately i am getting stuck at the normalisation step. I will be happy to share the code via e-mail.

Reply
- Jason Brownlee March 4, 2020 at 5:55 am #
  
  Maybe start by working with the raw data, get that working, then adapt to include scaling.
  
  Reply
Dhaval Varia March 4, 2020 at 10:41 pm #

Dear Sir,
I have following scenario :

I want to predict the value of Air Pollution for all the above column,By giving 2 Inputs Location [is not given here,But assume we have modified dataset and kept location_id],DateTime.

How to do this?

Reply
- Jason Brownlee March 5, 2020 at 6:36 am #
  
  You must prepare your data and develop a model, like any ml problem.
  
  The tutorials here will help you to get started:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Shekhar P March 13, 2020 at 6:50 pm #

Hi Doctor,

You have taken test dataset for validation purpose, and then you are predicting for the same test dataset. But actually, prediction should be on unknown set, i mean for tomorrow in my case.
Please see my case: I am training on one set. Testing on next set as validating on which is test set. Now please tell me, how should I predict for tomorrow? means how should i give input.

See below my example:

n_train_hours = 52799
train = values[:n_train_hours, :] # Training set
test = values[n_train_hours:62399, :] # Testing Set

Now I want set on which i will predict like below:

Utestx = values[62399:, :]

But this should be totally independent and different from previous ones. Hoe shall i give it.

My inputs are:

date load A B WtdRainFall Temp HeatInd WindChill
6/1/2018 0:0 2577.92 1 0 0 34.4 36.1 34.4

Now how can i form tomorrows set on which I will predict?
Here You can also tell me basic answer as how to take Training set, Test Set and then set on which I will predict. So What will be my set on which i will predict?
Thanks in advance.

Reply
- Jason Brownlee March 14, 2020 at 8:09 am #
  
  You can design the test harness any way you like. I generally recommend using walk-forward validation and it is the approach used in the 100s of time series tutorials on this site and in my books.
  
  You can learn more about walk forward validation here:
  https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
  
  You can learn more about making a forecast with an LSTM here:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
Ujjwal March 15, 2020 at 7:06 pm #

Hi Sir,

I am Stuck at model prediction getting error
Error when checking input: expected lstm_2_input to have 3 dimensions, but got array with shape (35039, 8)

Reply
- Jason Brownlee March 16, 2020 at 5:54 am #
  
  Sorry to hear that, this might help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
- Kristin Fore July 31, 2020 at 5:07 am #
  
  Got similar error from yhat = model.predict(test_X) as below
  
  “expected lstm_1_input to have 3 dimensions, but got array with shape (35039, 8)”
  
  Why input is 3 dim??
  
  Reply
Peter Gandy March 15, 2020 at 9:51 pm #

Good Morning,

I am following your code and I am trying to model Delinquency Rate. I have 7 + Delinquency makes 8 features When I define and fit the model I get 15 features instead of 8. I am trying to figure out what I missed or over looked. Would you mind providing some insight?

Reply
- Jason Brownlee March 16, 2020 at 5:55 am #
  
  Sorry, I don’t know about the specifics of your project.
  
  Perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
  - Peter Gandy March 16, 2020 at 12:58 pm #
    
    Jason,
    
    Thank you sir..
    
    It was a helpful read.
    
    I have another question. I am have a issue with the invert scaling for forecast. I have reviewed my code and read other papers you have published, but cannot figure out where I am going wrong.
    
    Could you please provide some insight?
    
    ValueError: Unexpectedly found an instance of type . Expected a symbolic tensor instance.
    
    Reply
    - Jason Brownlee March 16, 2020 at 1:31 pm #
      
      Yes, see this:
      https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/
      
      Reply
      - Peter Gandy March 16, 2020 at 5:47 pm #
        
        Awesome! I am perusing it now. Specifically, I am getting errors with inv_yhat = concatenate((yhat, x_test[:, 1:]), axis=1).
        
        Peter
Chris March 18, 2020 at 1:31 pm #

Hi Jason,

First off, huge thanks for all of these articles which you have written! They have been extremely helpful in learning a ton about the practical applications of time series forecasting and LSTMs.

Something I’ve been wondering about with regards to the conversion of the time series data to a supervised learning problem is specifically the association of the features given with each set of prior observations for a multivariate data set.

Take for example the air pollution data which you used in this article. So there are 8 features per observation and say for example we are using the observations from t-1, t-2, and t-3 to construct our input vectors instead of just t-1. I’m not too sure what word to use for it, but when we assemble this set of 24 features as input to a single output how does the model “know” that say the “Temperature” observation from t-3 is associated to the “Air pollution” observation from t-3 instead of the “Air pollution” observation from t-1.

I guess I am making the assumption that there is valuable information to be learned for the model in knowing that the features from t-1 are coupled together(meaning the observations from t-1 caused the air pollution at t-1), as with t-2, and t-3. Is there an assumption here that all features are independent from one another (even though that might not be the case) and is there something that can be done to perhaps maintain these associations? Or is there some deeper work being done during the training of the model that is identifying associations between input variables which is in turn taken into account when the model is built and subsequently used?

Forgive me if this is a naive question, I am largely inexperienced with time series problems.

Thanks again for all the work and time you’ve put into these articles!

Reply
- Jason Brownlee March 19, 2020 at 6:19 am #
  
  You’re welcome Chris.
  
  Great question.
  
  Our job is to frame the problem so that the model has enough information to make a prediction. E.g. given the output we want to predict under the conditions we want to predict it, what inputs are most useful/needed. This always requires a little experimentation.
  
  This is the general problem of supervised learning – selecting/preparing/engineering inputs for the target. The model simply learns a function to map the inputs to the output.
  
  So how does it know – well all it knows is that there are inputs and an output and it sees many examples and learns a statistical relationship.
  
  Now, with LSTMs we have time steps and features, e.g. more structure, so we can clearly demarcate separate parallel input time series (features) with multiple lab observations (time steps) for each case (sample) and see if different numbers/types of features and different lengths of time steps result in better or worse models.
  
  Does that help?
  
  Reply
  - Chris March 19, 2020 at 4:25 pm #
    
    Yes that does help. So the learning of the statistical relationships occurs during the training of the model and it uses the many examples which it is given to determine what those relationships are as opposed to necessarily needing to know which feature is associated with which prior time step.
    
    So then would it be safe to say that after the training of an LSTM model it has a good idea of how “significant” each of the features are in determining the output? Say we have one input feature which is largely noise and doesn’t provide valuable information in predicting the output and another feature which is very strongly correlated with the output. So does the model know to weigh the noise feature lightly and other feature heavily when making it’s predictions then? If this is the case, is there an actual way to measure these weights once the model is built to see which features have a larger impact on predicted test outputs?
    
    Reply
    - Jason Brownlee March 20, 2020 at 8:41 am #
      
      Yes.
      
      Yes, it learns how to best use the inputs. If the model is well configured/trained/etc.
      
      Yes, the training data must capture the salient properties of the data to be expected in operations for the model to learn what to expect – like noise.
      
      Probably. Feature importance from neural nets is not something I’ve studied, sorry.
      
      Reply
      - Chris March 22, 2020 at 5:16 am #
        
        Understood. Thanks a lot for the responses Jason. I’m looking forward to putting these models into practice!
      - Jason Brownlee March 22, 2020 at 6:59 am #
        
        Good luck!
Rik Ganju March 20, 2020 at 1:24 pm #

Dear Jason. I’d like to contrast LSTM with Linear Regression. In Linear Regression a regression line is created with some slope using the training set; and that line (or model) is used as such against a test set. Everything about the regression line created in training is invariant, or completely unchanged by the test set; just used by it to make predictions

Is this analogy also true with the LSTM model created during training. Is it applied in some unvarying way to the test set? Is there something static that emerges as a model after training, that I could imagine as a regression equation? Or is it something dynamic that could change dramatically by the test set itself?

Reply
- Jason Brownlee March 21, 2020 at 8:16 am #
  
  Not quite. The LSTM will preserve state across samples, introducing a dependency that influences the prediction. Any comparison must be carefully choreographed in terms of the test harness.
  
  Reply
Otmane March 21, 2020 at 6:28 am #

Dear Jason, thanks for this enriching tutorial, however, is it possible to explain how we can realise the following :

– Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour.

or in general, how can we adapt LSTM to predict next value of t+n, given the “expected” t+n values + historical data, (for example weather condition at t+n )

Reply
- Jason Brownlee March 21, 2020 at 8:28 am #
  
  You might need to have two inputs for the model, one for past observations and one for expected observations of other variables.
  
  Learn about multi-input models here:
  https://machinelearningmastery.com/keras-functional-api-deep-learning/
  
  Reply
  - Otmane March 21, 2020 at 9:56 am #
    
    thanks a lot for your quick reply , and I appreciate your help ^^
    
    Reply
    - Jason Brownlee March 22, 2020 at 6:46 am #
      
      You’re welcome.
      
      Reply
Shih Chunchiang March 24, 2020 at 5:07 pm #

Dear Jason，your example is very useful for me.
A further question is :
How can I realize the following functions:
Given any specific time(date), the model will output the predict result on this specific date.

Reply
- Jason Brownlee March 25, 2020 at 6:27 am #
  
  If you are using an LSTM, then you will need to write additional code around the usage of the model that is date-aware. E.g. simple software engineering.
  
  Reply
  - Shih Chunchiang March 25, 2020 at 5:02 pm #
    
    Thankyou Jason,
    Another question:
    
    In your Multivariate Time Series Forecasting LSTM Model , How to make a rolling predict? That is predict t from t-1, predict t+1 from t, predict t+2 from t+1, and so on?
    We know ,in your model,the predict results only contains one variable(pollution),if a rolling predict is carried out,how to set the other variables?
    
    Reply
    - Jason Brownlee March 26, 2020 at 7:50 am #
      
      Good question. See the examples here:
      https://machinelearningmastery.com/start-here/#deep_learning_time_series
      
      You can call model predict with the new inputs as they become available. You might choose to update the model as new data becomes available.
      
      Reply
  - Shih Chunchiang April 8, 2020 at 5:10 pm #
    
    Hello Jason:
    Could the date itself be used as one of a related variables in my multivariate time series forecasting LSTM model to realize the following functions:
    given any specific time(date), the model will output the predict result on this specific date?
    
    Another question is:
    we know that rolling predict for a long further forecast will accumulated error gradually which makes the forecast results badly.
    Is there any strategies for restrain the error?
    
    Reply
    - Jason Brownlee April 9, 2020 at 7:56 am #
      
      No, LSTMs work with contiguous inputs, no dates or times. If you want to work with dates/times, you must write custom code to handle these cases around the model – e.g. an engineering question.
      
      Yes, don’t predict far into the future 🙂
      
      Reply
      - Shih Chunchiang April 9, 2020 at 11:26 am #
        
        Thanks Jason:
        
        Do you have a tutorial which is about the Keras TCN model for timeseries forecast problem?
      - Jason Brownlee April 9, 2020 at 1:13 pm #
        
        What is the “Keras TCN model”?
      - Shih Chunchiang April 9, 2020 at 1:28 pm #
        
        I mean the Temporal Convolutional Network (TCN) in keras.
        Some Postings on the Internet declare that TCN is more effective than LSTM dealing with long timeseries forecast problem.
      - Jason Brownlee April 10, 2020 at 8:19 am #
        
        THanks,
        
        Perhaps I can cover it in the future.
Anna March 25, 2020 at 10:04 pm #

Hi, I was wondering why you include ‘var1(t-1)’ in your x-variables. This variable is probably highly correlated with the variable you want to predict ‘var1(t)’, because it’s just the t-1 version. Doesn’t this cause unfair predictions?

Reply
- Jason Brownlee March 26, 2020 at 7:54 am #
  
  It is called an autoregression model where lag obs are used as input.
  
  In machine learning we call it a sliding window:
  https://machinelearningmastery.com/time-series-forecasting-supervised-learning/
  
  Reply
Arjun Bhojani March 26, 2020 at 6:35 am #

hii sir
I was using your method of prediction of stock prices but in training model i am getting loss zero
so how I can I solve this problem as I have to predict the price of stock according to previous price.

Reply
- Jason Brownlee March 26, 2020 at 8:07 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/can-you-help-me-with-machine-learning-for-finance-or-the-stock-market
  
  Reply
fturmo March 26, 2020 at 9:30 pm #

On the last part of the tutorial, when predicting the pollution value focusing on multiple previous days, why aren’t you dropping the columns corresponding to the weather conditions of the current day? I mean when it says: Also note, we no longer explictly drop the columns from all of the other fields at ob(t).

Reply
- Jason Brownlee March 27, 2020 at 6:12 am #
  
  I don’t recall offhand, sorry.
  
  Reply
Ferran March 27, 2020 at 12:12 am #

Thanks for the tutorial, it has been very helpful!

Is there any way to improve the accuracy of the model? I’ve applied your model to my data obtaining RMSE=12. The range of my output is between 20-80, so obtaining an RMSE 12 is too large. How could it be reduced?

Reply
- Jason Brownlee March 27, 2020 at 6:14 am #
  
  You’re welcome.
  
  Yes, the tutorials here are focused on improving thee performance of deep learning models:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
mohammad March 29, 2020 at 10:16 am #

Thanks for your nice tutorial
I have a question, in all LSTM docs. a have seen, there is an assumption that in each time step we have only one sample, but what about a time in every time step we have many different sample with the same features?

Reply
- Jason Brownlee March 30, 2020 at 5:30 am #
  
  I think you’re idea of sample/timesteps/feature is confused.
  
  See this:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
James March 30, 2020 at 7:16 pm #

Thanks! Good stuff! — James

Reply
- James March 30, 2020 at 7:27 pm #
  
  Now that my multvariate time series forecasting with multiple lag inputs code is up and running, is there anything I could see in the way of a prediction? Where do I go from here?
  
  Reply
  - Jason Brownlee March 31, 2020 at 8:04 am #
    
    You can make a prediction by calling model.predict() with the input elements of each sample.
    
    Perhaps this will help:
    https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
    
    Reply
- Jason Brownlee March 31, 2020 at 8:00 am #
  
  Thanks.
  
  Reply
Alex April 1, 2020 at 5:05 pm #

How to find the important features in multivariate time series?

Thanks

Reply
- Jason Brownlee April 2, 2020 at 5:43 am #
  
  Good question, I don’t have tutorials on feature selection for time series – I hope to cover the topic in the future.
  
  Reply
  - Alex April 2, 2020 at 12:03 pm #
    
    Very looking forward! 🙂
    
    Reply
    - Jason Brownlee April 2, 2020 at 1:30 pm #
      
      Thanks.
      
      Reply
      - Alejandra Baena Restrepo August 8, 2023 at 4:04 am #
        
        Hi Jason,
        
        I hope this message finds you well. I wanted to inquire whether you have developed tutorials on feature selection for time series data, perhaps in this codebase?
        
        Thank you,
        Aleja
      - James Carmichael August 8, 2023 at 10:20 am #
        
        Hi Alejandra…The following resource may be of interest to you:
        
        https://machinelearningmastery.com/feature-selection-time-series-forecasting-python/
  - Alejandra Baena Restrepo August 13, 2023 at 10:11 am #
    
    Hi James,
    
    I want to express my gratitude for your response. After careful consideration, I believe that employing Recursive Feature Elimination (RFE) with a RandomRegressor could be an option., but I’m unsure when to apply it. Should I use RFE before using the ‘series_to_supervised’ function on the original 8-features data, or after applying the function to the data with almost 24 features (n_hours = 3 * n_features = 8)?
    
    Reply
    - James Carmichael August 13, 2023 at 10:20 am #
      
      Hi Alejandra…You are very welcome! The following resources provide best practices of applying RFE.
      
      https://machinelearningmastery.com/rfe-feature-selection-in-python/
      
      https://www.analyticsvidhya.com/blog/2023/05/recursive-feature-elimination/
      
      Reply
Jeremy April 4, 2020 at 8:02 pm #

Hello Jason,

Thank you for the tutorial it is very helpful. I have a question, do you know in what is the unit of the pollution? Is it a concentration in carbone dioxyde or something like this ? An other question, the prediction is not significantly better than a model where you estimate the pollution value at t by the polution value at t-1, so where is the benefit to use LSTM here ?
sorry for my english, thank again for the .

Reply
- Jason Brownlee April 5, 2020 at 5:43 am #
  
  Yes, PM2.5
  https://en.wikipedia.org/wiki/Particulates
  
  Reply
Abdel April 6, 2020 at 7:45 pm #

Hey Jason, first i want to thank you for all your impressive tutorials.
And i want to know if you have any other tutorial on predicting beyond train and test datasets.
Thank you.

Reply
- Jason Brownlee April 7, 2020 at 5:44 am #
  
  Thanks.
  
  Yes, and you can also adapt the dataset to make predictions directly.
  
  Start here:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  And here:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
Kiyani April 7, 2020 at 1:05 am #

Hi Dr. Jason,

I am having one question regarding the multistep ahead prediction but not using LSTM.

Actually, I am using single layer feed-forward (SLFN) neural network for prediction of next 1, 2, and 3 samples ahead in a signal having sampling frequency 10 Hz. I have a big CONFUSION in training and testing.

How will I do training for predicting aforementioned ahead samples on for example 70% of the data, and rest of it will use for testing?

%% Things have been tried so far:

moving_window_length = 5;
single_sample_ahead = 6;

Question # 01: (Training Phase) That’s 1 to 5 samples took to predict sixth one (single step). Same for 2 to 6 to predict seventh one. Is it doing right?

Question # 02: If procedure in question # 01 is correct, then can I take 1 to 5 samples to predict 7th or 8th etc (multi-step ahead samples) sample in training?

Question # 03: (Testing Phase) If above two assumptions are correct, then how will I visualize in testing that my model is predicting 2 or 3 samples ahead prediction (multistep ahead prediction)?

I am waiting for your kind reply.

Reply
- Jason Brownlee April 7, 2020 at 5:53 am #
  
  You would use walk-forward validation:
  https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
  
  Reply
  - Kiyani April 7, 2020 at 10:33 am #
    
    Thank you very much for answering my query very quickly.
    
    Reply
    - Jason Brownlee April 7, 2020 at 1:29 pm #
      
      You’re welcome.
      
      Reply
Eddie April 7, 2020 at 2:27 am #

Awesome tutorial Jason. I really appreciate what you have done here. I am just about through the tutorial but I’m stuck at one step that I can’t quite understand. Right before performing the inverse transform, you concatenated yhat with test_X, starting from the second column:

inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)

Is this because the transform was originally done on the dataset where the pollution variable was the first column? I’m guessing the shape of the array needs to match the original in order for the inverse transform to be performed.

Thanks!

Reply
- Jason Brownlee April 7, 2020 at 5:56 am #
  
  Yes, the transform has an expectation at how may columns the data has – we have to match that, but we are only intersted in one column, the rest can be rubbish if needed.
  
  Reply
Miao April 7, 2020 at 6:51 pm #

Hello Jason.I have 100 groups of data, and each group of data is continuous and varies over time.But there are discontinuities between the groups.Can I use LSTM?Looking forward to your reply!

Reply
- Jason Brownlee April 8, 2020 at 7:49 am #
  
  Perhaps. Run some tests and compare results to other methods.
  
  Reply
Josh April 8, 2020 at 5:30 am #

Will this book be updated for TensorFlow 2.

Reply
- Jason Brownlee April 8, 2020 at 7:59 am #
  
  All books use Keras 2.3 running on top of TensorFlow 2.
  
  Reply
Shih Chunchiang April 9, 2020 at 3:28 pm #

Dear Jason:
Deep learning algorithm such as LSTM is only good at nowcasting or short-term forecast, not suitable for medium and long term forecast. Do you think so?

Reply
- Jason Brownlee April 10, 2020 at 8:21 am #
  
  No. Probably not good at any time series forecasting, but great at other domains, like NLP.
  
  Reply
mimi April 10, 2020 at 6:35 am #

432/5000
Hello,
I have almost the same problem as you, when running the model
that I have knowing that it is model of the classification of output value (0 or 1) my results are:
rmse = sqrt (mean_squared_error (inv_y, inv_yhat))
print (‘RMSE Test:% .3f’% rmse)
RMSE test: 0.090
and scores = model.evaluate (test_X, test_y, verbose = 0)
print (“Accuracy:% .2f %%”% (scores [1] * 100))
Accuracy: 99.19%

Is my model good ???

Reply
Miao April 11, 2020 at 6:48 pm #

Hello Jason.
In this case : model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
How can I know cell state?How can I know the state of forget gate,input gate and output gate?
It confused me.

Reply
- Jason Brownlee April 12, 2020 at 6:17 am #
  
  You can retrieve it, but why do you want it exactly?
  
  Reply
mimi April 14, 2020 at 5:26 am #

Thanks for this example, I have a question in this example we just predict pollution or pollution and observations?

Reply
- Jason Brownlee April 14, 2020 at 6:30 am #
  
  We are predicting pollution.
  
  Reply
  - mimi April 14, 2020 at 7:13 am #
    
    We don’t predict the values of DEWP,TEMP,PRES,cbwdIws, Cumulated windspeed,Is,Ir?
    
    If I want to know the inputs that influence the output:POLLUTION more is it the temp or pres or Is etc?
    
    Reply
    - Jason Brownlee April 14, 2020 at 10:36 am #
      
      In this tutorial we let the model discover what is relevant to predicting the pollution for the next time step.
      
      Reply
mimi April 15, 2020 at 2:16 am #

we want to predict pollution at time t, we take into account the values of observations at time t?

Reply
- Jason Brownlee April 15, 2020 at 8:01 am #
  
  In this tutorial, we take then as expected obs, but you can remove them or use real obs – you can frame the problem anyway you wish.
  
  Reply
Lean April 15, 2020 at 7:07 am #

Thank you very much. I am starting to learn deep learning and I would like to know if it is possible to calculate Feature Importance for each hour?

Reply
- Jason Brownlee April 15, 2020 at 8:03 am #
  
  Perhaps. I have not done such a thing – some experimentation may be required.
  
  Reply
mimi April 15, 2020 at 9:05 am #

In my case I have real observations at time t and I want to predict the pollution in t in your example, I change what exactly in your programm, Thank you

Reply
- Jason Brownlee April 15, 2020 at 1:19 pm #
  
  Sorry, I don’t have the capacity to customize the example for you.
  
  If you are finding the tutorial too advanced, I recommend starting with some of the simpler examples here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Nima Akram April 16, 2020 at 7:13 am #

Hi Jason, this is a great article, you’re a great man for sharing this. I do have one suggestion though. When trying to turn the time-series data into a supervised learning problem, wouldn’t it be easier to just shift the target variable back a step as opposed to lag each of the features? So just do df[target].shift(-1)?

Reply
- Jason Brownlee April 16, 2020 at 1:18 pm #
  
  Thanks!
  
  I believe this will help:
  https://machinelearningmastery.com/time-series-forecasting-supervised-learning/
  
  Reply
AZI April 17, 2020 at 10:01 am #

Hi Jason,

Your tutorials are really helpful. I have also studied your book “Long Short-Term Memory
Networks With Python”. I have a project where in addition to multistep output, I have multi-step input as well.
I have seen all your tutorials for cases with multiple inputs and multiple parallel inputs but i have found no example where the input is also multistep.

I am struggling with reshaping such data where input is multistep (100 step forecasts on every timestep). so one timeseries for example has shape 26000 X100 and i have 200 such multistep input sereis. Any help on how to proceed will be highly appreciated. Thanks

Reply
- AZI April 17, 2020 at 10:15 am #
  
  Hi Jason,
  
  If you have a book where multistep inputs are used to predict multistep outputs ,kindly do let me know
  
  Reply
  - Jason Brownlee April 17, 2020 at 1:30 pm #
    
    Yes the example in this tutorial that you can use as a starting point:
    https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
    
    Reply
- Jason Brownlee April 17, 2020 at 1:30 pm #
  
  Thanks!
  
  Great question, shaping data for LSTMs can be very challenging.
  
  I created this to help:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
Senthilkumar Radhakrishnan April 17, 2020 at 11:18 pm #

Hi Dr.Jason,

You have been doing a great work, guiding all those who need help, Keep going

How to have a forecast on multiple time series problems ?

Let’s say i have to forecast sales of all my branches located in different locations , is it possible to model and get forecast in a same model or do we have to use different models for each of the store in each location?

Also if we have some external factors for each of the branch such as delivery charge, busy location of branch and so on,

I got this referring to Walmart problem on kaggle …

Can you share your knowledge in this ? If you do, i am so grateful

Thank you ,

Reply
- Jason Brownlee April 18, 2020 at 5:59 am #
  
  Thanks.
  
  Call model.predict() to make a forecast, more help here:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  And here:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
  - Senthilkumar Radhakrishnan April 20, 2020 at 5:51 pm #
    
    Thank you for your reply Jason,
    
    I didnot mean LSTM as the solution, i am asking how could we solve this case of problems ?
    How to handle multiple time series ?
    
    Thank you.
    
    Reply
    - Jason Brownlee April 21, 2020 at 5:49 am #
      
      Good question, this will give you ideas:
      https://machinelearningmastery.com/faq/single-faq/how-to-develop-forecast-models-for-multiple-sites
      
      Reply
      - Senthilkumar Radhakrishnan April 21, 2020 at 2:05 pm #
        
        Thanks for making it clear .. Continue your work beyond limits.
      - Jason Brownlee April 22, 2020 at 5:47 am #
        
        You’re very welcome.
mimi April 20, 2020 at 11:36 am #

Hi jason
I already asked you this question and I looked everywhere but I can’t find the solution please can you help me, I’m sorry for the inconvenience

In my case I have real observations (temp,press,etc) at time t and I want to predict the pollution in t in your example, I change what exactly in your programm,

Thank you

Reply
- Jason Brownlee April 20, 2020 at 1:21 pm #
  
  Sorry, I don’t have the capacity to prepare a customized version of the tutorial for you.
  
  One approach might be to have a multiple input model with one input being the sequence of obs from the past and the other input the input for the current time step. This will help:
  https://machinelearningmastery.com/keras-functional-api-deep-learning/
  
  Reply
sarrita April 21, 2020 at 5:54 am #

Hi Dr.Jason,It is a great work,
I have two questions:
1- the data, should we leave them in chronological order or we can mix the lines (if we have an output that takes the same value for a long time)
2- to code a simple RNN or GRU model, we just replace the word LSTM with RNN and GRU?

Reply
- Jason Brownlee April 21, 2020 at 6:07 am #
  
  Thanks!
  
  For time series, the order of samples/observations matters.
  
  Yes.
  
  Reply
sarrita April 21, 2020 at 7:22 am #

Thank you Jason,
one last question, does an LSTM model with timestep = 1 become a Simple RNN?

Reply
- Jason Brownlee April 21, 2020 at 7:45 am #
  
  Not quite. It almost becomes an MLP, although shares state across samples in a batch.
  
  Reply
Othmane April 21, 2020 at 9:45 am #

Hi jason,

Good job!!!

I want to ask you :when we set timestep = 1 it means that the model will just remember the previous state?

Reply
- Jason Brownlee April 21, 2020 at 11:44 am #
  
  Thanks!
  
  Regardless of the time steps, the model preserves states across samples in a batch.
  
  Reply
Othmane April 21, 2020 at 10:37 pm #

For an RNN we also use a timestep, what exactly is this time step?Thanks

Reply
- Jason Brownlee April 22, 2020 at 5:56 am #
  
  Good question, I answer it here:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
FATI April 23, 2020 at 8:00 am #

Hi Jason,

In my exemple when i take timestep=1 it’s ok val-loss=13% val-accuracy=95%
but timestemp=10 val-loss=90% why? thank you

Reply
- Jason Brownlee April 23, 2020 at 1:33 pm #
  
  No idea. We don’t have good enough theories of neural nets to answer “why” questions. This is why we run experiments.
  
  Reply
Othmane April 23, 2020 at 10:16 am #

Hi jason ,

For a classifier problem that not depend of chronological time , and we want apply it in LSTM .!!!how

Reply
- Jason Brownlee April 23, 2020 at 1:35 pm #
  
  It is not an appropriate model unless you have a sequence. See this:
  https://machinelearningmastery.com/when-to-use-mlp-cnn-and-rnn-neural-networks/
  
  Reply
Sunil Kumar April 24, 2020 at 12:11 am #

Hi Jason,

Its a great post.

Could you help me in how to forecast the future values using Multivariant LSTM.

As it is Multivariant and we need all the features used in the model for the future dates, I am confused how to achieve this.

Reply
- Jason Brownlee April 24, 2020 at 5:45 am #
  
  Yes, call predict()
  
  More here:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  And here:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
  - Sunil Kumar April 24, 2020 at 4:57 pm #
    
    Hi Jason,
    My question is how to predict the future Time Series values.
    I don’t have the future values of all the features that i have used, without them how can I use the predict()???
    
    Reply
    - Jason Brownlee April 25, 2020 at 6:40 am #
      
      You don’t need future values to make a prediction, you are predicting future values.
      
      The input to the predict() function are the values that you have available, e.g. the last observations in the sequence in order to predict beyond the sequence.
      
      Reply
adam April 24, 2020 at 1:16 am #

Hi Jason,
It is very good tutoriel i have just question concerning the fonction of activation , it is not mentionned in your model.

Reply
- Jason Brownlee April 24, 2020 at 5:48 am #
  
  We are using the default activation functions for the lstm, sigmoid and tanh, and a linear activation for the output layer.
  
  Reply
julie April 24, 2020 at 6:21 am #

Hello Jason,

is the sequence can be the valors of inputs of model if we have number of inputs>1?

Reply
- Jason Brownlee April 24, 2020 at 8:01 am #
  
  Sorry, I don’t follow, can you please restate or elaborate your question?
  
  Reply
julie April 24, 2020 at 10:10 am #

For a problem of classifcation with mutli inputs, the result or the Y depend in this inputs , the sequence are the inputs?

Reply
- Jason Brownlee April 24, 2020 at 11:04 am #
  
  Yes, an RNN takes a sequence as an input for each sample.
  
  Reply
Sam April 26, 2020 at 5:36 am #

Hello Jason,
I ran your model with the provided code. When I plot the test-Y against predicted-Y, I see I get a prediction which is 1 step ahead(at least it seems). I can’t explain this behaviour. I included two images for your consideration.
1. When I plot like below, normally- https://ibb.co/VTWq9Yn
pyplot.plot(yhat[:100], label=’Pred’)
pyplot.plot(test_y[:100], label=’True’, alpha=0.7)

2. When I plot moving 1 step ahead like below- https://ibb.co/6XsyMRQ
pyplot.plot(yhat[1:101], label=’Pred’) #why?????????????????????
pyplot.plot(test_y[:100], label=’True’, alpha=0.7)

is there any explanation?

Reply
- Jason Brownlee April 26, 2020 at 6:22 am #
  
  Yes this is common and suggests the model has no skill:
  https://machinelearningmastery.com/faq/single-faq/why-is-my-forecasted-time-series-right-behind-the-actual-time-series
  
  Reply
Ming April 27, 2020 at 12:46 am #

by your means, if i want to predict next 10 value ,i must predict one by one?

Reply
- Jason Brownlee April 27, 2020 at 5:36 am #
  
  You can if you want. Or, you can define a model that predicts 10 timesteps directly, see this:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
  - Ming April 27, 2020 at 12:42 pm #
    
    sorry for confusing explanation! i means the reframed data’s shape is (var1(t-1) var2(t-1) var3(t-1) var4(t-1) var5(t-1) var6(t-1) var7(t-1) var8(t-1) var1(t)) but var1(t-1) that is last pollution value ,but we don’t know is and we want to predict it, how can i get last polltion value? so in the last comment i asked ‘ if i want to predict next 10 value ,i must predict one by one’that means we predict must one pollution value as the next one’s var(t-1) and so on.
    
    Reply
    - Jason Brownlee April 28, 2020 at 6:38 am #
      
      There are many ways to predict 10 time steps ahead, and you must discover the best way for your model and dataset.
      
      Here are 4 possible approaches:
      https://machinelearningmastery.com/multi-step-time-series-forecasting/
      
      Reply
      - Ming April 28, 2020 at 11:36 am #
        
        thank you very much ! jason
      - Jason Brownlee April 28, 2020 at 1:23 pm #
        
        You’re welcome.
Mohamed Nedal April 28, 2020 at 12:32 pm #

Dear Dr. Jason,
Thank you for your fantastic explanation. I have a question please.

I’m trying to use this code with another dataset, but it doesn’t predict the variable that should be predicted. I have no idea how to fix it. How can I send you the dataset?

Another question is how can I modify this code to work with a different number of features or inputs, say 10 inputs, and predict one variable?

Thank you and I look forward to hearing from you.

Reply
- Jason Brownlee April 28, 2020 at 1:24 pm #
  
  You’re welcome.
  
  Sorry, I don’t have the capacity to review/debug your code example. Perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  
  This will help regarding how to understand lstm inputs:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
amirreza April 29, 2020 at 6:57 pm #

Thank you for your great post. May I ask if in a neural network I need my outputs to be integer what can I do? Is it an acceptable approach if I just apply a round function on the output array or the network itself should be able to provide integers? Now my training data labels are integer but the network still do not predict integer

Reply
- Jason Brownlee April 30, 2020 at 6:40 am #
  
  Perhaps scale the data first, then convert the predicted numbers back to integers.
  
  Alternately, use a one hot encoding for your integers.
  
  Reply
khandu May 4, 2020 at 10:40 pm #

X_test.shape
(3592, 7, 4)

# make a prediction
yhat = model.predict(X_test)
X_test = X_test.reshape((X_test.shape[0], X_test.shape[2]))
X_test = scaler.inverse_transform(X_test)

—————————————————————————
ValueError
ValueError: cannot reshape array of size 100576 into shape (3592,4)

I am stuck with the above error. Can anyone help me please…Thank You

Reply
- Jason Brownlee May 5, 2020 at 6:29 am #
  
  Sorry to hear that, perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Ay May 5, 2020 at 3:48 pm #

Hi Jason, thanks for your great effort.
If we provide the future weather parameters (from the weather forecast) as input, will this improve the accuracy of the pollution predictions? if yes, I would appreciate it if you give me some hints to write the code. Thanks

Reply
- Jason Brownlee May 6, 2020 at 6:21 am #
  
  You have to run the experiment to discover the answer.
  
  Reply
Tinto Raj May 7, 2020 at 5:41 pm #

Hi Jason,
I am a beginner in machine learning. I am making a model which contains 10 parameters. The input of the model is 10 parameters with 8 timesteps lag. so x contains 80 columns. Output is 10th parameter with 8 timesteps lead ie, y contains 8 columns. How could i inverse transform the predicted value?

Reply
- Jason Brownlee May 8, 2020 at 6:25 am #
  
  The inverse transform on the predictions can be done manually or can be done using the same object that prepared the transform. The input to the scaler object must have the same shape.
  
  Reply
Tinto Raj May 7, 2020 at 6:05 pm #

Hi Jason,
Why are we concatenating test x with yhat before inverse transforming?

Reply
- Jason Brownlee May 8, 2020 at 6:26 am #
  
  To ensure the input to the scaler has the same shape as when we transformed the data – a requirement.
  
  Reply
  - Tinto Raj May 9, 2020 at 5:22 pm #
    
    But in the multilag timestep example, we are converting to supervised after scaling. so how could test x concatenated with test y will have same shape?
    During scaling it has only 8 columns and test x and test y together has 24+8 columns which is what we are using for inverse scaling. Then how shapes are same?
    
    Reply
    - Jason Brownlee May 10, 2020 at 5:57 am #
      
      Perhaps confirm your assumptions.
      
      Reply
      - Tinto Raj May 10, 2020 at 4:15 pm #
        
        So, what I am telling isn’t correct? Then could you clarify the scenario?
Onur May 11, 2020 at 6:15 am #

Hi Jason ,

I applied your sample for my dataset but I get the following error in the prediction section.
can you help ?

ValueError: operands could not be broadcast together with shapes (218,3) (2,) (218,3)

Reply
- Jason Brownlee May 11, 2020 at 1:32 pm #
  
  I’m sorry to hear that.
  
  The error suggests that the shape of your data does not match your model.
  
  You can change the shape of the data to match the expectations of the model or change the model to match the shape of your data.
  
  Reply
  - Onur June 1, 2020 at 2:41 am #
    
    Hi Jason ,
    
    how can i solve this problem ?
    
    Can you hel me ?
    
    Reply
    - Jason Brownlee June 1, 2020 at 6:26 am #
      
      Yes, my previous answer suggested what to do.
      
      If you are new to numpy arrays, perhaps start here:
      https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/
      
      Reply
      - Onur June 5, 2020 at 6:15 pm #
        
        Thanks John.
        I solved the problem. There was an error due to the code below.
        
        # drop columns we don’t want to predict
        reframed.drop(reframed.columns[[5,6,7]], axis=1, inplace=True)
      - Jason Brownlee June 6, 2020 at 7:47 am #
        
        Happy to hear that you fixed your issue.
Michael Hopwood May 11, 2020 at 5:07 pm #

Hi Jason,

It seems that when utilizing multiple features, you disregard the parameters pattern through time. This is because with multiple parameters, the “sequence” (normally a sequence of values of one parameter through time) becomes a “sequence of parameter values”.

You describe the shape of the input data as (samples, timesteps, features) when normally LSTMs have shape (batch_size, time_steps, seq_len). I worry that this application does not consider “pattern through time” but “pattern across parameters”.

Could you comment on this?

Thanks!

Reply
- Jason Brownlee May 12, 2020 at 6:38 am #
  
  It is normal to feed multiple “features” into an LSTM. It is unusual to have a separate layer for each feature.
  
  Perhaps this will help to understand features:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
Alaa May 13, 2020 at 3:58 am #

Dear Jason ,
From your code I understand that you are doing a one-step forecast . That means given features at lag = t-1 , you predict your target at lag = t .

My question is : During the test is there a walk forward validation ? if the model predict one step ahead (example t ) , does it use that prediction of t or the real value of t to predict t+1 ?
Thank you.

Reply
- Jason Brownlee May 13, 2020 at 6:43 am #
  
  Yes.
  
  Reply
  - Alaa May 13, 2020 at 8:02 am #
    
    I didn’t get you ! Are you using the predicted or the real value to forecast one step ahead?
    
    Reply
    - Jason Brownlee May 13, 2020 at 1:21 pm #
      
      We are making a prediction then comparing the predicted value to the expected value.
      
      You can learn more about walk-forward validation here:
      https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
      
      Reply
Jules May 14, 2020 at 10:34 am #

Hi Jason.
Very interesting code y useful as well !
I’m working on LSTM with supermarket data in order to forecast sales.
There’s a way i can train LSTM with n products instead of just 1 products at times ? Or what kind of strategy you suggest to work with that problem?
Regards

Reply
- Jason Brownlee May 14, 2020 at 1:28 pm #
  
  Thanks.
  
  Yes, this will give you ideas, replace “sites” with “products”:
  https://machinelearningmastery.com/faq/single-faq/how-to-develop-forecast-models-for-multiple-sites
  
  Reply
Ashish Shrestha May 16, 2020 at 4:06 pm #

Hi, thank you for the wonderful post. I have a question. Will there be change in shape of train and test set after converting time series model to supervised learning model? I have 599 records in test set but after converting it into supervised learning model the shape of input model is 587. Also the shape of train set is also not same. Is it what happens, or i am going wrong?

Thank you

Reply
- Jason Brownlee May 17, 2020 at 6:29 am #
  
  Yes. Learn about how to reshape data for LSTMs here:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
Celine May 17, 2020 at 8:00 am #

Hi Jason! Thanks a lot for your tutorials. I have another question related to this post:

You mentioned this is a possibility as well:

Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour.

This is exactly what I need to do. Could you describe how you would do that? I don’t really know how to process/transform that “expected” information to an input

Reply
- Jason Brownlee May 18, 2020 at 6:05 am #
  
  Yes, try a multi-input model with an LSTM input for the historical data and a vector input for expected conditions.
  
  This will help:
  https://machinelearningmastery.com/keras-functional-api-deep-learning/
  
  Reply
Vaibhav May 20, 2020 at 12:51 pm #

Hi Jason,

Your tutorials are great!

I am looking for a way to convert samples of data into high resolution signals.

Like for example, I take out my motorcycle from home to office and then back every day, and record certain parameters of the ride at 1 Hz frequency. I have a lot of this kind of data multiple rides. I want to train a model that can use this data to redraw a whole ride if given only certain snapshots (at say per 10 min frequency) of data from a new but similar ride.

Can I train an LSTM to take 2 samples 10 mins apart, and predicts points between them?

Reply
- Jason Brownlee May 20, 2020 at 1:36 pm #
  
  You can resample the data directly I don’t think a model is required:
  https://machinelearningmastery.com/resample-interpolate-time-series-data-python/
  
  Reply
  - Vaibhav May 28, 2020 at 5:58 pm #
    
    Thanks Jason. I had a look at different applications for resampling and it does not seem to fit for my purpose.
    
    What I am looking for is similar to the work published in the link below, just not for sound files but for ride profiles. Sort of like reconstructing a ride profile using sample data and previous known full ride profiles.
    
    Any advice is that direction is much helpful. Thanks a ton.
    
    https://kuleshov.github.io/audio-super-res/
    
    Reply
    - Jason Brownlee May 29, 2020 at 6:26 am #
      
      Perhaps contact the author of the post directly?
      
      Reply
Carlos May 20, 2020 at 8:16 pm #

Hi Jason,

thanks a lot for this fruitful tutorial!

I’m wondering if it’s possible to have binary variables in our multivariable LSTM time series problem in addition to the others.

Reply
- Jason Brownlee May 21, 2020 at 6:16 am #
  
  Yes.
  
  Reply
Aditi May 23, 2020 at 10:53 pm #

Can you tell me how to predict the values for future dates (on a new set of dates) for multivariate time series forecasting in LSTM ?

Reply
- Jason Brownlee May 24, 2020 at 6:08 am #
  
  Yes, see this:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  And this:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
Amir May 24, 2020 at 6:45 pm #

Hi Jason,

I have plenty of sensors sending data to the things network. I want to develop a time series prediction model that takes these data, do predictions and publish results. I want this model to be online, so it can store data, train itself every day and do predictions for the next day. Can I do something like that as a web application?

I have seen IoT platforms like AWS can do it with python but for me as student they are expensive 🙂 I wan to use something free.

Reply
- Jason Brownlee May 25, 2020 at 5:46 am #
  
  I don’t see why not.
  
  Perhaps write custom code and use a custom server.
  
  Reply
Florian May 26, 2020 at 1:40 am #

Dear Mr. Jason Brownlee,

I have understood how to predict a value y out of an existing dataset with multivariate input X. But if I have a time series from t-100 to t, how can I forecast y(t+10) without having X(t+10). Is it possible with LSTM?
I “build” a scenario with a machine which needs maintenance regulary every 100 hours. When the load is above a specific level it needs to be maintained earlier. Also if some vibrations are measured the maintenance time will be earlier. I produced testdata with a periodic usage time and all relevant datas. My model hits the right point. But I don’t know how to “look in the future”.

Thank you very much

best regards
Florian

Reply
- Florian May 26, 2020 at 1:43 am #
  
  Forgot to say: I set the time until maintenance back to 100 hours after having a value below 0. This is the point I want to predict, and this works well in the past.
  
  Reply
- Jason Brownlee May 26, 2020 at 6:27 am #
  
  Yes, you can frame the prediction problem anyway you wish based on the inputs you have at prediction time and the outputs you need at prediction time. However, the model may or may not give good predictions.
  
  Reply
Vu Nguyen May 26, 2020 at 4:39 am #

Hi Jason,

Thank you for taking your time and effort to put together an excellent tutorial as always. I personally learned a lot from you.

I have to deal with a similar problem as air pollution, except I have another dimension “Subsurface Depth”. I have sensor data along the depth and time. From sensor data, I can extract engineer features so it would be a multivariate time series problem. So, my objective is train my model to detect anomalous events along the depth and time.

Would you give me your advice on how to deal with this problem? I would really appreciate your help.

Reply
- Jason Brownlee May 26, 2020 at 6:31 am #
  
  You’re welcome.
  
  I recommend testing a suite of diffrent models and data preparation methods and discover what works best for your dataset.
  
  Reply
  - Vu Q Nguyen May 26, 2020 at 12:23 pm #
    
    Jason,
    
    I intended to use LSTM autoencoder to deal with my problem because I have built a sparse autoencoder to deal with a similar problem without dealing with time series. So, it makes sense for me to continue with LSTM autoencoder and/or different statistical approaches to deal with the time series.
    
    I just have a hard time preparing the time-series matrix for my problem. It’s similar to air pollution in a sense if I only look at my data at specific depth of sensor deployment. However, I have more than 18,000 sensors installed from surface to subsurface, so my data is tremendously bigger than air pollution data. Do you think it’s still applicable to use LSTM, and if it is, how do I set up the time series matrix?
    
    Shoud I set up my dataframe like this: with date time for the index, the columns will be the depth, and the values are the sensor measurement if I still want to use LSTM?
    
    Thanks again Jason.
    
    Reply
    - Jason Brownlee May 26, 2020 at 1:21 pm #
      
      Perhaps prepare as separate feature arrays and combine using dstack or equivalent.
      
      You may need to experiment with some contrived examples until you get your desired effect.
      
      Reply
Rouzbeh May 27, 2020 at 3:14 am #

Hello Jason,
Thanks for your incredible tutorial.
Suppose after this implementation, we wanna compare this LSTM with SVM (as an example).
I use the train_X, train_y, test_X, test_y which we made before reshaping to 3D [samples, features]
I cannot rescale the output of SVM to original values by the scaler we made for LSTM. I got ValueError: Expected 2D array, got 1D array instead:

in other words, how can do this process for output of SVM:
yhat = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
# invert scaling for forecast
inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]

Reply
- Jason Brownlee May 27, 2020 at 8:01 am #
  
  Compare based on error in prediction on the same dataset with the original scale.
  
  Reply
Giselle May 28, 2020 at 12:30 pm #

Hi Jason,

Thank you for the tutorial 😀

I have a question concerning the feature we’re trying to predict : Pollution. In the first code I could see that we ‘re predicting the Pollution since we dropped all the columns at (t) except the first one which is the Pollution. However, I couldn’t understand it in the second code where you used the past 3 hours to predict the Pollution value of the next hour. Could you please explain that to me ?

Reply
- Jason Brownlee May 28, 2020 at 1:26 pm #
  
  Sure, what is the problem exactly?
  
  Reply
  - Giselle May 28, 2020 at 1:52 pm #
    
    It was easy to notice that the output is Pollution at (t) in the first code since you dropped the unnecessary columns but in the second code it is not.
    I couldn’t see in which part of the code it is noted that the output is the feature Pollution.
    
    Otherwise, If I would like to predict (t) and (t+1) what should I do ?
    
    Thank you
    
    Reply
    - Jason Brownlee May 29, 2020 at 6:19 am #
      
      If you are finding the example challenging, perhaps start with the simpler examples here and adapt them for your needs:
      https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
      
      Reply
      - Giselle June 3, 2020 at 3:31 am #
        
        Thank you
      - Jason Brownlee June 3, 2020 at 8:03 am #
        
        You’re welcome.
Samrat May 28, 2020 at 6:36 pm #

Hello.
Can you please explain the meaning of the parameter ‘batch_size’ and ‘verbose’ in the following line of code :

history = model.fit(train_X, train_y, epochs=50, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)

Thank You.

Reply
- Jason Brownlee May 29, 2020 at 6:28 am #
  
  Batch size defines the number of samples used to estimate the gradient before weights are updated and state is reset:
  https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/
  
  Verbose controls what is displayed to he console during training, in this case a one line summary for each epoch:
  https://machinelearningmastery.com/faq/single-faq/what-does-verbose-mean-in-keras
  
  Reply
Robert May 29, 2020 at 6:38 pm #

Hi Jason,

I have a question to be confirmed or denied.

When I read all this plus the code, is it true that with those aproaches I can only predict the very next data point? I assume that because I see only one neuron as output.

That would mean, in reality, to predict the 2nd data point, I would have to use the 1st (predicted) data point for lag calculation which then probably won’t work so good.

Can you advice me: Is there any kind of neural network that performs ok with predicting multiple steps ahead (seq2seq probably?) AND allows to use external features be it as multivariate or just somehow different?

I saw some video of Uber where they used a seq2seq approach which then somehow feeded into a MLP that was combined with external features but there was very little Information about it.

I would be happy for and advice.

Reply
- Jason Brownlee May 30, 2020 at 5:55 am #
  
  Yes. By design. You can change the model to model any framing of the problem you wish.
  
  I have tens of examples you can see on the blog for multi-step forecasting, perhaps start here:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Or here:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Subhas May 30, 2020 at 1:48 am #

Hi,

Thanks for the awesome tutorial.
In this tutorial we are forecasting only one time step ahead in future, but how can i extend it to forecast multiple time steps into the future using the predicted results ??

Reply
- Jason Brownlee May 30, 2020 at 6:07 am #
  
  See this:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
ikram May 31, 2020 at 10:52 am #

Thanks for the awesome tutorial.
please i have a question
how can i apply lstm if i have just the years in the label date ?

Reply
- Jason Brownlee May 31, 2020 at 1:23 pm #
  
  You’re welcome.
  
  The year/date is removed from the data, the model only learns from the observations.
  
  Reply
  - ikram June 1, 2020 at 11:33 am #
    
    thank you for your reply
    
    no i mean my own dataset contains in the label date just years not year month hours …
    how to split this dataset in order to apply LSTM ?
    
    Reply
    - Jason Brownlee June 1, 2020 at 1:42 pm #
      
      Start here:
      https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
      
      Then here:
      https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
      
      And here:
      https://machinelearningmastery.com/start-here/#deep_learning_time_series
      
      Reply
Diego June 2, 2020 at 3:31 am #

Hi Jason,
Thank you so much for your contribution, your posts are awesome!
I am a newbie to DL.
Abstract question.
I adapted this script to predict the VIX Index.
Loss function comparison looks great.
I plot actual vs predicted (inv_yhat vs inv_y): their volatility is totally different and the numbers (both series) do not match the original prices.
They seem to be in a different scale.
I am stuck.
What do you think could be happening?
Thank you so much for your time.
Best,

Reply
- Jason Brownlee June 2, 2020 at 6:21 am #
  
  Perhaps try data preparation prior to modeling, such as data scaling.
  
  Reply
Giselle June 3, 2020 at 4:08 am #

Hi Jason!
As I can notice, two features were deleted by the the end of the code. At the beginning inv_yhat has 8 features, by the end of the code it has only 6 features. Did I miss something ?
Thank you

Reply
- Jason Brownlee June 3, 2020 at 8:03 am #
  
  Yes, this is described in the data preparation section. Perhaps re-read that section.
  
  Reply
  - Giselle June 6, 2020 at 12:38 pm #
    
    Thank you. I confuse it with multivariate and multi-step code I’m working on.
    I was a little bit confused on the shapes and I want know if it’s alright.
    
    So, I used: n_out=6,
    and I have: test_X.shape= (50, 48, 8) where: n_hours= 48, and n_features = 8.
    
    I used “invert.transform()” function to get inv_y and inv_yhat.
    
    When I calculated the shapes of inv_y and inv_yhat, I’ve go this:
    
    inv_y.shape = (50, 6)
    inv_yhat.shape = (50, 6)
    
    Does it make a sens ? is it correct ?
    
    Reply
    - Jason Brownlee June 6, 2020 at 1:28 pm #
      
      Thanks may help in understanding the shapes:
      https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
      
      When using a sklearn transform object, the input data must have the same shape when calling transform() and inverse_transform(). If this is a pain, you can use some custom code to do the same thing.
      
      Reply
Firas Obeid June 5, 2020 at 1:09 am #

Should we convert the negative pollution predictions to zero before calculating our metric. Or in general if our dependent variable cant be realistically zero, should we convert all model predictions to zero before evaluating our model on our test set or would be violating model evaluation? Thank you!

Reply
- Jason Brownlee June 5, 2020 at 8:16 am #
  
  Sure. This is the idea of correcting or transforming raw output from the model.
  
  Reply
Chris June 8, 2020 at 8:28 am #

Hi Jason, one question on the reshaping of data into it’s 3D format [samples,timesteps, features] in order to feed it into the lstm model. Is it necessary for the number of features to be the same in each time steps? What if I am predicting a feature at t and I have some other observations at t but not all of the information i have in t-1, t-2, etc

For example, in my specific use case, say I am trying to predict the number of points a player will score in a given sports match at time t. In timesteps t-3, t-2, t-1 I have all normal statistics that the player accumulated along with features which measure the strength of their opponent at that timestep. For time t, the strength of the opponent is known ahead of the match where the points are accumulated and so I am wondering if there is any way to use that data as input as well. If I were to reshape in this fashion it would create a case where at times t-1, t-2, and t-3 would have, say, 8 features but time t would only have 2, and I do not think that would be a valid input.

One thought I had in terms of handling this, would be to shift all of those “opponent strength” features back to the prior timestep so that all information which was available could be used as input and the number of features would be consistent through each observation. The only thing with this is that those measurements would really be “associated” with the timestep that comes next in the data and I am not sure if that would have a negative impact on the resulting model. Would this be a reasonable approach to take?

Again, as many others have said, thank you for all of the articles you have written, they have been such a phenomenal source of learning for me.

Reply
- Jason Brownlee June 8, 2020 at 1:19 pm #
  
  Yes and no – you can pad the missing time steps with zeros and use a masking layer to skip over them.
  
  Or you can use a dynamic rnn, that is slower to train/use but can take inputs of any length.
  
  Reply
Sandipan Banerjee June 10, 2020 at 4:14 am #

Hi @Jason

Thanks for the tutorial. I have one question: How can I update the model (both in terms of data prep and lstm model creation) if I want to use:

Data from time step 1 for predicting time-step 2
Data from time step 1 and 2 for predicting time-step 3
Data from time step 1,2, and 3 for predicting time-step 4
…
Data from time-step 1…(n-1) for predicting time-step n

Thanks,

Reply
- Jason Brownlee June 10, 2020 at 6:21 am #
  
  You can call fit() on the trained model with any data you like at any time to up date it.
  
  Reply
Rakesh Allampally June 12, 2020 at 8:30 pm #

Hi, i have doubt how to create lstm with multiple features input for each time step(eg: temp,pressure,humidity,specific humidity) considering all these features are interdependent on each other , i wanna predict multiple features output(temperature , pressure){only 2 ouputs features}?

so basically at each time step my input data will be of 4 columns/features, now i wanna predict output of 2 columns/features?

how to create such model?

when i have gone through few papers they say lstm takes n features input at each time step and predicts only 1 feature output?

some paper has used some structural-lstm archeitecuture to achieve more than 1feature as ouput? could you throw some light on it?
thanks in advance 🙂

Reply
- Jason Brownlee June 13, 2020 at 5:59 am #
  
  Perhaps start with one of the simpler examples here:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Rakesh Allampally June 12, 2020 at 11:26 pm #

Hi jason, i have doubt on how to create lstm with multiple features input for each time step(eg: temp,pressure,humidity,specific humidity) considering all these features are interdependent on each other , i wanna predict multiple features output(temperature , pressure){only 2 ouputs features}?

so basically at each time step my input data will be of 4 columns/features, now i wanna predict output of 2 columns/features?

how to create such model?

when i have gone through few papers they say lstm takes n features input at each time step and predicts only 1 feature output?

some paper has used some structural-lstm archeitecuture to achieve more than 1feature as ouput? could you throw some light on it?
thanks in advance ????

Reply
- Jason Brownlee June 13, 2020 at 6:05 am #
  
  See this tutorial:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Fayyaz ALI June 18, 2020 at 5:10 am #

Hello Jason,

Thanks for this super article. I don’t know if the question has been answered before, but is it possible to modify some things in your code to take also into account weather at step t to predict pollution at time t.

I would like to have a model p(t) = f(p(t-1),p(t-2),w(t),w(t-1),w(t-2))

Thanks in advance

Reply
- Jason Brownlee June 18, 2020 at 6:31 am #
  
  Yes, you can add this information.
  
  Reply
  - Fayyaz ALI June 18, 2020 at 8:01 am #
    
    I can’t see how since for each estimation it seems to me LSTM will need n_hours (=3) values of any variable, I would like to predict p(t) with p(t-2) and p(t-1). I thought of putting a false 0 but I’m afraid that I’m doing a mistake.
    
    Do you know how I could manage this?
    
    Reply
    - Jason Brownlee June 18, 2020 at 1:18 pm #
      
      You can use a multi-input model and have the t observations as a second input, see this for multi input models:
      https://machinelearningmastery.com/keras-functional-api-deep-learning/
      
      Reply
      - Fayyaz ALI June 19, 2020 at 10:03 am #
        
        Thanks a lot!
      - Jason Brownlee June 19, 2020 at 1:11 pm #
        
        You’re welcome.
Sara June 18, 2020 at 8:02 pm #

Hi Jason,

Is it possible to use PCA for dimensional reduction in AirPolution Forcasting?

Reply
- Jason Brownlee June 19, 2020 at 6:11 am #
  
  Perhaps, I don’t have an example of using PCA for time series, sorry.
  
  Reply
Shekhar P June 18, 2020 at 8:33 pm #

Hello Sir,
I am running above model with some what similar multivariate data input. I have total 7 features. I have 96 values (for every 15 minutes interval) for each day. I want to have mutli step forecasting of 96 steps ( I mean I want next day’s prediction). I prepared data accordingly. See my model code where I took 96 as my n_steps_out.
My n_step_in = 1(time lag).

My data shape is as:
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
(62111, 1, 7) (62111,) (9600, 1, 7) (9600,)
My Input set for prediction is :print(Utestx_X.shape)
(1, 1, 7)
I am giving one row of input to model and trying to get 96 time steps ahead of it.

model = Sequential()
model.add(LSTM(100, return_sequences=True, activation=’relu’, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(LSTM(100, return_sequences = False, activation=’relu’))
model.add(Dense(96))
model.compile(loss=’mae’, optimizer=’adam’)

So I wrote 96 in Dense(96) layer.
But after that when I run the line of fitting the model.

history = model.fit(train_X, train_y, epochs=10, batch_size=96, validation_data=(test_X, test_y), verbose=2, shuffle=False)

I get below error.

ValueError: Error when checking target: expected dense_1 to have shape (96,) but got array with shape (1,)

I have all followed all the steps for this variation, that you gave in your book: Deep Learning for Time Sereis Forecasting chapter no 9 for multistep forecasting.

Could you guide me with the error.

Reply
- Jason Brownlee June 19, 2020 at 6:12 am #
  
  Well done!
  
  The error suggest the data does not match what is expected by the model, you can change the shape of the data to match the model or change the model to match the shape of the data.
  
  Reply
Johnny Liu June 19, 2020 at 1:48 pm #

We can use var1[t-1] to test and train in this example as var1 “pollution” is already known in this example.

var[t-1] is regarded as one of the feature (input) for the LSTM

However, how can we prepare the input X in real prediction? var1[t-1] is unknown in actual prediction. Output y is var1[t].

Assume “lstm_model.h5” is generated based on the above code.

Reply
- Jason Brownlee June 20, 2020 at 6:05 am #
  
  The above model is an example of real prediction. E.g. train on history and predict the future. We step through the future – a test dataset – to evaluate the model. This is called walk forward validation:
  https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
  
  Reply
  - Johnny Liu June 20, 2020 at 2:18 pm #
    
    Thanks for response.
    But I am not going to train and test it anymore.
    I have saved the model and created a “lstm_model.h5” based on the above example (Air Pollution Forecasting)
    
    Just like what you did in this post: https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/#comment-540145
    
    But I am still confused about giving the input to the loaded model.
    
    What would you do if you are going to make predictions based on the model generated through the above example using the method on the above link.
    
    I have done the following successfully:
    [script for test, train and save model]
    1. train & test the model
    2. After training and testing, save the model
    [script for load & make prediction]
    3. load the model in another script
    
    The following is what I feel confused:
    4. prepare input “X” for the model to make prediction.
    
    # load model from single file
    model = load_model(‘lstm_model.h5’)
    # make predictions
    yhat = model.predict(X, verbose=0)
    
    The new data (pollution.csv) is the input file. We have to scaling the data like the code in this post and giving the same number of input for the model.
    
    Pollution[t-1], DEWP[t-1], TEMP[t-1], PRES[t-1], cbwd[t-1], lws[t-1], Is[t-1] and Ir[t-1] are the inputs needed by the LSTM model.
    Pollution[t] is the output which is going to be predicted by the LSTM model.
    
    However, the new data (pollution.csv) is not the data for training and testing. We do not have the data for Pollution at the beginning. It is a blank column in the csv file.
    
    In training and testing, you are inserting the known value of all Pollution[t-1] as one of the input for the model. However, if you are going to make a prediction on pollution with new data using a trained and tested model, what would you insert and how would you insert?
    
    The value of all rows of Pollution[t-1] is missing and our model do not allow us to ignore this input as it is trained based on this input format. We have to give the same number of different input for the model.
    
    My question is:
    “If you are going to make a prediction based on the above pollution example and this website “https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/#comment-540145”, how would you prepare the input “X” for the model?”
    
    # load model from single file
    model = load_model(‘lstm_model.h5’)
    # make predictions
    yhat = model.predict(X, verbose=0) #This is the X that I don’t know how to prepare it.
    
    Reply
    - Jason Brownlee June 21, 2020 at 6:17 am #
      
      Generally, after you choose the model/model config, you must train a final model on all available data.
      https://machinelearningmastery.com/train-final-machine-learning-model/
      
      You must prepare new data using the same scaler objects used to prepare training data, you may need to save them as well:
      https://machinelearningmastery.com/how-to-save-and-load-models-and-data-preparation-in-scikit-learn-for-later-use/
      
      You can then make ad hoc predictions with one or more samples prepared using the shape expected by the model (specified via hte input_shape argument):
      https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
      
      And here:
      https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
      
      If the model expects observations from t-1 for t, then you can retrieve this information from the last observation in the training dataset, e.g. you can construct an appropriate input sample for the prediction you want to make.
      
      Perhaps review this to understand LSTM inout shape:
      https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
      
      Then review how we prepared the data and what exactly we chose as timesteps and features for each sample.
      
      I hope that gives you some ideas. If it is still challenging, start with these simpler examples:
      https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
      
      Reply
      - Johnny Liu June 23, 2020 at 6:34 pm #
        
        Thanks for response!
        I have tried to save the above model, load it and make prediction.
        However, PM2.5 concentration is missing in the new data as it is base on the prior predicted output.
        
        Even I have 1 sample of PM2.5 concentration at the beginning, it is also impossible to make prediction as it fails to scaling between 0 and 1 when there is only 1 value of PM2.5 in the new data.
        
        For feed back, I make a for loop and make prediction for 1 instance in every loop so that I can use the current output as the next input.
      - Jason Brownlee June 24, 2020 at 6:27 am #
        
        Not sure how I can help further, you may need to debug it yourself:
        https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
      - Johnny Liu June 23, 2020 at 6:46 pm #
        
        Moreover, I don’t understand why can we use the actual PM2.5 concentration as the testing input directly before the prediction?
        
        Shouldn’t the testing input be the prior predicted output instead of prior actual output? Shouldn’t testing part simulate the real prediction and compare the predicted output and actual output? If we use the prior actual output as the input, would it be inaccurate?
        
        We would not have the output in real application.
      - Jason Brownlee June 24, 2020 at 6:28 am #
        
        You can frame the prediction problem (inputs and outputs) anyway you wish then fit and evaluate a model on that framing.
    - Sanraj July 8, 2021 at 5:22 pm #
      
      Hi Johnny,
      
      I was curious if you managed to figure out how to feedback the output of the previous time step into the input for the next time step using a loop?
      
      Reply
Gopi June 22, 2020 at 12:10 am #

Hi Jason ,

I am following your article for multivariate forecasting using lstm. i am forecasting next timestep and in my case it has three input and three output features. can u give some reference or any article which you already did?

Reply
- Jason Brownlee June 22, 2020 at 6:15 am #
  
  This will help you understand how to prepare your data:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
Shekhar P June 24, 2020 at 10:36 pm #

Hello Sir,
Do I need to check the stationarity of time series in this case also? Do trend and seasonality needs to be considered separately here or are those terms taken care implicitly here?

Reply
- Jason Brownlee June 25, 2020 at 6:18 am #
  
  It depends on the dataset and choice of model. Perhaps try differencing and see if it makes a difference to performance.
  
  Yes, generally it is a good idea to first seasonally difference, then remove trend.
  
  Reply
Shilpa June 27, 2020 at 3:06 am #

Hello Sir,

I am trying to execute the code you have provided. However, at the beginning itself, it is giving error.
I am trying parser code where year, month, day and hour are being converted as date. It is giving the following value error.

ValueError: ‘year’ is not in list

How can I correct this error?
Thanks.

Reply
- Jason Brownlee June 27, 2020 at 5:35 am #
  
  I’m sorry to hear that, this may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Shilpa June 27, 2020 at 6:08 pm #
    
    Thank you, Sir.
    
    Regards,
    Shilpa
    
    Reply
    - Jason Brownlee June 28, 2020 at 5:44 am #
      
      You’re welcome.
      
      Reply
Priya June 27, 2020 at 5:03 pm #

Thank you for the great tutorial, It is really helpful and multiple lag improves the result than single lag for my problem. I have one doubt here that in multiple lag time-step example, for inverse scaling you are taking concatenation of yhat with last 7 columns of test_X. It means you are taking one time lag variables in concatenation. My question is, can we take two time step lag variables rather than one time lag, because our ultimate aim is to make 8 columns vector for inverse scaling here. If not then please explain why?

Reply
- Jason Brownlee June 28, 2020 at 5:44 am #
  
  You’re welcome.
  
  We must provide data to the transform both for the transform and the inverse.
  
  We do concat the target with the other field to invert the scaling, but we discard all of the other values and only focus on the target variable after the transform is inverted. The columns do not interact.
  
  Reply
  - Priya July 2, 2020 at 7:12 pm #
    
    Thank you for your response. I have one more doubt that for my time-series forecasting problem, I have applied all the necessary data pre-processing steps for example- missing data points, outliers removal and trend or seasonality correction. But still for 657 testing dataset, I am getting RMSE around 50 with LSTM model. Can you suggest me some other things that I can apply to improve it? One of the reason for high RMSE can be the bad quality of data. right?
    To get the RMSE in %, should I divide 50 by sqrt of 657? If I do this then I get 1.95 means 195%. And I think it is not an acceptable error. So sir, please guide me.
    
    Reply
    - Jason Brownlee July 3, 2020 at 6:13 am #
      
      You’re welcome.
      
      Yes, some of the suggestions here may help:
      https://machinelearningmastery.com/start-here/#better
      
      And here:
      https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
      
      Reply
maozhihao July 1, 2020 at 12:26 pm #

Hello Jason,now I have 246 time series of different lengths and each time series can be considered as sample.But I don’t know how to input different length of time series sample into lstm. Can you give some reference or any article which you already did?

Reply
- Jason Brownlee July 1, 2020 at 1:24 pm #
  
  Good question, see this:
  https://machinelearningmastery.com/data-preparation-variable-length-input-sequences-sequence-prediction/
  
  Reply
Leandro July 2, 2020 at 2:44 am #

Thank you very much.
I was removing and putting variables and analyzing the RMSE variation. The thing is, I don’t see it changing.

As far as I understood, the variable “pollution” is also used as a predictor. There is the possibility of removing it and seeing how the RMSE varies.

Reply
- Jason Brownlee July 2, 2020 at 6:27 am #
  
  Yes we predict pollution, and use lag obs as input.
  
  Yes, you can remove it as an input to the model and compare performance.
  
  Reply
  - Leandro July 3, 2020 at 12:13 am #
    
    Thank you. I wanted to see how I can do it?
    
    I was modifying “reframed.drop (reframed.columns)” but I get the following error message: “operands could not be broadcast together with shapes”. I understand the message but I don’t know how you can eliminate the variable “pollution” in another way.
    
    Reply
    - Leandro July 3, 2020 at 4:17 am #
      
      Is the following modification correct?
      
      #train_X, train_y = train [:,: -1], train [:, -1]
      
      to
      
      train_X, train_y = train [:, 1: -1], train [:, -1]
      
      and then
      
      inv_yhat = concatenate ((yhat, test_X [:, 1:]), axis = 1)
      
      to
      
      inv_yhat = concatenate ((yhat, test_X [:, 1:], yhat), axis = 1)
      
      and the same with inv_y
      
      Reply
      - Jason Brownlee July 3, 2020 at 6:26 am #
        
        Perhaps try it and see? I don’t have the capacity to debug code for you, sorry.
    - Jason Brownlee July 3, 2020 at 6:19 am #
      
      I’m eager to help, but I don’ have the capacity to prepare code for you.
      
      This is an advanced tutorial. If pandas data prep is challenging for you, perhaps start with some simpler tutorials here:
      https://machinelearningmastery.com/start-here/
      
      Reply
Robert July 3, 2020 at 12:39 am #

Hey Jason, thanks for all your guides, they are very helpful. Do you have any tips on irregular time-series forecasting from multiple data sources?

What I’ve tried for now is resampling data-points and aggregating the data, however both methods are not ideal.

I’m working with 3 databases all collecting different parameters at different time-points, there is no regularity and data points across databases are linked by a unique ID.

Reply
- Jason Brownlee July 3, 2020 at 6:21 am #
  
  Yes, this may give you some ideas:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-handle-discontiguous-time-series-data
  
  Reply
Aishwarya Sarkar July 3, 2020 at 8:26 am #

Hi Jason,

Thanks a lot for this tutorial. I have one question though – is it possible to include target day’s features in the prediction as well? In my problem statement, I have time step=7, each having 3 features – var1, var2, var3, and I am trying to predict var3 for the 8th day (t) using historical data of var1, var2, var3 from t-7 to t-1, is it possible to use var1 and var2 of the t (8th day) into the whole training to predict the value of var3 for the same day? My var3 is heavily dependent on var1 and var2.

Thanks,
Aishwarya

Reply
- Jason Brownlee July 3, 2020 at 2:22 pm #
  
  Yes, you might need to use a multi-input model, one for the sequence input and one for the static input.
  
  This will help:
  https://machinelearningmastery.com/keras-functional-api-deep-learning/
  
  Reply
Dwyane July 3, 2020 at 1:23 pm #

Hi Jason,
Your article is great. Helped me a lot. But I have a question in the follow-up. After the training model is completed, how to call the model to make real-time predictions?I really hope to hear from you.Thanks.

Reply
- Jason Brownlee July 3, 2020 at 2:24 pm #
  
  Thanks.
  
  Call model.predict()
  
  Learn more here:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
Firas Obeid July 8, 2020 at 5:37 am #

Can we not scale our y label and leave it like it is or is it a must to also scale it?

Reply
- Jason Brownlee July 8, 2020 at 6:38 am #
  
  You can invert the transform prior to calculating a metric as we do in the tutorial.
  
  If this is a new idea, see this:
  https://machinelearningmastery.com/how-to-transform-target-variables-for-regression-with-scikit-learn/
  
  Reply
  - Firas Obeid July 8, 2020 at 7:56 am #
    
    Absolutely amazing tutorials !
    Thank you.
    
    Reply
    - Jason Brownlee July 8, 2020 at 1:42 pm #
      
      Thanks.
      
      Reply
Hoda July 8, 2020 at 4:24 pm #

Thank you Mr Brownlee For this great article.
I have a question
I have some entities that every on have a Multivariate Time Series for some parameters.
You can think of it as a matrix whose columns are the parameters and the rows are the timestamp to record the parameters.
I need one Dimensional Embedding Vector for every entity.
I execute this tutorial and in final connect the encoder LSTM as the output layer
but the output is a matrix again,
how can I get one dimensional vector as out put of encoder ?
I will be very thankful if you guide me in this problem.

Reply
- Hoda July 8, 2020 at 6:43 pm #
  
  in another word suppose you have some sensor that record multivariate data during time
  what is the best approach for embedding these data into fix length vector?
  
  Reply
  - Jason Brownlee July 9, 2020 at 6:39 am #
    
    There are no best approaches. Try a suite of method and discover what works best for your specific dataset.
    
    Reply
- Jason Brownlee July 9, 2020 at 6:36 am #
  
  You would use an LSTM autoencoder:
  https://machinelearningmastery.com/lstm-autoencoders/
  
  Reply
Deepak Joshi July 13, 2020 at 5:54 pm #

Hi Jason,

Thanks for the great! work.

One think I am having trouble in understanding is that how do you specify which feature needs to be predicted? You are passing 8 features in this example, Is the model predicting all 8 features?

Thanks

Reply
- Jason Brownlee July 14, 2020 at 6:16 am #
  
  You must start with a strong definition of the problem you are modeling to know what to predict, see this framework:
  https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
  
  Reply
Deepak Joshi July 13, 2020 at 5:57 pm #

Also, I ‘ve feature which are like user name and countries, these are mostly static, even if I encode it for the same input lets say [0,1] I’ll get the same output for different time series dates.
How do we solve this issue?

Thanks

Reply
- Jason Brownlee July 14, 2020 at 6:17 am #
  
  Perhaps you need more information/variables to help predict your target.
  
  Reply
Ben July 14, 2020 at 5:29 am #

I get that we are dropping the columns we do not want to predict. I notice that there are 24 columns(v1(r-3)….v8(t)). why exactly 9,10,11,12,13,14,15. Can’t we drop 17,18,19,20,21,22,23,24?
reframed = series_to_supervised(scaled, n_hours, 1)
reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)

Reply
- Jason Brownlee July 14, 2020 at 6:31 am #
  
  You can frame the problem any way you wish.
  
  Reply
Benjamin Hong July 14, 2020 at 6:22 am #

Do you know that is the time index of inv_y?

Reply
- Jason Brownlee July 14, 2020 at 6:32 am #
  
  Sorry, I don’t understand your question, can you please elaborate?
  
  Reply
Thony July 14, 2020 at 6:26 am #

Hi Jason, is there a rule of thumb to set your validation data for hyper parameters tuning?
Thanks

Reply
- Jason Brownlee July 14, 2020 at 6:32 am #
  
  33% is a rule of thumb. Find what is appropriate for your specific dataset.
  
  Reply
Victor July 15, 2020 at 12:47 pm #

Hi Jason, a question regarding the post. After fitting the model, when you predict on the test set, is the model updated after each new observation it sees or does the model remain the same after the fitting procedure on the train set?

Reply
- Jason Brownlee July 15, 2020 at 2:00 pm #
  
  You can choose to update the model with new data or not.
  
  In this case we don’t update the model.
  
  You can see more examples here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Oleksii July 15, 2020 at 2:26 pm #

Thank you for great post!
Could you help to understand how to transform the data in case if we have multiple multivariative time series of different length?
For example if we had pollution dataset from 1000 points in one city and time not aligned, means data from one point is Jan.-Nov.2018, another – Jul-Dec.2018, Sep.2018 – Jun2019, etc.
(Do not take into account seasonality, just different length).
So I’m stuck how to feed and correcftly train single model for such case..

Reply
- Jason Brownlee July 16, 2020 at 6:27 am #
  
  Yes, one approach is to zero pad all time series to the same length and use a masking layer to ignore the padded values.
  
  This will help with padding:
  https://machinelearningmastery.com/data-preparation-variable-length-input-sequences-sequence-prediction/
  
  This will help with masking:
  https://machinelearningmastery.com/handle-missing-timesteps-sequence-prediction-problems-python/
  
  Reply
  - Oleksii July 17, 2020 at 1:20 am #
    
    Thanks, Jason!
    Already performed padding – surprisingly results not so bad.
    
    Reply
    - Jason Brownlee July 17, 2020 at 6:22 am #
      
      Nice work!
      
      Reply
Victor July 16, 2020 at 3:33 am #

Hi Jason, a question: since LSTM has memory, isn’t it by construction using multistep time lag? In other words, in your second part about the multistep time lag features, isn’t this construction redundant?

Reply
- Jason Brownlee July 16, 2020 at 6:45 am #
  
  Yes, across samples and timesteps.
  
  It is more computationally efficient to use time steps this way. You can change it to be across samples if you like and make it stateful:
  https://machinelearningmastery.com/stateful-stateless-lstm-time-series-forecasting-python/
  
  Perhaps start here:
  https://machinelearningmastery.com/start-here/#lstm
  
  Reply
Cristian July 16, 2020 at 8:41 am #

Great work and tutorials Jason!

I have a large dataset with 500 consumers and consumptions every 15 min for 3 months. How can group each consumer in order to create a consumptions patterns? (wich code or library). I work in a project with python, to detect electricity theft, and any comments or suggest are very important for me, as I’m a begginer in programming.

Thank’s for all!

Reply
- Jason Brownlee July 16, 2020 at 1:50 pm #
  
  Perhaps time series clustering. Sorry, I don’t have tutorials on this topic.
  
  Reply
  - Cristian July 19, 2020 at 8:09 am #
    
    Do you have tutorials for PCA? in order to reduce a large amount of data.
    
    Thanks for you reply!
    
    Reply
    - Jason Brownlee July 19, 2020 at 1:37 pm #
      
      Yes many (use the blog search), perhaps start here:
      https://machinelearningmastery.com/principal-components-analysis-for-dimensionality-reduction-in-python/
      
      Reply
Kamoga Hassan July 16, 2020 at 7:06 pm #

Greetings Jason Brownlee, I love how you make your tutorials so easy to follow and make them much easier to understand so much about machine learning ..
I have a challenging task, I do a time-dependent experiment, my experiment follows 10 tests with each test recorded every minute for 66minutes.
In Excel, the 10 tests show similar repetitive trends, over 66 minutes.
I have read about date-time, where periods are considered for 24hrs or even a year, how can I manipulate mine for a period of 66 minutes?
Will be grateful if I can have your email to forward to you a sample of my data.

Reply
- Jason Brownlee July 17, 2020 at 6:13 am #
  
  Thanks!
  
  Models like neural nets are not concerned about the interval, you should be able to model the data directly.
  
  Reply
  - Hassan Kamoga July 19, 2020 at 11:00 am #
    
    Can I share a bit of my data .. you get to see my challenge. kamogahsn@gmail.com
    
    Reply
    - Jason Brownlee July 19, 2020 at 1:42 pm #
      
      Sorry, I don’t have the capacity to review/code data.
      
      Reply
LiuXiangfei July 18, 2020 at 7:49 pm #

hello,Jason.
Do you have used PM2.5 data for Multi-step Time Series Forecasting with Long Short-Term Memory Networks in Python

Reply
- Jason Brownlee July 19, 2020 at 6:29 am #
  
  This will help you to get started for multi-step forecasting:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
Qizal Ashfaq July 21, 2020 at 1:42 am #

what do you mean by forecast at current hour ? it means hour at which data is available? i am confused because you are taking previous hours data and predicting next hour so it should not be called hour ahead prediction ?

Reply
- Jason Brownlee July 21, 2020 at 6:05 am #
  
  We are predicting the next hour after the data used as input. It is a forecast.
  
  Perhaps this will help:
  https://machinelearningmastery.com/time-series-forecasting-supervised-learning/
  
  Reply
Qizal Ashfaq July 22, 2020 at 2:39 am #

yess my point is cleared thanks.how can we use this code for 24 hour ahead prediction ?where changes should be made ?

Reply
- Jason Brownlee July 22, 2020 at 5:43 am #
  
  You can use the model recursively, one model for each time step, or design a model to make 24 hour predictions directly:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
Qizal Ashfaq July 22, 2020 at 2:45 am #

how to use this code for predicting beyond data? there is only training plus it is testing on given data? how to predict value for the hour next to this where data of pollution is not given ?

Reply
- Jason Brownlee July 22, 2020 at 5:43 am #
  
  Fit the model on all data and call model.predict() to make a prediction out of sample.
  
  Reply
  - Qizal Ashfaq July 23, 2020 at 2:51 am #
    
    i have predicted one hour beyond data now how should i use this for next hour prediction?should i use that predicted value for next hour prediction and what should i use for other input variables? i am using 3 previous timesteps which is given below.Last six are my other dependent variables and first is which i want to predict.
    X=[[0,12.7,1.1,90, 0, 0,71],[0,12.1,2.1,93,0,0,41],[0,11.7,2.3,93,0,0,39]]
    And ypredicted=0.2465
    now in second prection tell me i can only replace one value what should i keep other values.
    
    Reply
    - Jason Brownlee July 23, 2020 at 6:22 am #
      
      If you are using the model recursively, then the subsequent prediction would use the output of the last prediction as input.
      
      Alternately, you can frame the problem/train the model to make multi-step prediction directly:
      https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
      
      Reply
  - Qizal Ashfaq July 23, 2020 at 3:51 am #
    
    or should scaling/normalization necessary for new data which i have given in model.predict?or only training data needs scalling?
    
    Reply
    - Jason Brownlee July 23, 2020 at 6:23 am #
      
      All input data must be prepared in an identical manner. This includes training data, test data, validation data and new data.
      
      Reply
Ben Hong July 24, 2020 at 9:10 am #

Hi, thanks a lot for your wok about LSTM. I really appreciate it. However, there is some code that I don’t understand.
Here is the code:
inv_yhat = concatenate((yhat, test_X[:, -7:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:, 0]

When we invert scaling for yhat, why do we use -7 especially? I get that we are trying to concatenate the yhat and the last seven features of the data, but why do we do that?

Thank You

Reply
- Jason Brownlee July 24, 2020 at 10:37 am #
  
  We are only interested in inverting the target, but the transform requires the same columns when inverting as when transforming. Therefore we are adding the target with other input vars for the inverse operation.
  
  Reply
Shilpa July 27, 2020 at 3:52 am #

I am trying the given code as it is. However, it is giving me an error of index 4 is out of bound for axis 1 with size 0 at the code line ” values[:,4] = encoder.fit_transform(values[:,4])”

Sir, can you tell me where I am making mistake?

Reply
- Jason Brownlee July 27, 2020 at 5:51 am #
  
  Sorry to hear that, this will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Johnny Liu July 29, 2020 at 5:04 pm #

Hi Jason,
I know that LSTM and RNN is used for predicting a curve with pattern.

Does it mean LSTM is not suitable for predicting a logarithmic decay curve? It is because a logarithmic decay curve will not repeat the previous pattern, it will keep dropping increasingly faster.

Is it better to use ANN to predict a logarithmic decay curve instead of LSTM?

Reply
- Jason Brownlee July 30, 2020 at 6:19 am #
  
  If you know a curve is log, use a log function directly. No need for a more complex model.
  
  Reply
  - Johnny Liu July 30, 2020 at 11:40 am #
    
    The curve is act like a log function but it is not actually a log function. It is totally different with log function.
    
    The curve will only drop and drop more quickly depending on several inputs.
    
    Most importantly, I do not have the equation for the relationship between the input and output. It is absolutely not as simple as log function. I will never know how much should it drop. I only know that it must drop faster than previous time steps.
    
    It is a real application for predicting the asphalt stiffness according to the environment parameter and the previous stiffness.
    
    In this case, is RNN suitable for this application? RNN is used for predicting the repeated pattern in the future according to the same pattern appeared in the previous time step. Can RNN predict a decay curve in the above application? There is no repeated pattern in a decay curve.
    
    Reply
    - Jason Brownlee July 30, 2020 at 1:46 pm #
      
      My gut says no, but perhaps try it and also try to make the data stationary and try a suite of models in order to discover what works best.
      
      Reply
      - Johnny Liu July 30, 2020 at 3:16 pm #
        
        Thanks for your reply.
        I have already tried it but in vain.
        There is no problem in the training.
        However, when it comes to unknown new data, the prediction always drop from the maximum to the minimum no matter what is the range of time and inputs are.
        It should not be happened. The end of the curve should be depended on the inputs. It can be stopped at a point closed to the beginning point when the range of time of the dataset is short.
        
        Thank you for your answer again. I have been confused about this point for a month. I cannot search anything about decay curve and RNN and I doubted of the feasibility of using RNN for this application.
        
        The problem is solved now. I decide to give up using RNN and concentrate on ANN. Thank you.
      - Jason Brownlee July 31, 2020 at 6:14 am #
        
        Perhaps explore alternate models.
Eduardo July 31, 2020 at 7:46 am #

Hi Jason,

I plotted actual vs prediction and this appears to simply be predicting y(t+1) = y(t).

Any idea to address this issue?

Great post!

Regards,

Reply
- Jason Brownlee July 31, 2020 at 1:38 pm #
  
  Yes, you can try alternate model configuration, alternate learning configuration, alternate models, alternate data preparation, etc.
  
  Reply
Rajesh Maddu August 6, 2020 at 5:59 am #

Hi Jason,

Great Post.

In my data set:

X ->Air temperature Values; Y->Water Temperature values; objective is predict the Water temp.

After frame as supervised learning –
var1(t-1) var2(t-1) var1(t)
1 0.752294 0.891892 0.788991
2 0.788991 0.864865 0.779817
3 0.779817 0.864865 0.816514
4 0.816514 0.918919 0.770642
5 0.770642 0.864865 0.807339

Here var1: Water Temp & var2 – air temp

After prediction I am getting high RMSE value (Say 4.5, which is not acceptable), AM i missing something here? How to improve RMSE value?

# make a prediction
yhat = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
# invert scaling for forecast
inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]
# calculate RMSE
rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
r2score = r2_score(inv_y, inv_yhat)
print(‘Test RMSE: %.3f’ % rmse)
print(‘Test R2: %.3f’ % r2score)

Reply
- Jason Brownlee August 6, 2020 at 6:20 am #
  
  Some of the suggestions here will help:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
Rajesh Maddu August 10, 2020 at 11:37 pm #

Hi Jason,

Can we couple wavelets as pre-processing step to LSTM for better model accuracy? Any sample code for Wavelets?

Reply
- Jason Brownlee August 11, 2020 at 6:33 am #
  
  Perhaps.
  
  I don’t have examples sorry.
  
  Reply
hengheng August 17, 2020 at 7:38 pm #

Hi Jason,
The above example is a direct prediction strategy or a recursive prediction strategy？

Reply
- Jason Brownlee August 18, 2020 at 6:03 am #
  
  Neither, it is a one step forecast.
  
  Reply
Tobias August 20, 2020 at 9:36 am #

Dear Jason,

splendid code and explanation, as always ;-).

Of course, there is a subsequent question ^^’

My data set consists of 13500 stations. Each one delivered once a year in 18 years values for 16 features. I.e., the shape of the data set is (objects, timesteps, features): (13500,18,16).

One of the features is the target feature, i.e. y=(13500,18,1), X=(13500,18,15)

The data-set is train-test-split and scaled and the stations shuffeld, e.g. station 4 is on place 444, but their internal 18 year time series data remains untouched.

The LSTM-NN is trained on X_train/y_train (12000,18,15)/(12000,18,1) and shall predict the target value time series for all the test stations based on X_test (1500,18,15).

How would you realize such a “Multi object and Multi variate input, Multi object and single output” task, especially regarding Data Feed-In and LSTM/Mixed-LSTM-Networ constellation?

Best regards,
Tobias

Reply
- Jason Brownlee August 20, 2020 at 1:36 pm #
  
  Thanks!
  
  Great summary. It looks like you are predicting one output for each input time step for each variable.
  
  The model would have f nodes in the output layer, one for each feature, and a an encoder-decoder could be used to output each time step.
  
  I think the examples in this tutorial will help to get you started:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
  - Tobias August 21, 2020 at 5:32 am #
    
    Thanks for the swift reply!
    
    I guess my case is like
    
    https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
    
    Multiple Parallel Series
    
    Just that I would have
    
    a) 13500 parallel series (13500 stations)
    b) each of the series has 15 features
    
    Wouldn’t that end up to be 4D: Number of Samples, Number of Time Steps, Number of Stations (13500), Number of Features(15)? How could one deal with this?
    
    Reply
    - Jason Brownlee August 21, 2020 at 6:37 am #
      
      You can combine data across stations, e.g. learn across stations.
      
      Or have one model per station, perhaps insane, but I don’t know what kind of resources you have access to. See this:
      https://machinelearningmastery.com/faq/single-faq/how-to-develop-forecast-models-for-multiple-sites
      
      Reply
      - Tobias August 21, 2020 at 7:49 am #
        
        It would be one model for all the sites. So learning across stations. And it’s exactly the very essence of the problem I face. How must the data be formated, that this works out? Is it possible to do this with sequences?
        
        Definitely not one model per station, D
Valentin Mayr August 25, 2020 at 12:25 am #

Hi Jason,
thank you for sharing your insights! I was able to build an LSTM Model to predict a time series based independent but somewhat correlated factors.
I would like to analyze the impact of two of these factors on the dependent variable. I tried PCA, but the result does not really tell me about the contribution to the dependent variable.
Is there any method you would recommend to evaluate the impact of collinear independent variables on a dependent variable?
Thank you again!
Valentin

Reply
- Jason Brownlee August 25, 2020 at 6:42 am #
  
  You’re welcome.
  
  I’m not sure off the cuff, sorry.
  
  Reply
sergio August 29, 2020 at 8:58 pm #

Hi Jason,
it is a super helpful tutorial!
I was be able to apply the LSTM technique to a multivariate time series (in csv format) including voip traffic along with several features and the results are interesting.

I was also trying to perform a comparison with MLP.
I’ve tried to follow a similar tutorial provided by you on this topic (https://machinelearningmastery.com/how-to-develop-multilayer-perceptron-models-for-time-series-forecasting/) but a different coding structure has been used, e.g.:

– split_sequence function has been used for MLP and not the series_to_supervised used for LSTM here
– no normalizing feature step in MLP as for LSTM
– no inverse transform in MLP as in LSTM
– no clear distinction between train and test in the MLP example

Since i’m not so familiar with python libraries, is there an MLP-based example looking similar in structure to the LSTM one you proposed in this post?

Thanks in advance,
Sergio

Reply
- Jason Brownlee August 30, 2020 at 6:38 am #
  
  Thanks. Well done!
  
  Yes, this will get you started:
  https://machinelearningmastery.com/how-to-develop-multilayer-perceptron-models-for-time-series-forecasting/
  
  Or you can replace the LSTM with an MLP.
  
  Reply
Ming September 1, 2020 at 6:03 pm #

hi,Jason:
I have a question is that before the training i normalized my data use MaxMinScaler ,after training I saved my model as a file .In other application ,I will use this model file to predict ,so
first step is load the model file ,second input data but data must normalized ,how can i normalized data to predict?

Reply
- Jason Brownlee September 2, 2020 at 6:25 am #
  
  You can save the minmaxscaler object as well, then load it and use it to prepare new data.
  
  Reply
Edrin September 3, 2020 at 2:21 am #

Hi Jason,

Thanks for the great example.

I have a question about the prediction step on this example. Here we are validation the model on the test dataset on which we have the multivariates.
However considering the realistic scenario of trying to predict the pollution for the next X days in the future we don’t know the values of the multivariates of t-1 to predict t.
Is there any LSTM setup like with multiplesteps that can help to achieve this?

Reply
- Jason Brownlee September 3, 2020 at 6:09 am #
  
  We evaluation the model using walk-forward validation that estimates the capability of the model when making predictions on data not seen during training.
  
  You can learn more about this approach to model evaluation here:
  https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
  
  Reply
André de Sousa Araujo September 6, 2020 at 4:00 am #

Hi Jason,

In this experiment, Have you used walk-forward validation? So, this subset (test_X, test_y) was used in the training step or just to validation?

I couldn’t understand how you have used walk-forward validation (with unseen data during the training) and at the same time another subset to validate.

Follow the expert’s advice: Which subset you consider in this experiment train, test, and validation?

Reply
- Jason Brownlee September 6, 2020 at 6:09 am #
  
  We, do, but the model is fixed so we don’t need to enumerate each time step manually.
  
  It would be better to use the approach listed here:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/
  
  Reply
  - André de Sousa Araujo September 8, 2020 at 12:51 pm #
    
    Thanks, for you quick answer.
    
    Sorry, I did two questions and I don’t follow which one you have answered.
    
    Have you used walk-forward validation (in this experiment)? We, do. => Is this answer correct?
    
    “To speed up the training of the model for this demonstration, we will only fit the model on the first year of data, then evaluate it on the remaining 4 years of data.”
    
    => So, in this experiment, you fit with 1-year data, so, for each epoch you get 72 hours (batch size) train the model and predict the next hour 73th? train more 144 and predict the 145th until finish one-year data (or 50 epochs in this case)… Is this?
    
    And used the model to evaluate which part of 4 years? The entire subset?
    
    Please, Can you explain better how was walk forward using your example? I got very confused because is implicit on keras…
    
    Reply
    - Jason Brownlee September 8, 2020 at 1:42 pm #
      
      Sorry.
      
      The code does not step through walk-forward validation for each prediction in the test set. Instead, we fit the model on the entire training dataset and predict the test set directly with a static model. This is functionally equilivient to walk-forward validation with a static model fit once prior to validation. E.g. less code, simpler to explain, and fast to execute.
      
      Yes, the model is fit on one year and predicts the remaining years. This is very aggressive and was done to keep execution time down.
      
      I hope that helps.
      
      Reply
zhao September 7, 2020 at 12:38 pm #

Hi Jason,
After saving the H5 model of this model, I collect real-time data in another script to call the H5 model. I found that this real-time data needs to be normalized. How do I need to normalize the real-time data in another script with the previous data?

Reply
- Jason Brownlee September 7, 2020 at 1:21 pm #
  
  You can save the scaler objects as well, this post explains how:
  https://machinelearningmastery.com/how-to-save-and-load-models-and-data-preparation-in-scikit-learn-for-later-use/
  
  Reply
André de Sousa Araujo September 9, 2020 at 12:15 am #

Hi Jason,

Do you think that makes sense normalize in [+ 0.2, + 0.8] to helps sigmoid function (inside an LSTM cell) because extreme values of 0 and +1, correspond to values at the infinity of the sigmoid function and are never reached?

Thanks,

André

Reply
- Jason Brownlee September 9, 2020 at 6:52 am #
  
  No, but run a test and find out for your model + dataset + test harness.
  
  Reply
Michelle September 11, 2020 at 10:20 pm #

Hi Jason, thanks for the article.
In multivariate LSTM analysis, can we remove the target from previous time steps as input feature please?

Reply
- Jason Brownlee September 12, 2020 at 6:14 am #
  
  You can frame the problem or configure the model any way you like.
  
  Reply
Andreas September 17, 2020 at 11:46 pm #

Hello, is there an example with a simple neural network that uses all prior data of a timeseries to predict next time step?

Reply
- Jason Brownlee September 18, 2020 at 6:49 am #
  
  Yes, you can find many examples on the blog.
  
  Perhaps start here:
  https://machinelearningmastery.com/how-to-develop-multilayer-perceptron-models-for-time-series-forecasting/
  
  Reply
Rafael Dias September 20, 2020 at 2:13 pm #

Hi Jason,

Thanks for all this content.

I have a binary classification multivariate time series project related to the financial market. Where i perform several measurements on a pair os stocks in order to trade them in a long and short fashion.

I think LTSM is a nice modeling tool for such problem, but i am trying to understand if traditional modeling tools could work too?

There are any other candidates as far as modeling tools go? What about more conventional ones, like Random Forest or gradient boosting, do time series really mess them up?

Thank you.

Reply
- Jason Brownlee September 21, 2020 at 8:04 am #
  
  You’re welcome.
  
  Good question, yes, the suite of standard machine learning models can be used for your problem. I recommend testing a suite of different framings of the problem, as well as diffrent data preparation/models/configs in order to discover what works best for your specific dataset.
  
  This will provide a good starting point for testing standard ml algorithms for time series:
  https://machinelearningmastery.com/xgboost-for-time-series-forecasting/
  
  Reply
Rajesh Maddu September 21, 2020 at 3:03 pm #

Hi,

We have a daily time series dataset of 5478 data points (split 4383 training and 1094 testing) and fit the LSTM RNN model with the reference of your post. It is wokring fine and got good performance (r2score: 0.954; rmsescore:0.528).

When I changed the daily dataset to the monthly dataset, data points are 181 (split 144 training and 36 testings) and fit the LSTM model. Observed that model is giving bad results (r2score: 0.363; rmsescore: 1.794).

For both cases, I have used the below code to fit the model. Do I need to change any settings in the below code Or Am I missing anything here?

model = Sequential()

model.add(LSTM(50,input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.add(Activation(‘linear’))
model.compile(loss=’mean_squared_error’, optimizer=’adam’)

history = model.fit(train_X, train_y, epochs=50, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)

Reply
- Jason Brownlee September 22, 2020 at 6:40 am #
  
  Well done!
  
  It would be a good idea to tune the configuration of the model for each dataset, including data preparation, model architecture, and learning hyperparameters.
  
  This may be a good place to get started:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
Pritha September 21, 2020 at 9:49 pm #

hi,

I, am new to lstm,
how can i predict the intersmples by changing the time interval in LSTM.
i have data for every 15min. but i wnt to predict the data for every 5 mins.

can anyone please help me

Thanks in advance

Reply
- Jason Brownlee September 22, 2020 at 6:46 am #
  
  Prepare input and output samples in the format you require, then train a model on that data.
  
  Reply
Goksu Avdan September 23, 2020 at 9:20 am #

Hi thanks for the tutorial.
I am trying to solve the multi-input problem to predict single output problem. However, my input is not going to be included in the input dataset. Basically, I will predict the “Z” target value at time step (t+1) by using “X” and “Y” input features at the time step (t). In detail, my dataset consists of 120 trials and each trial has 101 time step. So, let’s say I would like to train my model on 100 trials and then test and validate my model on each 10 trials. So, could you please give me some advices about this problem and show me some direction?
Hope you can help me about that.
Have a great day!

Reply
- Goksu Avdan September 23, 2020 at 9:25 am #
  
  Sorry for the correction. In my second sentence, I would like to say that “my OUTPUT is not going to be included in the input dataset.”
  Also, I would like to predict whole trial! That means I will predict whole 101 time step one by one and will compare the results for 101 time steps for each of them by using Correlation Coefficient and RMSE.
  Thanks.
  
  Reply
- Jason Brownlee September 23, 2020 at 1:42 pm #
  
  That sounds like a great project.
  
  Generally, I’d recommend testing a suite of linear, ml and neural net models in order to discover what works best for your dataset:
  https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
  
  Reply
Ron September 23, 2020 at 12:36 pm #

Hi Jason,

I have been following your article to build my own LTSM binary forecasting network. My dataset is simplified as follows: time_stamp, class, f1, f2, f3 where class can be 0 or 1. I want to classify the next instance based on the features and class of the current instance. So my network will then have the input as:

class(t-1), f1(t-1), f2(t-1), f3(t-1)

and my output is class(t). So this means my output Dense layer will be Dense(1, activation=”sigmoid”)

finally my loss function will have to be “binary_crossentropy”.

May I know if the above modification to your code is correct?

Do I need to use “from keras.metrics import binary_accuracy” in place of the “metrics=[‘accuracy’]” part?

Thank you

Reply
- Jason Brownlee September 23, 2020 at 1:47 pm #
  
  Sounds like a good start, perhaps try it and see.
  
  No, accuracy metric is well understood by the keras API.
  
  Reply
Andreas September 25, 2020 at 3:34 am #

Hello Jason,

I am trying to figure out if you are using a walk-forward validation in this example. I can see that this question was asked many times in the past. I am confused because i think you answered this question with a different answer. More specifically, on August 31, 2017 at 6:25 am you said that this is not a walk-forward validation and on April 10, 2019 at 1:44 pm you said the opposite. Am i seeing something wrong?

Thank you

Reply
- Jason Brownlee September 25, 2020 at 6:38 am #
  
  Technically, no.
  
  Reply
Shatha September 28, 2020 at 12:27 am #

How can I remove the seasonality of the dataset?

Reply
- Jason Brownlee September 28, 2020 at 6:20 am #
  
  Using seasonal differencing / seasonal adjustment:
  https://machinelearningmastery.com/remove-trends-seasonality-difference-transform-python/
  
  Reply
Mirko September 28, 2020 at 1:54 am #

Hi!
First, thank you for this article. It helped me a lot in understanding how Keras framework operates. Thank you for that part.

I have one remark, though.
The model trains, yes, but it doesn’t forecast anything as it just learns to copy previous hour pollution. This gives the model best MSE so it’s obvious it will do it. It would do even better if no additional features were not given (just confusing it). This is why you see no improvement when extending number of previous steps (it only needs last value to copy).
Of course, you can say: try other configurations and see yourself, but this is a tutorial and you promised we’ll learn “How to make a forecast”. This is not the case.

I see how many people (in comments) believe this is what it pretends to be (Learn how to make forecast with LSTM), but it is not fair not to explain it doesn’t already in introduction.

Sorry, but it is misleading and you should correct it.

Regards,
Mirko

Reply
- Jason Brownlee September 28, 2020 at 6:22 am #
  
  You can make a forecast by calling mdoel.predict() We do this as part of evaluating the model.
  
  Also see this:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  And this:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
  - Dmitry October 24, 2020 at 6:19 pm #
    
    Jason, let me clarify Mirko’s point. They are talking about the real forecasting but just a DNN inference. The article’s introduction should have clear “disclaimer” that this is just an example of how to deal with LSTM only and that the actual real-world forecasting is a way way too complex problem that implies decent domain knowledge as well as plethora of data at hand. Examples of solving such problems deserve a special series of articles!
    
    Reply
    - Jason Brownlee October 25, 2020 at 6:59 am #
      
      Such as those listed here:
      https://machinelearningmastery.com/start-here/#deep_learning_time_series
      
      And this book:
      https://machinelearningmastery.com/deep-learning-for-time-series-forecasting/
      
      Reply
NamelessGhoul September 28, 2020 at 8:08 pm #

In imports you should use tensorflow.keras instead of keras:

E.g. change:

from keras.models import Sequential

To:

from tensorflow.keras.models import Sequential

Reply
- Jason Brownlee September 29, 2020 at 5:35 am #
  
  No, the example uses the standalone version of the Keras library.
  
  Reply
Michelle October 8, 2020 at 2:17 am #

Hi Jason, thanks for the article.
In such a setup, using the target variable from previous time step also as a feature variable can almost always get not a bad prediction as the worst case the prediction from this time step can take directly also the value from previous time step. That is why we often see with such a setup, the prediction curve is slightly shifted from the ground truth curve.
I would say it makes more sense to make a multi-variate analysis without using the target variable as feature. This is much more challenging to set up such a LSTM architecture of sequence to sequence prediction.
Do you have also a post in this aspect please?

Many thanks.

Reply
- Jason Brownlee October 8, 2020 at 8:34 am #
  
  Yes, this is called a persistance forecast and all models must be compared to it to see if they have skill.
  
  Sure, you can structure the prediction problem any way you wish based on the requirements of your problem.
  
  These models will help you to get started:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Bruce October 15, 2020 at 5:49 pm #

Hi Jason, thanks for the article.
I have a question after reading this article. After training, what should I do if I need to deploy the model to a Linux server for retraining? Looking forward to your answer

Reply
- Jason Brownlee October 16, 2020 at 5:51 am #
  
  You’re welcome.
  
  What do you mean retraining?
  
  Reply
Valdemar Sousa October 16, 2020 at 8:41 pm #

hi jason, i am working on a project that deals with infrastructure alarms and i want to develop a ML model capable of predicting the next alarms (time series problem).
Specifically, my data is a stream of alert data, where at each time stamp, information such as the alert monitoring system, the location of the problem etc. are stored in the alert. These fields are all categorical variables.
I am still undecided as to which time series machine learning model to use. Will you be able to give some hint of the “best” models for these problems, or any article of yours that has a similar problem?

Reply
- Jason Brownlee October 17, 2020 at 6:02 am #
  
  Good question, I recommend testing a suite of algorithms and discover what works best for your specific dataset.
  
  Reply
samavia October 20, 2020 at 1:42 am #

thanks… i need an lstm code for GDP data to predict 10 years GDP… kindly send me code

Reply
- Jason Brownlee October 20, 2020 at 6:26 am #
  
  Perhaps start here:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Salman Shahid October 20, 2020 at 9:14 pm #

”
First, the “pollution.csv” dataset is loaded. The wind speed feature is label encoded (integer encoded). This could further be one-hot encoded in the future if you are interested in exploring it.
”
It is not the wind speed feature that you are label encoding. It is the wind direction feature that you are label encoding.

Reply
- Jason Brownlee October 21, 2020 at 6:39 am #
  
  Thanks. Fixed.
  
  Reply
Mike Pang October 22, 2020 at 3:00 pm #

Hi Jason, I have few questions for these lines of code here :

# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))

if I would like to make predictions for multi-step (taking past 5 values to predict 5 future values), which means I will have to change to :

train_X = train_X.reshape((train_X.shape[0], 5, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 5, test_X.shape[1]))

is this correct method ?

or should I follow this tutorial instead
https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/

Reply
- Jason Brownlee October 23, 2020 at 6:02 am #
  
  You can make a prediction by calling the predict() function with the relevant input data.
  
  More details here:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  And here:
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  Reply
Dmitry October 24, 2020 at 3:33 am #

First, thank you so much for the job done. I’m a software engineer growing into the Deep Learning so your articles are very helpful to kick-in.
However, being an engineer by nature, I’m a little bit confused. Why is everyone used to think that this particular pollution forecasting problem is solvable at all with the data provided? It’s definitely not an AR problem. Weather data is most likely secondary. The most relevant features would have been transportation traffic and factories load. Even indirect data such as an electricity consumption might be helpful.
I played with a toy DNN and expectedly observed how the model is unable to converge once the important data is eliminated from the input.

Reply
- Jason Brownlee October 24, 2020 at 7:09 am #
  
  Agreed. Take it as a demo for the method.
  
  Reply
pouyan October 26, 2020 at 2:56 am #

Hi Jason, thanks for your complete tutorial. I have one question: when we want to predict next n values we have to set n future values as label or target. in the architecture of the LSTM model how can we set multi output? I know that there is a possibility in keras to set multi output for my model but dont know how. Can you guide me on this topic please. thanks in advance.

Reply
- Jason Brownlee October 26, 2020 at 6:51 am #
  
  See examples here:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Jephter Kapika Pelekamoyo October 26, 2020 at 8:04 pm #

Am getting this error:
ValueError: operands could not be broadcast together with shapes (35061,8) (11,) (35061,8)
when I run the code above.

Reply
- Jason Brownlee October 27, 2020 at 6:43 am #
  
  Sorry to hear that, these tips may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Alex October 27, 2020 at 3:02 am #

Really great tutorial! I a familiar with python but very new to machine learning and have been reading through and practicing the material in your books and online. One question I have though is what does the actual predicted output look like. Here we have trained the model but the goal is to predict the pollution at a future time. When well call the model.(predict) how do we interpret the results? Basically where/what is the predicted value at a future time?

I have already referenced

https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/

Reply
- Jason Brownlee October 27, 2020 at 6:48 am #
  
  Thanks.
  
  A prediction requires an input sample and the output of the model is related directly to the input sample.
  
  Perhaps I don’t understand your question?
  
  Reply
  - Alex October 27, 2020 at 12:49 pm #
    
    My apologies I realize that was a bit vague. What I am asking is based on the model when we pass some input values as (x) into model.predict(x) and invert the scale. The value we are looking at is a predicted pollution value for the next 1 hour time stamp. Say for instance we wanted to predict every 30 minutes? We could simply update the CSV training date for time stamps at every minute??
    
    Reply
    - Jason Brownlee October 27, 2020 at 1:01 pm #
      
      Yes, you can frame the prediction problem anyway you like in terms of inputs and outputs.
      
      Reply
Hale November 3, 2020 at 12:34 pm #

Say we wanted to use this to do a multivariate binary classification prediction. Would it be as simple as changing the loss function from mae to binary crossentropy. Assuming that our target variable was binary?

Reply
- Jason Brownlee November 3, 2020 at 1:42 pm #
  
  Yes, here is an example:
  https://machinelearningmastery.com/how-to-develop-rnn-models-for-human-activity-recognition-time-series-classification/
  
  Reply
Dyka November 5, 2020 at 7:53 pm #

Hey Jason,

many many thanks for this incredibly useful example!

Your tutorials are awesome.

I please have a request.
Could you write a post, to predict the next n (n € IN) values of a feature based on the previous m timestamps of multiple input variables ?
In this post you did something similar, just that you used the previous m timestamps of multiple variables to predict the next (single) value of the pollution.
So what I request, is something like : used the 10 previous time steps of multiple features (pollution, dew, temp, press, wnd_dir, wnd_spd, snow, rain) to predict the next 4 values of the pollution.

Thank you in advance.

Reply
- Jason Brownlee November 6, 2020 at 5:54 am #
  
  You’re welcome.
  
  Perhaps start here:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
innnne November 6, 2020 at 2:02 pm #

Excuse me, I’ve got a question for two case scenarios that are varied a bit from what is mentioned in this blog : What changes should we made if
1. case 1: we want to deal with forward lag timesteps and also backward lag timesteps (e.g.: a case (row) in which each sample contains 3 hours backwards and 1 hour forwards, with 4 features )
2. case 2: we want to deal with forward lag timesteps and also backward lag timesteps , but this time a little bit more complicated: the forward ones as well as the current time only have 3 among the 4 features which the backward lag timesteps do.
(e.g.:
[var1(t-3),var2(t-3),var3(t-3),var4(t-3),var1(t-2),var2(t-2),var3(t-2),var4(t-2),var1(t-1),var2(t-1),var3(t-1),var4(t-1),var1(t),var2(t),var3(t),var1(t+1),var2(t+1),var3(t+1)]
)

Reply
- Jason Brownlee November 7, 2020 at 6:26 am #
  
  One solution would be to have a multi-input model, one head of the model for the lag obs, and another for the future obs.
  
  This will get you started with multi-input models:
  https://machinelearningmastery.com/keras-functional-api-deep-learning/
  
  Reply
innnne November 9, 2020 at 7:55 pm #

Many thanks！ I’ll have a look！

Reply
- Jason Brownlee November 10, 2020 at 6:39 am #
  
  You’re welcome.
  
  Reply
David November 13, 2020 at 10:39 am #

Hello Jason, great work. Donde you have any tutorial in using multiple time series forecasting for multiple time series?
e.g. use 4 ts as input and 2 ts as output

Reply
- Jason Brownlee November 13, 2020 at 12:46 pm #
  
  Yes many examples, start here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Jimmy November 20, 2020 at 4:08 pm #

Hey Jason, thanks a lot for this post.
I am having a trouble finalizing the model by getting the model to predict the whole data and compare the prediction to the actual data, specially several raw are taken away because of the Nan and the output doesn’t have a date time index. Can you provide an example of finalizing the model here?

Reply
- Jason Brownlee November 21, 2020 at 6:38 am #
  
  Perhaps this will help:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
Behnaz November 21, 2020 at 7:55 am #

Hi Jason , thank you for your great website. I’ve learned so much of your posts. These days I’m working on predicting stock market with covid data. Im going to do an analysis like you did in this post. My variables are the total number of active case and deaths. I did the windowing part but I have a doubt . In this post u include the previous value of pollution besides of other factors like wind etc. But I am thinking if I have to exclude the price of stocks for previous days from features after windowing or not.
Would you please help me to figure out if I have to keep the price for previous days or should I remove them

Reply
- Jason Brownlee November 21, 2020 at 1:03 pm #
  
  Thanks!
  
  Generally the stock market cannot be predicted:
  https://machinelearningmastery.com/faq/single-faq/can-you-help-me-with-machine-learning-for-finance-or-the-stock-market
  
  Reply
  - ching chong November 30, 2020 at 5:32 pm #
    
    kindly sir share your email address .
    
    Reply
    - Jason Brownlee December 1, 2020 at 6:16 am #
      
      You can contact me directly here:
      https://machinelearningmastery.com/contact/
      
      Reply
ching chong November 30, 2020 at 5:29 pm #

sir please tell me which commands of neural network are used for de facto one day ahead forecast in R?

Reply
- Jason Brownlee December 1, 2020 at 6:16 am #
  
  Sorry, I don’t have any examples of deep learning in R.
  
  Reply
Tomás December 2, 2020 at 3:04 am #

Hi Jason, thanks a lot for this tutorial!

Hi am trying to understand a simple question. If your goal is to predict pm2.5, why would you feed your model with multiple features?

I am developing a similar project, and I have already performed some feature analysis with PCA and correlation matrices, etc. I found out the best features and used them to as input features of my project, and also the feature I want to predict (such as pm2.5 in this case). After testing, I can conclude that the model performs better if I use just one feature as input and not multiple. So in this case, why would you feed your model with multiple features if you already have past measures of the exact variable you want to predict?

Thanks again for your work! You’ve helped me a lot

Reply
- Jason Brownlee December 2, 2020 at 7:49 am #
  
  The assumption is that the other features help to predict the target in some way, either directly or in aggregate.
  
  Reply
  - Tomás December 2, 2020 at 11:43 pm #
    
    Thanks a lot!
    
    Reply
    - Jason Brownlee December 3, 2020 at 8:19 am #
      
      You’re welcome.
      
      Reply
Rajesh Maddu December 7, 2020 at 3:48 pm #

Hi,

Can you please explain the Data Assimilation with a Machine learning perspective? Now a day, everyone was talking “Data assimilation offers an opportunity to blend the two approaches, hence providing a useful alternative framework for combining theory-based and data-based approaches”.

I have an LSTM ML model for my prediction problem. I have XX numerical model (theory-based) also.

Can you please explain how to combine these two and get a new framework?

Reply
- Jason Brownlee December 8, 2020 at 7:39 am #
  
  What is “Data Assimilation”? I have never heard the term.
  
  Reply
  - Rajesh Maddu December 11, 2020 at 7:24 am #
    
    Ensemble Kalman Filter is a Data assimilation method.
    
    Do you have any code samples on this topic?
    
    Reply
    - Jason Brownlee December 11, 2020 at 7:42 am #
      
      I do not.
      
      Reply
Sven December 12, 2020 at 7:59 am #

Hello Jason,
thank you for the great tutorials and examples. I really enjoy it and build my own LSTM multivariate models with your code as base. My models work with Keras 2.2.4. But if I program several loops there is a memory leakage. All hints from the internet do not help to free memory. After some loops the memory has an overflow.
I updated to Keras 2.4.3: no more memory overflow, but completely different result for my predictions. Do you have a hint what has changed between Keras 2.2.4 and 2.4.3 that has effect on the predictions?
Thank you, best regards
Sven

Reply
- Jason Brownlee December 12, 2020 at 1:23 pm #
  
  Sorry, I don’t think Keras has memory leaks.
  
  Do you mean, you run out of main memory? If so:
  
  Perhaps try progressive loading.
  Perhaps try an AWS EC2 instance.
  Perhaps try a smaller model.
  Perhaps try less training data.
  
  I hope that helps.
  
  Reply
ATW December 18, 2020 at 8:57 pm #

How are you supposed to make it work if you want multiple inputs and outputs specified in the series_to_supervised method? It doesn’t work because the scalar.fit_transform method is called before shaping the data to the amount of i/o. Also when I try multi-input(50) and univariate output(1) and fit it after to this data.shape( , 50, 1), the model.predicted values are all zero.

Reply
- Jason Brownlee December 19, 2020 at 6:16 am #
  
  The function will handle multiple inputs and outputs directly.
  
  Any scaling of variables should probably be performed prior to transforming the series to supervised learning.
  
  Reply
  - ATW December 21, 2020 at 11:28 pm #
    
    I don’t think you understand.
    
    You specify, scaled = scaler.fit_transform(values), before you call the series_to_supervised() method. Let’s say your dataset has 4 features and you specifiy 10 as the amount of steps in that method, that would make the dataset effectively (0, 40, 1).
    But after prediciting you have to inverse the set, and it expects the shape (4, 1) so it doesn’t work.
    How do we solve that, to make this project accept multiple previous time-steps and perhaps future timesteps aswell.
    
    Also, when I run the project in the normal state of the features it works and I get a good predicted output, but for some reason amidst the reshaping and inversing the 1 predicted timestep is appended to the last tuple instead of making a new one. How does that work?
    
    Reply
    - Jason Brownlee December 22, 2020 at 6:47 am #
      
      The scaler object must take data in the same format when transforming or inverse transforming. If you scale all inputs and outputs together and you are only interested in inverse transforming the target, you can pad the other columns with nonsense and focus on the result for the target column.
      
      Perhaps this will help you with data preparation:
      https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/
      
      Reply
      - ATW December 22, 2020 at 9:40 pm #
        
        What I ment is that in the code above you specify the shape for the normalization before you change the actual shape of the data. If you want to use the initial data specified in the fit_transform method then it works. But if you specify that you want to predict by taking more t- or t+ into consideration then that shape changed AFTER fitting and the prediction is off and moreover you can’t transform it back.
        I’ve tried reshaping data before normalizing before feeding it to the model but the predictions are off nonetheless. I’m not sure the model can predict a 1x t+50 based on 4x t-50 features.
        Do you think making a single step recursive method that feeds and retrains the model would work better rather than going at it this way?
      - Jason Brownlee December 23, 2020 at 5:35 am #
        
        Sorry, I don’t understand the problem you’re having with data preparation. Perhaps I’m not the best person to help you with it.
        
        Regarding the best model configration for your dataaset – I recommend testing many different framings of the problem, different models and different model configurations in order to discover what works best for your dataset.
Kevin December 20, 2020 at 9:35 am #

hi, thanksJason for wonderful post.

Have a question, if we want to 6 timesteps,(backward 3+ forward 3) for 8 features, how should we do it?

Reply
- Jason Brownlee December 20, 2020 at 10:47 am #
  
  This will help:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
ling December 21, 2020 at 9:11 pm #

Hi Jason,

I am confused about the output prediction results. If I want to predict a period of time (a continuous period of data results), how should I set the output parameters? Is it by modifying the step size?

predictions = model.predict(X, verbose=0)

Reply
- Jason Brownlee December 22, 2020 at 6:44 am #
  
  I think you’re referring to a multi-step forecast, if so start here:
  https://machinelearningmastery.com/faq/single-faq/how-do-you-use-lstms-for-multi-step-time-series-forecasting
  
  Reply
ling December 22, 2020 at 12:19 pm #

Thank you very much for your reply, I will try to use multi-step prediction to get the result. In addition, I would like to ask you, the longer the prediction time, the greater the error in the results obtained. Is there a good way to determine the relationship between the accuracy of the prediction result and the length of the prediction?

Reply
- Jason Brownlee December 22, 2020 at 1:38 pm #
  
  Yes, use a robust test harness and calculate the average error for each forecasted lead time over many samples.
  
  Reply
  - ling December 22, 2020 at 5:36 pm #
    
    hi,jason.thanks a lot!The first prediction result using the LSTM model has come out, and it is still very different from the actual result. At present, I try to train multiple times to get the average of different prediction results or other methods to minimize the error between the final prediction result and the monitoring result. I would like to ask you, what other good ways do you have to improve the accuracy of the prediction (currently the data in my experiment is two-month-hour data), do I need to increase the amount of data?
    
    Reply
    - Jason Brownlee December 23, 2020 at 5:30 am #
      
      The suggestions here will help you to improve the performance of your neural network model:
      https://machinelearningmastery.com/start-here/#better
      
      Reply
      - ling December 23, 2020 at 12:08 pm #
        
        thanks a lot !I hope I can ask you more questions about machine learning，good luck!
Balki January 5, 2021 at 11:40 am #

Hi Jason,

Thanks for the post. The scaling

scaled = scaler.fit_transform(values)

takes place on the entire dataset before it is split up into Train and Test datasets. Shouldn’t we use the scaler parameters obtained from the Train dataset to scale the Test dataset?

Thanks

Reply
- Jason Brownlee January 5, 2021 at 1:31 pm #
  
  Yes, ideally. I chose to scale all data up front to keep the tutorial simple and focused on the technique.
  
  Reply
Kiran January 12, 2021 at 3:41 pm #

Hey Jason! Thanks for the wonderful tutorial. I was just wondering if you could explain how a dense layer functions in a LSTM code.

Reply
- Jason Brownlee January 13, 2021 at 6:10 am #
  
  You’re welcome.
  
  The dense layer interprets the feature extracted by the LSTM layers and makes a prediction.
  
  Reply
Valdemar Sousa January 18, 2021 at 11:06 pm #

hello jason, your work is fantastic, i bought the time series book and i think it’s excellent.
I have a doubt, my problem is based on predicting the number of alarms, these alarms occur in different regions, we can say that they occur in different places and all with different behavior. I have about 4000 different places and I wanted to train the LSTM model to forecast alarms for each location. How would you do that? use the same LSTM model and add the “local” feature?because making a model for each region is unthinkable in this case.

Reply
- Jason Brownlee January 19, 2021 at 6:37 am #
  
  Thanks!
  
  Good question, this will give you some ideas:
  https://machinelearningmastery.com/faq/single-faq/how-to-develop-forecast-models-for-multiple-sites
  
  Reply
  - Valdemar Sousa January 25, 2021 at 11:15 pm #
    
    thanks for the reply jason.
    my idea was to take some sites, and create a ‘for’ cycle where each site dataframe goes through ‘model.fit’, so I could train different sites.
    does this approach seem correct? if i pass several dataframes through mode.fit does he train? or simply train the last website that passes?
    
    Reply
    - Jason Brownlee January 26, 2021 at 5:55 am #
      
      Perhaps try it and see if it is effective on your dataset with your chosen model/config.
      
      There is no general best approach, only the approach that works well for your project.
      
      Reply
Harvey Benjamin Smith January 28, 2021 at 12:14 pm #

I’m using this exact framework on a different multivariate dataset and it works fine up until the end when making the predictions. I trained the model fine but then on the line

yhat = model.predict(test_X)

I get error:

ValueError: Input 0 of layer sequential_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 8]

The dimensions of the data is the same as in your example

print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

(2774, 1, 8) (2774,) (694, 1, 8) (694,)

Reply
- Jason Brownlee January 28, 2021 at 1:40 pm #
  
  Perhaps the shape of the data does not match the expectations of the model.
  
  You can change the model or the shape of the data.
  
  Reply
  - Harvey Benjamin Smith January 29, 2021 at 12:19 pm #
    
    Thank you sir. It’s Strange I restarted the notebook and it worked. But now I’m not sure how to use this model. Where does it give the actual prediction for the next time step, the future, the next day?? Thanks
    
    Reply
    - Jason Brownlee January 29, 2021 at 1:30 pm #
      
      The example demonstrates how to evaluate the model.
      
      If you choose this model over others, you can fit a final model on all data and then call model.predict() to make predictions.
      
      More on final models here:
      https://machinelearningmastery.com/train-final-machine-learning-model/
      
      More on making predictions here:
      https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
      
      More on the basics of LSTMs for time series forecasting here:
      https://machinelearningmastery.com/start-here/#deep_learning_time_series
      
      I hope that helps.
      
      Reply
Mike February 2, 2021 at 10:14 am #

Hi Jason, as always thanks for your job.
Even taking a look at this code, I think ther’s a logic mistake, I may try to explain:

Let’s say I got 3 features:
“a” as temperature. “b” as pressure. “c” as humidity.
I want to predict the feature c at time (t) by providing a(t-1) and b(t-1).
When it comes to NaN values, you just suggest to remove the affected rows.

By the time they are time-correlated, i don’t think it’s the best approach…
Example:
DAY | a | b | c
2000-01-01 | 20 | 10 | 0.54
2000-01-02 | 23 | 12 | 0.52
2000-01-03 | 22 | 8 | 0.48
2000-01-04 | 20 | 8 | 0.47
2000-01-05 | 24 | 12 | 0.49

Let’s say the row in 2000-01-3 has NaN as “b” feature.
According to what you said, the new dataset looks like:

2000-01-01 | 20 | 10 | 0.54
2000-01-02 | 23 | 12 | 0.52
2000-01-04 | 20 | 8 | 0.47
2000-01-05 | 24 | 12 | 0.49

The row has been removed.
When the lstm learn, it will actually understand that row number 2 leads to row number 3.
So temperature: 23, pressure: 12 and humidity: 0.52 will forecast a humidity of 0.47.
Which is a mistake, because that row should not predict anything, by the time the row 2000-01-03 has been removed.
Isn’t that a mistake?

Thank you!

Mike

Reply
- Jason Brownlee February 2, 2021 at 1:22 pm #
  
  This is called a sliding window, and is just one approach to transforming a one or multiple time series into a supervised learning problem.
  
  You can learn more here:
  https://machinelearningmastery.com/time-series-forecasting-supervised-learning/
  
  Reply
  - Mike February 2, 2021 at 7:29 pm #
    
    Thank you, I have read that article but it just shows sliding window method.
    It doesn’t explain how to handle missing NaN “during” the dataset. Instead, it just says you need to remove the first and last rows according to your sliding window method (or lag choice).
    I was wondering, if I have multiple missing values within the dataset, should I always remove all the affected rows?
    Example:
    
    DAY | a | b | c
    
    2000-01-01 | 20 | 10 | 0.54
    2000-01-02 | 23 | 12 | 0.52
    2000-01-03 | 22 | 8 | 0.48
    2000-01-04 | NaN | 5 | 0.47
    2000-01-05 | 28 | 11 | 0.49
    2000-01-06 | 22 | 15 | 0.45
    2000-01-07 | 25 | 18 | 0.43
    2000-01-08 | 29 | 14 | 0.45
    2000-01-09 | 21 | 17 | 0.42
    2000-01-10 | 22 | 13 | 0.41
    
    Using “a(t-1)”, “b(t-1)”, c(t-1) to predict “c(t)”
    
    Should the dataset be:
    
    [NaN, NaN, NaN] – > [0.54] (needs to be removed)
    [20, 10, 0.54] -> [0.52]
    [23, 12, 0.52] -> [0.48]
    [22, 8, 0.48] -> [0.47]
    [NaN, 5, 0.47] -> [0.49] (needs to be removed)
    [28, 11, 0.49] -> [0.45]
    and so on…
    
    Is that approach correct when it comes to sliding window with lag=1?
    
    Reply
    - Jason Brownlee February 3, 2021 at 6:17 am #
      
      If you have missing data in your time series dataset, you have many options, such as:
      
      – remove those observations/rows/features
      – impute (statistical, knn, etc.)
      – persist prior value
      – masking input layer
      – etc.
      
      Perhaps try a few approaches and see what works well/best
      
      I have a ton of tutorials on this topic, perhaps try the search box at the top of the page.
      
      Reply
Varun February 10, 2021 at 9:13 pm #

Hi Jason,

Thanks for the brilliant post. I had a question regarding removing trends and seasonality. At what step do we remove them and add them back?

In my opinion when you detrend/deseasonalize it first, do feature engineering, put it in walk forward model. Evaluate data. Forecast it and then do inverse of the detrending/deseasonlizing that we did. I am not sure if its the right way to do it. Let me know what do you think?

-Varun

Reply
- Jason Brownlee February 11, 2021 at 5:54 am #
  
  You’re welcome.
  
  Good question, see this tutorial:
  https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/
  
  Reply
Hassan February 18, 2021 at 5:26 pm #

Hi Jason,

you scaled the data first before splitting it into test and training sets. Wouldn’t it make more sense to split it first, fit the scaler to the training data and then apply the scaler to the test data? This way there won’t be any information leakage.

Regards,
Hassan

Reply
- Jason Brownlee February 19, 2021 at 5:56 am #
  
  It was to keep the example simple.
  
  Ideally you fit the scaler in the training set only, then apply it to train and new data to avoid data leakage:
  https://machinelearningmastery.com/data-preparation-without-data-leakage/
  
  Reply
Rogelio Triviño González February 20, 2021 at 11:08 pm #

hi:
plot prediction (inv_yhat) and inv_y I detected a lag between both series.
This fix the lag and decrease from 26 to 5 the RMSE:

inv_yhat = np.append(inv_yhat[:,0], 0)
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = np.append(0, inv_y[:,0])
# calculate RMSE
rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
print(‘Test RMSE: %.3f’ % rmse)
pyplot.plot(inv_y, label=’inv_y’)
pyplot.plot(inv_yhat, label=’inv_yhat’)
pyplot.legend()
pyplot.show()

Reply
- Jason Brownlee February 21, 2021 at 6:13 am #
  
  The lag is a sign of poor performance:
  https://machinelearningmastery.com/faq/single-faq/why-is-my-forecasted-time-series-right-behind-the-actual-time-series
  
  Shifting the plot is cheating.
  
  Reply
  - Rogelio Triviño González February 28, 2021 at 9:00 pm #
    
    you are totally rigth, I discovered myself and came here to fix my comment, it seems I can’t
    
    Reply
    - Jason Brownlee March 1, 2021 at 5:34 am #
      
      I’m happy to hear that you’re making progress!
      
      Reply
William Xu February 25, 2021 at 1:16 pm #

Hi Jason,
Thanks for your great post. I have a scenario that have two highly related time series. For example, in this post we have a Beijing air pollution sequence with multiple variables, suppose I have another sequence like a nearby city’s (say Shanghai’s) air pollution data, also have similar multiple variables, what should I deal with this case that predict two city’s future pollution data?
I suppose there are two approaches. First treat them as two seperate problem and estimate the the two models independetly, which looks vary naive and does not fully utilizes the data. Second, estimate the two targets by utilizing one model, which seems very convincing but how can we implement it?

Another question, I learned that in many DL model ‘learning rate’ is a very important hyper-parameter to tune, but there is no such parameter in your lstm example, is there any special reason for that?

Thanks and regards.

Reply
- Jason Brownlee February 26, 2021 at 4:54 am #
  
  Good question, this will give you some ideas:
  https://machinelearningmastery.com/faq/single-faq/how-to-develop-forecast-models-for-multiple-sites
  
  Yes, you can tune learning rate for LSTMs just like any other deep learning model.
  
  Reply
  - William Xu February 26, 2021 at 12:47 pm #
    
    thanks for your reply. I’ll have a try.
    
    Reply
William Xu February 26, 2021 at 10:23 pm #

Hi Dr. Jason,
I find s a time shift phenomenon in the final results. Run this code below your code to show the shift problem:

pyplot.plot(inv_y[:100], label=’real’)
pyplot.plot(inv_yhat[:100], label=’predict’)
pyplot.legend()
pyplot.show()

It shows that the prediction always lag one step for the real value. I try to find the reason but no conclusion yet. Would you please tell me is this the right phenomenon?

Thanks for your time.

Reply
- Jason Brownlee February 27, 2021 at 6:03 am #
  
  It is a sign of a bad prediction, not a bug in plotting, you can learn more here:
  https://machinelearningmastery.com/faq/single-faq/why-is-my-forecasted-time-series-right-behind-the-actual-time-series
  
  Reply
William Xu February 26, 2021 at 10:52 pm #

By the way, if I just utilize the pollution of previous day to predict current day’s pollution. It seems that the RMSE is 26.56. Almost the same as the lstm results. Should I conclude that the model used in the post is almost useless?

Y_original = (dataset[‘pollution’].values)[n_train_hours+1:]
Y_predict = (dataset[‘pollution’].values)[n_train_hours:-1]
sqrt(mean_squared_error(Y_original, Y_predict))

Thanks for your time.

Reply
- Jason Brownlee February 27, 2021 at 6:03 am #
  
  Perhaps not well tuned, it’s just a worked example.
  
  Reply
Rea March 4, 2021 at 9:24 pm #

Good evening Dr. Jason,

Congratulations for your great job. I have a question about your above codes. In the first example, when we use the previous hour to predict the next, we drop the columns we don’t want to predict.
reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)
print(reframed.head())

But in the second code why don’t we drop the columns we don’t want to predict?

Thank you in advance…

Reply
- Jason Brownlee March 5, 2021 at 5:34 am #
  
  Thanks!
  
  We drop the columns we don’t want to predict so we keep the columns we do want to predict.
  
  Perhaps I don’t understand your question?
  
  Reply
Rea March 7, 2021 at 11:36 am #

I’m sorry, I may not have expressed it well. What I want to say is that in the second code,
”Train On Multiple Lag Timesteps Example”:
n_hours = 3
n_features = 8
# frame as supervised learning
reframed = series_to_supervised(scaled, n_hours, 1)

there are not the lines:

# drop columns we don’t want to predict
reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)
print(reframed.head())

why we don’t drop the columns we don’t want to predict now?

Reply
- Jason Brownlee March 8, 2021 at 4:39 am #
  
  Because we are loading the version of the dataset that we saved earlier “pollution.csv” where the dataset has already been prepared, not the raw dataset.
  
  Reply
Anshuka Anshuka March 8, 2021 at 5:49 pm #

Hi Jason,

I am a bit confused with this part of the code.

# drop columns we don’t want to predict
reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)
print(reframed.head())

I don’t understand what colums we are dropping, as the transformed data sets do not have columns 9-15 to begin with ?

Reply
- Jason Brownlee March 9, 2021 at 5:15 am #
  
  It removes the columns that we do not want to predict from the transformed dataset.
  
  Perhaps start with this simpler tutorial on how to prepare data for modeling:
  https://machinelearningmastery.com/time-series-forecasting-supervised-learning/
  
  And this:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  Reply
engimp March 10, 2021 at 7:15 am #

Hi Jason, your books and blog posts are wonderful.Would you be so kind and could extend the example, code to predict not only air pollution, but air pollution, temperature and pressure at the same time. Thank you very much, kind regards engimp, Berlin

Reply
- Jason Brownlee March 10, 2021 at 2:00 pm #
  
  Thanks!
  
  Good suggestion, thanks.
  
  The examples here might help as a first step:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Furqan Nasir March 19, 2021 at 1:59 pm #

Hi Jason

I have a question in that multivariate example you predict one feature with the help of multiple features.

Can we predict multiple features on the basis of their previous value?

For example data-set is like

Date N1 N2 N3 N4 N5 RB XB
01/02/2020 20 14 17 37 64 24 0

now can we predict N1,N2,N3,N4,N5,RB,XB all of them on the basis of their previous values ?
If yes how ?

Reply
- Jason Brownlee March 20, 2021 at 5:16 am #
  
  Yes, you can see an example in this tutorial:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Pratik Gehlot April 1, 2021 at 7:56 pm #

How it is actually working, why haven’t you applied split on the dataset to do X = [all features] and y = [target] variable, how does the model know I need to predict pollution

Reply
- Jason Brownlee April 2, 2021 at 5:37 am #
  
  We defined the problem explicitly – e.g. we prepared the X and y data based on the inputs we wanted to use and the output we wanted to predict. The model just learned how to map examples of input sequences to examples of the output.
  
  Reply
Rouzbeh April 12, 2021 at 3:46 am #

Hello,

I wanted to know if, for instance, we need to predict at the time of t+m (instead of t+1) what we should do?
All I found was to predict t+1.

Thanks,

Reply
- Jason Brownlee April 12, 2021 at 5:11 am #
  
  You can define the data for the model anyway you like, the model will learn the problem the way you frame it. So, start with the framing of the problem you want to solve.
  
  Reply
  - Rouzbeh April 12, 2021 at 10:40 am #
    
    Thanks for the answer. Is it possible to refer me to an example? Thank you
    
    Reply
    - Jason Brownlee April 13, 2021 at 6:02 am #
      
      Perhaps start here:
      https://machinelearningmastery.com/start-here/#deep_learning_time_series
      
      Reply
Juan Moreno April 17, 2021 at 8:40 pm #

Hi jason,
thanks for this great post. One question, perhaps raised before: you preprocess the data before splitting train and test. Isn’t that incorrect?. Doesn’t this bring “data leak” to the model?.
Thanks again

Reply
- Jason Brownlee April 18, 2021 at 5:54 am #
  
  It is, I often do this in tutorials to focus attention on the model and make the code easier to read.
  
  See this for good practices for avoid data leakage:
  https://machinelearningmastery.com/data-preparation-without-data-leakage/
  
  Reply
Sasmitoh Rahmad Riady April 19, 2021 at 2:16 am #

Dear Jason Brownlee

Is the Multivariate Time Series for LSTM tutorial already using Teacher Forching,

if so, where is the teacher forching?

Please for explanation

Reply
- Jason Brownlee April 19, 2021 at 5:53 am #
  
  Yes. I use teacher forcing in almost all cases.
  
  Reply
  - He Zheng May 4, 2023 at 12:22 am #
    
    How do we do this withOUT teacher forcing (because we don’t have the data) I tried dropping the first column (target/pollution at t-1), obviously the prediction is very bad, but I cannot be sure if the model simply takes the next column (originally column 1, but now column 0 in place of target t-1 column that was dropped) and use that as feedback? How do I make the model not use teacher forcing feedback?
    Thank you
    
    Reply
    - James Carmichael May 4, 2023 at 6:33 am #
      
      Hi He Zheng…Time series forecasting problems are typically reframed as “supervised learning” problems as opposed to unsupervised learning.
      
      The following resource explains this concept:
      
      https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
      
      Having said that however, the following discussion may provide some additional insight:
      
      https://www.quora.com/What-unsupervised-machine-learning-techniques-can-I-use-for-time-series-forecasting-Data-is-2D-date-and-value
      
      Reply
Sasmitoh Rahmad Riady April 19, 2021 at 7:15 am #

Thank you very much for the response,

Sorry, can you show me which side of the code is using Teacher forcing,

Your feedback really helps me explore your tutorial

Reply
Tariq April 20, 2021 at 2:48 am #

Hello ,

Thank you for this tutorial , I have a question , I work in a forecasting project, and I use LSTM just Vanilla, and I want to compare the forecating errors by using Univariate and multivariate, the problem is I think the forecasting in multivariate case must be more accurate than univariate but I got the same results (not 100% ) , if that the case what you think the problem will be ? is the variate that I use in multivariate forecasting have some errors or something else ?

Thank for your answer

Reply
- Jason Brownlee April 20, 2021 at 6:01 am #
  
  I do not recommend using accuracy for regression problems:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-calculate-accuracy-for-regression
  
  Generally, these tips will help improve the performance of your model:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
Tariq April 20, 2021 at 10:55 am #

Yeah I use RMSE and MAE but i like to know if it’s normal to have RMSE lower in univariate than multivariate

Reply
- Jason Brownlee April 21, 2021 at 5:51 am #
  
  It really depends on the specific data and model.
  
  Reply
Abraham Rodarte April 22, 2021 at 9:02 pm #

Hello, thank you very much for all the information.
I would like to know how I could make the prediction of 3 features from a dataset?, since if you used the same code it returns an error in the shapes,
For my example, 6 features are entered and 3 are predicted.

Reply
- Jason Brownlee April 23, 2021 at 5:02 am #
  
  You’re welcome.
  
  Perhaps these simpler examples will help you get started:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Arya April 25, 2021 at 4:38 am #

Hi Jason, your posts are amazing!.
in this topic you mentioned a case:
“Predict the pollution for the next hour based on the weather conditions and pollution over the last 24 hours.”
My question is that have you covered this method in your books or posts?
In case we are at time t, and want to predict n future values, , can we use LSTM?

Reply
- Jason Brownlee April 25, 2021 at 5:18 am #
  
  Thanks.
  
  I don’t have a tutorial on exactly that, but the tutorials here will help to get you started:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
  - Vedant May 4, 2021 at 10:06 pm #
    
    Hello sir
    Your post are amazing and really helpful.
    I am trying to make lstm for a Multivariate timeseries problem. I took the time step for past is 30 and trying to forecast for next 15 and 30 min but the model is replacing the values at t time to the forecast.
    Please tell what i need to improve?
    
    Reply
    - Jason Brownlee May 5, 2021 at 6:11 am #
      
      Thanks!
      
      Perhaps try alternate data preparation, alternate models, and alternate model configurations in order to discover what works well or best for your dataset.
      
      Reply
Evan Prianto May 3, 2021 at 12:26 am #

I Try to check the score orf the result with this code

def print_scores(test,predictions):

mfe = stat.mean(test-predictions)
mad = mean_absolute_error(test, predictions)
ts = sum(test-predictions)/mad
rmse = sqrt(mean_squared_error(test, predictions))
mape = mean_absolute_percentage_error(test, predictions)
print(‘Test MFE: %.3f’ % mfe)
print(‘Test MAD: %.3f’ % mad)
print(‘Test TS: %.3f’ % ts)
print(‘Test RMSE: %.3f’ % rmse)
print(‘Test MAPE: %.3f’ % mape)

and then I call this function by this code

import statistics as stat
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from math import sqrt
from sklearn.metrics import mean_absolute_percentage_error
print_scores(inv_y, inv_yhat)

the result of MAPE is not good.

Test MFE: 0.843
Test MAD: 13.566
Test TS: 1088.583
Test RMSE: 26.727
Test MAPE: 1832701736779776.000

btw why this is happen?

Reply
- Jason Brownlee May 3, 2021 at 4:57 am #
  
  I don’t know sorry.
  
  Reply
Fidelis C. OBODOEZE May 5, 2021 at 1:16 am #

Dear Jason,

thanks for your wonderful tutorials.

I ran the complete code in spyder and jupyter Notebook and I received the following ERROR message, nevertheless all the previous codes ran and produced good results:

File “C:\Users\HP\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py”, line 848, in __array__
” a NumPy call, which is not supported”.format(self.name))

NotImplementedError: Cannot convert a symbolic Tensor (lstm/strided_slice:0) to a numpy array. This error may indicate that you’re trying to pass a Tensor to a NumPy call, which is not supported

Reply
- Jason Brownlee May 5, 2021 at 6:13 am #
  
  You’re welcome.
  
  I recommend not using a notebook or IDE:
  https://machinelearningmastery.com/faq/single-faq/why-dont-use-or-recommend-notebooks
  
  Reply
Daniel May 13, 2021 at 10:20 pm #

Hello, Thank you so much for this material.
One question, can this model be applied to forecast the temperature for the next 24 hours having enough data?
Thank you.
-Daniel

Reply
- Jason Brownlee May 14, 2021 at 6:25 am #
  
  Sure, you will have to modify the model to either make 24h predictions and re-train it, or use the model recursively.
  
  See this:
  https://machinelearningmastery.com/multi-step-time-series-forecasting/
  
  Reply
Joshua May 14, 2021 at 8:01 am #

Hello Jason, thanks for this amazing post.

I have a question about. I have a dataset which is very similar with this example. I am planning to use, Keras Functional API and feed model with 2 dataframes. my first dataframes include temperature, humidity etc. and I prepared t-4, t-3, t-2, t-1 and t dataframe just using previous air pollution data. And I also want to predict air pollution. Then after training I will predict the test dataset one by one and I will also use current prediction as an input of next prediction.

So previous t-3 is not t-4, t-2 is now t-3 ………. and current prediction is not t-1 for next prediction.

Is it a good idea ? Actually, I have already made it and the results are very good but I am just suspicious about ı am using air pollution to predict air pollution but in the example you used other features to predict air pollution.

Thank you!

Reply
- Jason Brownlee May 15, 2021 at 6:25 am #
  
  Not sure I follow.
  
  Generally, if the model is only using data that is reasonably available at prediction time to make predictions (e.g. is not cheating/leaking data), and the model gives a good result, then go for it.
  
  Reply
Mariana Costa May 21, 2021 at 12:42 am #

Hello. Thank you very much for this. I’d like to ask about validation set… when you use the test set to validate and then also to predict, that probably won’t generalize, right? What about splitting train into train/validation? Even when using walk forward validation?
What I’m really asking is, does it bias the performance of the predicted data if we use that same data to validate when training? Or I shouldn’t worry much about it?
Thanks in advance

Reply
- Jason Brownlee May 21, 2021 at 6:01 am #
  
  You’re welcome.
  
  Using validation sets with time series and walk forward validation is challenging, perhaps intractable. I don’t do it.
  
  Reply
  - Mariana Costa May 21, 2021 at 6:28 pm #
    
    Could you send some reference to that, if you have?
    
    Reply
    - Jason Brownlee May 22, 2021 at 5:32 am #
      
      No, it is from experience.
      
      Reply
E A June 1, 2021 at 10:51 pm #

Hi Jason,
I’m looking for some help with a model similar to this one but instead of one sensing station something like 1000 and the time samples are once a month for 5 years.

What would you suggest be the appropriate approach to train this model?
Can you direct me to an article that have done such things?

Reply
- Jason Brownlee June 2, 2021 at 5:43 am #
  
  I recommend evaluating a suite of data preparation, models and model configurations in order to discover what works well or best for your dataset.
  
  Reply
Elmer June 2, 2021 at 12:00 pm #

Hi Jason,

First of all thanks for the series wonderful machine learning model tutorials.

And I have a few questions related to the multivariate and multi-step LSTM model, hope you could point me to the right direction as I am so struggling with the current issue.

I have successfully modified the air pollution model with my dataset, with feeding 5 input variables to the LSTM model and get 1 output. I understand that I am using 5 variables to predict one of the variables. Now, I want to use 5 variables to predict these 5 variables in the next timestamp, so I remove the data frame column drop line in the code, and change the training and its label to the correct size (which is 5), also I change the dense() to 5 as well. However, the output is not what I expected.

Because the 5 input variables are related with each other, so each of the output variable should be predicted from the 5 input variables, I am confused is what I am doing right? I saw from your other tutorials for the multivariate and multioutput LSTM mode, but in the tutorial, each output variable is predicted only from one input variable which means the input variables are not related with each other so I couldn’t proceed with it.

Any help will be really appreciated, thanks!!!

Reply
- Jason Brownlee June 3, 2021 at 5:27 am #
  
  You’re welcome.
  
  This sounds like a multivariate input and multivariate output, the example here will get you started:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Carolina June 2, 2021 at 8:29 pm #

Hello! Thank you for all your posts and explanations, makes everything easier.

I tried to implement your example in my context, but I do not understand the following code, is it possible to explain why we reshape to 0 and 2 and not 0 and 1 in here?

test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))

Also, it is really necessary to convert to supervised_learning? Could we use the original data frame already preprocessed to train and fit?

Thank you!

Reply
- Jason Brownlee June 3, 2021 at 5:35 am #
  
  This will help:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Yes, data must be transformed into samples with input and output components.
  
  Reply
Jishan Ahmed June 6, 2021 at 3:14 pm #

Nice examples! Do you have any work on Multivariate time series classification? Most of the examples I have seen in the literature did not consider the class imbalance. I wanted to use time series classification models to analyze the highly imbalanced Backblaze Hard Drive Data. Each day in the Backblaze data center, they take a snapshot of each operational hard drive. The daily snapshot of one drive is one record or row of data. SMART features are associated with the hard drive failure. SMART features corresponds to the temperature Celsius (TC), reallocated sector count (RSC), power-on-hours (POH), and the spin-up time (SUT) of hard disk. If any one of these attributes triggers i.e. exceeds certain threshold values, the drive is considered a failure. In the failure column of the datasets, 0 represents healthy drive, and 1 means failed drive. I wanted to classify failure or healthy disk based these SMART features as well as timestamps. I appreciate your suggestions! Thanks!

Reply
- Jason Brownlee June 7, 2021 at 5:18 am #
  
  Thanks.
  
  You can use a cost-sensitive version of the model for class imbalance:
  https://machinelearningmastery.com/cost-sensitive-neural-network-for-imbalanced-classification/
  
  Reply
  - Jishan Ahmed June 8, 2021 at 12:06 am #
    
    Cost-sensitive version of the NN model doesn’t integrate the time series, but my data has time stamps!
    
    Reply
    - Jason Brownlee June 8, 2021 at 7:17 am #
      
      Perhaps try it and compare results, it will impact the loss function during training.
      
      Reply
llinet June 7, 2021 at 10:28 pm #

Hi:
I have a doubt,do you used for test the model the same data that you ese for validated the model.

Reply
- Jason Brownlee June 8, 2021 at 7:16 am #
  
  No, we use walk forward validation:
  https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
  
  Reply
Ankit Sekseria June 14, 2021 at 11:22 pm #

Hi,

Thank you for all the material Jason.

I have few questions regarding the scaling of the data and testing the model.

I see in the above codes you have scaled the entire data and then split the data into train and test.
According to my limited knowledge I believe the test data is something which is a real world data and should not be altered. But here we are actually scaling it based on the means and standard deviation of the entire data.

Shouldn’t we just scale the training data and then use means and standard deviation we get after scaling the data to transform the test data.?

Reply
- Jason Brownlee June 15, 2021 at 6:06 am #
  
  Yes, I scaled all data to keep the example simple. We do not want to do that in practice because of data leakage.
  
  This may help:
  https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/
  
  Reply
Geollan June 16, 2021 at 1:35 pm #

I have a question, in the example, you want to predict pollution, but train_X also contain the pollution. It does do a great job to predict the test_X. But if we want to predict the future and I don’t have the pollution value, I think it can not work

Reply
- Jason Brownlee June 17, 2021 at 6:12 am #
  
  Yes, it means you need to frame the problem based on the input data you will have at prediction time.
  
  Reply
Irene June 18, 2021 at 5:59 pm #

(first of all, I’m not that fluent in English..
so, if my expressions are awkward, please excuse.. )

I’m very thank you for your wonderful article. All your posts are very helpful for me, beginner at Neural network.

I have a question for this post.

I’m trying forecast ‘multivariate time series’.
after I follow this post, my results which is pricing forecast is so accurate..
So I wonder am I right..

My process are follows..
> Dataset has 85 features(Xs) and 1 y(which I like to predict)
and I like to predict “y(t), y(t+1), …, y(t+365)”
1) convert dataset as “series_to_supervised(scaled, 1, 1)”
and remove columns 85Xs for time t (like you mentioned)
2) split into train/validation/test set with portion 60/20/20 (here, size of test is 366 for my case)
3) run “model.fit(train_X, train_y, epochs=epochs, batch_size=batch_size, validation_data=(valid_X, valid_y), verbose=2, shuffle=False, callbacks=[earlystopping, model_check])
4) predict with “model.predict(test_X)”

My intention is to “predict post 366(times) y with no information for Xs that time period(after t)”
I think, cause I removed 85Xs after time t, it means there are no information Xs after t..
But prediction results is so accurate then I suspicous for my theory(I didn’t use Xs inform after t) could be wrong..

is there a misunderstood for my thought??

I hope you are understand my question..
and I will be appreciate if you don’t mind my long question.

Thank you

Reply
- Jason Brownlee June 19, 2021 at 5:50 am #
  
  It is impossible to say what process is best or what algorithm/config will work well or best.
  
  I recommend that you start with a robust test harness for evaluating models on your problem, then evaluate a suite of methods and discover what works well or best.
  
  Generally, early stopping is not compatible with evaluating time series forecasting models using walk-forward validation.
  
  Reply
Sam V July 3, 2021 at 4:52 am #

Hi Jason, Thanks for the Wonderful tutorial. sorry for my lack of understanding – I’m a newbie: I have a similar dataset for 3 years hourly data with carbon flux (like pollution here) and other 6 columns including temp, moisture etc. I would like to use the full 3 year data for training and preparing the model which I plan to use for predicting the future 1-2 years. I can then compare that with incoming experimental data. How do I tweak the code and go about this? Thanks in advance.

Reply
- Jason Brownlee July 3, 2021 at 6:11 am #
  
  You’re welcome.
  
  Perhaps these models will help you to get started:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  And these tutorials on model tuning:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
Fahmid Shibib July 10, 2021 at 1:41 pm #

Hi Jason,
Thanks for the tutorial. I am newbie here, so I was wondering how I would get the prediction for the next hour as discrete value that I can use from this script?
The output seems to be a graph.

Also, I am trying to create a bid estimator as my project. I want to train a model based on previous bids. However, each bid also depends on certain features. Will this bid estimating system work with the same concept of your example here?

I ask because the features for the bidding system does not depend on its previous values. It depends on what the customer wants which I will be providing as an input. The bid estimator should then use my inputs and use a trained model to give me an estimate.

If your example is not a good match with what I want to achieve, what topics should I look for to achieve this goal?

Thanks!

Reply
- Jason Brownlee July 11, 2021 at 5:38 am #
  
  You can call model.predict() to make a prediction, this will help:
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  I recommend evaluating a suite of models in order to discover what works well or best for your specific dataset.
  
  Also, perhaps this will help:
  https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
  
  And this:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
Giovanni July 11, 2021 at 11:41 pm #

Hi, Jason!

Thanks for the tutorial!

I tried to adapt this to my datasets, but it looks like my predictions are so much smoother than it it should be. The LSTM prediction does not hit the peaks that exist in the original dataset. Do you have any idea what I can do to improve the model?

Reply
- Jason Brownlee July 12, 2021 at 5:49 am #
  
  Perhaps some of these suggestions will help:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
Alex July 16, 2021 at 1:18 pm #

Hi, did you float the date column? I’m getting a bit of an error. I keep getting either “TypeError: float() argument must be a string or a number, not ‘Timestamp'” or “could not convert string to float for ”

Any advice on how to fix this?

Reply
- Jason Brownlee July 17, 2021 at 5:18 am #
  
  Typically the date/time columns is removed from the data as part of data prepartion.
  
  Reply
  - Alex July 22, 2021 at 1:05 pm #
    
    Thanks, I realized I skipped over the line where the date is removed and indexed instead
    
    Reply
    - Jason Brownlee July 23, 2021 at 5:45 am #
      
      No problem.
      
      Reply
Alex July 22, 2021 at 1:02 pm #

Say I wanted to predict to 2 weeks out, how would I edit the modeling section to predict more than an hour out?

You are predicting one hour, so is that the 1 in reframed = series_to_supervised(scaled, n_hours, 1)?

The data I am hoping to applying some of these methods to have 5 lags in a day and we are wanting to predict 2 weeks out. Would it be to just sepcify 70?

Reply
- Jason Brownlee July 23, 2021 at 5:45 am #
  
  This is called a multi-step forecast, there are many ways to achieve this. Perhaps start here:
  https://machinelearningmastery.com/multi-step-time-series-forecasting/
  
  Reply
Benny August 3, 2021 at 5:12 am #

Hi Jason,

Loving this tutorial so far. I do have a question though:
I understand you are predicting just for pollution. Where exactly in the model section is that specified? I know you create 1 neuron for the output, but when building the model which argument specifies that this will be the pollution output and all other features are inputs?

Reply
- Jason Brownlee August 4, 2021 at 5:10 am #
  
  It is specified in the data provided to the model during training.
  
  Reply
  - Benny August 4, 2021 at 1:23 pm #
    
    Thank you! I just read back and saw I’ve been specifying to remove all varN(t) when I need to keep var1(t) for the output. I should be getting the same results as you now.
    
    Reply
    - Jason Brownlee August 5, 2021 at 5:14 am #
      
      Happy to hear that.
      
      Reply
Eduardo August 18, 2021 at 7:57 am #

Hi Jason,

Nice tutorial.

I mean, this appears to simply be predicting y(t+1) = y(t).
Why do not just take the actual pollution and try to predict it?

Reply
- Adrian Tam August 18, 2021 at 11:56 am #
  
  Why not use actual pollution and to predict: Because pollution depends on many factors. Rain or not, windy or not, temperature, etc. can change the pollution index. Hence the LSTM network is to figure out the relationship amongst these.
  
  Hope this can help you better understand.
  
  Reply
LinhDo August 23, 2021 at 3:38 pm #

Hi Jason,

Very helpful tutorial.

How can I predict the rest 7 variables using the same inputs as the examples? I mean, other than pollution, I also want prediction for the other 7 variables as well. How can I do that?

Reply
- Adrian Tam August 24, 2021 at 8:22 am #
  
  Surely you can. The neutral networks, LSTM included, can be modified to output not only a value, but a vector of values. In that case your can predict many variables at once. But at the same time, you increased the complexity of the problem and you may want a bigger network (because you now should have more states to remember in the LSTM), and with a bigger network, you may also need more data to train it for an acceptable accuracy. So better experiment before conclusion.
  
  Reply
Don September 9, 2021 at 5:01 am #

Hi Jason,

Just say I have 5 x variables that help predict a y variables and these are all ordered by time. If I wish to use LSTM to train this model, what changes would I have to make to the example here? E.g. train model on N datapoints, then try to predict the N+1 y variable using the N+1 (5 x variables).

Thanks,

Don

Reply
- Adrian Tam September 9, 2021 at 5:07 am #
  
  Yes, that sounds correct.
  
  Reply
Peter Steele September 13, 2021 at 7:27 pm #

As Songbin Xu pointed out, your calculating RMSE incorrectly. You are comparing the datapoint for time t to the prediction for time t+1. Which results in a much higher RMSE, because the result is almost always going to be wrong.

rmse = sqrt(mean_squared_error(inv_y[:-1], inv_yhat[1:]))

This will give the correct RMSE.

You have not corrected this error, despite the “update”.

Reply
JJ September 19, 2021 at 11:28 am #

Hi Jason, great article. I am confuse at the last part on prediction. To predict, say 14 days into the future, wouldn’t I need to apply a loop to predict based on previous day data? Which means if I predict day 1, I will take the last data point in the available dataset, then to predict day 2, I will take the predicted day 1 value as the input to predict and so on. In this example, I do not see this other than calling a predict function which I don’t think is right.

Reply
- Adrian Tam September 20, 2021 at 2:29 pm #
  
  Indeed you’re right. That is the common way to do prediction deep into the future.
  
  Reply
JJ September 21, 2021 at 11:44 am #

How can I do this? Is there an article that clearly show how to do it for multivariate? Thanks!

Reply
- Adrian Tam September 23, 2021 at 2:57 am #
  
  The example here is multivariate. Do you see something not answering your question?
  
  Reply
SAMUEL October 10, 2021 at 2:51 am #

Hello Jason. Which software do you use for your articles? I like how you embed the code with the text. I mean, how do you put the code in here with different viewing options.

Reply
- Adrian Tam October 13, 2021 at 5:56 am #
  
  Try this out: https://wordpress.org/plugins/urvanov-syntax-highlighter/
  
  Reply
edward October 18, 2021 at 10:29 pm #

Hello Dr.Jason,
I am using your code for some research, how do I split the data into train, test and validation set . if I want to use the same method as you have done. Thank you

Reply
- Adrian Tam October 20, 2021 at 9:46 am #
  
  Easiest way is to prepare the data into a big matrix, then run train_test_split() function from scikit-learn.
  
  Reply
edward October 20, 2021 at 9:01 pm #

Thank you very much, I was thinking there is another method similar to the above. I will do that.

Reply
Madelaine November 11, 2021 at 9:15 pm #

Hello Sir,

I’m trying to use your method on other research. My data is similar to yours. The dataset has 13 columns. After running the ‘series_to_supervised’ function, I got 26 columns.

var1(t-1) var2(t-1) var3(t-1)… var13(t-1) var1(t) var2(t) var3(t) …var13(t)

The data of each varX(t-1) are the same as the varX(t), and I can’t find the output variable. Do you know the reason?

Thanks

Reply
- Adrian Tam November 14, 2021 at 2:01 pm #
  
  The default n_in=1 and n_out=1 says your input are varX(t-1) and output are varX(t) but if they are the same, probably that’s your data looks like so?
  
  Reply
Guilherme Carvalho December 10, 2021 at 6:12 am #

Hello Sir,

I’m trying to use your method on other research. But I’m encountering an error when performing scaling.

ERROR:

—————————————————————————
ValueError Traceback (most recent call last)
in ()
—-> 1 inv_yhat = scaler.inverse_transform(inv_yhat)

/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_data.py in inverse_transform(self, X)
527 )
528
–> 529 X -= self.min_
530 X /= self.scale_
531 return X

ValueError: operands could not be broadcast together with shapes (1561,11) (6,) (1561,11)

Reply
- Adrian Tam December 10, 2021 at 7:29 am #
  
  Whenever you see this shape error, you should check the input data shape and the input layer’s specified shape. They must match to work.
  
  Reply
- Akanksha February 21, 2022 at 2:17 am #
  
  Hi. Did you get the solution?
  
  Reply
Jeff G January 3, 2022 at 5:39 am #

Hi Jason, thank you for this post. For the multivariate case I had one question regarding interactions between variables at each time step. For example, if forecasting the performance of a player in a future sports game based on their last 10 games, but they have played 9 of those last 10 games at their ‘Home’ venue (which will slightly inflate that player’s statistics in those time steps). Could we simply feed the model a 0,1 indicator for home/away to solve this? I am picturing the yhat(t), yhat(t+1), yhat(t+2), … predictions at each step incorporating this indicator to calibrate the other statistics (e.g., having 12 shots & being at home results in a similar expectation to 10 shots & being away at any given lagged time step). Thanks!

Reply
- James Carmichael January 3, 2022 at 8:33 am #
  
  Hi Jeff…You may want to consider two models, one for home and one for away and simply let the LSTM learn the unique features from each.
  
  Regards,
  
  Reply
Jeff Goeree January 5, 2022 at 6:19 am #

Thanks James. I think I was hoping for something more general, which could also extend to something like a season change, or team change. In the case of a season change, there is a long break in time in-between time steps and a player may have improved/declined based on their age and off-season routine. Wondering if LSTM could handle this natively or if the data would need to be engineered beforehand.

Reply
- James Carmichael January 5, 2022 at 6:49 am #
  
  Hi Jeff…LSTMs would be a great option if the data is truly a time-series. If there are small gaps in the data you may want to use ARIMA, CNN or LSTM to predict the missing data in between the contiguous time periods.
  
  Regards,
  
  Reply
Nicolás January 5, 2022 at 8:37 pm #

I have an important conceptual doubt.

If I want to predict the output at instant t, and I enter as inputs N variables of previous instant (t-1) (as in this example), will the LSTM take into account information from instants prior to (t-1)? I understand that since it is a LSTM it has a long term memory and takes into account past information of the whole time series, although I may be wrong.

Reply
- James Carmichael January 6, 2022 at 10:50 am #
  
  Hi Nicolas…If I understand your question correctly, the answer would be yes. LSTMs are designed to learn directly from past time series data. The following may be of interest to you:
  
  https://machinelearningmastery.com/get-the-most-out-of-lstms/
  
  Reply
sina jry January 7, 2022 at 12:24 am #

hi dear dr.jason … I have a categorical item based time series dataset for a market.
the output variable is the sales of the item and the purpose of the problem is to forecast the amount of items needed for next 30 days. which model do you suggest to solve this problem.it would be nice if you recommend any related article.

Reply
- James Carmichael January 7, 2022 at 6:17 am #
  
  Hi Sina…My recommendation would be to try SARIMA, CNN and LSTM models and compare the performance. Sometimes “newer” deep learning models do not perform better than “classical” methods such as SARIMA. The following may be of interest to you:
  
  https://machinelearningmastery.com/sarima-for-time-series-forecasting-in-python/
  
  Reply
Augusto January 17, 2022 at 11:39 am #

Hi Jason,

Thanks for the great work.

I just could not realize why n_features is said to be 8 but when concatenating to invert the data after prediction it is used the index -7. Can you help me on that, please?

Thanks!

Reply
- James Carmichael February 27, 2022 at 12:27 pm #
  
  Hi Augusto…Please clarify your question so that I may better assist you. What specific code listing are you referencing?
  
  Reply
Igor Popov January 26, 2022 at 8:31 am #

Hi Jason Brownlee, this is a beautiful tutorial, thank you very much!! I have enjoyed going through it line by line.
Will you please tell me the following. After model training I would like to predict next time step using just a few previous time steps. For instance if I want to use only one previous time step for prediction using
y = model.predict_step(test_X[-1].reshape(1, 8))
I get the error:
Input 0 of layer “sequential” is incompatible with the layer: expected shape=(None, 1, 8), found shape=(1, 8)
I don’t understand what is the first dimension. The predict method accepts the array test_X, which has shape (35039, 8), i.e. it does not have three dimensions too.

I know C++ well but have just a bare experience with Python, so sorry if it is a trivial question. I can’t figure out how to fix it.

Reply
- James Carmichael January 26, 2022 at 10:56 am #
  
  Hi Igor…Thank you for your feedback and kind words!
  
  I am confident that your understanding will be greatly enhanced with the following material (especially the Part and Lessons below):
  
  https://machinelearningmastery.com/lstms-with-python/
  
  Part I. Foundations
  Lesson 01: What are LSTMs.
  Lesson 02: How to Train LSTMs.
  Lesson 03: How to Prepare Data for LSTMs.
  Lesson 04: How to Develop LSTMs in Keras.
  Lesson 05: Models for Sequence Prediction.
  
  Reply
Igor Popov January 27, 2022 at 12:33 pm #

Thank you James for the answer. The book looks good.

Reply
- James Carmichael January 27, 2022 at 12:46 pm #
  
  You are very welcome, Igor!
  
  Reply
Luca February 10, 2022 at 7:12 pm #

Hi Jason, congratulation for this article.

I just wanted to ask you why you scaled the whole dataset before splitting it into train and test sets. In fact, I have learned that it would be best practice to split the data set first, and then apply the MinMaxScaler() method separately on the two sets (fit_transform() on the training set and transform() on the test set). This is done to avoid any bias, since we theoretically should not know the values in the test set when we train the training set.

Can you please let me know if this is correct and if modifications to your data pre-processing are needed before building any model?

Thanks a lot!

Reply
- James Carmichael February 11, 2022 at 8:32 am #
  
  Hi Luca…You are correct in your understanding. In general it is recommended to follow the procedure you mentioned. I would recommend that you actually try the approach both ways and compare the results of the model in its ability to make predictions for data never seen by the network during training.
  
  Reply
Andrzej February 23, 2022 at 11:33 pm #

Dear Jason,

thank you for your exceptional ability explaining complex matters in simple way.

I have launched a real-world project based on your books. The main idea is choosing the best method among many (incl. LTSM) in validation state and applying it for each single multivariate time series. The result forecasts happened to beat ARIMA and ES for my dataset by higher margin than the best methods in M5 competition did.

The only problem is computational time. I have 40 thousand time series for tests. Decent CPU with computes them entire week calculating in parallel with all its 16 cores. Using GPU with same code makes performance even worse. Now I need to compute 4 million time series on first day of each month. So, the project obviously does not scale.

If I understand correctly, the GPU-optimised code could compute these 40 thousand time series feeding data as tensors. I was advised that GPU may allegedly compute those 40 thousand almost as fast as a single CPU core does with a single time series, provided sufficient GPU memory. However, I failed to find any example, how exactly data should be transformed and fed into the methods in the code like above. Could you please tell, whether it really possible to get such huge performance increase in mentioned way, and if so, give some links to the simple examples (if possible, explained by you as a really talented lector)?

Reply
- James Carmichael February 24, 2022 at 12:56 pm #
  
  Hi Andrzej…While I cannot speak to your specific project, I can offer an introduction to the use of GPUs that may prove beneficial:
  
  https://machinelearningmastery.com/develop-evaluate-large-deep-learning-models-keras-amazon-web-services/
  
  Reply
Hager February 24, 2022 at 1:42 am #

pollution is target
Why you do use pollution as input features?

Reply
- James Carmichael February 24, 2022 at 12:48 pm #
  
  Hi Hager…Time series forecasting uses historical data to forecast future values. It does not use separate “train” and “test” datasets as multilayer perceptron models do.
  
  Reply
  - Hager February 25, 2022 at 3:08 am #
    
    I need to remove pollution from train and test data please help me
    
    Reply
Jeff Wang March 8, 2022 at 12:43 pm #

Hi James,

Thanks for the awesome tutorial.

I have a question regarding the scaling process. You first MinMaxScale the entire dataset and then split the scaled dataset into train and test data. Isn’t this going to result in the out-of-sample data affecting the scaling of in-sample data, thus creating look-ahead bias when fitting the model and predicting using the model?

I know this is just an illustrative example, but would love to hear your take on this, and what would happen if we split in-sample and out-of-sample before scaling them separately.

Thanks

Reply
- James Carmichael March 9, 2022 at 5:56 am #
  
  Hi Jeff…You are correct. A more detailed discussion that confirms your understanding is found here:
  
  https://datascience.stackexchange.com/questions/54908/data-normalization-before-or-after-train-test-split
  
  Reply
Bobby Jones March 11, 2022 at 6:13 pm #

Hi James,

Thanks so much for this tutorial! Really helped me understand how an LSTM works.

One question about the validation set…What was the reason for using the test set in there? I thought that would introduce bias and maybe cause the actual model to overfit?

Would it be okay to set aside another independent chunk of the training set to use for validation instead?

Reply
- James Carmichael March 12, 2022 at 2:40 pm #
  
  Hi Bobby…You are correct. The test set was used just for illustration, however there should be a Training set, a Test set and a Validation set that represents data never seen the model.
  
  Reply
Priyadarshan March 14, 2022 at 10:33 pm #

Hi James, this is a great post. Now it explains how to make multivariate series prediction for pollution. What should I change in the code in order to make it predict for temperature instead of prediction?

Reply
- James Carmichael March 15, 2022 at 1:41 pm #
  
  Hi Priyadarshan…You would need to adjust the code section that creates “pollution.csv” so that you have a transformed dataset called something like
  “temperature.csv”.
  
  Reply
  - Wintotally March 19, 2022 at 1:09 am #
    
    Hi, James. Thanks for this tutorial!
    I want to know would we still be able to get a prediction data if we didn’t have a test set?
    
    Reply
    - James Carmichael March 20, 2022 at 7:20 am #
      
      Hi Wintotally…It never recommended to not utilize a test dataset:
      
      https://www.thedatamba.com/post/why-you-need-to-test-the-tests-in-machine-learning
      
      Reply
  - Priaydarshan S March 21, 2022 at 6:48 pm #
    
    Thank you so much for clearing that. What should I do to predict for all variables at once instead of only pollution?
    
    Reply
    - James Carmichael March 22, 2022 at 11:51 am #
      
      Hi Priaydarshan…You are very welcome! My recommendation is investigate other features of multivariate time series forecasting:
      
      https://www.analyticsvidhya.com/blog/2020/10/multivariate-multi-step-time-series-forecasting-using-stacked-lstm-sequence-to-sequence-autoencoder-in-tensorflow-2-0-keras/
      
      Reply
wintotally March 23, 2022 at 1:48 pm #

Thanks for your answer. I also want to know what should I do to adjust the training loss and the forecast loss? They fluctuate a lot.

Reply
Rajesh Maddu April 2, 2022 at 5:14 am #

Hi Jason,

In my monthly data set:

X ->Air temperature Values; Y->Water Temperature values; the objective is to predict the monthly Water temp. Here we have used one month lag variables are also input variables.

After frame as supervised learning –
var1(t-1) var2(t-1) var1(t) var2(t)

Here var2: Water Temp & var1 – air temp

We have prepared a model with time steps=1 i.e., sequence length=1

Questions:
1. are we underuse of the capabilities of LSTMs as we have used time steps =1?

2. With an LSTM sequence longer than 1 month, the LSTM could learn to remember past values of air and/or water temperature without needing to be passed those variables explicitly. Is this correct? are lag variables not required?

Reply
Habib April 17, 2022 at 2:22 am #

Hi
Thank you so much for the informative code. I am leaning a lot.

I am having some trouble plotting the original and predicted curves. When i plot them, the original curve is different then inv_y.

Could you please3 let me what might be the reason and how to fix it. Thank you in advance.

Reply
- James Carmichael April 17, 2022 at 7:56 am #
  
  Hi Habib…Are you using the code listings provided in the tutorial? Also, how are the the curves different?
  
  Reply
Furkan April 20, 2022 at 9:13 pm #

I want to reach the forecast values for the 12-month or 36-month future data. Then I want to plot graphs of actual and predicted series with these values.
I would be pleased if you could help me.
Thanks in advance.

Reply
- James Carmichael April 21, 2022 at 9:02 am #
  
  Hi Furkan…The following may be of interest to you:
  
  https://stackoverflow.com/questions/65156850/how-to-change-the-forecast-horizon-in-lstm-model
  
  Reply
  - Furkan April 25, 2022 at 11:40 pm #
    
    Hi again,
    
    Thanks for the reply. Could you explain how to implement these codes for multivariate time series, especially for Mr. Brownlee’s codes?
    
    Best wishes.
    
    Reply
Hayat April 21, 2022 at 1:31 am #

Hi James,
Thank you for the effort spent in presenting this tutorial.
Can you give some guide on how to apply the alternate formulation you mentioned above (Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour).
I have multivariate time series data like the one presented, I want to divide the data into training and testing (without shuffling) so I can fit the model on the training set and consequently predict the output on the test data. At the end, I will plot the predicted series and actual series to visualize the difference.
I have tried using your code but later realized you did not use the formulation that fits my goal.

Thanks

Reply
Ktze April 28, 2022 at 1:39 pm #

Hello!

I have a doubt about multivariate lstm.

How to make continuous predictions about the future when multiple inputs correspond to a single output?

Suppose there exist features A and B of length n, and set the sliding window to 2. Using A and B as feature inputs, predict feature A. Then when the model is trained, I can construct a 2×2 sample matrix using the [n-1,n] periods of feature A and the [n-1,n] periods of feature B, and predict the n+1 periods of output A.

But how do I continue to predict the n+2 periods of A?

For feature A, its length becomes n+1 and I can slide to [n,n+1], while for feature B, its length is still n and I cannot slide to [n, n+1], in other words B’s future n+1 periods are still unknown to me and I cannot construct a new 2×2 sample matrix to input into the model to predict A’s n+2 period results.

Are there some problems with multiple inputs corresponding to a single output?

Does this mean I need to go in to predict feature B alone?

Thanks！

Reply
- James Carmichael April 29, 2022 at 10:31 am #
  
  Hi Ktze…The following tutorial may help clarify:
  
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/
  
  Reply
Esperanza May 5, 2022 at 8:41 am #

Hi Jason,

I am reproducing your code with other data (daily values). But when I am trying to use the inverse transformation (to transform to actual values) I get an error. It says:

ValueError: operands could not be broadcast together with shapes (6029,3) (6,) (6029,3)

Do you have any idea on how to solve it?

Thank you 🙂

Reply
- James Carmichael May 6, 2022 at 6:57 am #
  
  Hi Esperanza…Curious if you typed the code in or used copy and paste?
  
  Reply
mina May 10, 2022 at 10:09 pm #

Hello,

Thanks for your great guide.
This guide answers a lot of my questions about the LSTM. however when it comes to multivariate LSTM, how the network will realize the length of historical data? if we prepare data according to this order: var_1_(t-3), var_2_(t-3), var_1_(t-2), var_2_(t-2), var_1_(t-1),var_2_(t-1). after transforming data into NumPy array, the label will be removed and how the network knows that every 2 column of the data presents one timestamp.

Reply
Dirk May 11, 2022 at 3:54 am #

Great tutorial! I have a more general question on LSTM models: let’s say in 1000 people I have feature X measured at 4 timepoints (X1, X2, X3, and X4), and I want to predict some outcome Y measured at time point 5, can I still use LSTM then?

If not, what would be the correct machine learning model for this? I could of course train SVMs, Random Forest, NNs or whatever simply using X1 through X4 as features and Y as the outcome but this would not take into account the time dependency of X (i.e. the nestedness/multi-level-ness of the data). Hope you can help! Best, Dirk

Reply
- James Carmichael May 13, 2022 at 1:21 am #
  
  Hi Dirk…I see no issue with continuing with an LSTM model. Have you implemented the LSTM model for your application yet?
  
  Reply
Pitty May 16, 2022 at 2:47 am #

Hi Jason,

I want to merge the predicted data with the original data into a new CSV file. But I found that the prediction data of the merged files at time T was actually at time T-1. So I have to shift my forecast up by one unit. And the last predicted number will therefore change to NA. I wonder why the raw data and forecast data do not correspond one to one. In this case, the raw data is “pollution. CSV”.

Best wishes

Reply
- James Carmichael May 16, 2022 at 9:12 am #
  
  Hi Pitty…The following may be of interest:
  
  https://stackoverflow.com/questions/48034625/keras-lstm-predicted-timeseries-squashed-and-shifted
  
  Reply
  - Pitty May 18, 2022 at 11:00 am #
    
    Hi Jason,
    another question is as follows:
    inv_yhat = scaler.inverse_transform(inv_yhat)
    ValueError: operands could not be broadcast together with shapes (8760,8) (9,) (8760,8)
    
    In this case, I use the difference between the PM2.5 values of the two moments as the predicted value. And the order of data normalization and series_to_supervised is exchanged.
    
    Best wishes
    
    Reply
Moiz Qureshi May 18, 2022 at 5:14 am #

Hi James,

The way you explain stuff is mind blowing. I was practicing with this model and I’m getting promising results. I was wondering if passing the validation set to the fit function carries any risk of over fitting, when compared to running evaluate method separately.

Also, if I wanted to feed 1 window (most recent data) to this model for live prediction, and at the same time use actual data to keep updating the model, how should I set that up?

Reply
- James Carmichael May 18, 2022 at 11:49 am #
  
  Hi Moiz…Thank you for the feedback! There should always be a complete separation of the validation set from the training process to avoid over fitting. The following should help add clarity:
  
  https://machinelearningmastery.com/difference-test-validation-datasets/
  
  https://machinelearningmastery.com/training-validation-test-split-and-cross-validation-done-right/
  
  Reply
Magnum Quest May 19, 2022 at 7:25 am #

Hi James,

I really enjoyed learning from your tutorial. I had a question, though. You used a prediction model which includes target variable as part of the input features. I had a separate Y variable that I do not want to include as a feature, how would I go about shaping the data for LSTM. I’m have difficulty using reshape function.

Reply
- James Carmichael May 20, 2022 at 11:30 pm #
  
  Hi Magnum…Thank you for the feedback and support! You may find the following of interest:
  
  https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
  
  Reply
Reza May 20, 2022 at 12:18 am #

Hello Jason, I wonder why you did not drop the columns in the larger window. please explain it.
thank you very much.

Reply
- James Carmichael May 20, 2022 at 11:14 pm #
  
  Hi Reza…the example is for illustration only and this step could have been performed.
  
  Reply
Juan May 23, 2022 at 8:24 am #

Hi Jason, Good post!

How can I develop an LSTM for multiple datasets?

Thank you very much.

Reply
- James Carmichael May 23, 2022 at 10:42 am #
  
  Hi Juan…Please elaborate on the goals of your model so that we may better assist you.
  
  Reply
  - Juan May 23, 2022 at 11:56 pm #
    
    I have multiple datasets (each dataset is an array of mxn) and my output is a vector (mx1).
    I want to use all the data for training and choose the best answer out of all for prediction.
    
    Reply
Yby June 3, 2022 at 1:48 am #

Hi Jason, excellent post.

Could this example be converted to an anomaly detection problem, instead of a regression/prediction one?

The reason is I would be interested in using LSTM for anomaly detection in a multivariate time-series application (with moderate series number, 20 or so, and relative large window size).

Would autoencoders be a better option? I don’t think typical methods like isolation forests, DBSCAN, LOF, k-means… would do the job in this case, would they? All examples I’ve seen use single row samples and few columns, don’t deal with time-series windowing, and complex anomalies (just merely detecting outliers).

thanks in advance for your advice.

Reply
- James Carmichael June 3, 2022 at 9:16 am #
  
  Hi Yby…the following is a great resource for LSTMs used for anomaly detection:
  
  https://medium.datadriveninvestor.com/lstm-neural-networks-for-anomaly-detection-4328cb9b6e27
  
  Reply
Rohit July 2, 2022 at 11:44 pm #

Hi,
Can you please explain how to forecast the future in multivariate time series data? And share some good resources to learn.

Reply
- James Carmichael July 3, 2022 at 1:10 pm #
  
  Hi Rohit…What are some specific goals for your models? Knowing this will enable us to better assist you.
  
  The following resource is a great starting point:
  
  https://machinelearningmastery.com/deep-learning-for-time-series-forecasting/
  
  Reply
Yang July 6, 2022 at 6:03 pm #

i found error while line “scaler.inverse_transform(inv_y) ” executed..and found some people have same situation like mine. Finally, i realized that 4 columns [‘year’, ‘month’, ‘day’, ‘hour’] need to be deleted first from dataset.
btw, that’s why the (index:4) column need to be encodered. –> line values[:,4] = encoder.fit_transform(values[:,4]).

Reply
- James Carmichael July 7, 2022 at 6:43 am #
  
  Thank you for the feedback Yang!
  
  Reply
C Yang July 6, 2022 at 6:04 pm #

i found error while line “scaler.inverse_transform(inv_y) ” executed..and found some people have same situation like mine. Finally, i realized that 4 columns [‘year’, ‘month’, ‘day’, ‘hour’] need to be deleted first from dataset.
btw, that’s why the (index:4) column need to be encodered. –> line values[:,4] = encoder.fit_transform(values[:,4]). i appreciate James Carmichael’s post, which i learned a lot from it.

Reply
- James Carmichael July 7, 2022 at 6:44 am #
  
  Thank you for the feedback Yang!
  
  Reply
nada July 13, 2022 at 6:33 pm #

Hello James, i just want to know how do i do to predict data with the same model but instide predicting every 1 hour i want to predict it every 15 minutes.

Reply
- James Carmichael July 14, 2022 at 10:44 am #
  
  Hi Nada…Your source data would have to be input with data points representing values for every 15 minutes.
  
  Reply
Hadyan July 16, 2022 at 5:27 am #

Hello Jason!

Good work! By the way, how do you generate prediction without X_value? I want to use the model to forecast something in the future that I don’t have any data from

Reply
- James Carmichael July 16, 2022 at 7:15 am #
  
  Hi Hadyan…We are not aware of a way to make predictions on data that does not have any values in the past. Perhaps you could elaborate on what you are trying to accomplish. Time series forecasting algorithms determine the “autocorrelation” of an input data set to make future predictions. I apologize if I am misunderstanding your question.
  
  Reply
Hadyan July 16, 2022 at 6:33 am #

Sorry for not being clear on this. Let me give an example.

Let’s say I have data from January 2020 to July 2022, and want to predict the value from August to October 2022, how can it be achieved?

Thank you

Reply
- James Carmichael July 16, 2022 at 7:11 am #
  
  Hi Hayden…You will want to adjust the forecast horizon.
  
  https://towardsdatascience.com/how-long-should-the-forecast-horizon-be-2f24a6005b89
  
  Reply
  - Hadyan July 18, 2022 at 10:59 pm #
    
    Hello James,
    
    Thank you very much for the response. But with the code showed in this example, I can only predict one timestep ahead. How can I structure the data so it would be able to predict the value for three months ahead of time, given the last data I have is on July 2022, to predict the value for August to October 2022?
    
    Reply
Ciaran July 20, 2022 at 9:34 pm #

Thank you for this fantastic resource, and your wider project of making this subject matter understandable. I am finding it a huge help! I am stuck with a problem that I can’t seem to get my head around…

My context – I am using past visitor data along with weather data, aiming to better predict visitor numbers in future. I am trying to make a prediction 3 days ahead. I want to use past visitor + weather data, alongside forecast weather data, to make this 3 day ahead prediction. If I align the weather with the visitor data, then it seems I must cut the future (unknown) visitor data out of my inputs, creating some non rectangular input. I imagine having an input like this:

||Rain|Sun|Wind|Visitors|
|:—:|:—:|:—:|:—:|:—:|
|t+3|R+3|S+3|W+3|Null|
|t+2|R+2|S+1|W+2|Null|
|t+1|R+1|S+1|W+1|Null|
|t-0|R-0|S-0|W-0|todays visitor numbers|
|t-1|R-1|S-1|W-1|V-1|
|t-2|R-2|S-2|W-2|V-2|
|t-3|R-3|S-3|W-3|V-3|

I am really intellectually stuck on this point.

Reply
- James Carmichael July 21, 2022 at 10:57 am #
  
  Hi Ciaran…Please clarify any questions you may have regarding the tutorial content so that we may better assist you.
  
  Reply
  - Ciaran July 29, 2022 at 10:07 pm #
    
    I want to feed in multivariate data with columns (number of visitors yesterday, temp yesterday, rain yesterday etc), and I want to feed in forecast weather without the actual number of future visitors, to predict visitor number 3 days from now. This makes the data not rectangular since I will have null values for the number of visitors today & future.
    
    Can you suggest how I might shape my data to include all this data?
    
    Reply
    - James Carmichael July 30, 2022 at 10:03 am #
      
      Hi Ciaran…The following resource may prove beneficial:
      
      https://machinelearningmastery.com/handle-missing-timesteps-sequence-prediction-problems-python/
      
      Reply
      - Ciaran August 1, 2022 at 6:08 pm #
        
        Oh that is great! Thank you very much for your help James!
Ehsan Afshar July 22, 2022 at 3:45 am #

thanks for a wonderful explanation, could i ask you explain how to predict next unseen nth days for multivariate LSTM models?

Reply
- James Carmichael July 22, 2022 at 8:09 am #
  
  Hi Eshan…The following discussion may be of interest to you:
  
  https://stackoverflow.com/questions/65156850/how-to-change-the-forecast-horizon-in-lstm-model
  
  Reply
Martina July 22, 2022 at 8:15 pm #

Is it correct to scale the test set used in validation with the same scaler of the training set?

Reply
- James Carmichael July 23, 2022 at 11:58 am #
  
  Hi Martina…You may find the following of interest:
  
  https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/
  
  Reply
Julian July 24, 2022 at 1:09 am #

Hi Jason,

I have a problem which compromises the following:

I have 30 companies.

For each company I have 40 periods (from 2011 to 2020 quarterly)

Then I have 39 variables/columns (Financial metrics)
1 dependent variable: ESG score (between 0 and 100)

My question is:

If I have 40 rows for each company going from 2011-Q1 to 2020-Q2

Can I stack the 30 companies one below the other?

What procedure should I use for this? I would have a total of 40×30 rows repeating 30 times the time variable.

It is multivariate timeseries but I can’t find what method to follow if I’m stacking time x times (30 in my case).

Hope you understand and can help us with this. I am willing to buy a book where this is explained!

Thanks in advance!

Best regards,

Julian

Reply
- James Carmichael July 24, 2022 at 9:35 am #
  
  Hi Julian…I would highly recommend the following resource as a starting point:
  
  https://machinelearningmastery.com/deep-learning-for-time-series-forecasting/
  
  Reply
Saubhagya August 18, 2022 at 11:35 pm #

How can I add walk-forward validation in multivariate time series analysis using LSTM?

Reply
- James Carmichael August 19, 2022 at 7:28 am #
  
  Hi Saubhagya…The following resource may be of interest:
  
  https://stats.stackexchange.com/questions/564407/how-does-walk-forward-work-with-lstm
  
  Reply
Lu September 5, 2022 at 11:53 pm #

Hi Jason
I am trying to build an multi-input, multi-output LSTM network. The difference to the networks from tutorials is that in addition to the time, other values from the future are known. These values should be taken into account. For a better understanding I have created a small table here.
| Timestep| y-pos| x-pos| vy-velo| vx-velo | ay-accel |ax-accel| ey-error | ex-error|
|:————|:——–|:——-|:———|:———–|:————|:———-|:———–|:———–|
| t-5 | 1 | 3 | 1 | 1 | 0 | 0 | 0.58 | 0.07 |
| t-4 | 2 | 4 | 1 |1 | 1 | 0 | 1.21 | 0.53 |
| t-3 | 3 | 5 | 2 | 1 | 0 | 0 | 0.91 | 0.63 |
| t-2 | 5 | 6 | 2 | 1 | -3 | 0 | -2.91 | 0.507 |
| t-1 | 7 | 7 | -1 | 1 | 4 | 0 | 4.71 | 0.616 |
| t | 6 | 8 | 3 | 1 | -2 | 1 | -1.144 | 1.09 |
| t+1 | 9 | 9 | 1 |2 | -5 | 0 | | |
| t+2 | 10 | 11 | -4 |2 | 6 | -3 | | |
| t+3 | 6 | 12 | 2 |-1 | 1 | 2 | | |

A known trajectory is considered, with planned speed and acceleration. Now I want to predict the position error. Unfortunately, the values for the planned trajectory, with planned speed and acceleration (t+1 to t+3) are not taken into account. Is there a way to include these values in the forecast ?

Reply
- James Carmichael September 6, 2022 at 7:28 am #
  
  Hi Lu…The following discussion may prove beneficial:
  
  https://stackoverflow.com/questions/70361179/how-to-include-future-values-in-a-time-series-prediction-of-a-rnn-in-keras
  
  Reply
Nick September 6, 2022 at 12:14 am #

Hi James

Thanks for the fantastic post – really interesting what you’ve done here. I’m probably going mad, but when I print out inv_y & inv_yhat variables at the end of the script after they’ve been inverted, I get values much lower than the air pollution figure that is being used for the predictions? I’m trying to get the figures back to normal after they’ve been normalized to decimal point figures so that I can add the forecast on the end of the dataframe as a new column.

See below code:

# invert scaling for forecast
inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)
print(“inverted scaling for forecast – step 1:”)
print(inv_yhat)

inv_yhat = scaler.inverse_transform(inv_yhat)
print(“inverted scaling for forecast – step 2:”)
print(inv_yhat)
inv_yhat = inv_yhat[:,0]
print(“inverted scaling for forecast – step 3:”)
print(inv_yhat)
df_output = dataset[:35039]
df_output[‘Forecast’] = inv_yhat

# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
print(“inverted scaling for actual – step 1:”)
print(test_y)
inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)
print(“inverted scaling for actual – step 2:”)
print(inv_y)
inv_y = scaler.inverse_transform(inv_y)
print(“inverted scaling for actual – step 3:”)
print(inv_y)
inv_y = inv_y[:,0]
print(“inverted scaling for actual – step 4:”)
print(inv_y)
df_output[‘Actual’] = inv_y
df_output.to_csv(‘LSTM_Forecast.csv’)
# calculate RMSE
rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
print(‘Test RMSE: %.3f’ % rmse)

Reply
- James Carmichael September 6, 2022 at 7:25 am #
  
  Hi Nick…The following resource may be of interest:
  
  https://towardsdatascience.com/understand-data-normalization-in-machine-learning-8ff3062101f0
  
  Reply
Chiru September 9, 2022 at 4:16 pm #

Hi James..

I am impressed with your work and posts. You are amazing.
My doubt is that can we apply LSTM to a normal regression kind of problem where there is no time series data.

Reply
- James Carmichael September 10, 2022 at 7:39 am #
  
  Hi Chiru…You are very welcome! We appreciate the feedback! LSTMs are ideal for time series data as opposed to establishing a functional mapping (regression). Having said that, there is no doubt research into possible application to many other tasks.
  
  Do you have a particular regression type of application you can describe? That will allow us to help determine a suitable selection of model type.
  
  Reply
Chiru September 10, 2022 at 2:15 pm #

Thank you James..

I have a data set with 70 features. Let us say with 1000 samples. It is a size of 1000*70. Most of the samples are non-zero values where as few are zero values. Only one label with a few zero values and more non-zero values.
Same problem, I modeled with Multilayer perception and CNN. Now I would like to work with LSTM and GANs.
Can you give me some insights which will really help me in doing my work?
Thank you…

Reply
- James Carmichael September 11, 2022 at 2:19 am #
  
  You are very welcome Chiru! The following resources are great starting points:
  
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/
  
  https://machinelearningmastery.com/what-are-generative-adversarial-networks-gans/
  
  Reply
JOJO October 18, 2022 at 8:20 pm #

Excellent work! But I want to kown how to predict the furture data. Actually, we have not the furture test_x data.
For example, I want to predict the pm2.5 in 2022-10-19——2022-11-19.

Reply
- James Carmichael October 19, 2022 at 6:58 am #
  
  Hi JOJO…The following may be of interest to you:
  
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
OTB October 29, 2022 at 11:33 pm #

Hi. Thanks fot the tutorial. I have a question. Please share your comments.

Consider typical LSTM model for time series problem. If i want to train the model with different datasets, what should I do? I must create one model and train it with 120 different datasets but same size, same time steps, same features. Model must consider all of those datasets to predict afterwards.

Consider the typical LSTM structure below:
model_seq = Sequential()
model_seq.add(InputLayer((5,4)))
model_seq.add(LSTM(64))
model_seq.add(Dense(8,”relu”))
model_seq.add(Dense(1,”linear”))

And compiling like below:
opti=rp(learning_rate=0.0001)
opti2=Adam(learning_rate=0.0001)
model_seq.compile(loss=”mse”, optimizer=opti,metrics=”mae”)
model_seq.fit(x1,y1,epochs=5, batch_size=16, verbose=1)

My problem is I don’t want to train with only x1-y1. I also need to train the same model with x2-y2,x3-y3 etc. At the end, I need one model that understood all of 120 datasets behavior and it must be able to predict another x-y data. Is it possible? Your comments will be very important because I couldn’t do it for very long time.

When I try to fit multiple times, model only consider last fitting. Because all time series starts with 0 and ends at different values.

Reply
- James Carmichael October 30, 2022 at 5:58 am #
  
  Hi OTB…In this case I would recommend investigation of sequence to sequence models.
  
  https://towardsdatascience.com/day-1-2-attention-seq2seq-models-65df3f49e263
  
  Reply
Rahat November 20, 2022 at 12:33 am #

Hi James

While I am trying to evaluate the model, getting following error.

transpose expects a vector of size 2. But input(1) is a vector of size 3
[[{{node transpose}}]]
[[sequential_10/lstm_10/PartitionedCall]] [Op:__inference_predict_function_804135]

Note that, i have dataset with same amount of columns(features) and trying to predict one output. Number of rows and train and test set count is different

Reply
Rahat November 20, 2022 at 12:58 am #

Hi James

While I am trying to evaluate the model, getting below error

transpose expects a vector of size 2. But input(1) is a vector of size 3
[[{{node transpose}}]]
[[sequential_12/lstm_12/PartitionedCall]] [Op:__inference_predict_function_1065417]

Note that, my database feature no is same as this example but test train dataset quantity is different. Also i am trying to evaluate one parameter as output

Reply
- James Carmichael November 20, 2022 at 11:45 am #
  
  Hi Rahat…The following resource may be of interest:
  
  https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
  
  Reply
Afron November 30, 2022 at 3:24 am #

Hi Jason

l changed the value of the real PO2 in the test data but the value of the predictions changed also

Does the LSTM model allow to use the real value of PO2 in test data?

I am confused , because I think the model should use the training data only to predict the PO2 in the test data

not test data itself.

Reply
- James Carmichael November 30, 2022 at 8:56 am #
  
  Hi Afron…The following resource may add clarity related to this topic:
  
  https://machinelearningmastery.com/moving-average-smoothing-for-time-series-forecasting-python/
  
  Reply
Joe December 3, 2022 at 4:59 am #

Hey Jason, a few people complained about a “ValueError: could not convert string to float: ‘NW’” error.

Most likely they didn’t rename the original file pollution.csv file to raw.csv before running the preprocessing code to convert it to convert it back to pollution.csv. To make things more clear and less error-prone, maybe consider renaming the original pollution.csv file to pollution_raw.csv or something similar.

Reply
- James Carmichael December 3, 2022 at 8:15 am #
  
  This is great advice Joe! We appreciate the feedback and suggestion!
  
  Reply
Ivan Arrubla December 18, 2022 at 3:39 am #

Hi JAson. I have the same error, than others

—————————————————————————
ValueError Traceback (most recent call last)
C:\Users\IZIDAR~1\AppData\Local\Temp/ipykernel_16112/1830588263.py in
37 values[:,4] = encoder.fit_transform(values[:,4])
38 # ensure all data is float
—> 39 values = values.astype(‘float32’)
40 # normalize features
41 scaler = MinMaxScaler(feature_range=(0, 1))

ValueError: could not convert string to float: ‘1 4’

I’ve followed the instructions about how to solve the error, but it appears again.
Can you help me?
Thanks

Reply
- James Carmichael December 18, 2022 at 10:18 am #
  
  Hi Ivan…Have you tried your code in Google Colab? Also, did you type the code listing or copy and paste it?
  
  Reply
Nic December 29, 2022 at 2:58 am #

Hi Jason,

I appreciate your thorough explanation. I was successful in running your code using the dataset you provided. However, I would like to repeat the LSTM model (for multivariate input data) say five times and then comparing the average outcome. Could you explain how the code can be extended for this purpose please?

Thanks a lot

Reply
- James Carmichael December 29, 2022 at 8:44 am #
  
  Hi Nic…You are very welcome! The following resource may be of interest to you:
  
  https://machinelearningmastery.com/repeated-k-fold-cross-validation-with-python/
  
  Reply
Nic December 29, 2022 at 7:28 pm #

Hi James,

Thanks for your reply. I will have a look at the resource which you have indicated. I have another question with regards to the feature/variable selection in an LSTM model. Could you kindly indicate some resources which would help in determining how to best choose the number of variables to be considered as inputs for an LSTM model please?

Thanks a lot

Reply
Nic January 2, 2023 at 6:49 pm #

Hi James,

Thanks for your reply. I will have a look at the resource which you have indicated. I have another question with regards to the feature/variable selection in an LSTM model. Could you kindly indicate some resources which would help in determining how to best choose the number of variables to be considered as inputs for an LSTM model please?

Thanks a lot.

Reply
Guanta January 19, 2023 at 2:46 am #

Hi all, I am trying to find the solution to a simillar problem and I wonder if you can help.

I have panel data on 200 different stocks, each stock belongs to a different sector of which there are 12 different sectors hot encoded 1-12. For each stock there 8 different pieces of price information such as price, market capitalisation, volume, and so forth. I then have a a column of of future stock prices on which to train the mdoel.

Would this mean I need to train 200 different models? How would you go about this problem if you were given this dataset?

Sorry if this is a daft question. I am new to ML.

Reply
Arnold January 21, 2023 at 1:22 pm #

Hi Jason, massive fan of your work throughout the years.
Keeping it short as I assume you have hundreds of messages a day!

If one has a dataset on 400 patients’ health through time.
X variables are: Patient ID, Age Group (Binary i.e OLD 1 and Young 2), Distance walked during the day, Amount of calories eaten that day.
Y variable to be predicted is: Amount of non-fatal heart attacks.

My idea was that one could run 400 different LSTM time series models on each individual to predict the amount of non-fatal heart attacks.

My question is! These results would gain no information from the other predictions, is there a way you know of linking this information?

For example, if one was to train a model on an OLD patient, is there any way that the model can learn that OLD patients have tended to have more non-fatal heart attacks in the other regressions so the model incorporates more non-fatal heart attacks to this old patients predictions?

Maybe I am thinking about it wrong, please help!

Reply
Arnold January 22, 2023 at 9:46 am #

Hi all/anyone I am wondering if anyone can help, hypothetically speaking:

If one has a dataset on 400 individuals through time.

X variables are: person ID, age group (Binary i.e OLD 1 and Young 2), average calories eaten in a day, the average amount of cigarettes smoked in a day, and the average amount of dentist appointments in a year.

Y variable to be predicted is: the number of teeth in the mouth of each patient.

My idea was that one could run 400 different LSTM time series models on each individual to predict the number of teeth in that individual’s mouth.

My question is! These predictions would not have gained any information from the other predictions, or the data from the other persons. Is there a way you know of linking this information?

For example, if one was to train a model on an OLD patient, is there any way that the model can learn that OLD patients have tended to have less teeth in their mouths in the other models/data, so the model incorporates ‘less teeth in the mouth’ to this old patients predictions?

Or maybe I am not thinking about this correctly?

Reply
Travis February 26, 2023 at 1:15 am #

Hi Dr. Carmichael,

Really appreciative of all of your blog posts- takes a very complex issue and boils it down to something I can understand with a measly bachelors engineering degree and not a doctorate in mathematics (like most other posts)! I am relatively new to coding, and while I follow the logic behind all the steps and purpose of everything, I have a more technical coding question:

In the “make a prediction section” after inverting the yhat and y datasets (see the specific lines below, bracketed by ‘–> inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1) inv_y = concatenate((test_y, test_X[:, 1:]), axis=1) <–
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]

Reply
- James Carmichael February 26, 2023 at 10:44 am #
  
  Hi Travis…You are very welcome! Please elaborate on your question so that we may better assist you.
  
  Reply
Jason February 27, 2023 at 4:47 am #

Hi Jason,

I am building similar LSTM model, but wanting to use several features to predict Bitcoin close price instead, not sure if this is mentioned but I am struggling with trying to inverse transform my outcome. To provide more context, here’s a snippet of my code:

###
# scaling my input data
scaler = MinMaxScaler()

features = df.iloc[:, 1:].values.reshape(-1, 6)

scaled_features = scaler.fit_transform(features)

# Checking scaled features shape
scaled_features.shape
(4608, 6)

# Build sequences of data to feed into model

SEQ_LEN = 100

def to_sequences(data, seq_len):
d = []

for index in range(len(data) – seq_len):
d.append(data[index: index + seq_len])

return np.array(d)

def preprocess(features, seq_len, train_split):

data = to_sequences(features, seq_len)

num_train = int(train_split * data.shape[0])

X_train = data[:num_train, :-1, :-1]
y_train = data[:num_train, -1, -1].reshape(-1, 1)

X_test = data[num_train:, :-1, :-1]
y_test = data[num_train:, -1, -1].reshape(-1, 1)

return X_train, y_train, X_test, y_test

X_train, y_train, X_test, y_test = preprocess(scaled_features, SEQ_LEN, train_split = 0.90)

print(X_train.shape, y_train.shape)
(4057, 99, 5) (4057, 1)
print(X_test.shape, y_test.shape)
(451, 99, 5) (451, 1)

## Build model
# Will not paste the code for my model as I successfully fit and trained my model
# But the error comes in when I tried to inverse transform the prediction made by the model

y_hat = model.predict(X_test)

y_test_inverse = scaler.inverse_transform(y_test)
y_hat_inverse = scaler.inverse_transform(y_hat)

plt.title(‘Bitcoin price prediction’)
plt.xlabel(‘Time [days]’)
plt.ylabel(‘Price’)
plt.legend(loc=’best’)

plt.show();

ValueError: non-broadcastable output operand with shape (451,1) doesn’t match the broadcast shape (451,6)

From my understanding it seems like I tried to inverse_transform my prediction that has a different shape from the scaler that is used to fit_transform on my input data, but I don’t know how to overcome this. Can you please give me some hints on this ?

Reply
- James Carmichael February 27, 2023 at 9:22 am #
  
  Hi Jason…The following resources may be of interest:
  
  https://python.hotexamples.com/examples/sklearn.preprocessing/MinMaxScaler/inverse_transform/python-minmaxscaler-inverse_transform-method-examples.html
  
  https://itecnote.com/tecnote/python-how-to-use-inverse_transform-in-minmaxscaler-for-a-column-in-a-matrix/
  
  Reply
Mory March 6, 2023 at 12:58 am #

I have new measurements without output >>>> how can i predict y with new measurements.

Reply
- James Carmichael March 6, 2023 at 11:29 am #
  
  Hi Mory…new measurements would also need to be reshaped into a time series so that the lstm model can make predictions with it.
  
  Reply
Amory March 19, 2023 at 9:48 am #

Hi Jason,

Thank you for a cool example. I am working on a similar problem where I have 7 variables of interest at time t, and trying to predict a binary variable y at some time in the future, say t+7. I want to include lagged values of the 7 variables going back 40 time measurements. This means I have 7*40 + 7 variables or as you call it “features”.

My issue is figuring out what the proper dimensions for reshaping my data so I can pass it into keras API. My guess as of now is to have my dimensions be (samples = len(dataframe), timesteps = 1, and features = 7*40+7).

Is my intuition correct? This seems to contradict your code above but I don’t understand the intuition for why.

Reply
- James Carmichael March 20, 2023 at 10:21 am #
  
  Hi Amory…Have you executed your code? That may allow us to better assist you should your results not be correct.
  
  Reply
peter March 23, 2023 at 3:03 am #

Sorry, I wonder

1. why this code training use var1(t) is target

2. why use var1(t-n) var1(t-3) var1(t-2) var1(t-1) is input for training ?

3. why when testing not use future but use past for prediction (use test_X for predict, test_X is past)

So it’s call forecasting future ?

.. Thank you if you cleared my doubts

Reply
- James Carmichael March 23, 2023 at 10:47 am #
  
  Hi Peter…Past values are considered in order to make future predictions. Here is another view that may prove insightful:
  
  https://towardsdatascience.com/time-series-forecasting-with-recurrent-neural-networks-74674e289816
  
  Reply
peter March 23, 2023 at 10:02 pm #

thank PhD. for reply

I wonder this why use 7 for inverse transform why not use n_features ?

# specify the number of lag hours
n_hours = 3
n_features = 8 <—————————————————– This feature
# frame as supervised learning
reframed = series_to_supervised(scaled, n_hours, 1)

inv_yhat = np.concatenate((yhat, test_X[:, -7:]), axis=1) <—– Why use -7 ?, why not used n_features
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]

Thank you ..

Reply
peter March 23, 2023 at 10:55 pm #

another question
if i want forecasting next 5 days
How do I configure it function series_to_supervised(n_in=1, n_out=1)

n_in=?, n_out=?

1, 2, 3, 4, 5, 6, 7, 8, 9, 10 days –> need forecasting value 11, 12, 13, 14, 15 days of future

Reply
- James Carmichael March 24, 2023 at 6:08 am #
  
  Hi Peter…There are more detailed examples provided in the following resource:
  
  https://machinelearningmastery.com/deep-learning-for-time-series-forecasting/
  
  Reply
roy March 24, 2023 at 12:21 am #

Hi professor, I have question

1. I do not understand

Now we have output is yhat.shape : (8476, 1)

8476 = number of test set (assume now we have daily dataset)

assume dataset is

if i need show result for forecasting next 10 days future, Where is this value in yhat ?
because yhat is length 8476 not 10

2. How set n_in=1, n_out=1 in series_to_supervised(n_in=1, n_out=1) for forecasting next 10 days (need predict future value not past), If I try set n_out=1 , does that show the forecast for the next 5 days ?

Thank for answer

Reply
- James Carmichael March 24, 2023 at 6:07 am #
  
  Hi Roy…Most of these questions are addressed in more detail in the following resource:
  
  https://machinelearningmastery.com/deep-learning-for-time-series-forecasting/
  
  Reply
kk May 3, 2023 at 11:28 pm #

Hi Jason,

These are great tutorials and I was able to run on my sample data. One quick question: what changes are required (series to supervised learning, train and test sets, network,…) within the “Train On Multiple Lag Timesteps Example” if a sample dataset has pollution data for several cities (name of cities being one feature)? Any suggestion is much appreciated.

Reply
- James Carmichael May 4, 2023 at 6:36 am #
  
  Hi kk…The following resources may add clarity:
  
  https://machinelearningmastery.com/how-to-develop-autoregressive-forecasting-models-for-multi-step-air-pollution-time-series-forecasting/
  
  Reply
John May 18, 2023 at 8:32 pm #

Hi,
I have a similar dataset but instead I have a 13 month dataset with measurements every 15 mins of SO2, NO2, NO, NOx, PM10, PM2.5, Temperature, Wind speed, Wind direction (in degrees), humidity, pressure and solar radiation. I started making some approaches (before resampling my dataset hourly) such as ARIMA and SARIMAX following your books (that were a lot of help for me), could you tell me whether or not checking that approaches is a good choice? When starting to look for Deep Learning models I found out (also in your books) that LSTM is the best option to check out.

However I do not know when transformation such as the MinMaxScaler is needed. Moreover, I tried taking as base your code of the current web page and I do not know how the MinMaxScaler works, as if I print the forecasted values and the observed ones after applying the inverse of the MinMaxScaler I do not obtain the values in the same scale as I had initially (For example, I have 68 micrograms/m3 for O3 as the first value of the test set, I apply the MinMaxScaler, forecast it and then I have as observed 8 and as forecasted 8) Why am I not getting the 68 micrograms/m3?
Could you help me please? Thanks in advance!!!

Reply
- James Carmichael May 19, 2023 at 6:03 am #
  
  Hi John…The following resource provides an introduction to the standard and min-max scaler:
  
  https://machinelearningmastery.com/standardscaler-and-minmaxscaler-transforms-in-python/
  
  Reply
John May 19, 2023 at 4:28 pm #

Thanks!!
Could you tell me whether applying ARIMA, SARIMAX and LSTM is right for that time series problem?

Reply
Nat May 31, 2023 at 4:43 pm #

Hi Jason, I’m not sure about one thing.
If I get it right, this model uses multiple variates in one time step to predict the pollution value in that same time step? Do I understand it correctly?

So it just use LSTM model to receive these multi variates?
And how to use multi-step and also multi-varites data as input?

Thank you in advance.

Reply
- James Carmichael June 1, 2023 at 5:21 am #
  
  Hi Nat…The following resource may help clarify:
  
  https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/
  
  Reply
Saad Alsamraee July 7, 2023 at 2:46 pm #

Hi Jason,
I have tried to find any tutorial for time series forecasting using LSTM involves a prediction after testing the model. Could you please move one step forward after testing the model and make a prediction for future like six months or one year It will be very helpful.

Many thanks,
Saad

Reply
- James Carmichael July 8, 2023 at 11:21 am #
  
  Hi Saad…The following resource will provide many examples and insights into time series forecasting with deep learning models.
  
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Buddi August 5, 2023 at 2:02 am #

Hi Jason,

Your tutorials were very helpful. I need some support from you regarding following issue.

I had a data set including rainfall and river flow. I used both rainfall and river flow to train the model to predict the river flow. Now I need to predict the river flow for future rainfall estimates using same trained model where river discharge data is not available.

Could you please help me in this regard.

Thank you.

Reply
- James Carmichael August 5, 2023 at 9:47 am #
  
  Hi Buddi…You are very welcome! Once trained, you will simply provide your new dataset as input to your trained model.
  
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
Jerron August 5, 2023 at 7:28 pm #

How to ‘Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour’?

Reply
- James Carmichael August 6, 2023 at 9:33 am #
  
  Hi Jerron…The following resource may be of interest:
  
  https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
  
  Reply
Jerron Liu August 5, 2023 at 7:53 pm #

How to ‘Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour’? I saw many people asked the same questions in the past 6 years. Let me try to make the question more clear:
In this article, we only use past 1 hour as input. If we make n_in=2 when we call series_to_supervised, we can expand the input to 2 hours history. after we trained the model, we can use it to predict the pollution in the next hour given 2 hours of input with the call
yhat = model.predict(test_X)
Now if we want to predict the polution with not only the historical 2 hours input, but also the “expected” weather conditions for the next hour. How to add such extra input?

Reply
Jerron August 5, 2023 at 8:00 pm #

How to ‘Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour’? I saw many people asked the same questions in the past 6 years. Let me try to make the question more clear:
In this article, we only use past 1 hour as input. If we make n_in=2 when we call series_to_supervised, we can expand the input to 2 hours history. after we trained the model, we can use it to predict the pollution in the next hour given 2 hours of input with the call
yhat = model.predict(test_X)
Now if we want to predict the polution with not only the historical 2 hours input, but also the “expected” weather conditions for the next hour. How to add such extra input? Will the model still stay the same, but we can somehow squeeze the weather conditions of 3 hours as input? Or we use the last one hour and the expected future one hour? If the former, will there be misalignment : when we train, we have 1 am and 2 am as input and output at 3 am. but now we have 2 am and 3 am as input but still want output at 3 am.

Reply
Jaroslav hook August 16, 2023 at 3:43 am #

I think there is a serious bug on the code. You are predicting pollution data ( values[:,8] using the polution data itself (values[:,0]. You first and the last column in values ARE THE SAME. If you exclude column[0] from input the prediction will be different

Reply
- James Carmichael August 16, 2023 at 8:50 am #
  
  Hi Jaroslav…The following resource will clarify your queries and doubts regarding LSTMs and CNNs utilized for time-series forecasting.
  
  https://machinelearningmastery.com/deep-learning-for-time-series-forecasting/
  
  Reply
Rana LOUBANI September 26, 2023 at 9:56 pm #

Hi James,

Thank you for this tutorial, so interesting.

I’m trying to train a LSTM model, using mutivariate time series data.
I need to predict the value of y at t, using mutiple lags of mutiple variables X.
so my question is: if i need to use 2 lags of each variable x, do i form my input matrix like this :

[
[ [var1 (t-1)], [var2 (t-1)] ,
[var1 (t-2)], [var2 (t-2)] ],

[ [var1 (t-1)], [var2 (t-1)] ,
[var1 (t-2)], [var2 (t-2)] ],
.
.
.
.
.

]

or like that

[
[ [var1 (t-2)], [var2 (t-2)] ,
[var1 (t-1)], [var2 (t-1)] ],

[ [var1 (t-2)], [var2 (t-2)] ,
[var1 (t-1)], [var2 (t-1)] ],
.
.
.
.
.

]

Thank you.

Reply
- James Carmichael September 27, 2023 at 7:54 am #
  
  Hi Rana…You are very welcome! The following resources may be of interest:
  
  https://towardsdatascience.com/multivariate-timeseries-forecast-with-lead-and-lag-timesteps-using-lstm-1a34915f08a
  
  https://www.analyticsvidhya.com/blog/2018/09/multivariate-time-series-guide-forecasting-modeling-python-codes/
  
  Reply
Candice October 5, 2023 at 11:03 pm #

HI Jason,
Thank you for such a great tutorial. I have a ‘first principles’ question to ask, if I have many data points for my training dataset, is it necessary to have a long lookback as well? In my dataset, the performance gets worse when I add more timesteps to my lookback.
Thanks.

Reply
- James Carmichael October 6, 2023 at 9:13 am #
  
  Hi Candice…You are very welcome! Your understanding is correct! The lookback should be adjusted based upon acceptabl accuracy. I would suggest investigating model performance as a function of lookback and consider it a hyperparameter to be optimised.
  
  Reply
Chris November 1, 2023 at 8:24 pm #

what if i have 1 year data like this and want to do the hourly prediction base on previous same hour of the day because the data i have behave like the same not on the previous hour but on the same previous day hour.

Reply
- James Carmichael November 2, 2023 at 10:47 am #
  
  Hi Chris…In this case you would reshape the data to be consistent with the time steps needed for your prediction.
  
  Reply
Chris November 1, 2023 at 8:25 pm #

not just the previous day but base on the previous months days same hour.

Reply
Chris November 3, 2023 at 6:32 pm #

Can you help me in this let suppose the data is like hourly data of previous 1 year and want to predict hourly base for next day or week so how it will then works.

Reply
Rohit Shorya January 12, 2024 at 6:11 pm #

Hi Jason,
Thank you for such a great tutorial.

I just want to know what if I want to predict more than one feature at a time? Consider I have a data frame with 11 features and I want to predict 6 of them as well . Can we do with this or not? And is it advisable to do so OR I should I go for each feature individually.

Reply
- James Carmichael January 13, 2024 at 7:50 am #
  
  Hi Rohit…You are very welcome! It may be more beneficial to train a model to predict each feature in this case. Let us know how things are working once you build your models.
  
  Reply
Rohit Shorya January 17, 2024 at 6:28 pm #

Here you have fit_transformed the complete data. Is it okay to do so? Like we have exposed our data completely. And in some of your other blogs, only the train data was fit_transformed and test data was only transformed not fit.

Reply
- Mark January 31, 2024 at 4:58 pm #
  
  Same doubt. I think only the train part must be fit_transfomed and the test part only be transformed. Please clarify this.
  
  Reply
polad February 18, 2024 at 8:49 am #

is it possible to reframe it first and then scale it?

Reply
- James Carmichael February 18, 2024 at 10:42 am #
  
  Hi Polad…The following resource may be of interest to you:
  
  https://machinelearningmastery.com/standardscaler-and-minmaxscaler-transforms-in-python/
  
  Reply
shadow_x March 4, 2024 at 6:02 pm #

Hello Dr.Jason，Are there any examples of using LSTM to implement multi-step forecasting for multiple time steps?

Reply
- James Carmichael March 5, 2024 at 10:35 am #
  
  Hi shadow_x…The following resource is a great place to start:
  
  https://machinelearningmastery.com/deep-learning-for-time-series-forecasting/
  
  Reply
Rohit March 7, 2024 at 5:47 pm #

Please tell me is it okay to fit_transform all the values including the both train & test.
# normalize features
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)

Reply
kt April 11, 2024 at 2:17 am #

hi Jason,

I tried to run the following code: model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
and was met with the following error: AttributeError: module ‘keras.src.backend’ has no attribute ‘Variable’

I have keras 3.2.0 and tensorflow 2.16.1 installed, and my keras.json looks like this:
{
“floatx”: “float32”,
“epsilon”: 1e-7,
“backend”: “tensorflow”,
“image_data_format”: “channels_last”
}

I tried searching online for ways to debug the above attribute error, and was only able to find 1 similar post on Stack Overflow about this https://stackoverflow.com/questions/78173150/attributeerror-module-keras-src-backend-has-no-attribute-variable-with-drop; it seems like I need to change my backend configuration to ‘jax’. However, to my understanding the backend config used here is tensorflow? is there any other way I can curb this attribute error?

thank you.

Reply
- James Carmichael April 11, 2024 at 7:51 am #
  
  Hi Kt…The error you’re encountering seems to suggest an unusual problem with your TensorFlow and Keras setup. Typically, the TensorFlow Keras API (tensorflow.keras) should not be attempting to call anything under keras.src.backend. This kind of error might indicate that there’s an issue with mixed imports between standalone Keras and TensorFlow’s integrated Keras.
  
  Reply
kt April 12, 2024 at 12:48 am #

hi James, does this mean that I need to uninstall the keras package? and I only need to have the tensorflow package installed? I tried to uninstall keras but am still facing the same error..

Reply
New_Learner April 12, 2024 at 1:02 am #

Hello Jason and James,

Firstly, thanks for such a great tutorial!

I have a different type of question. I would like to work on multivariate time series forecasting, however, I have two different data frames and two different timeframes. For example, the first data frame contains 4 features, with 2880 values for each feature, and it has values hourly. The second data frame contains 1 feature for 720 values, and it is daily data. My aim is to use both of these data frames to make a prediction of one of the features of the first data frame by using LSTM. How can I do that? If you could help me, I would appreciate it.

Best 🙂

Reply
kt April 12, 2024 at 1:03 am #

oh nevermind, it works for me now, thanks so much for your help Jason 🙂

Reply

Navigation

Multivariate Time Series Forecasting with LSTMs in Keras

Tutorial Overview

Python Environment

Need help with Deep Learning for Time Series?

1. Air Pollution Forecasting

2. Basic Data Preparation

3. Multivariate LSTM Forecast Model

LSTM Data Preparation

Define and Fit Model

Evaluate Model

Complete Example

Train On Multiple Lag Timesteps Example

Further Reading

Summary

Develop Deep Learning models for Time Series Today!

Develop Your Own Forecasting models in Minutes

Finally Bring Deep Learning to your Time Series Forecasting Projects

More On This Topic

2,751 Responses to Multivariate Time Series Forecasting with LSTMs in Keras

Leave a Reply Click here to cancel reply.