Feature Selection for Time Series Forecasting with Python

By Jason Brownlee on September 16, 2020 in Time Series 107

The use of machine learning methods on time series data requires feature engineering.

A univariate time series dataset is only comprised of a sequence of observations. These must be transformed into input and output features in order to use supervised learning algorithms.

The problem is that there is little limit to the type and number of features you can engineer for a time series problem. Classical time series analysis tools like the correlogram can help with evaluating lag variables, but do not directly help when selecting other types of features, such as those derived from the timestamps (year, month or day) and moving statistics, like a moving average.

In this tutorial, you will discover how you can use the machine learning tools of feature importance and feature selection when working with time series data.

After completing this tutorial, you will know:

How to create and interpret a correlogram of lagged observations.
How to calculate and interpret feature importance scores for time series features.
How to perform feature selection on time series input variables.

Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Updated Apr/2019: Updated the link to dataset.
Updated Jun/2019: Fixed indenting.
Updated Aug/2019: Updated data loading to use new API.
Updated Sep/2020: Updated code to match changes to the API.

Tutorial Overview

This tutorial is broken down into the following 5 steps:

Monthly Car Sales Dataset: That describes the dataset we will be working with.
Make Stationary: That describes how to make the dataset stationary for analysis and forecasting.
Autocorrelation Plot: That describes how to create a correlogram of the time series data.
Feature Importance of Lag Variables: That describes how to calculate and review feature importance scores for time series data.
Feature Selection of Lag Variables: That describes how to calculate and review feature selection results for time series data.

Let’s start off by looking at a standard time series dataset.

Stop learning Time Series Forecasting the slow way!

Take my free 7-day email course and discover how to get started (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Monthly Car Sales Dataset

In this tutorial, we will use the Monthly Car Sales dataset.

This dataset describes the number of car sales in Quebec, Canada between 1960 and 1968.

The units are a count of the number of sales and there are 108 observations. The source data is credited to Abraham and Ledolter (1983).

Download the dataset

Download the dataset and save it into your current working directory with the filename “car-sales.csv“. Note, you may need to delete the footer information from the file.

The code below loads the dataset as a Pandas Series object.

# line plot of time series
from pandas import read_csv
from matplotlib import pyplot
# load dataset
series = read_csv('car-sales.csv', header=0, index_col=0)
# display first few rows
print(series.head(5))
# line plot of dataset
series.plot()
pyplot.show()

# line plot of time series

from pandas import read_csv

from matplotlib import pyplot

# load dataset

series = read_csv('car-sales.csv', header=0, index_col=0)

# display first few rows

print(series.head(5))

# line plot of dataset

series.plot()

pyplot.show()

Running the example prints the first 5 rows of data.

Month
1960-01-01 6550
1960-02-01 8728
1960-03-01 12026
1960-04-01 14395
1960-05-01 14587
Name: Sales, dtype: int64

Month

1960-01-01 6550

1960-02-01 8728

1960-03-01 12026

1960-04-01 14395

1960-05-01 14587

Name: Sales, dtype: int64

A line plot of the data is also provided.

Monthly Car Sales Dataset Line Plot

Make Stationary

We can see a clear seasonality and increasing trend in the data.

The trend and seasonality are fixed components that can be added to any prediction we make. They are useful, but need to be removed in order to explore any other systematic signals that can help make predictions.

A time series with seasonality and trend removed is called stationary.

To remove the seasonality, we can take the seasonal difference, resulting in a so-called seasonally adjusted time series.

The period of the seasonality appears to be one year (12 months). The code below calculates the seasonally adjusted time series and saves it to the file “seasonally-adjusted.csv“.

# seasonally adjust the time series
from pandas import read_csv
from matplotlib import pyplot
# load dataset
series = read_csv('car-sales.csv', header=0, index_col=0)
# seasonal difference
differenced = series.diff(12)
# trim off the first year of empty data
differenced = differenced[12:]
# save differenced dataset to file
differenced.to_csv('seasonally_adjusted.csv', index=False)
# plot differenced dataset
differenced.plot()
pyplot.show()

# seasonally adjust the time series

from pandas import read_csv

from matplotlib import pyplot

# load dataset

series = read_csv('car-sales.csv', header=0, index_col=0)

# seasonal difference

differenced = series.diff(12)

# trim off the first year of empty data

differenced = differenced[12:]

# save differenced dataset to file

differenced.to_csv('seasonally_adjusted.csv', index=False)

# plot differenced dataset

differenced.plot()

pyplot.show()

Because the first 12 months of data have no prior data to be differenced against, they must be discarded.

The stationary data is stored in “seasonally-adjusted.csv“. A line plot of the differenced data is created.

Seasonally Differenced Monthly Car Sales Dataset Line Plot

The plot suggests that the seasonality and trend information was removed by differencing.

Autocorrelation Plot

Traditionally, time series features are selected based on their correlation with the output variable.

This is called autocorrelation and involves plotting autocorrelation plots, also called a correlogram. These show the correlation of each lagged observation and whether or not the correlation is statistically significant.

For example, the code below plots the correlogram for all lag variables in the Monthly Car Sales dataset.

from pandas import read_csv
from statsmodels.graphics.tsaplots import plot_acf
from matplotlib import pyplot
series = read_csv('seasonally_adjusted.csv', header=0)
plot_acf(series)
pyplot.show()

from pandas import read_csv

from statsmodels.graphics.tsaplots import plot_acf

from matplotlib import pyplot

series = read_csv('seasonally_adjusted.csv', header=0)

plot_acf(series)

pyplot.show()

Running the example creates a correlogram, or Autocorrelation Function (ACF) plot, of the data.

The plot shows lag values along the x-axis and correlation on the y-axis between -1 and 1 for negatively and positively correlated lags respectively.

The dots above the blue area indicate statistical significance. The correlation of 1 for the lag value of 0 indicates 100% positive correlation of an observation with itself.

The plot shows significant lag values at 1, 2, 12, and 17 months.

Correlogram of the Monthly Car Sales Dataset

This analysis provides a good baseline for comparison.

Time Series to Supervised Learning

We can convert the univariate Monthly Car Sales dataset into a supervised learning problem by taking the lag observation (e.g. t-1) as inputs and using the current observation (t) as the output variable.

We can do this in Pandas using the shift function to create new columns of shifted observations.

The example below creates a new time series with 12 months of lag values to predict the current observation.

The shift of 12 months means that the first 12 rows of data are unusable as they contain NaN values.

from pandas import read_csv
from pandas import DataFrame
# load dataset
series = read_csv('seasonally_adjusted.csv', header=0)
# reframe as supervised learning
dataframe = DataFrame()
for i in range(12,0,-1):
	dataframe['t-'+str(i)] = series.shift(i).values[:,0]
dataframe['t'] = series.values[:,0]
print(dataframe.head(13))
dataframe = dataframe[13:]
# save to new file
dataframe.to_csv('lags_12months_features.csv', index=False)

from pandas import read_csv

from pandas import DataFrame

# load dataset

series = read_csv('seasonally_adjusted.csv', header=0)

# reframe as supervised learning

dataframe = DataFrame()

for i in range(12,0,-1):

dataframe['t-'+str(i)] = series.shift(i).values[:,0]

dataframe['t'] = series.values[:,0]

print(dataframe.head(13))

dataframe = dataframe[13:]

# save to new file

dataframe.to_csv('lags_12months_features.csv', index=False)

Running the example prints the first 13 rows of data showing the unusable first 12 rows and the usable 13th row.

             t-12   t-11   t-10    t-9     t-8     t-7     t-6     t-5  \
1961-01-01    NaN    NaN    NaN    NaN     NaN     NaN     NaN     NaN
1961-02-01    NaN    NaN    NaN    NaN     NaN     NaN     NaN     NaN
1961-03-01    NaN    NaN    NaN    NaN     NaN     NaN     NaN     NaN
1961-04-01    NaN    NaN    NaN    NaN     NaN     NaN     NaN     NaN
1961-05-01    NaN    NaN    NaN    NaN     NaN     NaN     NaN     NaN
1961-06-01    NaN    NaN    NaN    NaN     NaN     NaN     NaN   687.0
1961-07-01    NaN    NaN    NaN    NaN     NaN     NaN   687.0   646.0
1961-08-01    NaN    NaN    NaN    NaN     NaN   687.0   646.0  -189.0
1961-09-01    NaN    NaN    NaN    NaN   687.0   646.0  -189.0  -611.0
1961-10-01    NaN    NaN    NaN  687.0   646.0  -189.0  -611.0  1339.0
1961-11-01    NaN    NaN  687.0  646.0  -189.0  -611.0  1339.0    30.0
1961-12-01    NaN  687.0  646.0 -189.0  -611.0  1339.0    30.0  1645.0
1962-01-01  687.0  646.0 -189.0 -611.0  1339.0    30.0  1645.0  -276.0

               t-4     t-3     t-2     t-1       t
1961-01-01     NaN     NaN     NaN     NaN   687.0
1961-02-01     NaN     NaN     NaN   687.0   646.0
1961-03-01     NaN     NaN   687.0   646.0  -189.0
1961-04-01     NaN   687.0   646.0  -189.0  -611.0
1961-05-01   687.0   646.0  -189.0  -611.0  1339.0
1961-06-01   646.0  -189.0  -611.0  1339.0    30.0
1961-07-01  -189.0  -611.0  1339.0    30.0  1645.0
1961-08-01  -611.0  1339.0    30.0  1645.0  -276.0
1961-09-01  1339.0    30.0  1645.0  -276.0   561.0
1961-10-01    30.0  1645.0  -276.0   561.0   470.0
1961-11-01  1645.0  -276.0   561.0   470.0  3395.0
1961-12-01  -276.0   561.0   470.0  3395.0   360.0
1962-01-01   561.0   470.0  3395.0   360.0  3440.0

t-12 t-11 t-10 t-9 t-8 t-7 t-6 t-5 \

1961-01-01 NaN NaN NaN NaN NaN NaN NaN NaN

1961-02-01 NaN NaN NaN NaN NaN NaN NaN NaN

1961-03-01 NaN NaN NaN NaN NaN NaN NaN NaN

1961-04-01 NaN NaN NaN NaN NaN NaN NaN NaN

1961-05-01 NaN NaN NaN NaN NaN NaN NaN NaN

1961-06-01 NaN NaN NaN NaN NaN NaN NaN 687.0

1961-07-01 NaN NaN NaN NaN NaN NaN 687.0 646.0

1961-08-01 NaN NaN NaN NaN NaN 687.0 646.0 -189.0

1961-09-01 NaN NaN NaN NaN 687.0 646.0 -189.0 -611.0

1961-10-01 NaN NaN NaN 687.0 646.0 -189.0 -611.0 1339.0

1961-11-01 NaN NaN 687.0 646.0 -189.0 -611.0 1339.0 30.0

1961-12-01 NaN 687.0 646.0 -189.0 -611.0 1339.0 30.0 1645.0

1962-01-01 687.0 646.0 -189.0 -611.0 1339.0 30.0 1645.0 -276.0

t-4 t-3 t-2 t-1 t

1961-01-01 NaN NaN NaN NaN 687.0

1961-02-01 NaN NaN NaN 687.0 646.0

1961-03-01 NaN NaN 687.0 646.0 -189.0

1961-04-01 NaN 687.0 646.0 -189.0 -611.0

1961-05-01 687.0 646.0 -189.0 -611.0 1339.0

1961-06-01 646.0 -189.0 -611.0 1339.0 30.0

1961-07-01 -189.0 -611.0 1339.0 30.0 1645.0

1961-08-01 -611.0 1339.0 30.0 1645.0 -276.0

1961-09-01 1339.0 30.0 1645.0 -276.0 561.0

1961-10-01 30.0 1645.0 -276.0 561.0 470.0

1961-11-01 1645.0 -276.0 561.0 470.0 3395.0

1961-12-01 -276.0 561.0 470.0 3395.0 360.0

1962-01-01 561.0 470.0 3395.0 360.0 3440.0

The first 12 rows are removed from the new dataset and results are saved in the file “lags_12months_features.csv“.

This process can be repeated with an arbitrary number of time steps, such as 6 months or 24 months, and I would recommend experimenting.

Feature Importance of Lag Variables

Ensembles of decision trees, like bagged trees, random forest, and extra trees, can be used to calculate a feature importance score.

This is common in machine learning to estimate the relative usefulness of input features when developing predictive models.

We can use feature importance to help to estimate the relative importance of contrived input features for time series forecasting.

This is important because we can contrive not only the lag observation features above, but also features based on the timestamp of observations, rolling statistics, and much more. Feature importance is one method to help sort out what might be more useful in when modeling.

The example below loads the supervised learning view of the dataset created in the previous section, fits a random forest model (RandomForestRegressor), and summarizes the relative feature importance scores for each of the 12 lag observations.

A large-ish number of trees is used to ensure the scores are somewhat stable. Additionally, the random number seed is initialized to ensure that the same result is achieved each time the code is run.

from pandas import read_csv
from sklearn.ensemble import RandomForestRegressor
from matplotlib import pyplot
# load data
dataframe = read_csv('lags_12months_features.csv', header=0)
array = dataframe.values
# split into input and output
X = array[:,0:-1]
y = array[:,-1]
# fit random forest model
model = RandomForestRegressor(n_estimators=500, random_state=1)
model.fit(X, y)
# show importance scores
print(model.feature_importances_)
# plot importance scores
names = dataframe.columns.values[0:-1]
ticks = [i for i in range(len(names))]
pyplot.bar(ticks, model.feature_importances_)
pyplot.xticks(ticks, names)
pyplot.show()

from pandas import read_csv

from sklearn.ensemble import RandomForestRegressor

from matplotlib import pyplot

# load data

dataframe = read_csv('lags_12months_features.csv', header=0)

array = dataframe.values

# split into input and output

X = array[:,0:-1]

y = array[:,-1]

# fit random forest model

model = RandomForestRegressor(n_estimators=500, random_state=1)

model.fit(X, y)

# show importance scores

print(model.feature_importances_)

# plot importance scores

names = dataframe.columns.values[0:-1]

ticks = [i for i in range(len(names))]

pyplot.bar(ticks, model.feature_importances_)

pyplot.xticks(ticks, names)

pyplot.show()

Running the example first prints the importance scores of the lagged observations.

[ 0.21642244  0.06271259  0.05662302  0.05543768  0.07155573  0.08478599
  0.07699371  0.05366735  0.1033234   0.04897883  0.1066669   0.06283236]

1 2	[ 0.21642244 0.06271259 0.05662302 0.05543768 0.07155573 0.08478599 0.07699371 0.05366735 0.1033234 0.04897883 0.1066669 0.06283236]

The scores are then plotted as a bar graph.

The plot shows the high relative importance of the observation at t-12 and, to a lesser degree, the importance of observations at t-2 and t-4.

It is interesting to note a difference with the outcome from the correlogram above.

Bar Graph of Feature Importance Scores on the Monthly Car Sales Dataset

This process can be repeated with different methods that can calculate importance scores, such as gradient boosting, extra trees, and bagged decision trees.

Feature Selection of Lag Variables

We can also use feature selection to automatically identify and select those input features that are most predictive.

A popular method for feature selection is called Recursive Feature Selection (RFE).

RFE works by creating predictive models, weighting features, and pruning those with the smallest weights, then repeating the process until a desired number of features are left.

The example below uses RFE with a random forest predictive model and sets the desired number of input features to 4.

from pandas import read_csv
from sklearn.feature_selection import RFE
from sklearn.ensemble import RandomForestRegressor
from matplotlib import pyplot
# load dataset
dataframe = read_csv('lags_12months_features.csv', header=0)
# separate into input and output variables
array = dataframe.values
X = array[:,0:-1]
y = array[:,-1]
# perform feature selection
rfe = RFE(RandomForestRegressor(n_estimators=500, random_state=1), n_features_to_select=4)
fit = rfe.fit(X, y)
# report selected features
print('Selected Features:')
names = dataframe.columns.values[0:-1]
for i in range(len(fit.support_)):
	if fit.support_[i]:
		print(names[i])
# plot feature rank
names = dataframe.columns.values[0:-1]
ticks = [i for i in range(len(names))]
pyplot.bar(ticks, fit.ranking_)
pyplot.xticks(ticks, names)
pyplot.show()

from pandas import read_csv

from sklearn.feature_selection import RFE

from sklearn.ensemble import RandomForestRegressor

from matplotlib import pyplot

# load dataset

dataframe = read_csv('lags_12months_features.csv', header=0)

# separate into input and output variables

array = dataframe.values

X = array[:,0:-1]

y = array[:,-1]

# perform feature selection

rfe = RFE(RandomForestRegressor(n_estimators=500, random_state=1), n_features_to_select=4)

fit = rfe.fit(X, y)

# report selected features

print('Selected Features:')

names = dataframe.columns.values[0:-1]

for i in range(len(fit.support_)):

if fit.support_[i]:

print(names[i])

# plot feature rank

names = dataframe.columns.values[0:-1]

ticks = [i for i in range(len(names))]

pyplot.bar(ticks, fit.ranking_)

pyplot.xticks(ticks, names)

pyplot.show()

Running the example prints the names of the 4 selected features.

Unsurprisingly, the results match features that showed a high importance in the previous section.

Selected Features:
t-12
t-6
t-4
t-2

Selected Features:

t-12

t-6

t-4

t-2

A bar graph is also created showing the feature selection rank (smaller is better) for each input feature.

Bar Graph of Feature Selection Rank on the Monthly Car Sales Dataset

This process can be repeated with different numbers of features to select more than 4 and different models other than random forest.

Summary

In this tutorial, you discovered how to use the tools of applied machine learning to help select features from time series data when forecasting.

Specifically, you learned:

How to interpret a correlogram for highly correlated lagged observations.
How to calculate and review feature importance scores in time series data.
How to use feature selection to identify the most relevant input variables in time series data.

Do you have any questions about feature selection with time series data?
Ask your questions in the comments and I will do my best to answer.

107 Responses to Feature Selection for Time Series Forecasting with Python

Andrewcz March 29, 2017 at 5:33 pm #

Hi Jason big fan! I was wondering if you are going to a series on multivariate array time series forecasting.
Many thanks,
Best,
Andrew

Reply
- Jason Brownlee March 30, 2017 at 8:48 am #
  
  Yes, I hope to cover this soon Andrew.
  
  Reply
  - Tiwalade Usman September 3, 2021 at 9:50 pm #
    
    Please have you done this? Feature Selection for multivariate Time Series Forecasting
    
    Reply
    - Jason Brownlee September 4, 2021 at 5:21 am #
      
      I don’t have a tutorial on this topic, sorry.
      
      Reply
Benson Dube April 2, 2017 at 6:13 am #

Hello Jason,

Many thanks for this blog. I will be so Interested to see how the multivariate Time Series Forecast is dealt with.

Keep up the good works,

Best Regards

Ben

Reply
- Jason Brownlee April 2, 2017 at 6:33 am #
  
  Thanks Ben, I hope to cover multivariate time series soon.
  
  Reply
  - Emmanuel June 30, 2019 at 11:11 pm #
    
    Hello, great work, Looking Forward to this…do you have an estimate of how soon
    
    Reply
    - Jason Brownlee July 1, 2019 at 6:34 am #
      
      Yes, I have many examples here:
      https://machinelearningmastery.com/start-here/#deep_learning_time_series
      
      Reply
Kélian April 13, 2017 at 2:05 am #

Hello Jason,

I wondered about your choice to keep only the last 12 lags for the feature importance and feature selection study.

Because i understand the correlogram showed you should push the study until the 17 lag (correlogram showed 1, 2, 12, and 17 lags are correlated to current state)

I m I right?

Thanks for your work!

Reply
- Jason Brownlee April 13, 2017 at 10:07 am #
  
  Yes, I kept it short for brevity.
  
  Reply
Mehrdad May 26, 2017 at 5:18 am #

The output of this lines
‘plot_acf(series)’
‘pyplot.show()’
is not like yours. It just shows an straight line.
May you please check it.
Thanks

Reply
- Merlin June 1, 2017 at 8:58 pm #
  
  Yeah, the plot_acf thing is not working properly.
  
  Reply
  - Jason Brownlee June 2, 2017 at 12:57 pm #
    
    What problem do you see exactly?
    
    What version of statsmodels are you using?
    
    Reply
- Jason Brownlee June 2, 2017 at 11:50 am #
  
  I can confirm the example, please check that you have all of the code and the same source data.
  
  Reply
  - porter October 27, 2017 at 4:11 am #
    
    I had a similar issue. It is due to the footer if you do not delete in the data set or drop the last row in the series after import.
    
    Reply
Ralph Li June 30, 2017 at 6:09 pm #

Hello Jason!

Can you recommend some references about recursive feature selection and random forest on feature selection for time series?

Thanks!

Reply
- Jason Brownlee July 1, 2017 at 6:29 am #
  
  No. My best advice: try it, get results and use them in developing better models.
  
  Reply
Saurav Sharma July 27, 2017 at 2:38 am #

Hi Jason!

I am still unable to understand the importance of lag variable?

Is lag applied to a feature variable to find correlation with the target variable?

Thanks!

Reply
- Jason Brownlee July 27, 2017 at 8:11 am #
  
  A lag is a past observation, an observation at a prior time step.
  
  We can use these as input features to learning models. So abstractly we can predict today based on what happened yesterday.
  
  Yesterday’s ob is a lag variable.
  
  Does that help?
  
  Reply
Mert August 26, 2017 at 6:43 pm #

Dear Jason,
I am trying to run your code above with X size of (358,168) and test y (358,24), and having error “ValueError: bad input shape (358, 24)”. I would like to find the most relevant 12 features from 168 features in X(358,168) depending on 24 output of y(358,24)

My y matrix has 24 output instead of 1. What might be the reason for the error?

X = array[:,0:168]
y = array[:,168:192]
rfe = RFE(RandomForestRegressor(n_estimators=500, random_state=1), 12)
fit = rfe.fit(X, y)

Reply
- Jason Brownlee August 27, 2017 at 5:48 am #
  
  That might be too many output variables, most algorithms expect a single output variable in sklearn.
  
  I can’t think of any that support multiple, but I could be wrong.
  
  You might like to explore a neural network model instead?
  
  Reply
  - Mert August 28, 2017 at 10:49 am #
    
    Thanks for your comment Jason.
    Actually, what I would like to do is determining the most relevant feature with RFE, then training a neural network model with this features. Do you think it is a reasonable approach?
    For the multiple output error, I will run RFE for each output instead of 24 one by one.
    
    Reply
    - Jason Brownlee August 29, 2017 at 5:00 pm #
      
      You could try it and it would make sense if there is one highly predictive feature, but I would encourage you to test many configurations.
      
      Reply
Orry October 9, 2017 at 9:59 pm #

Thanks for the great tutorial.

I was wondering if you could explain the logic of why ACF might show some lags as statistically significant, while feature selection might show totally different lags as having predictive power.

Reply
- Jason Brownlee October 10, 2017 at 7:44 am #
  
  Different operate under different assumptions and in turn, produce differing results. This is to be expected.
  
  Reply
lingxiao November 15, 2017 at 1:10 pm #

hello Jason,

Thank you for the post loved it!

I’m a bit confused about the following:

“This is important because we can contrive not only the lag observation features above, but also features based on the timestamp of observations, rolling statistics, and much more.”

Would it make sense for me to add “month” to the set of features(“X”) if I have removed the seasonality from the time series already? Also, about the “much more” part, does stationarity still mean anything if I add extra features to “X”?

If it is not a problem, why do we require the data to be stationary in the first place?
If it is a problem, how do we make sure that the data is still stationary after we add extra features to “X”?

Reply
- Jason Brownlee November 16, 2017 at 10:25 am #
  
  Yes, but you can also explore non-linear methods that offer more flexibility when it comes to stationarity requirements.
  
  Reply
Ali November 17, 2017 at 8:16 pm #

Great tutorial! I have moderate experience with time series data. I am into detecting the most important features for a time series financial data for a binary classification task. And I have about 400 features (many of them highly correlated after I make the data stationary). How could I apply the method you show above? Getting the let’s say 10 previous days for each feature? Or do you have other suggestions?

Thanks in advance!

Reply
- Jason Brownlee November 18, 2017 at 10:15 am #
  
  I would recommend exploring a suite of approaches and see what features result in the best model skill.
  
  Reply
Francisco January 11, 2018 at 2:45 am #

Hi Jason,

This is great! How would you go about feature selection for time series using LSTM/keras. In that case, there won’t be a need to deconstruct the time series into the different lag variables from t to t-12.

I’m currently working on a time series problem with multiple predictors. I need to know which predictors are important. Is the process the same as what you would do here or can I use a randomforest’s importance feature?

Thanks!

Reply
- Jason Brownlee January 11, 2018 at 5:53 am #
  
  Good question.
  
  There may be specialised methods, but I’m not across them right now – perhaps do a little research.
  
  I’d suggest grid searching models across different subsets of features to see what is important/results in better model skill. Basically an RFE approach.
  
  Reply
MLbeginner96 March 25, 2018 at 12:43 am #

Hi Jason,

I’m assuming we can extend this feature importance and selection beyond lag variables:
– “temporal/seasonal features” such is hour of day,month of year etc
– external variables that depend on the problem
– rolling features such as min, max and mean of value of temperature in this case over past n days for example

Essentially the features you provided in link below, we can then perfrom feature importance and selection, would you agree?

https://machinelearningmastery.com/basic-feature-engineering-time-series-data-python/

Reply
- Jason Brownlee March 25, 2018 at 6:32 am #
  
  Sure. I don’t have a lot of material on multivaraite time series though, I hope to cover it more in the future.
  
  Reply
  - MLbeginner96 April 3, 2018 at 9:37 pm #
    
    Am i right in saying the process of feature selection/importance/etc occurs AFTER fitting the model to the training data?
    
    Reply
    - Jason Brownlee April 4, 2018 at 6:12 am #
      
      Features should be chosen prior to fitting a model.
      
      Note though that the process of working through is iterative. Lots of looping back to prior steps.
      
      Reply
otw June 14, 2018 at 5:38 am #

the observations in your training data are not iid. Do you think it is ok for your model?

Reply
- Jason Brownlee June 14, 2018 at 6:13 am #
  
  Making the series stationary removes the time dependence.
  
  Reply
Leonildo August 21, 2018 at 8:25 am #

RandomForestRegressor does bootstrap. Would not this be data leakage considering that the example is a time series?

Reply
- Jason Brownlee August 21, 2018 at 2:15 pm #
  
  How so?
  
  Reply
  - Leonildo August 22, 2018 at 12:34 am #
    
    Here’s a better explanation:
    
    https://stats.stackexchange.com/questions/25706/how-do-you-do-bootstrapping-with-time-series-data
    
    Reply
Vishal August 31, 2018 at 1:00 am #

Hi, Jason

I am using RandomForest for forecasting rainfall variable. I have around 15 predictors with 50 years data. When I am predicting rainfall values based on the predictors (variables), I am getting very low values as compared to original rainfalls. I mean, I am totally missing extreme values. Please suggest.

Regards,
Vishu

Reply
- Jason Brownlee August 31, 2018 at 8:14 am #
  
  I have some suggestions here:
  https://machinelearningmastery.com/machine-learning-performance-improvement-cheat-sheet/
  
  Reply
zb September 4, 2018 at 5:31 pm #

Hi Jason,

Thanks for the blog. I learned a lot thanks to you.

I’m looking for a method of selecting variables for time series like the RFE. But after reading this new post (https://machinelearningmastery.com/how-to-predict-whether-eyes-are-open-or-closed-using-brain-waves/), I have doubts about whether it is possible to apply a method that uses bootstrap.

I think that when using RFE, the evaluation of the models does not respect the temporal ordering of the observations, as it happens in your post about how to predict whether eyes are open or closed and that uses the future information for the selection of variables. What do you think? Thanks!!

Regards

Reply
- Jason Brownlee September 5, 2018 at 6:29 am #
  
  It is a challenge. You could try classical feature selection methods, like RFE and correlation, knowing there is bias, then build models from the suggestions and compare the performance to using all features.
  
  Reply
Hamza September 15, 2018 at 5:06 am #

Hi Jason,
Many thanks for this blog.

I use Simple Linear Regression in Sklearn.
I have this error (could not convert to float: ‘(TOP (S (S (NP *’)

I think it’s ncessary to encod categorical data !!!

But, my dateset is for natural language processing (data from conll-2012).
I use another algorithm that accepts string variables or there are an other solution?

Reply
- Jason Brownlee September 15, 2018 at 6:19 am #
  
  I explain how to work with text data here:
  https://machinelearningmastery.com/start-here/#nlp
  
  Reply
  - Hamza September 15, 2018 at 9:32 am #
    
    Thank you 🙂
    
    Reply
Hossein October 18, 2018 at 4:35 am #

Hi Jason,

Thank you for your great tutorials.

Unfortunately, I got a problem running the code. The result of the code on my computer is exactly the same as yours till Autocorrelation Plot. At Autocorrelation Plot, my result just shows a straight line at zero.
The next, there is an error as follows.

runfile(‘C:/Users/Hossein/.spyder-py3/temp.py’, wdir=’C:/Users/Hossein/.spyder-py3′)
Traceback (most recent call last):

File “C:\Users\Hossein\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py”, line 2862, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)

File “”, line 1, in
runfile(‘C:/Users/Hossein/.spyder-py3/temp.py’, wdir=’C:/Users/Hossein/.spyder-py3′)

File “C:\Users\Hossein\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 705, in runfile
execfile(filename, namespace)

File “C:\Users\Hossein\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 102, in execfile
exec(compile(f.read(), filename, ‘exec’), namespace)

File “C:/Users/Hossein/.spyder-py3/temp.py”, line 48
dataframe[‘t-‘+str(i)] = series.shift(i)
^
IndentationError: expected an indented block

Reply
- Jason Brownlee October 18, 2018 at 6:38 am #
  
  Looks like you did not copy the code with the indenting, here’s how to copy code:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-copy-code-from-a-tutorial
  
  Also, I recommend running code from the command line:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-run-a-script-from-the-command-line
  
  Reply
  - Hossein October 18, 2018 at 8:47 am #
    
    Thank you for your prompt response.
    
    Unfortunately, I still have the same problem. Even I tried your code on https://repl.it and it showed the same error.
    
    dataframe[‘t-‘+str(i)] = series.shift(i)
    ^
    IndentationError: expected an indented block
    
    Reply
    - Jason Brownlee October 18, 2018 at 2:32 pm #
      
      Perhaps try copy-pasting the code again and indenting it manually in your text editor?
      
      Reply
      - miguel dias June 5, 2019 at 8:47 pm #
        
        Looks to me that when you yourself pasted the code it did not properly indent, because the box doesn’t show any kind of indentation (might also be a problem to do with the website or the browser, do you see any indentation on the code box?).
        
        thanks for the tutorial, good stuff
      - Jason Brownlee June 6, 2019 at 6:27 am #
        
        You’re right, I have added indenting to the example.
        
        Sorry about that.
- Hossein October 18, 2018 at 2:15 pm #
  
  I figured it out, finally.
  
  The autocorrelation plot doesn’t show since there are two “nan”s at the end of series.
  add “series=series[1:-2]” after reading the following line.
  series = Series.from_csv(‘seasonally_adjusted.csv’, header=None)
  
  Another comment regarding the error in Time Series to Supervised Learning.
  
  the code needs a space just after “for” loop as follows:
  
  for i in range(12,0,-1):
  dataframe[‘t-‘+str(i)] = series.shift(i)
  
  Reply
hk November 24, 2018 at 11:54 pm #

Codes don’t work. I get length of values does not match length of index, when you creating the dataframe with the shifted columns. I don’t know how could you produce the results with this code.

Reply
- Jason Brownlee November 25, 2018 at 6:56 am #
  
  Sorry to hear that, I have some suggestions for you here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Sabine January 8, 2019 at 9:52 pm #

Hi Jason,
I’m struggling a bit to understand the feature importance and selection results. Specifically, how is it possible for lag t-12 to have such a high impact in predicting the time series after having removed the seasonality of 12 month in the differencing step before?

Reply
- Jason Brownlee January 9, 2019 at 8:43 am #
  
  Perhaps the seasonal correction did not remove all of the seasonal structure.
  
  Reply
Rima January 9, 2019 at 9:57 pm #

Hello Jason,
Thanks for the article!
The time series I have is daily data of 4 years and 10 months.
I am actually implementing SARIMAX for my time series data and I am including several exogenous variables.
I actually did the feature selection you explained above on the exogenous variables and also on 10 lags.
I included in my exogenous variables the mean of my time series over the year (so 364 value where each value represents the mean over 4 years).
The feature selection method above gave 0.9 importance for the mean_values and very low values for other exogenous variables and lags. and on the other hand the SARIMAX I implemented also didn’t enhance my RMSE (relatively to the RMSE obtained if the predicted value is the mean value).

So to resume, my model does not perform any better than the mean. What should I do in your opinion?

Thank you!

Reply
- Jason Brownlee January 10, 2019 at 7:50 am #
  
  I would encourage you to only include exog variables if they lift the skill of the model.
  
  Reply
Yaqian January 28, 2019 at 3:52 pm #

Hi Jason, thanks for the useful article! Is there a tutorial explain how to select features from multi-variate time series forecast?

Reply
- Jason Brownlee January 29, 2019 at 6:08 am #
  
  I don’t have a tutorial on this topic.
  
  Reply
  - Kaushal February 25, 2019 at 7:24 pm #
    
    Hi Jason, Nice article,
    
    I would like to ask one general question regarding using Time series model like Arima or Arimax. When we do first or second difference of the time series data to remove trend and seasonality from that time series, do we have to pass trend or seasonality order in model like arimax ( ts, order= (p,d,q) , seasoanlity= c(P,D,Q) )? or if we pass order (trend & seasonality ) we don’t have to take difference of input time series ?
    
    Reply
    - Jason Brownlee February 26, 2019 at 6:16 am #
      
      If you are using an ARIMA or SARIMA model, you can let the model difference the series for you using the appropriate order parameters.
      
      Reply
yoonji April 5, 2019 at 1:27 pm #

Hello. Jason
The same code was copied and executed, but an error appeared. How do I handle it?

my error : Input contains NaN, infinity or a value too large for dtype(‘float32’).

Reply
- Jason Brownlee April 5, 2019 at 2:01 pm #
  
  I’m sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
yoonji April 9, 2019 at 2:51 pm #

Hello, Jason.
I’m getting a lot of help through your blog.
I have q question for feature selection.

What can I do if I want to use the selected lag variables as input variables for the LSTM model?
I really wondering how can I used selected feature for LSTM.

I’ll be waiting for the reply.

Thank you!

Reply
- Jason Brownlee April 10, 2019 at 6:08 am #
  
  You can use them, I’m not sure I understand the problem you’re having?
  
  Perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
  - yoonji April 10, 2019 at 12:19 pm #
    
    Thank you for your reply.
    Looking at the above results, t-12, t-6, t-4, t-2 were selected as a variable.
    So, this 4 feature should be used as variables in the LSTM model. right?
    or you mean, When I make the dataset using “series_to_supervised” function, Should I enter the number(12 or 6 or 4 or 2) to factor “n-in” and “n-out”
    
    or I just input variable “t-12″, t-6”, “t-4″,”t-2” to model?
    
    I don’t understand how can I utilize selected feature as a variable.
    
    Thank you.
    
    Reply
    - Jason Brownlee April 10, 2019 at 1:46 pm #
      
      I see. Generally we would not select lag obs as the time steps for the LSTM, instead we would provide all of the time steps and lear the LSTM learn what to use to make good predictions.
      
      Reply
      - yoonji April 10, 2019 at 2:53 pm #
        
        Oh, I get it.
        So… Why did we selected feature?
        What is the purpose of feature selection?
        You mean, if I will use LSTM model, It doesn’t necessary selected feature?
        
        I am really thank your advise.
      - Jason Brownlee April 11, 2019 at 6:28 am #
        
        It can be useful for linear models, and when developing static ML models (not LSTM).
Anusha April 24, 2019 at 10:01 pm #

Hi jason, thank you for your wonderful article. Could you please give us how you can do the same with multivariate time series? Does looping the above code for n number of features help?

Reply
- Jason Brownlee April 25, 2019 at 8:13 am #
  
  Thanks for the suggestion.
  
  Sorry, I don’t have an example of feature selection for multivariate time series.
  
  Reply
Aditya Mahajan May 29, 2019 at 5:19 pm #

a)So if we want to use
t-12
t-6
t-4
t-2 as features then we should use following.
model=ARIMA(endog=y(t),exog=[y(t-12),y(t-6),y(t-4),y(t-2)])

b)And if we want we want to use (t-2),(t-1) as features then will model_1,model_2 will give same results?.
model_1=ARIMA(endog=y(t),exog=[y(t-1),y(t-2)])
model_1=ARIMA(endog=y(t),order=(2,0,0))

Reply
ahbon December 18, 2019 at 6:04 pm #

Thank you for sharing. How can we sort features importance and show the important ratio?

Reply
- Jason Brownlee December 19, 2019 at 6:25 am #
  
  Sure.
  
  Reply
Camilla jensen March 24, 2020 at 7:48 am #

Dear Jason

I found Manu of your articles and Camps inspiring.

I was just wondering – when working with multivariate random forest forecasting with time Series – e.g. I want 12 lags of each predictor and the lags of the output variable as input variables in my model to predict the outcome. Agter training my model, I should make feature selection – here my thought is to use the variable importance plot/table with the Value %IncMSE for the random forest forecast to select the most importance variables, But my question is:
Can I just choose e.g. Lag 2 and 5 from predictionr x1 and 1, 2 and 10 from x2 and not the whole session of lags for each variable?

Hope my question make sense and you have time to answer me.

Reply
- Jason Brownlee March 24, 2020 at 8:01 am #
  
  It might be easier to include all of the lag obs and let the random forest decide what to use and what to ignore.
  
  Reply
  - Camilla Jensen March 24, 2020 at 7:00 pm #
    
    Okay thank you very much – Do you know any good literature about this decision/trade-off?
    
    Reply
    - Jason Brownlee March 25, 2020 at 6:29 am #
      
      Not really, I recommend running the experiment and comparing the results. A paper will not tell you to do that.
      
      Reply
Carina Jeppesen March 30, 2020 at 4:49 am #

Dear

So in this case you would only include t-12, t-6, t-4 and t-2 as predictors and not include all the lags from t-1 to t-12 ? In some of your other posted, I have understand that the optimal is to include all lags, and then let the Random Forest function decide what to use and what not to? Or is it this RFE you mean by that?

And do you have a link to R code for this ?

Thanks

Reply
- Jason Brownlee March 30, 2020 at 5:39 am #
  
  Generally, I recommend including the lags into an advanced model and let it choose what is useful.
  
  See here for R code:
  https://machinelearningmastery.com/books-on-time-series-forecasting-with-r/
  
  Reply
  - Carina Jeppesen March 30, 2020 at 6:58 am #
    
    Ja okay – THANK YOU
    
    When you say advanced Random Forest models is that like the one under the subtitle ‘extend caret’ in the link, where you after the training make the randomForest prediction?
    
    https://machinelearningmastery.com/tune-machine-learning-algorithms-in-r/
    
    Reply
    - Jason Brownlee March 30, 2020 at 7:53 am #
      
      This post shows how random forest works from scratch:
      https://machinelearningmastery.com/implement-random-forest-scratch-python/
      
      Reply
rose May 5, 2020 at 2:51 pm #

Hi, Jason.

this article would help me so much. it is a great article. However, I got “Cannot set a frame with no defined index and a value that cannot be converted to a Series” whenever I try to do the shift. for Time Series to Supervised Learning section. I’m new to python, hope you can clarify this and help me. Thank you!

Reply
- Jason Brownlee May 6, 2020 at 6:21 am #
  
  Sorry to hear that, this might help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Natasha September 16, 2020 at 3:34 am #
    
    Yes I did everything right and yet I am getting the same error! Hope to hear from you soon!
    
    ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series
    
    Reply
    - Jason Brownlee September 16, 2020 at 6:39 am #
      
      Thanks, I have updated the examples for the changes to the API.
      
      Let me know how you go.
      
      Reply
      - Natasha September 16, 2020 at 1:20 pm #
        
        Thank you so much! Working perfecty now!
      - Jason Brownlee September 17, 2020 at 6:40 am #
        
        You’re welcome, thanks for your patience!
Rani June 8, 2020 at 11:39 pm #

hi jason,

can you please help me how to predict next month value using this model as the model is trained on lag features..

thanks in advance

Reply
- Jason Brownlee June 9, 2020 at 6:02 am #
  
  Call the predict() function in order to make a prediction:
  https://machinelearningmastery.com/make-predictions-scikit-learn/
  
  Reply
Oliver Smith December 16, 2020 at 8:11 am #

Hi, am new to ML and am working on a forecast problem. For the features am using lag (1-7) and isweekend feature. The management team has less expertise in ML so they are asking why am I using only last 7 days to predict why not use all the past data to predict. Please help me understand this and give a prompt answer.

Reply
- Jason Brownlee December 16, 2020 at 1:36 pm #
  
  It’s a good question.
  
  I recommend testing different amounts of history in order to discover what works best for your specific dataset and model.
  
  Reply
Luigi January 8, 2021 at 10:00 pm #

Hi Jason,
thanks for the post. about RFE.
Since the number of features to keep is not always known in advance, would it make sense to use GridSearchCv with a list of values for the number of featurs, in order to optimise based on a scoring?

Also if the set is imbalanced, are you aware if RFE can correct a bit the difficulty (bias) of RandomForest to deal with imbalanced datasets?

Thanks
Luigi

Reply
- Jason Brownlee January 9, 2021 at 6:42 am #
  
  You’re welcome.
  
  Yes. or use rfecv directly.
  
  RFE will be fine as long as you use an appropriate metric for choosing the features.
  
  Reply
Deniz June 11, 2021 at 8:43 am #

Hi Jason thanks for nice work,

I have a question for you, if you answer it I would be really appreciated.
I have multivariate time series data that contains coffee prices and tea prices with weekly frequency and I have added lagged versions of each variable. After applied the steps as you explain for feature selection of lag variables, I have found most relevant lagged features for coffe as coffee_t_1, coffee_t_2, coffee_t_3 and, coffee_t_4 relevant and tea_t_1, tea_t_2 and some date_time_features

In next step I would like to make a forecast about the next couple weeks coffe prices by using random forest. I’m planning to give features coffee_t_1, coffee_t_2, coffee_t_3, coffee_t_4, tea , tea_t_1, tea_t_2 is this approach is valid for time series forecasting? is giving lagged feature variables for forecasting is kind of a cheating?

Thanks
Deniz

Reply
- Jason Brownlee June 12, 2021 at 5:22 am #
  
  You’re welcome.
  
  As long as the input to the model contains only data available at prediction time (nothing from the future), it should be fine.
  
  Reply
Sanket September 19, 2021 at 1:05 am #

Not getting why lag value 1 has low feature importance

Reply
- Adrian Tam September 19, 2021 at 6:11 am #
  
  That’s what the calculation tells. It is specific to this particular input data.
  
  Reply
Charles February 15, 2022 at 6:19 am #

Hello, thank you for the amazing tutorial as always. After the Time Series is changed to Supervised Learning, would ARMAX be suitable? Could we view the (T – X) features as exogenous? Or after the data is changed from univariate to Supervised Learning could you also just use Linear Regression?

Reply
- James Carmichael February 15, 2022 at 12:59 pm #
  
  Hi Charles…I would recommend that you start with ARIMA models and its variants. If such models satisfy your performance criterion, you may not need to move on to deep learning models such as CNN and LSTMs, however it would be beneficial to also try those model types for comparison if you have time.
  
  Reply
Carlos Abdalad February 22, 2022 at 9:38 am #

Dear @Jason Brownlee, thaks for the article. It was very inspiring…
But I was wondering if there is a way to select optimal lag of a feaure? I mean that lag that is most “correlated” to target, or that has the most predictive power to help understand the target variations (“leading indicator” as the Economist used to call). Or we have to manually create all the lagged feateures we understand that make sense, and than analyse a scaterplot of lagged feature x target, for a resonable amount of lag numbers, or even a modified ACF/PACF-like graph correlating lagged feature x (no lagged) target.

I refuse to admit that only the domain expert would be able to indicate, by its own experience, what would be the best lag of each feature, to be used as a “predictive” new feature.

Thanks in advance
Carlos Abdalad

Reply
- James Carmichael February 26, 2022 at 12:47 pm #
  
  Hi Carlos…I would recommend investigating Bayesian Optimization:
  
  https://machinelearningmastery.com/what-is-bayesian-optimization/
  
  Reply
vahid April 10, 2022 at 6:27 am #

Hello, thank you for the awesome tutorial .
I have a equation like this: ( combined from multivariate timeseries and Cross-sectional data)

Yt=Xq +X (p(t-1)) + Y(t-1)

Described:

Yt= (X1+X2+X3+⋯)+ ((X(1(t-1) )+X(1(t-2) )+X(1(t-3)+..) )+(X(2(t-1) )+X(2(t-2) )+X(2(t-3)+..) )+⋯)+ (Y(t-1)+Y(t-2)+..)

now i have 2 question:

In feature selection discussion, can we use Lasso or Ridge ? if no, which model can we use instead of Lasso or Ridge in this equation?

In prediction discussion, which algorithm and model can we use for this equation?

thanks alot

Reply
- James Carmichael April 10, 2022 at 7:38 am #
  
  Hi Vahid…The following may be of interest:
  
  https://machinelearningmastery.com/feature-selection-with-real-and-categorical-data/
  
  Reply

Navigation

Feature Selection for Time Series Forecasting with Python

Tutorial Overview

Stop learning Time Series Forecasting the slow way!

Monthly Car Sales Dataset

Make Stationary

Autocorrelation Plot

Time Series to Supervised Learning

Feature Importance of Lag Variables

Feature Selection of Lag Variables

Summary

Want to Develop Time Series Forecasts with Python?

Develop Your Own Forecasts in Minutes

Finally Bring Time Series Forecasting to
Your Own Projects

More On This Topic

107 Responses to Feature Selection for Time Series Forecasting with Python

Leave a Reply Click here to cancel reply.

Navigation

Tutorial Overview

Stop learning Time Series Forecasting the slow way!

Monthly Car Sales Dataset

Make Stationary

Autocorrelation Plot

Time Series to Supervised Learning

Feature Importance of Lag Variables

Feature Selection of Lag Variables

Summary

Want to Develop Time Series Forecasts with Python?

Develop Your Own Forecasts in Minutes

Finally Bring Time Series Forecasting to Your Own Projects

More On This Topic

107 Responses to Feature Selection for Time Series Forecasting with Python

Leave a Reply Click here to cancel reply.

Finally Bring Time Series Forecasting to
Your Own Projects