Feature Selection for Time Series Forecasting with Python

Last Updated on September 16, 2020

The use of machine learning methods on time series data requires feature engineering.

A univariate time series dataset is only comprised of a sequence of observations. These must be transformed into input and output features in order to use supervised learning algorithms.

The problem is that there is little limit to the type and number of features you can engineer for a time series problem. Classical time series analysis tools like the correlogram can help with evaluating lag variables, but do not directly help when selecting other types of features, such as those derived from the timestamps (year, month or day) and moving statistics, like a moving average.

In this tutorial, you will discover how you can use the machine learning tools of feature importance and feature selection when working with time series data.

After completing this tutorial, you will know:

  • How to create and interpret a correlogram of lagged observations.
  • How to calculate and interpret feature importance scores for time series features.
  • How to perform feature selection on time series input variables.

Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Updated Apr/2019: Updated the link to dataset.
  • Updated Jun/2019: Fixed indenting.
  • Updated Aug/2019: Updated data loading to use new API.
  • Updated Sep/2020: Updated code to match changes to the API.

Tutorial Overview

This tutorial is broken down into the following 5 steps:

  1. Monthly Car Sales Dataset: That describes the dataset we will be working with.
  2. Make Stationary: That describes how to make the dataset stationary for analysis and forecasting.
  3. Autocorrelation Plot: That describes how to create a correlogram of the time series data.
  4. Feature Importance of Lag Variables: That describes how to calculate and review feature importance scores for time series data.
  5. Feature Selection of Lag Variables: That describes how to calculate and review feature selection results for time series data.

Let’s start off by looking at a standard time series dataset.

Stop learning Time Series Forecasting the slow way!

Take my free 7-day email course and discover how to get started (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Monthly Car Sales Dataset

In this tutorial, we will use the Monthly Car Sales dataset.

This dataset describes the number of car sales in Quebec, Canada between 1960 and 1968.

The units are a count of the number of sales and there are 108 observations. The source data is credited to Abraham and Ledolter (1983).

Download the dataset and save it into your current working directory with the filename “car-sales.csv“. Note, you may need to delete the footer information from the file.

The code below loads the dataset as a Pandas Series object.

Running the example prints the first 5 rows of data.

A line plot of the data is also provided.

Monthly Car Sales Dataset Line Plot

Monthly Car Sales Dataset Line Plot

Make Stationary

We can see a clear seasonality and increasing trend in the data.

The trend and seasonality are fixed components that can be added to any prediction we make. They are useful, but need to be removed in order to explore any other systematic signals that can help make predictions.

A time series with seasonality and trend removed is called stationary.

To remove the seasonality, we can take the seasonal difference, resulting in a so-called seasonally adjusted time series.

The period of the seasonality appears to be one year (12 months). The code below calculates the seasonally adjusted time series and saves it to the file “seasonally-adjusted.csv“.

Because the first 12 months of data have no prior data to be differenced against, they must be discarded.

The stationary data is stored in “seasonally-adjusted.csv“. A line plot of the differenced data is created.

Seasonally Differenced Monthly Car Sales Dataset Line Plot

Seasonally Differenced Monthly Car Sales Dataset Line Plot

The plot suggests that the seasonality and trend information was removed by differencing.

Autocorrelation Plot

Traditionally, time series features are selected based on their correlation with the output variable.

This is called autocorrelation and involves plotting autocorrelation plots, also called a correlogram. These show the correlation of each lagged observation and whether or not the correlation is statistically significant.

For example, the code below plots the correlogram for all lag variables in the Monthly Car Sales dataset.

Running the example creates a correlogram, or Autocorrelation Function (ACF) plot, of the data.

The plot shows lag values along the x-axis and correlation on the y-axis between -1 and 1 for negatively and positively correlated lags respectively.

The dots above the blue area indicate statistical significance. The correlation of 1 for the lag value of 0 indicates 100% positive correlation of an observation with itself.

The plot shows significant lag values at 1, 2, 12, and 17 months.

Correlogram of the Monthly Car Sales Dataset

Correlogram of the Monthly Car Sales Dataset

This analysis provides a good baseline for comparison.

Time Series to Supervised Learning

We can convert the univariate Monthly Car Sales dataset into a supervised learning problem by taking the lag observation (e.g. t-1) as inputs and using the current observation (t) as the output variable.

We can do this in Pandas using the shift function to create new columns of shifted observations.

The example below creates a new time series with 12 months of lag values to predict the current observation.

The shift of 12 months means that the first 12 rows of data are unusable as they contain NaN values.

Running the example prints the first 13 rows of data showing the unusable first 12 rows and the usable 13th row.

The first 12 rows are removed from the new dataset and results are saved in the file “lags_12months_features.csv“.

This process can be repeated with an arbitrary number of time steps, such as 6 months or 24 months, and I would recommend experimenting.

Feature Importance of Lag Variables

Ensembles of decision trees, like bagged trees, random forest, and extra trees, can be used to calculate a feature importance score.

This is common in machine learning to estimate the relative usefulness of input features when developing predictive models.

We can use feature importance to help to estimate the relative importance of contrived input features for time series forecasting.

This is important because we can contrive not only the lag observation features above, but also features based on the timestamp of observations, rolling statistics, and much more. Feature importance is one method to help sort out what might be more useful in when modeling.

The example below loads the supervised learning view of the dataset created in the previous section, fits a random forest model (RandomForestRegressor), and summarizes the relative feature importance scores for each of the 12 lag observations.

A large-ish number of trees is used to ensure the scores are somewhat stable. Additionally, the random number seed is initialized to ensure that the same result is achieved each time the code is run.

Running the example first prints the importance scores of the lagged observations.

The scores are then plotted as a bar graph.

The plot shows the high relative importance of the observation at t-12 and, to a lesser degree, the importance of observations at t-2 and t-4.

It is interesting to note a difference with the outcome from the correlogram above.

Bar Graph of Feature Importance Scores on the Monthly Car Sales Dataset

Bar Graph of Feature Importance Scores on the Monthly Car Sales Dataset

This process can be repeated with different methods that can calculate importance scores, such as gradient boosting, extra trees, and bagged decision trees.

Feature Selection of Lag Variables

We can also use feature selection to automatically identify and select those input features that are most predictive.

A popular method for feature selection is called Recursive Feature Selection (RFE).

RFE works by creating predictive models, weighting features, and pruning those with the smallest weights, then repeating the process until a desired number of features are left.

The example below uses RFE with a random forest predictive model and sets the desired number of input features to 4.

Running the example prints the names of the 4 selected features.

Unsurprisingly, the results match features that showed a high importance in the previous section.

A bar graph is also created showing the feature selection rank (smaller is better) for each input feature.

Bar Graph of Feature Selection Rank on the Monthly Car Sales Dataset

Bar Graph of Feature Selection Rank on the Monthly Car Sales Dataset

This process can be repeated with different numbers of features to select more than 4 and different models other than random forest.


In this tutorial, you discovered how to use the tools of applied machine learning to help select features from time series data when forecasting.

Specifically, you learned:

  • How to interpret a correlogram for highly correlated lagged observations.
  • How to calculate and review feature importance scores in time series data.
  • How to use feature selection to identify the most relevant input variables in time series data.

Do you have any questions about feature selection with time series data?
Ask your questions in the comments and I will do my best to answer.

Want to Develop Time Series Forecasts with Python?

Introduction to Time Series Forecasting With Python

Develop Your Own Forecasts in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Introduction to Time Series Forecasting With Python

It covers self-study tutorials and end-to-end projects on topics like: Loading data, visualization, modeling, algorithm tuning, and much more...

Finally Bring Time Series Forecasting to
Your Own Projects

Skip the Academics. Just Results.

See What's Inside

107 Responses to Feature Selection for Time Series Forecasting with Python

  1. Avatar
    Andrewcz March 29, 2017 at 5:33 pm #

    Hi Jason big fan! I was wondering if you are going to a series on multivariate array time series forecasting.
    Many thanks,

    • Avatar
      Jason Brownlee March 30, 2017 at 8:48 am #

      Yes, I hope to cover this soon Andrew.

      • Avatar
        Tiwalade Usman September 3, 2021 at 9:50 pm #

        Please have you done this? Feature Selection for multivariate Time Series Forecasting

        • Avatar
          Jason Brownlee September 4, 2021 at 5:21 am #

          I don’t have a tutorial on this topic, sorry.

  2. Avatar
    Benson Dube April 2, 2017 at 6:13 am #

    Hello Jason,

    Many thanks for this blog. I will be so Interested to see how the multivariate Time Series Forecast is dealt with.

    Keep up the good works,

    Best Regards


  3. Avatar
    Kélian April 13, 2017 at 2:05 am #

    Hello Jason,

    I wondered about your choice to keep only the last 12 lags for the feature importance and feature selection study.

    Because i understand the correlogram showed you should push the study until the 17 lag (correlogram showed 1, 2, 12, and 17 lags are correlated to current state)

    I m I right?

    Thanks for your work!

  4. Avatar
    Mehrdad May 26, 2017 at 5:18 am #

    The output of this lines
    is not like yours. It just shows an straight line.
    May you please check it.

    • Avatar
      Merlin June 1, 2017 at 8:58 pm #

      Yeah, the plot_acf thing is not working properly.

      • Avatar
        Jason Brownlee June 2, 2017 at 12:57 pm #

        What problem do you see exactly?

        What version of statsmodels are you using?

    • Avatar
      Jason Brownlee June 2, 2017 at 11:50 am #

      I can confirm the example, please check that you have all of the code and the same source data.

      • Avatar
        porter October 27, 2017 at 4:11 am #

        I had a similar issue. It is due to the footer if you do not delete in the data set or drop the last row in the series after import.

  5. Avatar
    Ralph Li June 30, 2017 at 6:09 pm #

    Hello Jason!

    Can you recommend some references about recursive feature selection and random forest on feature selection for time series?


    • Avatar
      Jason Brownlee July 1, 2017 at 6:29 am #

      No. My best advice: try it, get results and use them in developing better models.

  6. Avatar
    Saurav Sharma July 27, 2017 at 2:38 am #

    Hi Jason!

    I am still unable to understand the importance of lag variable?

    Is lag applied to a feature variable to find correlation with the target variable?


    • Avatar
      Jason Brownlee July 27, 2017 at 8:11 am #

      A lag is a past observation, an observation at a prior time step.

      We can use these as input features to learning models. So abstractly we can predict today based on what happened yesterday.

      Yesterday’s ob is a lag variable.

      Does that help?

  7. Avatar
    Mert August 26, 2017 at 6:43 pm #

    Dear Jason,
    I am trying to run your code above with X size of (358,168) and test y (358,24), and having error “ValueError: bad input shape (358, 24)”. I would like to find the most relevant 12 features from 168 features in X(358,168) depending on 24 output of y(358,24)

    My y matrix has 24 output instead of 1. What might be the reason for the error?

    X = array[:,0:168]
    y = array[:,168:192]
    rfe = RFE(RandomForestRegressor(n_estimators=500, random_state=1), 12)
    fit = rfe.fit(X, y)

    • Avatar
      Jason Brownlee August 27, 2017 at 5:48 am #

      That might be too many output variables, most algorithms expect a single output variable in sklearn.

      I can’t think of any that support multiple, but I could be wrong.

      You might like to explore a neural network model instead?

      • Avatar
        Mert August 28, 2017 at 10:49 am #

        Thanks for your comment Jason.
        Actually, what I would like to do is determining the most relevant feature with RFE, then training a neural network model with this features. Do you think it is a reasonable approach?
        For the multiple output error, I will run RFE for each output instead of 24 one by one.

        • Avatar
          Jason Brownlee August 29, 2017 at 5:00 pm #

          You could try it and it would make sense if there is one highly predictive feature, but I would encourage you to test many configurations.

  8. Avatar
    Orry October 9, 2017 at 9:59 pm #

    Thanks for the great tutorial.

    I was wondering if you could explain the logic of why ACF might show some lags as statistically significant, while feature selection might show totally different lags as having predictive power.

    • Avatar
      Jason Brownlee October 10, 2017 at 7:44 am #

      Different operate under different assumptions and in turn, produce differing results. This is to be expected.

  9. Avatar
    lingxiao November 15, 2017 at 1:10 pm #

    hello Jason,

    Thank you for the post loved it!

    I’m a bit confused about the following:

    “This is important because we can contrive not only the lag observation features above, but also features based on the timestamp of observations, rolling statistics, and much more.”

    Would it make sense for me to add “month” to the set of features(“X”) if I have removed the seasonality from the time series already? Also, about the “much more” part, does stationarity still mean anything if I add extra features to “X”?

    If it is not a problem, why do we require the data to be stationary in the first place?
    If it is a problem, how do we make sure that the data is still stationary after we add extra features to “X”?

    • Avatar
      Jason Brownlee November 16, 2017 at 10:25 am #

      Yes, but you can also explore non-linear methods that offer more flexibility when it comes to stationarity requirements.

  10. Avatar
    Ali November 17, 2017 at 8:16 pm #

    Great tutorial! I have moderate experience with time series data. I am into detecting the most important features for a time series financial data for a binary classification task. And I have about 400 features (many of them highly correlated after I make the data stationary). How could I apply the method you show above? Getting the let’s say 10 previous days for each feature? Or do you have other suggestions?

    Thanks in advance!

    • Avatar
      Jason Brownlee November 18, 2017 at 10:15 am #

      I would recommend exploring a suite of approaches and see what features result in the best model skill.

  11. Avatar
    Francisco January 11, 2018 at 2:45 am #

    Hi Jason,

    This is great! How would you go about feature selection for time series using LSTM/keras. In that case, there won’t be a need to deconstruct the time series into the different lag variables from t to t-12.

    I’m currently working on a time series problem with multiple predictors. I need to know which predictors are important. Is the process the same as what you would do here or can I use a randomforest’s importance feature?


    • Avatar
      Jason Brownlee January 11, 2018 at 5:53 am #

      Good question.

      There may be specialised methods, but I’m not across them right now – perhaps do a little research.

      I’d suggest grid searching models across different subsets of features to see what is important/results in better model skill. Basically an RFE approach.

  12. Avatar
    MLbeginner96 March 25, 2018 at 12:43 am #

    Hi Jason,

    I’m assuming we can extend this feature importance and selection beyond lag variables:
    – “temporal/seasonal features” such is hour of day,month of year etc
    – external variables that depend on the problem
    – rolling features such as min, max and mean of value of temperature in this case over past n days for example

    Essentially the features you provided in link below, we can then perfrom feature importance and selection, would you agree?


    • Avatar
      Jason Brownlee March 25, 2018 at 6:32 am #

      Sure. I don’t have a lot of material on multivaraite time series though, I hope to cover it more in the future.

      • Avatar
        MLbeginner96 April 3, 2018 at 9:37 pm #

        Am i right in saying the process of feature selection/importance/etc occurs AFTER fitting the model to the training data?

        • Avatar
          Jason Brownlee April 4, 2018 at 6:12 am #

          Features should be chosen prior to fitting a model.

          Note though that the process of working through is iterative. Lots of looping back to prior steps.

  13. Avatar
    otw June 14, 2018 at 5:38 am #

    the observations in your training data are not iid. Do you think it is ok for your model?

    • Avatar
      Jason Brownlee June 14, 2018 at 6:13 am #

      Making the series stationary removes the time dependence.

  14. Avatar
    Leonildo August 21, 2018 at 8:25 am #

    RandomForestRegressor does bootstrap. Would not this be data leakage considering that the example is a time series?

  15. Avatar
    Vishal August 31, 2018 at 1:00 am #

    Hi, Jason

    I am using RandomForest for forecasting rainfall variable. I have around 15 predictors with 50 years data. When I am predicting rainfall values based on the predictors (variables), I am getting very low values as compared to original rainfalls. I mean, I am totally missing extreme values. Please suggest.


  16. Avatar
    zb September 4, 2018 at 5:31 pm #

    Hi Jason,

    Thanks for the blog. I learned a lot thanks to you.

    I’m looking for a method of selecting variables for time series like the RFE. But after reading this new post (https://machinelearningmastery.mystagingwebsite.com/how-to-predict-whether-eyes-are-open-or-closed-using-brain-waves/), I have doubts about whether it is possible to apply a method that uses bootstrap.

    I think that when using RFE, the evaluation of the models does not respect the temporal ordering of the observations, as it happens in your post about how to predict whether eyes are open or closed and that uses the future information for the selection of variables. What do you think? Thanks!!


    • Avatar
      Jason Brownlee September 5, 2018 at 6:29 am #

      It is a challenge. You could try classical feature selection methods, like RFE and correlation, knowing there is bias, then build models from the suggestions and compare the performance to using all features.

  17. Avatar
    Hamza September 15, 2018 at 5:06 am #

    Hi Jason,
    Many thanks for this blog.

    I use Simple Linear Regression in Sklearn.
    I have this error (could not convert to float: ‘(TOP (S (S (NP *’)

    I think it’s ncessary to encod categorical data !!!

    But, my dateset is for natural language processing (data from conll-2012).
    I use another algorithm that accepts string variables or there are an other solution?

  18. Avatar
    Hossein October 18, 2018 at 4:35 am #

    Hi Jason,

    Thank you for your great tutorials.

    Unfortunately, I got a problem running the code. The result of the code on my computer is exactly the same as yours till Autocorrelation Plot. At Autocorrelation Plot, my result just shows a straight line at zero.
    The next, there is an error as follows.

    runfile(‘C:/Users/Hossein/.spyder-py3/temp.py’, wdir=’C:/Users/Hossein/.spyder-py3′)
    Traceback (most recent call last):

    File “C:\Users\Hossein\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py”, line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

    File “”, line 1, in
    runfile(‘C:/Users/Hossein/.spyder-py3/temp.py’, wdir=’C:/Users/Hossein/.spyder-py3′)

    File “C:\Users\Hossein\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 705, in runfile
    execfile(filename, namespace)

    File “C:\Users\Hossein\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 102, in execfile
    exec(compile(f.read(), filename, ‘exec’), namespace)

    File “C:/Users/Hossein/.spyder-py3/temp.py”, line 48
    dataframe[‘t-‘+str(i)] = series.shift(i)
    IndentationError: expected an indented block

    • Avatar
      Jason Brownlee October 18, 2018 at 6:38 am #

      Looks like you did not copy the code with the indenting, here’s how to copy code:

      Also, I recommend running code from the command line:

      • Avatar
        Hossein October 18, 2018 at 8:47 am #

        Thank you for your prompt response.

        Unfortunately, I still have the same problem. Even I tried your code on https://repl.it and it showed the same error.

        dataframe[‘t-‘+str(i)] = series.shift(i)
        IndentationError: expected an indented block

        • Avatar
          Jason Brownlee October 18, 2018 at 2:32 pm #

          Perhaps try copy-pasting the code again and indenting it manually in your text editor?

          • Avatar
            miguel dias June 5, 2019 at 8:47 pm #

            Looks to me that when you yourself pasted the code it did not properly indent, because the box doesn’t show any kind of indentation (might also be a problem to do with the website or the browser, do you see any indentation on the code box?).

            thanks for the tutorial, good stuff

          • Avatar
            Jason Brownlee June 6, 2019 at 6:27 am #

            You’re right, I have added indenting to the example.

            Sorry about that.

    • Avatar
      Hossein October 18, 2018 at 2:15 pm #

      I figured it out, finally.

      The autocorrelation plot doesn’t show since there are two “nan”s at the end of series.
      add “series=series[1:-2]” after reading the following line.
      series = Series.from_csv(‘seasonally_adjusted.csv’, header=None)

      Another comment regarding the error in Time Series to Supervised Learning.

      the code needs a space just after “for” loop as follows:

      for i in range(12,0,-1):
      dataframe[‘t-‘+str(i)] = series.shift(i)

  19. Avatar
    hk November 24, 2018 at 11:54 pm #

    Codes don’t work. I get length of values does not match length of index, when you creating the dataframe with the shifted columns. I don’t know how could you produce the results with this code.

  20. Avatar
    Sabine January 8, 2019 at 9:52 pm #

    Hi Jason,
    I’m struggling a bit to understand the feature importance and selection results. Specifically, how is it possible for lag t-12 to have such a high impact in predicting the time series after having removed the seasonality of 12 month in the differencing step before?

    • Avatar
      Jason Brownlee January 9, 2019 at 8:43 am #

      Perhaps the seasonal correction did not remove all of the seasonal structure.

  21. Avatar
    Rima January 9, 2019 at 9:57 pm #

    Hello Jason,
    Thanks for the article!
    The time series I have is daily data of 4 years and 10 months.
    I am actually implementing SARIMAX for my time series data and I am including several exogenous variables.
    I actually did the feature selection you explained above on the exogenous variables and also on 10 lags.
    I included in my exogenous variables the mean of my time series over the year (so 364 value where each value represents the mean over 4 years).
    The feature selection method above gave 0.9 importance for the mean_values and very low values for other exogenous variables and lags. and on the other hand the SARIMAX I implemented also didn’t enhance my RMSE (relatively to the RMSE obtained if the predicted value is the mean value).

    So to resume, my model does not perform any better than the mean. What should I do in your opinion?

    Thank you!

    • Avatar
      Jason Brownlee January 10, 2019 at 7:50 am #

      I would encourage you to only include exog variables if they lift the skill of the model.

  22. Avatar
    Yaqian January 28, 2019 at 3:52 pm #

    Hi Jason, thanks for the useful article! Is there a tutorial explain how to select features from multi-variate time series forecast?

    • Avatar
      Jason Brownlee January 29, 2019 at 6:08 am #

      I don’t have a tutorial on this topic.

      • Avatar
        Kaushal February 25, 2019 at 7:24 pm #

        Hi Jason, Nice article,

        I would like to ask one general question regarding using Time series model like Arima or Arimax. When we do first or second difference of the time series data to remove trend and seasonality from that time series, do we have to pass trend or seasonality order in model like arimax ( ts, order= (p,d,q) , seasoanlity= c(P,D,Q) )? or if we pass order (trend & seasonality ) we don’t have to take difference of input time series ?

        • Avatar
          Jason Brownlee February 26, 2019 at 6:16 am #

          If you are using an ARIMA or SARIMA model, you can let the model difference the series for you using the appropriate order parameters.

  23. Avatar
    yoonji April 5, 2019 at 1:27 pm #

    Hello. Jason
    The same code was copied and executed, but an error appeared. How do I handle it?

    my error : Input contains NaN, infinity or a value too large for dtype(‘float32’).

  24. Avatar
    yoonji April 9, 2019 at 2:51 pm #

    Hello, Jason.
    I’m getting a lot of help through your blog.
    I have q question for feature selection.

    What can I do if I want to use the selected lag variables as input variables for the LSTM model?
    I really wondering how can I used selected feature for LSTM.

    I’ll be waiting for the reply.

    Thank you!

    • Avatar
      Jason Brownlee April 10, 2019 at 6:08 am #

      You can use them, I’m not sure I understand the problem you’re having?

      Perhaps this will help:

      • Avatar
        yoonji April 10, 2019 at 12:19 pm #

        Thank you for your reply.
        Looking at the above results, t-12, t-6, t-4, t-2 were selected as a variable.
        So, this 4 feature should be used as variables in the LSTM model. right?
        or you mean, When I make the dataset using “series_to_supervised” function, Should I enter the number(12 or 6 or 4 or 2) to factor “n-in” and “n-out”

        or I just input variable “t-12″, t-6”, “t-4″,”t-2” to model?

        I don’t understand how can I utilize selected feature as a variable.

        Thank you.

        • Avatar
          Jason Brownlee April 10, 2019 at 1:46 pm #

          I see. Generally we would not select lag obs as the time steps for the LSTM, instead we would provide all of the time steps and lear the LSTM learn what to use to make good predictions.

          • Avatar
            yoonji April 10, 2019 at 2:53 pm #

            Oh, I get it.
            So… Why did we selected feature?
            What is the purpose of feature selection?
            You mean, if I will use LSTM model, It doesn’t necessary selected feature?

            I am really thank your advise.

          • Avatar
            Jason Brownlee April 11, 2019 at 6:28 am #

            It can be useful for linear models, and when developing static ML models (not LSTM).

  25. Avatar
    Anusha April 24, 2019 at 10:01 pm #

    Hi jason, thank you for your wonderful article. Could you please give us how you can do the same with multivariate time series? Does looping the above code for n number of features help?

    • Avatar
      Jason Brownlee April 25, 2019 at 8:13 am #

      Thanks for the suggestion.

      Sorry, I don’t have an example of feature selection for multivariate time series.

  26. Avatar
    Aditya Mahajan May 29, 2019 at 5:19 pm #

    a)So if we want to use
    t-2 as features then we should use following.

    b)And if we want we want to use (t-2),(t-1) as features then will model_1,model_2 will give same results?.

  27. Avatar
    ahbon December 18, 2019 at 6:04 pm #

    Thank you for sharing. How can we sort features importance and show the important ratio?

  28. Avatar
    Camilla jensen March 24, 2020 at 7:48 am #

    Dear Jason

    I found Manu of your articles and Camps inspiring.

    I was just wondering – when working with multivariate random forest forecasting with time Series – e.g. I want 12 lags of each predictor and the lags of the output variable as input variables in my model to predict the outcome. Agter training my model, I should make feature selection – here my thought is to use the variable importance plot/table with the Value %IncMSE for the random forest forecast to select the most importance variables, But my question is:
    Can I just choose e.g. Lag 2 and 5 from predictionr x1 and 1, 2 and 10 from x2 and not the whole session of lags for each variable?

    Hope my question make sense and you have time to answer me.

    • Avatar
      Jason Brownlee March 24, 2020 at 8:01 am #

      It might be easier to include all of the lag obs and let the random forest decide what to use and what to ignore.

      • Avatar
        Camilla Jensen March 24, 2020 at 7:00 pm #

        Okay thank you very much – Do you know any good literature about this decision/trade-off?

        • Avatar
          Jason Brownlee March 25, 2020 at 6:29 am #

          Not really, I recommend running the experiment and comparing the results. A paper will not tell you to do that.

  29. Avatar
    Carina Jeppesen March 30, 2020 at 4:49 am #


    So in this case you would only include t-12, t-6, t-4 and t-2 as predictors and not include all the lags from t-1 to t-12 ? In some of your other posted, I have understand that the optimal is to include all lags, and then let the Random Forest function decide what to use and what not to? Or is it this RFE you mean by that?

    And do you have a link to R code for this ?


  30. Avatar
    rose May 5, 2020 at 2:51 pm #

    Hi, Jason.

    this article would help me so much. it is a great article. However, I got “Cannot set a frame with no defined index and a value that cannot be converted to a Series” whenever I try to do the shift. for Time Series to Supervised Learning section. I’m new to python, hope you can clarify this and help me. Thank you!

  31. Avatar
    Rani June 8, 2020 at 11:39 pm #

    hi jason,

    can you please help me how to predict next month value using this model as the model is trained on lag features..

    thanks in advance

  32. Avatar
    Oliver Smith December 16, 2020 at 8:11 am #

    Hi, am new to ML and am working on a forecast problem. For the features am using lag (1-7) and isweekend feature. The management team has less expertise in ML so they are asking why am I using only last 7 days to predict why not use all the past data to predict. Please help me understand this and give a prompt answer.

    • Avatar
      Jason Brownlee December 16, 2020 at 1:36 pm #

      It’s a good question.

      I recommend testing different amounts of history in order to discover what works best for your specific dataset and model.

  33. Avatar
    Luigi January 8, 2021 at 10:00 pm #

    Hi Jason,
    thanks for the post. about RFE.
    Since the number of features to keep is not always known in advance, would it make sense to use GridSearchCv with a list of values for the number of featurs, in order to optimise based on a scoring?

    Also if the set is imbalanced, are you aware if RFE can correct a bit the difficulty (bias) of RandomForest to deal with imbalanced datasets?


    • Avatar
      Jason Brownlee January 9, 2021 at 6:42 am #

      You’re welcome.

      Yes. or use rfecv directly.

      RFE will be fine as long as you use an appropriate metric for choosing the features.

  34. Avatar
    Deniz June 11, 2021 at 8:43 am #

    Hi Jason thanks for nice work,

    I have a question for you, if you answer it I would be really appreciated.
    I have multivariate time series data that contains coffee prices and tea prices with weekly frequency and I have added lagged versions of each variable. After applied the steps as you explain for feature selection of lag variables, I have found most relevant lagged features for coffe as coffee_t_1, coffee_t_2, coffee_t_3 and, coffee_t_4 relevant and tea_t_1, tea_t_2 and some date_time_features

    In next step I would like to make a forecast about the next couple weeks coffe prices by using random forest. I’m planning to give features coffee_t_1, coffee_t_2, coffee_t_3, coffee_t_4, tea , tea_t_1, tea_t_2 is this approach is valid for time series forecasting? is giving lagged feature variables for forecasting is kind of a cheating?


    • Avatar
      Jason Brownlee June 12, 2021 at 5:22 am #

      You’re welcome.

      As long as the input to the model contains only data available at prediction time (nothing from the future), it should be fine.

  35. Avatar
    Sanket September 19, 2021 at 1:05 am #

    Not getting why lag value 1 has low feature importance

    • Avatar
      Adrian Tam September 19, 2021 at 6:11 am #

      That’s what the calculation tells. It is specific to this particular input data.

  36. Avatar
    Charles February 15, 2022 at 6:19 am #

    Hello, thank you for the amazing tutorial as always. After the Time Series is changed to Supervised Learning, would ARMAX be suitable? Could we view the (T – X) features as exogenous? Or after the data is changed from univariate to Supervised Learning could you also just use Linear Regression?

    • Avatar
      James Carmichael February 15, 2022 at 12:59 pm #

      Hi Charles…I would recommend that you start with ARIMA models and its variants. If such models satisfy your performance criterion, you may not need to move on to deep learning models such as CNN and LSTMs, however it would be beneficial to also try those model types for comparison if you have time.

  37. Avatar
    Carlos Abdalad February 22, 2022 at 9:38 am #

    Dear @Jason Brownlee, thaks for the article. It was very inspiring…
    But I was wondering if there is a way to select optimal lag of a feaure? I mean that lag that is most “correlated” to target, or that has the most predictive power to help understand the target variations (“leading indicator” as the Economist used to call). Or we have to manually create all the lagged feateures we understand that make sense, and than analyse a scaterplot of lagged feature x target, for a resonable amount of lag numbers, or even a modified ACF/PACF-like graph correlating lagged feature x (no lagged) target.

    I refuse to admit that only the domain expert would be able to indicate, by its own experience, what would be the best lag of each feature, to be used as a “predictive” new feature.

    Thanks in advance
    Carlos Abdalad

  38. Avatar
    vahid April 10, 2022 at 6:27 am #

    Hello, thank you for the awesome tutorial .
    I have a equation like this: ( combined from multivariate timeseries and Cross-sectional data)

    Yt=Xq +X (p(t-1)) + Y(t-1)


    Yt= (X1+X2+X3+⋯)+ ((X(1(t-1) )+X(1(t-2) )+X(1(t-3)+..) )+(X(2(t-1) )+X(2(t-2) )+X(2(t-3)+..) )+⋯)+ (Y(t-1)+Y(t-2)+..)

    now i have 2 question:

    In feature selection discussion, can we use Lasso or Ridge ? if no, which model can we use instead of Lasso or Ridge in this equation?

    In prediction discussion, which algorithm and model can we use for this equation?

    thanks alot

Leave a Reply