The post How to Model Volatility with ARCH and GARCH for Time Series Forecasting in Python appeared first on Machine Learning Mastery.

]]>The ARCH or Autoregressive Conditional Heteroskedasticity method provides a way to model a change in variance in a time series that is time dependent, such as increasing or decreasing volatility. An extension of this approach named GARCH or Generalized Autoregressive Conditional Heteroskedasticity allows the method to support changes in the time dependent volatility, such as increasing and decreasing volatility in the same series.

In this tutorial, you will discover the ARCH and GARCH models for predicting the variance of a time series.

After completing this tutorial, you will know:

- The problem with variance in a time series and the need for ARCH and GARCH models.
- How to configure ARCH and GARCH models.
- How to implement ARCH and GARCH models in Python.

Discover how to prepare and visualize time series data and develop autoregressive forecasting models in my new book, with 28 step-by-step tutorials, and full python code.

Let’s get started.

This tutorial is divided into five parts; they are:

- Problem with Variance
- What Is an ARCH Model?
- What Is a GARCH Model?
- How to Configure ARCH and GARCH Models
- ARCH and GARCH Models in Python

Autoregressive models can be developed for univariate time series data that is stationary (AR), has a trend (ARIMA), and has a seasonal component (SARIMA).

One aspect of a univariate time series that these autoregressive models do not model is a change in the variance over time.

Classically, a time series with modest changes in variance can sometimes be adjusted using a power transform, such as by taking the Log or using a Box-Cox transform.

There are some time series where the variance changes consistently over time. In the context of a time series in the financial domain, this would be called increasing and decreasing volatility.

In time series where the variance is increasing in a systematic way, such as an increasing trend, this property of the series is called heteroskedasticity. It’s a fancy word from statistics that means changing or unequal variance across the series.

If the change in variance can be correlated over time, then it can be modeled using an autoregressive process, such as ARCH.

Autoregressive Conditional Heteroskedasticity, or ARCH, is a method that explicitly models the change in variance over time in a time series.

Specifically, an ARCH method models the variance at a time step as a function of the residual errors from a mean process (e.g. a zero mean).

The ARCH process introduced by Engle (1982) explicitly recognizes the difference between the unconditional and the conditional variance allowing the latter to change over time as a function of past errors.

— Generalized autoregressive conditional heteroskedasticity, 1986.

A lag parameter must be specified to define the number of prior residual errors to include in the model. Using the notation of the GARCH model (discussed later), we can refer to this parameter as “*q*“. Originally, this parameter was called “*p*“, and is also called “*p*” in the arch Python package used later in this tutorial.

**q**: The number of lag squared residual errors to include in the ARCH model.

A generally accepted notation for an ARCH model is to specify the ARCH() function with the q parameter ARCH(q); for example, ARCH(1) would be a first order ARCH model.

The approach expects the series is stationary, other than the change in variance, meaning it does not have a trend or seasonal component. An ARCH model is used to predict the variance at future time steps.

[ARCH] are mean zero, serially uncorrelated processes with nonconstant variances conditional on the past, but constant unconditional variances. For such processes, the recent past gives information about the one-period forecast variance.

In practice, this can be used to model the expected variance on the residuals after another autoregressive model has been used, such as an ARMA or similar.

The model should only be applied to a prewhitened residual series {e_t} that is uncorrelated and contains no trends or seasonal changes, such as might be obtained after fitting a satisfactory SARIMA model.

— Page 148, Introductory Time Series with R, 2009.

Generalized Autoregressive Conditional Heteroskedasticity, or GARCH, is an extension of the ARCH model that incorporates a moving average component together with the autoregressive component.

Specifically, the model includes lag variance terms (e.g. the observations if modeling the white noise residual errors of another process), together with lag residual errors from a mean process.

The introduction of a moving average component allows the model to both model the conditional change in variance over time as well as changes in the time-dependent variance. Examples include conditional increases and decreases in variance.

As such, the model introduces a new parameter “p” that describes the number of lag variance terms:

**p**: The number of lag variances to include in the GARCH model.**q**: The number of lag residual errors to include in the GARCH model.

A generally accepted notation for a GARCH model is to specify the GARCH() function with the *p* and *q* parameters GARCH(p, q); for example GARCH(1, 1) would be a first order GARCH model.

A GARCH model subsumes ARCH models, where a GARCH(0, q) is equivalent to an ARCH(q) model.

For p = 0 the process reduces to the ARCH(q) process, and for p = q = 0 E(t) is simply white noise. In the ARCH(q) process the conditional variance is specified as a linear function of past sample variances only, whereas the GARCH(p, q) process allows lagged conditional variances to enter as well. This corresponds to some sort of adaptive learning mechanism.

— Generalized autoregressive conditional heteroskedasticity, 1986.

As with ARCH, GARCH predicts the future variance and expects that the series is stationary, other than the change in variance, meaning it does not have a trend or seasonal component.

The configuration for an ARCH model is best understood in the context of ACF and PACF plots of the variance of the time series.

This can be achieved by subtracting the mean from each observation in the series and squaring the result, or just squaring the observation if you’re already working with white noise residuals from another model.

If a correlogram appears to be white noise […], then volatility ca be detected by looking at the correlogram of the squared values since the squared values are equivalent to the variance (provided the series is adjusted to have a mean of zero).

— Pages 146-147, Introductory Time Series with R, 2009.

The ACF and PACF plots can then be interpreted to estimate values for p and q, in a similar way as is done for the ARMA model.

For more information on how to do this, see the post:

In this section, we will look at how we can develop ARCH and GARCH models in Python using the arch library.

First, let’s prepare a dataset we can use for these examples.

We can create a dataset with a controlled model of variance.

The simplest case would be a series of random noise where the mean is zero and the variance starts at 0.0 and steadily increases.

We can achieve this in Python using the gauss() function that generates a Gaussian random number with the specified mean and standard deviation.

# create dataset data = [gauss(0, i*0.01) for i in range(1,100+1)]

We can plot the dataset to get an idea of how the linear change in variance looks. The complete example is listed below.

# create a simple white noise with increasing variance from random import gauss from random import seed from matplotlib import pyplot # seed pseudorandom number generator seed(1) # create dataset data = [gauss(0, i*0.01) for i in range(0,100)] # plot pyplot.plot(data) pyplot.show()

Running the example creates and plots the dataset. We can see the clear change in variance over the course of the series.

We know there is an autocorrelation in the variance of the contrived dataset.

Nevertheless, we can look at an autocorrelation plot to confirm this expectation. The complete example is listed below.

# check correlations of squared observations from random import gauss from random import seed from matplotlib import pyplot from statsmodels.graphics.tsaplots import plot_acf # seed pseudorandom number generator seed(1) # create dataset data = [gauss(0, i*0.01) for i in range(0,100)] # square the dataset squared_data = [x**2 for x in data] # create acf plot plot_acf(squared_data) pyplot.show()

Running the example creates an autocorrelation plot of the squared observations. We see significant positive correlation in variance out to perhaps 15 lag time steps.

This might make a reasonable value for the parameter in the ARCH model.

Developing an ARCH model involves three steps:

- Define the model
- Fit the model
- Make a forecast.

Before fitting and forecasting, we can split the dataset into a train and test set so that we can fit the model on the train and evaluate its performance on the test set.

# split into train/test n_test = 10 train, test = data[:-n_test], data[-n_test:]

A model can be defined by calling the arch_model() function. We can specify a model for the mean of the series: in this case *mean=’Zero’* is an appropriate model. We can then specify the model for the variance: in this case *vol=’ARCH’*. We can also specify the lag parameter for the ARCH model: in this case *p=15*.

Note, in the arch library, the names of *p* and *q* parameters for ARCH/GARCH have been reversed.

# define model model = arch_model(train, mean='Zero', vol='ARCH', p=15)

The model can be fit on the data by calling the fit() function. There are many options on this function, although the defaults are good enough for getting started. This will return a fit model.

# fit model model_fit = model.fit()

Finally, we can make a prediction by calling the forecast() function on the fit model. We can specify the horizon for the forecast.

In this case, we will predict the variance for the last 10 time steps of the dataset, and withhold them from the training of the model.

# forecast the test set yhat = model_fit.forecast(horizon=n_test)

We can tie all of this together; the complete example is listed below.

# example of ARCH model from random import gauss from random import seed from matplotlib import pyplot from arch import arch_model # seed pseudorandom number generator seed(1) # create dataset data = [gauss(0, i*0.01) for i in range(0,100)] # split into train/test n_test = 10 train, test = data[:-n_test], data[-n_test:] # define model model = arch_model(train, mean='Zero', vol='ARCH', p=15) # fit model model_fit = model.fit() # forecast the test set yhat = model_fit.forecast(horizon=n_test) # plot the actual variance var = [i*0.01 for i in range(0,100)] pyplot.plot(var[-n_test:]) # plot forecast variance pyplot.plot(yhat.variance.values[-1, :]) pyplot.show()

Running the example defines and fits the model then predicts the variance for the last 10 time steps of the dataset.

A line plot is created comparing the series of expected variance to the predicted variance. Although the model was not tuned, the predicted variance looks reasonable.

We can fit a GARCH model just as easily using the arch library.

The *arch_model()* function can specify a GARCH instead of ARCH model vol=’GARCH’ as well as the lag parameters for both.

# define model model = arch_model(train, mean='Zero', vol='GARCH', p=15, q=15)

The dataset may not be a good fit for a GARCH model given the linearly increasing variance, nevertheless, the complete example is listed below.

# example of ARCH model from random import gauss from random import seed from matplotlib import pyplot from arch import arch_model # seed pseudorandom number generator seed(1) # create dataset data = [gauss(0, i*0.01) for i in range(0,100)] # split into train/test n_test = 10 train, test = data[:-n_test], data[-n_test:] # define model model = arch_model(train, mean='Zero', vol='GARCH', p=15, q=15) # fit model model_fit = model.fit() # forecast the test set yhat = model_fit.forecast(horizon=n_test) # plot the actual variance var = [i*0.01 for i in range(0,100)] pyplot.plot(var[-n_test:]) # plot forecast variance pyplot.plot(yhat.variance.values[-1, :]) pyplot.show()

A plot of the expected and predicted variance is listed below.

This section provides more resources on the topic if you are looking to go deeper.

- Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation, 1982.
- Generalized autoregressive conditional heteroskedasticity, 1986.
- Chapter 7, Non-stationary Models, Introductory Time Series with R, 2009.

- Autoregressive conditional heteroskedasticity on Wikipedia
- Heteroscedasticity on Wikipedia
- What is the difference between GARCH and ARCH?

In this tutorial, you discovered the ARCH and GARCH models for predicting the variance of a time series.

Specifically, you learned:

- The problem with variance in a time series and the need for ARCH and GARCH models.
- How to configure ARCH and GARCH models.
- How to implement ARCH and GARCH models in Python.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post How to Model Volatility with ARCH and GARCH for Time Series Forecasting in Python appeared first on Machine Learning Mastery.

]]>The post A Gentle Introduction to Exponential Smoothing for Time Series Forecasting in Python appeared first on Machine Learning Mastery.

]]>It is a powerful forecasting method that may be used as an alternative to the popular Box-Jenkins ARIMA family of methods.

In this tutorial, you will discover the exponential smoothing method for univariate time series forecasting.

After completing this tutorial, you will know:

- What exponential smoothing is and how it is different from other forecasting methods.
- The three main types of exponential smoothing and how to configure them.
- How to implement exponential smoothing in Python.

Discover how to prepare and visualize time series data and develop autoregressive forecasting models in my new book, with 28 step-by-step tutorials, and full python code.

Let’s get started.

This tutorial is divided into 4 parts; they are:

- What Is Exponential Smoothing?
- Types of Exponential Smoothing
- How to Configure Exponential Smoothing
- Exponential Smoothing in Python

Exponential smoothing is a time series forecasting method for univariate data.

Time series methods like the Box-Jenkins ARIMA family of methods develop a model where the prediction is a weighted linear sum of recent past observations or lags.

Exponential smoothing forecasting methods are similar in that a prediction is a weighted sum of past observations, but the model explicitly uses an exponentially decreasing weight for past observations.

Specifically, past observations are weighted with a geometrically decreasing ratio.

Forecasts produced using exponential smoothing methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older. In other words, the more recent the observation the higher the associated weight.

— Page 171, Forecasting: principles and practice, 2013.

Exponential smoothing methods may be considered as peers and an alternative to the popular Box-Jenkins ARIMA class of methods for time series forecasting.

Collectively, the methods are sometimes referred to as ETS models, referring to the explicit modeling of Error, Trend and Seasonality.

There are three main types of exponential smoothing time series forecasting methods.

A simple method that assumes no systematic structure, an extension that explicitly handles trends, and the most advanced approach that add support for seasonality.

Single Exponential Smoothing, SES for short, also called Simple Exponential Smoothing, is a time series forecasting method for univariate data without a trend or seasonality.

It requires a single parameter, called *alpha* (*a*), also called the smoothing factor or smoothing coefficient.

This parameter controls the rate at which the influence of the observations at prior time steps decay exponentially. Alpha is often set to a value between 0 and 1. Large values mean that the model pays attention mainly to the most recent past observations, whereas smaller values mean more of the history is taken into account when making a prediction.

A value close to 1 indicates fast learning (that is, only the most recent values influence the forecasts), whereas a value close to 0 indicates slow learning (past observations have a large influence on forecasts).

— Page 89, Practical Time Series Forecasting with R, 2016.

Hyperparameters:

**Alpha**: Smoothing factor for the level.

Double Exponential Smoothing is an extension to Exponential Smoothing that explicitly adds support for trends in the univariate time series.

In addition to the *alpha* parameter for controlling smoothing factor for the level, an additional smoothing factor is added to control the decay of the influence of the change in trend called *beta* (*b*).

The method supports trends that change in different ways: an additive and a multiplicative, depending on whether the trend is linear or exponential respectively.

Double Exponential Smoothing with an additive trend is classically referred to as Holt’s linear trend model, named for the developer of the method Charles Holt.

**Additive Trend**: Double Exponential Smoothing with a linear trend.**Multiplicative Trend**: Double Exponential Smoothing with an exponential trend.

For longer range (multi-step) forecasts, the trend may continue on unrealistically. As such, it can be useful to dampen the trend over time.

Dampening means reducing the size of the trend over future time steps down to a straight line (no trend).

The forecasts generated by Holt’s linear method display a constant trend (increasing or decreasing) indecently into the future. Even more extreme are the forecasts generated by the exponential trend method […] Motivated by this observation […] introduced a parameter that “dampens” the trend to a flat line some time in the future.

— Page 183, Forecasting: principles and practice, 2013.

As with modeling the trend itself, we can use the same principles in dampening the trend, specifically additively or multiplicatively for a linear or exponential dampening effect. A damping coefficient *Phi* (*p*) is used to control the rate of dampening.

**Additive Dampening**: Dampen a trend linearly.**Multiplicative Dampening**: Dampen the trend exponentially.

Hyperparameters:

**Alpha**: Smoothing factor for the level.**Beta**: Smoothing factor for the trend.**Trend Type**: Additive or multiplicative.**Dampen Type**: Additive or multiplicative.**Phi**: Damping coefficient.

Triple Exponential Smoothing is an extension of Exponential Smoothing that explicitly adds support for seasonality to the univariate time series.

This method is sometimes called Holt-Winters Exponential Smoothing, named for two contributors to the method: Charles Holt and Peter Winters.

In addition to the alpha and beta smoothing factors, a new parameter is added called *gamma* (*g*) that controls the influence on the seasonal component.

As with the trend, the seasonality may be modeled as either an additive or multiplicative process for a linear or exponential change in the seasonality.

**Additive Seasonality**: Triple Exponential Smoothing with a linear seasonality.**Multiplicative Seasonality**: Triple Exponential Smoothing with an exponential seasonality.

Triple exponential smoothing is the most advanced variation of exponential smoothing and through configuration, it can also develop double and single exponential smoothing models.

Being an adaptive method, Holt-Winter’s exponential smoothing allows the level, trend and seasonality patterns to change over time.

— Page 95, Practical Time Series Forecasting with R, 2016.

Additionally, to ensure that the seasonality is modeled correctly, the number of time steps in a seasonal period (*Period*) must be specified. For example, if the series was monthly data and the seasonal period repeated each year, then the Period=12.

Hyperparameters:

**Alpha**: Smoothing factor for the level.**Beta**: Smoothing factor for the trend.**Gamma**: Smoothing factor for the seasonality.**Trend Type**: Additive or multiplicative.**Dampen Type**: Additive or multiplicative.**Phi**: Damping coefficient.**Seasonality Type**: Additive or multiplicative.**Period**: Time steps in seasonal period.

All of the model hyperparameters can be specified explicitly.

This can be challenging for experts and beginners alike.

Instead, it is common to use numerical optimization to search for and fund the smoothing coefficients (*alpha*, *beta*, *gamma*, and *phi*) for the model that result in the lowest error.

[…] a more robust and objective way to obtain values for the unknown parameters included in any exponential smoothing method is to estimate them from the observed data. […] the unknown parameters and the initial values for any exponential smoothing method can be estimated by minimizing the SSE [sum of the squared errors].

— Page 177, Forecasting: principles and practice, 2013.

The parameters that specify the type of change in the trend and seasonality, such as weather they are additive or multiplicative and whether they should be dampened, must be specified explicitly.

This section looks at how to implement exponential smoothing in Python.

The implementations of Exponential Smoothing in Python are provided in the Statsmodels Python library.

The implementations are based on the description of the method in Rob Hyndman and George Athanasopoulos’ excellent book “Forecasting: Principles and Practice,” 2013 and their R implementations in their “forecast” package.

Single Exponential Smoothing or simple smoothing can be implemented in Python via the SimpleExpSmoothing Statsmodels class.

First, an instance of the *SimpleExpSmoothing* class must be instantiated and passed the training data. The *fit()* function is then called providing the fit configuration, specifically the *alpha* value called *smoothing_level*. If this is not provided or set to *None*, the model will automatically optimize the value.

This *fit()* function returns an instance of the *HoltWintersResults* class that contains the learned coefficients. The *forecast()* or the *predict()* function on the result object can be called to make a forecast.

For example:

# single exponential smoothing ... from statsmodels.tsa.holtwinters import SimpleExpSmoothing # prepare data data = ... # create class model = SimpleExpSmoothing(data) # fit model model_fit = model.fit(...) # make prediction yhat = model_fit.predict(...)

Single, Double and Triple Exponential Smoothing can be implemented in Python using the ExponentialSmoothing Statsmodels class.

First, an instance of the ExponentialSmoothing class must be instantiated, specifying both the training data and some configuration for the model.

Specifically, you must specify the following configuration parameters:

**trend**: The type of trend component, as either “*add*” for additive or “*mul*” for multiplicative. Modeling the trend can be disabled by setting it to None.**damped**: Whether or not the trend component should be damped, either*True*or*False*.**seasonal**: The type of seasonal component, as either “*add*” for additive or “*mul*” for multiplicative. Modeling the seasonal component can be disabled by setting it to None.**seasonal_periods**: The number of time steps in a seasonal period, e.g. 12 for 12 months in a yearly seasonal structure (more here).

The model can then be fit on the training data by calling the *fit()* function.

This function allows you to either specify the smoothing coefficients of the exponential smoothing model or have them optimized. By default, they are optimized (e.g. *optimized=True*). These coefficients include:

**smoothing_level**(*alpha*): the smoothing coefficient for the level.**smoothing_slope**(*beta*): the smoothing coefficient for the trend.**smoothing_seasonal**(*gamma*): the smoothing coefficient for the seasonal component.**damping_slope**(*phi*): the coefficient for the damped trend.

Additionally, the fit function can perform basic data preparation prior to modeling; specifically:

**use_boxcox**: Whether or not to perform a power transform of the series (True/False) or specify the lambda for the transform.

The *fit()* function will return an instance of the *HoltWintersResults* class that contains the learned coefficients. The *forecast()* or the *predict()* function on the result object can be called to make a forecast.

# double or triple exponential smoothing ... from statsmodels.tsa.holtwinters import ExponentialSmoothing # prepare data data = ... # create class model = ExponentialSmoothing(data, ...) # fit model model_fit = model.fit(...) # make prediction yhat = model_fit.predict(...)

This section provides more resources on the topic if you are looking to go deeper.

- Chapter 7 Exponential smoothing, Forecasting: principles and practice, 2013.
- Section 6.4. Introduction to Time Series Analysis, Engineering Statistics Handbook, 2012.
- Practical Time Series Forecasting with R, 2016.

- Statsmodels Time Series analysis tsa
- statsmodels.tsa.holtwinters.SimpleExpSmoothing API
- statsmodels.tsa.holtwinters.ExponentialSmoothing API
- statsmodels.tsa.holtwinters.HoltWintersResults API
- forecast: Forecasting Functions for Time Series and Linear Models R package

In this tutorial, you discovered the exponential smoothing method for univariate time series forecasting.

Specifically, you learned:

- What exponential smoothing is and how it is different from other forecast methods.
- The three main types of exponential smoothing and how to configure them.
- How to implement exponential smoothing in Python.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post A Gentle Introduction to Exponential Smoothing for Time Series Forecasting in Python appeared first on Machine Learning Mastery.

]]>The post A Gentle Introduction to SARIMA for Time Series Forecasting in Python appeared first on Machine Learning Mastery.

]]>Although the method can handle data with a trend, it does not support time series with a seasonal component.

An extension to ARIMA that supports the direct modeling of the seasonal component of the series is called SARIMA.

In this tutorial, you will discover the Seasonal Autoregressive Integrated Moving Average, or SARIMA, method for time series forecasting with univariate data containing trends and seasonality.

After completing this tutorial, you will know:

- The limitations of ARIMA when it comes to seasonal data.
- The SARIMA extension of ARIMA that explicitly models the seasonal element in univariate data.
- How to implement the SARIMA method in Python using the Statsmodels library.

Discover how to prepare and visualize time series data and develop autoregressive forecasting models in my new book, with 28 step-by-step tutorials, and full python code.

Let’s get started.

**Update**: For help using and grid searching SARIMA hyperparameters, see this post:

This tutorial is divided into four parts; they are:

- What’s Wrong with ARIMA
- What Is SARIMA?
- How to Configure SARIMA
- How to use SARIMA in Python

Autoregressive Integrated Moving Average, or ARIMA, is a forecasting method for univariate time series data.

As its name suggests, it supports both an autoregressive and moving average elements. The integrated element refers to differencing allowing the method to support time series data with a trend.

A problem with ARIMA is that it does not support seasonal data. That is a time series with a repeating cycle.

ARIMA expects data that is either not seasonal or has the seasonal component removed, e.g. seasonally adjusted via methods such as seasonal differencing.

For more on ARIMA, see the post:

An alternative is to use SARIMA.

Seasonal Autoregressive Integrated Moving Average, SARIMA or Seasonal ARIMA, is an extension of ARIMA that explicitly supports univariate time series data with a seasonal component.

It adds three new hyperparameters to specify the autoregression (AR), differencing (I) and moving average (MA) for the seasonal component of the series, as well as an additional parameter for the period of the seasonality.

A seasonal ARIMA model is formed by including additional seasonal terms in the ARIMA […] The seasonal part of the model consists of terms that are very similar to the non-seasonal components of the model, but they involve backshifts of the seasonal period.

— Page 242, Forecasting: principles and practice, 2013.

Configuring a SARIMA requires selecting hyperparameters for both the trend and seasonal elements of the series.

There are three trend elements that require configuration.

They are the same as the ARIMA model; specifically:

**p**: Trend autoregression order.**d**: Trend difference order.**q**: Trend moving average order.

There are four seasonal elements that are not part of ARIMA that must be configured; they are:

**P**: Seasonal autoregressive order.**D**: Seasonal difference order.**Q**: Seasonal moving average order.**m**: The number of time steps for a single seasonal period.

Together, the notation for an SARIMA model is specified as:

SARIMA(p,d,q)(P,D,Q)m

Where the specifically chosen hyperparameters for a model are specified; for example:

SARIMA(3,1,0)(1,1,0)12

Importantly, the *m* parameter influences the *P*, *D*, and *Q* parameters. For example, an m of 12 for monthly data suggests a yearly seasonal cycle.

A *P*=1 would make use of the first seasonally offset observation in the model, e.g. t-(m*1) or t-12. A *P*=2, would use the last two seasonally offset observations t-(m * 1), t-(m * 2).

Similarly, a *D* of 1 would calculate a first order seasonal difference and a *Q*=1 would use a first order errors in the model (e.g. moving average).

A seasonal ARIMA model uses differencing at a lag equal to the number of seasons (s) to remove additive seasonal effects. As with lag 1 differencing to remove a trend, the lag s differencing introduces a moving average term. The seasonal ARIMA model includes autoregressive and moving average terms at lag s.

— Page 142, Introductory Time Series with R, 2009.

The trend elements can be chosen through careful analysis of ACF and PACF plots looking at the correlations of recent time steps (e.g. 1, 2, 3).

Similarly, ACF and PACF plots can be analyzed to specify values for the seasonal model by looking at correlation at seasonal lag time steps.

For more on interpreting ACF/PACF plots, see the post:

Seasonal ARIMA models can potentially have a large number of parameters and combinations of terms. Therefore, it is appropriate to try out a wide range of models when fitting to data and choose a best fitting model using an appropriate criterion …

— Pages 143-144, Introductory Time Series with R, 2009.

Alternately, a grid search can be used across the trend and seasonal hyperparameters.

For more on grid searching SARIMA parameters, see the post:

The SARIMA time series forecasting method is supported in Python via the Statsmodels library.

To use SARIMA there are three steps, they are:

- Define the model.
- Fit the defined model.
- Make a prediction with the fit model.

Let’s look at each step in turn.

An instance of the SARIMAX class can be created by providing the training data and a host of model configuration parameters.

# specify training data data = ... # define model model = SARIMAX(data, ...)

The implementation is called SARIMAX instead of SARIMA because the “X” addition to the method name means that the implementation also supports exogenous variables.

These are parallel time series variates that are not modeled directly via AR, I, or MA processes, but are made available as a weighted input to the model.

Exogenous variables are optional can be specified via the “*exog*” argument.

# specify training data data = ... # specify additional data other_data = ... # define model model = SARIMAX(data, exog=other_data, ...)

The trend and seasonal hyperparameters are specified as 3 and 4 element tuples respectively to the “*order*” and “*seasonal_order*” arguments.

These elements must be specified.

# specify training data data = ... # define model configuration my_order = (1, 1, 1) my_seasonal_order = (1, 1, 1, 12) # define model model = SARIMAX(data, order=my_order, seasonal_order=my_seasonal_order, ...)

These are the main configuration elements.

There are other fine tuning parameters you may want to configure. Learn more in the full API:

Once the model is created, it can be fit on the training data.

The model is fit by calling the fit() function.

Fitting the model returns an instance of the *SARIMAXResults* class. This object contains the details of the fit, such as the data and coefficients, as well as functions that can be used to make use of the model.

# specify training data data = ... # define model model = SARIMAX(data, order=..., seasonal_order=...) # fit model model_fit = model.fit()

Many elements of the fitting process can be configured, and it is worth reading the API to review these options once you are comfortable with the implementation.

Once fit, the model can be used to make a forecast.

A forecast can be made by calling the *forecast()* or the *predict()* functions on the *SARIMAXResults* object returned from calling fit.

The forecast() function takes a single parameter that specifies the number of out of sample time steps to forecast, or assumes a one step forecast if no arguments are provided.

# specify training data data = ... # define model model = SARIMAX(data, order=..., seasonal_order=...) # fit model model_fit = model.fit() # one step forecast yhat = model_fit.forecast()

The *predict()* function requires a start and end date or index to be specified.

Additionally, if exogenous variables were provided when defining the model, they too must be provided for the forecast period to the *predict()* function.

# specify training data data = ... # define model model = SARIMAX(data, order=..., seasonal_order=...) # fit model model_fit = model.fit() # one step forecast yhat = model_fit.predict(start=len(data), end=len(data))

This section provides more resources on the topic if you are looking to go deeper.

- How to Grid Search SARIMA Model Hyperparameters for Time Series Forecasting
- How to Create an ARIMA Model for Time Series Forecasting with Python
- How to Grid Search ARIMA Model Hyperparameters with Python
- A Gentle Introduction to Autocorrelation and Partial Autocorrelation

- Chapter 8 ARIMA models, Forecasting: principles and practice, 2013.
- Chapter 7, Non-stationary Models, Introductory Time Series with R, 2009.

- Statsmodels Time Series Analysis by State Space Methods
- statsmodels.tsa.statespace.sarimax.SARIMAX API
- statsmodels.tsa.statespace.sarimax.SARIMAXResults API
- Statsmodels SARIMAX Notebook

In this tutorial, you discovered the Seasonal Autoregressive Integrated Moving Average, or SARIMA, method for time series forecasting with univariate data containing trends and seasonality.

Specifically, you learned:

- The limitations of ARIMA when it comes to seasonal data.
- The SARIMA extension of ARIMA that explicitly models the seasonal element in univariate data.
- How to implement the SARIMA method in Python using the Statsmodels library.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post A Gentle Introduction to SARIMA for Time Series Forecasting in Python appeared first on Machine Learning Mastery.

]]>The post 11 Classical Time Series Forecasting Methods in Python (Cheat Sheet) appeared first on Machine Learning Mastery.

]]>Before exploring machine learning methods for time series, it is a good idea to ensure you have exhausted classical linear time series forecasting methods. Classical time series forecasting methods may be focused on linear relationships, nevertheless, they are sophisticated and perform well on a wide range of problems, assuming that your data is suitably prepared and the method is well configured.

In this post, will you will discover a suite of classical methods for time series forecasting that you can test on your forecasting problem prior to exploring to machine learning methods.

The post is structured as a cheat sheet to give you just enough information on each method to get started with a working code example and where to look to get more information on the method.

All code examples are in Python and use the Statsmodels library. The APIs for this library can be tricky for beginners (trust me!), so having a working code example as a starting point will greatly accelerate your progress.

This is a large post; you may want to bookmark it.

Let’s get started.

**Updated Apr/2020**: Changed AR to AutoReg due to API change.

This cheat sheet demonstrates 11 different classical time series forecasting methods; they are:

- Autoregression (AR)
- Moving Average (MA)
- Autoregressive Moving Average (ARMA)
- Autoregressive Integrated Moving Average (ARIMA)
- Seasonal Autoregressive Integrated Moving-Average (SARIMA)
- Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX)
- Vector Autoregression (VAR)
- Vector Autoregression Moving-Average (VARMA)
- Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX)
- Simple Exponential Smoothing (SES)
- Holt Winter’s Exponential Smoothing (HWES)

Did I miss your favorite classical time series forecasting method?

Let me know in the comments below.

Each method is presented in a consistent manner.

This includes:

**Description**. A short and precise description of the technique.**Python Code**. A short working example of fitting the model and making a prediction in Python.**More Information**. References for the API and the algorithm.

Each code example is demonstrated on a simple contrived dataset that may or may not be appropriate for the method. Replace the contrived dataset with your data in order to test the method.

Remember: each method will require tuning to your specific problem. In many cases, I have examples of how to configure and even grid search parameters on the blog already, try the search function.

If you find this cheat sheet useful, please let me know in the comments below.

The autoregression (AR) method models the next step in the sequence as a linear function of the observations at prior time steps.

The notation for the model involves specifying the order of the model p as a parameter to the AR function, e.g. AR(p). For example, AR(1) is a first-order autoregression model.

The method is suitable for univariate time series without trend and seasonal components.

# AR example from statsmodels.tsa.ar_model import AutoReg from random import random # contrived dataset data = [x + random() for x in range(1, 100)] # fit model model = AutoReg(data, lags=1) model_fit = model.fit() # make prediction yhat = model_fit.predict(len(data), len(data)) print(yhat)

- statsmodels.tsa.ar_model.AutoReg API
- statsmodels.tsa.ar_model.AutoRegResults API
- Autoregressive model on Wikipedia

The moving average (MA) method models the next step in the sequence as a linear function of the residual errors from a mean process at prior time steps.

A moving average model is different from calculating the moving average of the time series.

The notation for the model involves specifying the order of the model q as a parameter to the MA function, e.g. MA(q). For example, MA(1) is a first-order moving average model.

The method is suitable for univariate time series without trend and seasonal components.

We can use the ARMA class to create an MA model and setting a zeroth-order AR model. We must specify the order of the MA model in the order argument.

# MA example from statsmodels.tsa.arima_model import ARMA from random import random # contrived dataset data = [x + random() for x in range(1, 100)] # fit model model = ARMA(data, order=(0, 1)) model_fit = model.fit(disp=False) # make prediction yhat = model_fit.predict(len(data), len(data)) print(yhat)

- statsmodels.tsa.arima_model.ARMA API
- statsmodels.tsa.arima_model.ARMAResults API
- Moving-average model on Wikipedia

The Autoregressive Moving Average (ARMA) method models the next step in the sequence as a linear function of the observations and resiudal errors at prior time steps.

It combines both Autoregression (AR) and Moving Average (MA) models.

The notation for the model involves specifying the order for the AR(p) and MA(q) models as parameters to an ARMA function, e.g. ARMA(p, q). An ARIMA model can be used to develop AR or MA models.

The method is suitable for univariate time series without trend and seasonal components.

# ARMA example from statsmodels.tsa.arima_model import ARMA from random import random # contrived dataset data = [random() for x in range(1, 100)] # fit model model = ARMA(data, order=(2, 1)) model_fit = model.fit(disp=False) # make prediction yhat = model_fit.predict(len(data), len(data)) print(yhat)

- statsmodels.tsa.arima_model.ARMA API
- statsmodels.tsa.arima_model.ARMAResults API
- Autoregressive–moving-average model on Wikipedia

The Autoregressive Integrated Moving Average (ARIMA) method models the next step in the sequence as a linear function of the differenced observations and residual errors at prior time steps.

It combines both Autoregression (AR) and Moving Average (MA) models as well as a differencing pre-processing step of the sequence to make the sequence stationary, called integration (I).

The notation for the model involves specifying the order for the AR(p), I(d), and MA(q) models as parameters to an ARIMA function, e.g. ARIMA(p, d, q). An ARIMA model can also be used to develop AR, MA, and ARMA models.

The method is suitable for univariate time series with trend and without seasonal components.

# ARIMA example from statsmodels.tsa.arima_model import ARIMA from random import random # contrived dataset data = [x + random() for x in range(1, 100)] # fit model model = ARIMA(data, order=(1, 1, 1)) model_fit = model.fit(disp=False) # make prediction yhat = model_fit.predict(len(data), len(data), typ='levels') print(yhat)

- statsmodels.tsa.arima_model.ARIMA API
- statsmodels.tsa.arima_model.ARIMAResults API
- Autoregressive integrated moving average on Wikipedia

The Seasonal Autoregressive Integrated Moving Average (SARIMA) method models the next step in the sequence as a linear function of the differenced observations, errors, differenced seasonal observations, and seasonal errors at prior time steps.

It combines the ARIMA model with the ability to perform the same autoregression, differencing, and moving average modeling at the seasonal level.

The notation for the model involves specifying the order for the AR(p), I(d), and MA(q) models as parameters to an ARIMA function and AR(P), I(D), MA(Q) and m parameters at the seasonal level, e.g. SARIMA(p, d, q)(P, D, Q)m where “m” is the number of time steps in each season (the seasonal period). A SARIMA model can be used to develop AR, MA, ARMA and ARIMA models.

The method is suitable for univariate time series with trend and/or seasonal components.

# SARIMA example from statsmodels.tsa.statespace.sarimax import SARIMAX from random import random # contrived dataset data = [x + random() for x in range(1, 100)] # fit model model = SARIMAX(data, order=(1, 1, 1), seasonal_order=(1, 1, 1, 1)) model_fit = model.fit(disp=False) # make prediction yhat = model_fit.predict(len(data), len(data)) print(yhat)

- statsmodels.tsa.statespace.sarimax.SARIMAX API
- statsmodels.tsa.statespace.sarimax.SARIMAXResults API
- Autoregressive integrated moving average on Wikipedia

The Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX) is an extension of the SARIMA model that also includes the modeling of exogenous variables.

Exogenous variables are also called covariates and can be thought of as parallel input sequences that have observations at the same time steps as the original series. The primary series may be referred to as endogenous data to contrast it from the exogenous sequence(s). The observations for exogenous variables are included in the model directly at each time step and are not modeled in the same way as the primary endogenous sequence (e.g. as an AR, MA, etc. process).

The SARIMAX method can also be used to model the subsumed models with exogenous variables, such as ARX, MAX, ARMAX, and ARIMAX.

The method is suitable for univariate time series with trend and/or seasonal components and exogenous variables.

# SARIMAX example from statsmodels.tsa.statespace.sarimax import SARIMAX from random import random # contrived dataset data1 = [x + random() for x in range(1, 100)] data2 = [x + random() for x in range(101, 200)] # fit model model = SARIMAX(data1, exog=data2, order=(1, 1, 1), seasonal_order=(0, 0, 0, 0)) model_fit = model.fit(disp=False) # make prediction exog2 = [200 + random()] yhat = model_fit.predict(len(data1), len(data1), exog=[exog2]) print(yhat)

- statsmodels.tsa.statespace.sarimax.SARIMAX API
- statsmodels.tsa.statespace.sarimax.SARIMAXResults API
- Autoregressive integrated moving average on Wikipedia

The Vector Autoregression (VAR) method models the next step in each time series using an AR model. It is the generalization of AR to multiple parallel time series, e.g. multivariate time series.

The notation for the model involves specifying the order for the AR(p) model as parameters to a VAR function, e.g. VAR(p).

The method is suitable for multivariate time series without trend and seasonal components.

# VAR example from statsmodels.tsa.vector_ar.var_model import VAR from random import random # contrived dataset with dependency data = list() for i in range(100): v1 = i + random() v2 = v1 + random() row = [v1, v2] data.append(row) # fit model model = VAR(data) model_fit = model.fit() # make prediction yhat = model_fit.forecast(model_fit.y, steps=1) print(yhat)

- statsmodels.tsa.vector_ar.var_model.VAR API
- statsmodels.tsa.vector_ar.var_model.VARResults API
- Vector autoregression on Wikipedia

The Vector Autoregression Moving-Average (VARMA) method models the next step in each time series using an ARMA model. It is the generalization of ARMA to multiple parallel time series, e.g. multivariate time series.

The notation for the model involves specifying the order for the AR(p) and MA(q) models as parameters to a VARMA function, e.g. VARMA(p, q). A VARMA model can also be used to develop VAR or VMA models.

The method is suitable for multivariate time series without trend and seasonal components.

# VARMA example from statsmodels.tsa.statespace.varmax import VARMAX from random import random # contrived dataset with dependency data = list() for i in range(100): v1 = random() v2 = v1 + random() row = [v1, v2] data.append(row) # fit model model = VARMAX(data, order=(1, 1)) model_fit = model.fit(disp=False) # make prediction yhat = model_fit.forecast() print(yhat)

- statsmodels.tsa.statespace.varmax.VARMAX API
- statsmodels.tsa.statespace.varmax.VARMAXResults
- Vector autoregression on Wikipedia

The Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX) is an extension of the VARMA model that also includes the modeling of exogenous variables. It is a multivariate version of the ARMAX method.

Exogenous variables are also called covariates and can be thought of as parallel input sequences that have observations at the same time steps as the original series. The primary series(es) are referred to as endogenous data to contrast it from the exogenous sequence(s). The observations for exogenous variables are included in the model directly at each time step and are not modeled in the same way as the primary endogenous sequence (e.g. as an AR, MA, etc. process).

The VARMAX method can also be used to model the subsumed models with exogenous variables, such as VARX and VMAX.

The method is suitable for multivariate time series without trend and seasonal components with exogenous variables.

# VARMAX example from statsmodels.tsa.statespace.varmax import VARMAX from random import random # contrived dataset with dependency data = list() for i in range(100): v1 = random() v2 = v1 + random() row = [v1, v2] data.append(row) data_exog = [x + random() for x in range(100)] # fit model model = VARMAX(data, exog=data_exog, order=(1, 1)) model_fit = model.fit(disp=False) # make prediction data_exog2 = [[100]] yhat = model_fit.forecast(exog=data_exog2) print(yhat)

- statsmodels.tsa.statespace.varmax.VARMAX API
- statsmodels.tsa.statespace.varmax.VARMAXResults
- Vector autoregression on Wikipedia

The Simple Exponential Smoothing (SES) method models the next time step as an exponentially weighted linear function of observations at prior time steps.

The method is suitable for univariate time series without trend and seasonal components.

# SES example from statsmodels.tsa.holtwinters import SimpleExpSmoothing from random import random # contrived dataset data = [x + random() for x in range(1, 100)] # fit model model = SimpleExpSmoothing(data) model_fit = model.fit() # make prediction yhat = model_fit.predict(len(data), len(data)) print(yhat)

- statsmodels.tsa.holtwinters.SimpleExpSmoothing API
- statsmodels.tsa.holtwinters.HoltWintersResults API
- Exponential smoothing on Wikipedia

The Holt Winter’s Exponential Smoothing (HWES) also called the Triple Exponential Smoothing method models the next time step as an exponentially weighted linear function of observations at prior time steps, taking trends and seasonality into account.

The method is suitable for univariate time series with trend and/or seasonal components.

# HWES example from statsmodels.tsa.holtwinters import ExponentialSmoothing from random import random # contrived dataset data = [x + random() for x in range(1, 100)] # fit model model = ExponentialSmoothing(data) model_fit = model.fit() # make prediction yhat = model_fit.predict(len(data), len(data)) print(yhat)

- statsmodels.tsa.holtwinters.ExponentialSmoothing API
- statsmodels.tsa.holtwinters.HoltWintersResults API
- Exponential smoothing on Wikipedia

This section provides more resources on the topic if you are looking to go deeper.

In this post, you discovered a suite of classical time series forecasting methods that you can test and tune on your time series dataset.

Did I miss your favorite classical time series forecasting method?

Let me know in the comments below.

Did you try any of these methods on your dataset?

Let me know about your findings in the comments.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post 11 Classical Time Series Forecasting Methods in Python (Cheat Sheet) appeared first on Machine Learning Mastery.

]]>The post A Standard Multivariate, Multi-Step, and Multi-Site Time Series Forecasting Problem appeared first on Machine Learning Mastery.

]]>In this post, you will discover a standardized yet complex time series forecasting problem that has these properties, but is small and sufficiently well understood that it can be used to explore and better understand methods for developing forecasting models on challenging datasets.

After reading this post, you will know:

- The competition and motivation for addressing the air-quality dataset.
- An overview of the defined prediction problem and the data challenges it covers.
- A description of the free data files that you can download and start working with immediately.

Let’s get started.

The dataset was used as the center of a Kaggle competition.

Specifically, a 24-hour hackathon hosted by Data Science London and Data Science Global as part of a Big Data Week event, two organizations that don’t seem to exist now, 6 years later.

The competition involved a multi-thousand-dollar cash prize, and the dataset was provided by the Cook County, Illinois local government, suggesting all locations mentioned in the dataset are in that locality.

The motivation for the challenge is to develop a better model for predicting air quality, taken from the competition description:

The EPA’s Air Quality Index is used daily by people suffering from asthma and other respiratory diseases to avoid dangerous levels of outdoor air pollutants, which can trigger attacks. According to the World Health Organisation there are now estimated to be 235 million people suffering from asthma. Globally, it is now the most common chronic disease among children, with incidence in the US doubling since 1980.

The competition description suggests that winning models could be used as the basis for a new air-quality prediction system, although it is not clear if any models were ever transitioned for this purpose.

The competition was won by a Kaggle employee, Ben Hamner, who presumably did not collect the prize given the conflict of interest. Ben described his winning approach in the blog post titled “Chucking everything into a Random Forest: Ben Hamner on Winning The Air Quality Prediction Hackathon” and provided his code on GitHub.

There is also a good discussion of solutions and related code in this forum post titled “General approaches to partitioning the models?“.

The data describes a multi-step forecasting problem given a multivariate time series across multiple sites or physical locations.

Given multiple weather measurements over time, predict a sequence of air quality measurements at specific future time intervals across multiple physical locations.

It is a challenging time series forecasting problem that has a lot of the qualities of real-world forecasting:

**Incomplete data**. Not all weather and air quality measures are available for all locations.**Missing data**. Not all available measures have a complete history.**Multivariate inputs**: The model inputs for each forecast are comprised of multiple weather observations.**Multi-step outputs**: The model outputs are a discontiguous sequence of forecasted air quality measures.**Multi-site outputs**: The mode must output a multi-step forecast for multiple physical sites.

The dataset is available for free from the Kaggle website.

You must create an account and sign-in with Kaggle before you can get access to download the dataset.

The dataset can be downloaded from here:

There are 4 files of interest that you must download separately; they are:

This file contains a list of site locations marked by unique identifiers and their precise location on Earth measured by longitude and latitude.

All coordinates appear to be relatively close in the North-Western Hemisphere, e.g. America.

Below is a sample of the file.

"SITE_ID","LATITUDE","LONGITUDE" 1,41.6709918952829,-87.7324568962847 32,41.755832412403,-87.545349670582 50,41.7075695897648,-87.5685738570845 57,41.9128621248178,-87.7227234452095 64,41.7907868783739,-87.6016464917605 ...

This file has the same format as *SiteLocations.csv* and appears to list all of the same locations as that file with some additional locations.

As the filename suggests, it is just an updated version of the list of sites.

Below is a sample of the file.

"SITE_ID","LATITUDE","LONGITUDE" 1,41.6709918952829,-87.7324568962847 14,41.834243,-87.6238 22,41.6871654376343,-87.5393154841479 32,41.755832412403,-87.545349670582 50,41.7075695897648,-87.5685738570845 ...

This file contains the training data for modeling.

The data is presented in an unnormalized manner. Each row of data contains one set of meteorological measurements for one hour across multiple locations as well as the targets or outcomes for each location for that hour.

The measures include:

- Time information, including the block of time, the index within the contiguous block of time, the average month, day of the week, and hour of the day.
- Wind measurements such as direction and speed.
- Temperature measurements such as minimum and maximum ambient temperature.
- Pressure measurements such as minimum and maximum barometric pressure.

The target variables are a collection of different air quality or pollution measures at different physical locations.

Not all locations have all weather measurements and not all locations are concerned with all target measures. Further, for those recorded variables, there are missing values marked as NA.

Below is a sample of the file.

"rowID","chunkID","position_within_chunk","month_most_common","weekday","hour","Solar.radiation_64","WindDirection..Resultant_1","WindDirection..Resultant_1018","WindSpeed..Resultant_1","WindSpeed..Resultant_1018","Ambient.Max.Temperature_14","Ambient.Max.Temperature_22","Ambient.Max.Temperature_50","Ambient.Max.Temperature_52","Ambient.Max.Temperature_57","Ambient.Max.Temperature_76","Ambient.Max.Temperature_2001","Ambient.Max.Temperature_3301","Ambient.Max.Temperature_6005","Ambient.Min.Temperature_14","Ambient.Min.Temperature_22","Ambient.Min.Temperature_50","Ambient.Min.Temperature_52","Ambient.Min.Temperature_57","Ambient.Min.Temperature_76","Ambient.Min.Temperature_2001","Ambient.Min.Temperature_3301","Ambient.Min.Temperature_6005","Sample.Baro.Pressure_14","Sample.Baro.Pressure_22","Sample.Baro.Pressure_50","Sample.Baro.Pressure_52","Sample.Baro.Pressure_57","Sample.Baro.Pressure_76","Sample.Baro.Pressure_2001","Sample.Baro.Pressure_3301","Sample.Baro.Pressure_6005","Sample.Max.Baro.Pressure_14","Sample.Max.Baro.Pressure_22","Sample.Max.Baro.Pressure_50","Sample.Max.Baro.Pressure_52","Sample.Max.Baro.Pressure_57","Sample.Max.Baro.Pressure_76","Sample.Max.Baro.Pressure_2001","Sample.Max.Baro.Pressure_3301","Sample.Max.Baro.Pressure_6005","Sample.Min.Baro.Pressure_14","Sample.Min.Baro.Pressure_22","Sample.Min.Baro.Pressure_50","Sample.Min.Baro.Pressure_52","Sample.Min.Baro.Pressure_57","Sample.Min.Baro.Pressure_76","Sample.Min.Baro.Pressure_2001","Sample.Min.Baro.Pressure_3301","Sample.Min.Baro.Pressure_6005","target_1_57","target_10_4002","target_10_8003","target_11_1","target_11_32","target_11_50","target_11_64","target_11_1003","target_11_1601","target_11_4002","target_11_8003","target_14_4002","target_14_8003","target_15_57","target_2_57","target_3_1","target_3_50","target_3_57","target_3_1601","target_3_4002","target_3_6006","target_4_1","target_4_50","target_4_57","target_4_1018","target_4_1601","target_4_2001","target_4_4002","target_4_4101","target_4_6006","target_4_8003","target_5_6006","target_7_57","target_8_57","target_8_4002","target_8_6004","target_8_8003","target_9_4002","target_9_8003" 1,1,1,10,"Saturday",21,0.01,117,187,0.3,0.3,NA,NA,NA,14.9,NA,NA,NA,NA,NA,NA,NA,NA,5.8,NA,NA,NA,NA,NA,NA,NA,NA,747,NA,NA,NA,NA,NA,NA,NA,NA,750,NA,NA,NA,NA,NA,NA,NA,NA,743,NA,NA,NA,NA,NA,2.67923294292042,6.1816228132982,NA,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,NA,2.38965627997991,NA,5.56815355612325,0.690015329704154,NA,NA,NA,NA,NA,NA,2.84349016287551,0.0920223353681394,1.69321097077376,0.368089341472558,0.184044670736279,0.368089341472558,0.276067006104418,0.892616653070952,1.74842437199465,NA,NA,5.1306307034019,1.34160578423204,2.13879182993514,3.01375212399952,NA,5.67928016629218,NA 2,1,2,10,"Saturday",22,0.01,231,202,0.5,0.6,NA,NA,NA,14.9,NA,NA,NA,NA,NA,NA,NA,NA,5.8,NA,NA,NA,NA,NA,NA,NA,NA,747,NA,NA,NA,NA,NA,NA,NA,NA,750,NA,NA,NA,NA,NA,NA,NA,NA,743,NA,NA,NA,NA,NA,2.67923294292042,8.47583334194495,NA,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,NA,1.99138023331659,NA,5.56815355612325,0.923259948195698,NA,NA,NA,NA,NA,NA,3.1011527019063,0.0920223353681394,1.94167127626774,0.368089341472558,0.184044670736279,0.368089341472558,0.368089341472558,1.73922213845783,2.14412041407765,NA,NA,5.1306307034019,1.19577906855465,2.72209869264472,3.88871241806389,NA,7.42675098668978,NA 3,1,3,10,"Saturday",23,0.01,247,227,0.5,1.5,NA,NA,NA,14.9,NA,NA,NA,NA,NA,NA,NA,NA,5.8,NA,NA,NA,NA,NA,NA,NA,NA,747,NA,NA,NA,NA,NA,NA,NA,NA,750,NA,NA,NA,NA,NA,NA,NA,NA,743,NA,NA,NA,NA,NA,2.67923294292042,8.92192983362627,NA,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,NA,1.7524146053186,NA,5.56815355612325,0.680296803933673,NA,NA,NA,NA,NA,NA,3.06434376775904,0.0920223353681394,2.52141198908702,0.460111676840697,0.184044670736279,0.368089341472558,0.368089341472558,1.7852333061419,1.93246904273093,NA,NA,5.13639545700122,1.40965825154816,3.11096993445111,3.88871241806389,NA,7.68373198968942,NA 4,1,4,10,"Sunday",0,0.01,219,218,0.2,1.2,NA,NA,NA,14,NA,NA,NA,NA,NA,NA,NA,NA,4.8,NA,NA,NA,NA,NA,NA,NA,NA,751,NA,NA,NA,NA,NA,NA,NA,NA,754,NA,NA,NA,NA,NA,NA,NA,NA,748,NA,NA,NA,NA,NA,2.67923294292042,5.09824561921501,NA,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,NA,2.38965627997991,NA,5.6776192223642,0.612267123540305,NA,NA,NA,NA,NA,NA,3.21157950434806,0.184044670736279,2.374176252498,0.460111676840697,0.184044670736279,0.368089341472558,0.276067006104418,1.86805340797323,2.08890701285676,NA,NA,5.21710200739181,1.47771071886428,2.04157401948354,3.20818774490271,NA,4.83124285639335,NA 5,1,5,10,"Sunday",1,0.01,2,216,0.2,0.3,NA,NA,NA,14,NA,NA,NA,NA,NA,NA,NA,NA,4.8,NA,NA,NA,NA,NA,NA,NA,NA,751,NA,NA,NA,NA,NA,NA,NA,NA,754,NA,NA,NA,NA,NA,NA,NA,NA,748,NA,NA,NA,NA,NA,2.67923294292042,4.87519737337435,NA,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,NA,2.31000107064725,NA,5.6776192223642,0.694874592589394,NA,NA,NA,NA,NA,NA,3.67169118118876,0.184044670736279,2.46619858786614,0.460111676840697,0.184044670736279,0.368089341472558,0.276067006104418,1.70241320431058,2.60423209091834,NA,NA,5.21710200739181,1.45826715677396,2.13879182993514,3.4998411762575,NA,4.62565805399363,NA ...

This file contains a sample of the submission for the prediction problem.

Each row specifies the prediction for each target measure across all target locations for a given hour in a chunk of contiguous time.

Below is a sample of the file.

"rowID","chunkID","position_within_chunk","hour","month_most_common","target_1_57","target_10_4002","target_10_8003","target_11_1","target_11_32","target_11_50","target_11_64","target_11_1003","target_11_1601","target_11_4002","target_11_8003","target_14_4002","target_14_8003","target_15_57","target_2_57","target_3_1","target_3_50","target_3_57","target_3_1601","target_3_4002","target_3_6006","target_4_1","target_4_50","target_4_57","target_4_1018","target_4_1601","target_4_2001","target_4_4002","target_4_4101","target_4_6006","target_4_8003","target_5_6006","target_7_57","target_8_57","target_8_4002","target_8_6004","target_8_8003","target_9_4002","target_9_8003" 193,1,193,21,10,0,0,-1e+06,0,0,0,0,0,0,0,-1e+06,0,-1e+06,0,0,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,0,0,0,0,0,0,0,0,0,-1e+06,-1e+06,0,0,0,0,-1e+06,0,-1e+06 194,1,194,22,10,0,0,-1e+06,0,0,0,0,0,0,0,-1e+06,0,-1e+06,0,0,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,0,0,0,0,0,0,0,0,0,-1e+06,-1e+06,0,0,0,0,-1e+06,0,-1e+06 195,1,195,23,10,0,0,-1e+06,0,0,0,0,0,0,0,-1e+06,0,-1e+06,0,0,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,0,0,0,0,0,0,0,0,0,-1e+06,-1e+06,0,0,0,0,-1e+06,0,-1e+06 196,1,196,0,10,0,0,-1e+06,0,0,0,0,0,0,0,-1e+06,0,-1e+06,0,0,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,0,0,0,0,0,0,0,0,0,-1e+06,-1e+06,0,0,0,0,-1e+06,0,-1e+06 197,1,197,1,10,0,0,-1e+06,0,0,0,0,0,0,0,-1e+06,0,-1e+06,0,0,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,0,0,0,0,0,0,0,0,0,-1e+06,-1e+06,0,0,0,0,-1e+06,0,-1e+06 ...

A large part of the challenge of this prediction problem is the vast number of ways that the problem can be framed for modeling.

This is challenging because it is not clear which framing may be the best for this specific modeling problem.

For example, below are some questions to provoke thought about how the problem could be framed.

- Is it better to impute or ignore missing observations?
- Is it better to feed in a time series of weather observations or only the observations for the current hour?
- Is it better to use weather observations from one or multiple source locations for a forecast?
- Is it better to have one model for each location or one mode for all locations?
- Is it better to have one model for each forecast time or one for all forecast times?

This section provides more resources on the topic if you are looking to go deeper.

- EMC Data Science Global Hackathon (Air Quality Prediction)
- Download Dataset
- Chucking everything into a Random Forest: Ben Hamner on Winning The Air Quality Prediction Hackathon
- Winning Code for the EMC Data Science Global Hackathon (Air Quality Prediction)
- General approaches to partitioning the models?

In this post, you discovered the Kaggle air-quality dataset that provides a standard dataset for complex time series forecasting.

Specifically, you learned:

- The competition and motivation for addressing the air-quality dataset.
- An overview of the defined prediction problem and the data challenges it covers.
- A description of the free data files that can download and start working with immediately.

Have you worked on this dataset, or do you intend to?

Share your experiences in the comments below.

The post A Standard Multivariate, Multi-Step, and Multi-Site Time Series Forecasting Problem appeared first on Machine Learning Mastery.

]]>The post How to Convert a Time Series to a Supervised Learning Problem in Python appeared first on Machine Learning Mastery.

]]>Before machine learning can be used, time series forecasting problems must be re-framed as supervised learning problems. From a sequence to pairs of input and output sequences.

In this tutorial, you will discover how to transform univariate and multivariate time series forecasting problems into supervised learning problems for use with machine learning algorithms.

After completing this tutorial, you will know:

- How to develop a function to transform a time series dataset into a supervised learning dataset.
- How to transform univariate time series data for machine learning.
- How to transform multivariate time series data for machine learning.

Let’s get started.

Before we get started, let’s take a moment to better understand the form of time series and supervised learning data.

A time series is a sequence of numbers that are ordered by a time index. This can be thought of as a list or column of ordered values.

For example:

0 1 2 3 4 5 6 7 8 9

A supervised learning problem is comprised of input patterns (*X*) and output patterns (*y*), such that an algorithm can learn how to predict the output patterns from the input patterns.

For example:

X, y 1 2 2, 3 3, 4 4, 5 5, 6 6, 7 7, 8 8, 9

For more on this topic, see the post:

A key function to help transform time series data into a supervised learning problem is the Pandas shift() function.

Given a DataFrame, the *shift()* function can be used to create copies of columns that are pushed forward (rows of NaN values added to the front) or pulled back (rows of NaN values added to the end).

This is the behavior required to create columns of lag observations as well as columns of forecast observations for a time series dataset in a supervised learning format.

Let’s look at some examples of the shift function in action.

We can define a mock time series dataset as a sequence of 10 numbers, in this case a single column in a DataFrame as follows:

from pandas import DataFrame df = DataFrame() df['t'] = [x for x in range(10)] print(df)

Running the example prints the time series data with the row indices for each observation.

t 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9

We can shift all the observations down by one time step by inserting one new row at the top. Because the new row has no data, we can use NaN to represent “no data”.

The shift function can do this for us and we can insert this shifted column next to our original series.

from pandas import DataFrame df = DataFrame() df['t'] = [x for x in range(10)] df['t-1'] = df['t'].shift(1) print(df)

Running the example gives us two columns in the dataset. The first with the original observations and a new shifted column.

We can see that shifting the series forward one time step gives us a primitive supervised learning problem, although with *X* and *y* in the wrong order. Ignore the column of row labels. The first row would have to be discarded because of the NaN value. The second row shows the input value of 0.0 in the second column (input or *X*) and the value of 1 in the first column (output or *y*).

t t-1 0 0 NaN 1 1 0.0 2 2 1.0 3 3 2.0 4 4 3.0 5 5 4.0 6 6 5.0 7 7 6.0 8 8 7.0 9 9 8.0

We can see that if we can repeat this process with shifts of 2, 3, and more, how we could create long input sequences (*X*) that can be used to forecast an output value (*y*).

The shift operator can also accept a negative integer value. This has the effect of pulling the observations up by inserting new rows at the end. Below is an example:

from pandas import DataFrame df = DataFrame() df['t'] = [x for x in range(10)] df['t+1'] = df['t'].shift(-1) print(df)

Running the example shows a new column with a NaN value as the last value.

We can see that the forecast column can be taken as an input (*X*) and the second as an output value (*y*). That is the input value of 0 can be used to forecast the output value of 1.

t t+1 0 0 1.0 1 1 2.0 2 2 3.0 3 3 4.0 4 4 5.0 5 5 6.0 6 6 7.0 7 7 8.0 8 8 9.0 9 9 NaN

Technically, in time series forecasting terminology the current time (*t*) and future times (*t+1*, *t+n*) are forecast times and past observations (*t-1*, *t-n*) are used to make forecasts.

We can see how positive and negative shifts can be used to create a new DataFrame from a time series with sequences of input and output patterns for a supervised learning problem.

This permits not only classical *X -> y* prediction, but also *X -> Y* where both input and output can be sequences.

Further, the shift function also works on so-called multivariate time series problems. That is where instead of having one set of observations for a time series, we have multiple (e.g. temperature and pressure). All variates in the time series can be shifted forward or backward to create multivariate input and output sequences. We will explore this more later in the tutorial.

We can use the *shift()* function in Pandas to automatically create new framings of time series problems given the desired length of input and output sequences.

This would be a useful tool as it would allow us to explore different framings of a time series problem with machine learning algorithms to see which might result in better performing models.

In this section, we will define a new Python function named *series_to_supervised()* that takes a univariate or multivariate time series and frames it as a supervised learning dataset.

The function takes four arguments:

**data**: Sequence of observations as a list or 2D NumPy array. Required.**n_in**: Number of lag observations as input (*X*). Values may be between [1..len(data)] Optional. Defaults to 1.**n_out**: Number of observations as output (*y*). Values may be between [0..len(data)-1]. Optional. Defaults to 1.**dropnan**: Boolean whether or not to drop rows with NaN values. Optional. Defaults to True.

The function returns a single value:

**return**: Pandas DataFrame of series framed for supervised learning.

The new dataset is constructed as a DataFrame, with each column suitably named both by variable number and time step. This allows you to design a variety of different time step sequence type forecasting problems from a given univariate or multivariate time series.

Once the DataFrame is returned, you can decide how to split the rows of the returned DataFrame into *X* and *y* components for supervised learning any way you wish.

The function is defined with default parameters so that if you call it with just your data, it will construct a DataFrame with *t-1* as *X* and *t* as *y*.

The function is confirmed to be compatible with Python 2 and Python 3.

The complete function is listed below, including function comments.

from pandas import DataFrame from pandas import concat def series_to_supervised(data, n_in=1, n_out=1, dropnan=True): """ Frame a time series as a supervised learning dataset. Arguments: data: Sequence of observations as a list or NumPy array. n_in: Number of lag observations as input (X). n_out: Number of observations as output (y). dropnan: Boolean whether or not to drop rows with NaN values. Returns: Pandas DataFrame of series framed for supervised learning. """ n_vars = 1 if type(data) is list else data.shape[1] df = DataFrame(data) cols, names = list(), list() # input sequence (t-n, ... t-1) for i in range(n_in, 0, -1): cols.append(df.shift(i)) names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)] # forecast sequence (t, t+1, ... t+n) for i in range(0, n_out): cols.append(df.shift(-i)) if i == 0: names += [('var%d(t)' % (j+1)) for j in range(n_vars)] else: names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)] # put it all together agg = concat(cols, axis=1) agg.columns = names # drop rows with NaN values if dropnan: agg.dropna(inplace=True) return agg

Can you see obvious ways to make the function more robust or more readable?

Please let me know in the comments below.

Now that we have the whole function, we can explore how it may be used.

It is standard practice in time series forecasting to use lagged observations (e.g. t-1) as input variables to forecast the current time step (t).

This is called one-step forecasting.

The example below demonstrates a one lag time step (t-1) to predict the current time step (t).

from pandas import DataFrame from pandas import concat def series_to_supervised(data, n_in=1, n_out=1, dropnan=True): """ Frame a time series as a supervised learning dataset. Arguments: data: Sequence of observations as a list or NumPy array. n_in: Number of lag observations as input (X). n_out: Number of observations as output (y). dropnan: Boolean whether or not to drop rows with NaN values. Returns: Pandas DataFrame of series framed for supervised learning. """ n_vars = 1 if type(data) is list else data.shape[1] df = DataFrame(data) cols, names = list(), list() # input sequence (t-n, ... t-1) for i in range(n_in, 0, -1): cols.append(df.shift(i)) names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)] # forecast sequence (t, t+1, ... t+n) for i in range(0, n_out): cols.append(df.shift(-i)) if i == 0: names += [('var%d(t)' % (j+1)) for j in range(n_vars)] else: names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)] # put it all together agg = concat(cols, axis=1) agg.columns = names # drop rows with NaN values if dropnan: agg.dropna(inplace=True) return agg values = [x for x in range(10)] data = series_to_supervised(values) print(data)

Running the example prints the output of the reframed time series.

var1(t-1) var1(t) 1 0.0 1 2 1.0 2 3 2.0 3 4 3.0 4 5 4.0 5 6 5.0 6 7 6.0 7 8 7.0 8 9 8.0 9

We can see that the observations are named “*var1*” and that the input observation is suitably named (t-1) and the output time step is named (t).

We can also see that rows with NaN values have been automatically removed from the DataFrame.

We can repeat this example with an arbitrary number length input sequence, such as 3. This can be done by specifying the length of the input sequence as an argument; for example:

data = series_to_supervised(values, 3)

The complete example is listed below.

from pandas import DataFrame from pandas import concat def series_to_supervised(data, n_in=1, n_out=1, dropnan=True): """ Frame a time series as a supervised learning dataset. Arguments: data: Sequence of observations as a list or NumPy array. n_in: Number of lag observations as input (X). n_out: Number of observations as output (y). dropnan: Boolean whether or not to drop rows with NaN values. Returns: Pandas DataFrame of series framed for supervised learning. """ n_vars = 1 if type(data) is list else data.shape[1] df = DataFrame(data) cols, names = list(), list() # input sequence (t-n, ... t-1) for i in range(n_in, 0, -1): cols.append(df.shift(i)) names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)] # forecast sequence (t, t+1, ... t+n) for i in range(0, n_out): cols.append(df.shift(-i)) if i == 0: names += [('var%d(t)' % (j+1)) for j in range(n_vars)] else: names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)] # put it all together agg = concat(cols, axis=1) agg.columns = names # drop rows with NaN values if dropnan: agg.dropna(inplace=True) return agg values = [x for x in range(10)] data = series_to_supervised(values, 3) print(data)

Again, running the example prints the reframed series. We can see that the input sequence is in the correct left-to-right order with the output variable to be predicted on the far right.

var1(t-3) var1(t-2) var1(t-1) var1(t) 3 0.0 1.0 2.0 3 4 1.0 2.0 3.0 4 5 2.0 3.0 4.0 5 6 3.0 4.0 5.0 6 7 4.0 5.0 6.0 7 8 5.0 6.0 7.0 8 9 6.0 7.0 8.0 9

A different type of forecasting problem is using past observations to forecast a sequence of future observations.

This may be called sequence forecasting or multi-step forecasting.

We can frame a time series for sequence forecasting by specifying another argument. For example, we could frame a forecast problem with an input sequence of 2 past observations to forecast 2 future observations as follows:

data = series_to_supervised(values, 2, 2)

The complete example is listed below:

from pandas import DataFrame from pandas import concat def series_to_supervised(data, n_in=1, n_out=1, dropnan=True): """ Frame a time series as a supervised learning dataset. Arguments: data: Sequence of observations as a list or NumPy array. n_in: Number of lag observations as input (X). n_out: Number of observations as output (y). dropnan: Boolean whether or not to drop rows with NaN values. Returns: Pandas DataFrame of series framed for supervised learning. """ n_vars = 1 if type(data) is list else data.shape[1] df = DataFrame(data) cols, names = list(), list() # input sequence (t-n, ... t-1) for i in range(n_in, 0, -1): cols.append(df.shift(i)) names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)] # forecast sequence (t, t+1, ... t+n) for i in range(0, n_out): cols.append(df.shift(-i)) if i == 0: names += [('var%d(t)' % (j+1)) for j in range(n_vars)] else: names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)] # put it all together agg = concat(cols, axis=1) agg.columns = names # drop rows with NaN values if dropnan: agg.dropna(inplace=True) return agg values = [x for x in range(10)] data = series_to_supervised(values, 2, 2) print(data)

Running the example shows the differentiation of input (t-n) and output (t+n) variables with the current observation (t) considered an output.

var1(t-2) var1(t-1) var1(t) var1(t+1) 2 0.0 1.0 2 3.0 3 1.0 2.0 3 4.0 4 2.0 3.0 4 5.0 5 3.0 4.0 5 6.0 6 4.0 5.0 6 7.0 7 5.0 6.0 7 8.0 8 6.0 7.0 8 9.0

Another important type of time series is called multivariate time series.

This is where we may have observations of multiple different measures and an interest in forecasting one or more of them.

For example, we may have two sets of time series observations obs1 and obs2 and we wish to forecast one or both of these.

We can call *series_to_supervised()* in exactly the same way.

For example:

from pandas import DataFrame from pandas import concat def series_to_supervised(data, n_in=1, n_out=1, dropnan=True): """ Frame a time series as a supervised learning dataset. Arguments: data: Sequence of observations as a list or NumPy array. n_in: Number of lag observations as input (X). n_out: Number of observations as output (y). dropnan: Boolean whether or not to drop rows with NaN values. Returns: Pandas DataFrame of series framed for supervised learning. """ n_vars = 1 if type(data) is list else data.shape[1] df = DataFrame(data) cols, names = list(), list() # input sequence (t-n, ... t-1) for i in range(n_in, 0, -1): cols.append(df.shift(i)) names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)] # forecast sequence (t, t+1, ... t+n) for i in range(0, n_out): cols.append(df.shift(-i)) if i == 0: names += [('var%d(t)' % (j+1)) for j in range(n_vars)] else: names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)] # put it all together agg = concat(cols, axis=1) agg.columns = names # drop rows with NaN values if dropnan: agg.dropna(inplace=True) return agg raw = DataFrame() raw['ob1'] = [x for x in range(10)] raw['ob2'] = [x for x in range(50, 60)] values = raw.values data = series_to_supervised(values) print(data)

Running the example prints the new framing of the data, showing an input pattern with one time step for both variables and an output pattern of one time step for both variables.

Again, depending on the specifics of the problem, the division of columns into *X* and *Y* components can be chosen arbitrarily, such as if the current observation of *var1* was also provided as input and only *var2* was to be predicted.

var1(t-1) var2(t-1) var1(t) var2(t) 1 0.0 50.0 1 51 2 1.0 51.0 2 52 3 2.0 52.0 3 53 4 3.0 53.0 4 54 5 4.0 54.0 5 55 6 5.0 55.0 6 56 7 6.0 56.0 7 57 8 7.0 57.0 8 58 9 8.0 58.0 9 59

You can see how this may be easily used for sequence forecasting with multivariate time series by specifying the length of the input and output sequences as above.

For example, below is an example of a reframing with 1 time step as input and 2 time steps as forecast sequence.

from pandas import DataFrame from pandas import concat def series_to_supervised(data, n_in=1, n_out=1, dropnan=True): """ Frame a time series as a supervised learning dataset. Arguments: data: Sequence of observations as a list or NumPy array. n_in: Number of lag observations as input (X). n_out: Number of observations as output (y). dropnan: Boolean whether or not to drop rows with NaN values. Returns: Pandas DataFrame of series framed for supervised learning. """ n_vars = 1 if type(data) is list else data.shape[1] df = DataFrame(data) cols, names = list(), list() # input sequence (t-n, ... t-1) for i in range(n_in, 0, -1): cols.append(df.shift(i)) names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)] # forecast sequence (t, t+1, ... t+n) for i in range(0, n_out): cols.append(df.shift(-i)) if i == 0: names += [('var%d(t)' % (j+1)) for j in range(n_vars)] else: names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)] # put it all together agg = concat(cols, axis=1) agg.columns = names # drop rows with NaN values if dropnan: agg.dropna(inplace=True) return agg raw = DataFrame() raw['ob1'] = [x for x in range(10)] raw['ob2'] = [x for x in range(50, 60)] values = raw.values data = series_to_supervised(values, 1, 2) print(data)

Running the example shows the large reframed DataFrame.

var1(t-1) var2(t-1) var1(t) var2(t) var1(t+1) var2(t+1) 1 0.0 50.0 1 51 2.0 52.0 2 1.0 51.0 2 52 3.0 53.0 3 2.0 52.0 3 53 4.0 54.0 4 3.0 53.0 4 54 5.0 55.0 5 4.0 54.0 5 55 6.0 56.0 6 5.0 55.0 6 56 7.0 57.0 7 6.0 56.0 7 57 8.0 58.0 8 7.0 57.0 8 58 9.0 59.0

Experiment with your own dataset and try multiple different framings to see what works best.

In this tutorial, you discovered how to reframe time series datasets as supervised learning problems with Python.

Specifically, you learned:

- About the Pandas
*shift()*function and how it can be used to automatically define supervised learning datasets from time series data. - How to reframe a univariate time series into one-step and multi-step supervised learning problems.
- How to reframe multivariate time series into one-step and multi-step supervised learning problems.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post How to Convert a Time Series to a Supervised Learning Problem in Python appeared first on Machine Learning Mastery.

]]>The post Seasonal Persistence Forecasting With Python appeared first on Machine Learning Mastery.

]]>A better first-cut forecast on time series data with a seasonal component is to persist the observation for the same time in the previous season. This is called seasonal persistence.

In this tutorial, you will discover how to implement seasonal persistence for time series forecasting in Python.

After completing this tutorial, you will know:

- How to use point observations from prior seasons for a persistence forecast.
- How to use mean observations across a sliding window of prior seasons for a persistence forecast.
- How to apply and evaluate seasonal persistence on monthly and daily time series data.

Let’s get started.

**Updated Apr/2019**: Updated the links to datasets.**Updated Aug/2019**: Updated data loading to use new API.

It is critical to have a useful first-cut forecast on time series problems to provide a lower-bound on skill before moving on to more sophisticated methods.

This is to ensure we are not wasting time on models or datasets that are not predictive.

It is common to use a persistence or a naive forecast as a first-cut forecast model when time series forecasting.

This does not make sense with time series data that has an obvious seasonal component. A better first cut model for seasonal data is to use the observation at the same time in the previous seasonal cycle as the prediction.

We can call this “*seasonal persistence*” and it is a simple model that can result in an effective first cut model.

One step better is to use a simple function of the last few observations at the same time in previous seasonal cycles. For example, the mean of the observations. This can often provide a small additional benefit.

In this tutorial, we will demonstrate this simple seasonal persistence forecasting method for providing a lower bound on forecast skill on three different real-world time series datasets.

In this tutorial, we will use a sliding window seasonal persistence model to make forecasts.

Within a sliding window, observations at the same time in previous one-year seasons will be collected and the mean of those observations can be used as the persisted forecast.

Different window sizes can be evaluated to find a combination that minimizes error.

As an example, if the data is monthly and the month to be predicted is February, then with a window of size 1 (*w=1*) the observation last February will be used to make the forecast.

A window of size 2 (*w=2*) would involve taking observations for the last two Februaries to be averaged and used as a forecast.

An alternate interpretation might seek to use point observations from prior years (e.g. t-12, t-24, etc. for monthly data) rather than taking the mean of the cumulative point observations. Perhaps try both methods on your dataset and see what works best as a good starting point model.

Take my free 7-day email course and discover how to get started (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

It is important to evaluate time series forecasting models consistently.

In this section, we will define how we will evaluate forecast models in this tutorial.

First, we will hold the last two years of data back and evaluate forecasts on this data. This works for both monthly and daily data we will look at.

We will use a walk-forward validation to evaluate model performance. This means that each time step in the test dataset will be enumerated, a model constructed on historical data, and the forecast compared to the expected value. The observation will then be added to the training dataset and the process repeated.

Walk-forward validation is a realistic way to evaluate time series forecast models as one would expect models to be updated as new observations are made available.

Finally, forecasts will be evaluated using root mean squared error, or RMSE. The benefit of RMSE is that it penalizes large errors and the scores are in the same units as the forecast values (car sales per month).

In summary, the test harness involves:

- The last 2 years of data used as a test set.
- Walk-forward validation for model evaluation.
- Root mean squared error used to report model skill.

The Monthly Car Sales dataset describes the number of car sales in Quebec, Canada between 1960 and 1968.

The units are a count of the number of sales and there are 108 observations. The source of the data is credited to Abraham and Ledolter (1983).

Download the dataset and save it into your current working directory with the filename “*car-sales.csv*“. Note, you may need to delete the footer information from the file.

The code below loads the dataset as a Pandas Series object.

# line plot of time series from pandas import read_csv from matplotlib import pyplot # load dataset series = read_csv('car-sales.csv', header=0, index_col=0) # display first few rows print(series.head(5)) # line plot of dataset series.plot() pyplot.show()

Running the example prints the first 5 rows of data.

Month 1960-01-01 6550 1960-02-01 8728 1960-03-01 12026 1960-04-01 14395 1960-05-01 14587 Name: Sales, dtype: int64

A line plot of the data is also provided. We can see both a yearly seasonal component and an increasing trend.

The prior 24 months of data will be held back as test data. We will investigate seasonal persistence with a sliding window from 1 to 5 years.

The complete example is listed below.

from pandas import read_csv from sklearn.metrics import mean_squared_error from math import sqrt from numpy import mean from matplotlib import pyplot # load data series = read_csv('car-sales.csv', header=0, index_col=0) # prepare data X = series.values train, test = X[0:-24], X[-24:] # evaluate mean of different number of years years = [1, 2, 3, 4, 5] scores = list() for year in years: # walk-forward validation history = [x for x in train] predictions = list() for i in range(len(test)): # collect obs obs = list() for y in range(1, year+1): obs.append(history[-(y*12)]) # make prediction yhat = mean(obs) predictions.append(yhat) # observation history.append(test[i]) # report performance rmse = sqrt(mean_squared_error(test, predictions)) scores.append(rmse) print('Years=%d, RMSE: %.3f' % (year, rmse)) pyplot.plot(years, scores) pyplot.show()

Running the example prints the year number and the RMSE for the mean observation from the sliding window of observations at the same month in prior years.

The results suggest that taking the average from the last three years is a good starting model with an RMSE of 1803.630 car sales.

Years=1, RMSE: 1997.732 Years=2, RMSE: 1914.911 Years=3, RMSE: 1803.630 Years=4, RMSE: 2099.481 Years=5, RMSE: 2522.235

A plot of the relationship of sliding window size to model error is created.

The plot nicely shows the improvement with the sliding window size to 3 years, then the rapid increase in error from that point.

The Monthly Writing Paper Sales dataset describes the number of specialty writing paper sales.

The units are a type of count of the number of sales and there are 147 months of observations (just over 12 years). The counts are fractional, suggesting the data may in fact be in the units of hundreds of thousands of sales. The source of the data is credited to Makridakis and Wheelwright (1989).

Download the dataset and save it into your current working directory with the filename “*writing-paper-sales.csv*“. Note, you may need to delete the footer information from the file.

The date-time stamps only contain the year number and month. Therefore, a custom date-time parsing function is required to load the data and base the data in an arbitrary year. The year 1900 was chosen as the starting point, which should not affect this case study.

The example below loads the Monthly Writing Paper Sales dataset as a Pandas Series.

# load and plot dataset from pandas import read_csv from pandas import datetime from matplotlib import pyplot # load dataset def parser(x): if len(x) == 4: return datetime.strptime('190'+x, '%Y-%m') return datetime.strptime('19'+x, '%Y-%m') series = read_csv('writing-paper-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) # summarize first few rows print(series.head()) # line plot series.plot() pyplot.show()

Running the example prints the first 5 rows of the loaded dataset.

Month 1901-01-01 1359.795 1901-02-01 1278.564 1901-03-01 1508.327 1901-04-01 1419.710 1901-05-01 1440.510

A line plot of the loaded dataset is then created. We can see the yearly seasonal component and an increasing trend.

As in the previous example, we can hold back the last 24 months of observations as a test dataset. Because we have much more data, we will try sliding window sizes from 1 year to 10 years.

The complete example is listed below.

from pandas import read_csv from pandas import datetime from sklearn.metrics import mean_squared_error from math import sqrt from numpy import mean from matplotlib import pyplot # load dataset def parser(x): if len(x) == 4: return datetime.strptime('190'+x, '%Y-%m') return datetime.strptime('19'+x, '%Y-%m') series = read_csv('writing-paper-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) # prepare data X = series.values train, test = X[0:-24], X[-24:] # evaluate mean of different number of years years = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] scores = list() for year in years: # walk-forward validation history = [x for x in train] predictions = list() for i in range(len(test)): # collect obs obs = list() for y in range(1, year+1): obs.append(history[-(y*12)]) # make prediction yhat = mean(obs) predictions.append(yhat) # observation history.append(test[i]) # report performance rmse = sqrt(mean_squared_error(test, predictions)) scores.append(rmse) print('Years=%d, RMSE: %.3f' % (year, rmse)) pyplot.plot(years, scores) pyplot.show()

Running the example prints the size of the sliding window and the resulting seasonal persistence model error.

The results suggest that a window size of 5 years is optimal, with an RMSE of 554.660 monthly writing paper sales.

Years=1, RMSE: 606.089 Years=2, RMSE: 557.653 Years=3, RMSE: 555.777 Years=4, RMSE: 544.251 Years=5, RMSE: 540.317 Years=6, RMSE: 554.660 Years=7, RMSE: 569.032 Years=8, RMSE: 581.405 Years=9, RMSE: 602.279 Years=10, RMSE: 624.756

The relationship between window size and error is graphed on a line plot showing a similar trend in error to the previous scenario. Error drops to an inflection point (in this case 5 years) before increasing again.

The Daily Maximum Melbourne Temperatures dataset describes the daily temperatures in the city Melbourne, Australia from 1981 to 1990.

The units are in degrees Celsius and there 3,650 observations, or 10 years of data. The source of the data is credited to the Australian Bureau of Meteorology.

Download the dataset and save it into your current working directory with the filename “*max-daily-temps.csv*“. Note, you may need to delete the footer information from the file.

The example below demonstrates loading the dataset as a Pandas Series.

# line plot of time series from pandas import read_csv from matplotlib import pyplot # load dataset series = read_csv('max-daily-temps.csv', header=0, index_col=0) # display first few rows print(series.head(5)) # line plot of dataset series.plot() pyplot.show()

Running the example prints the first 5 rows of data.

Date 1981-01-01 38.1 1981-01-02 32.4 1981-01-03 34.5 1981-01-04 20.7 1981-01-05 21.5

A line plot is also created. We can see we have a lot more observations than the previous two scenarios and that there is a clear seasonal trend in the data.

Because the data is daily, we need to specify the years in the test data as a function of 365 days rather than 12 months.

This ignores leap years, which is a complication that could, or even should, be addressed in your own project.

The complete example of seasonal persistence is listed below.

from pandas import read_csv from sklearn.metrics import mean_squared_error from math import sqrt from numpy import mean from matplotlib import pyplot # load data series = read_csv('max-daily-temps.csv', header=0, index_col=0) # prepare data X = series.values train, test = X[0:-(2*365)], X[-(2*365):] # evaluate mean of different number of years years = [1, 2, 3, 4, 5, 6, 7, 8] scores = list() for year in years: # walk-forward validation history = [x for x in train] predictions = list() for i in range(len(test)): # collect obs obs = list() for y in range(1, year+1): obs.append(history[-(y*365)]) # make prediction yhat = mean(obs) predictions.append(yhat) # observation history.append(test[i]) # report performance rmse = sqrt(mean_squared_error(test, predictions)) scores.append(rmse) print('Years=%d, RMSE: %.3f' % (year, rmse)) pyplot.plot(years, scores) pyplot.show()

Running the example prints the size of the sliding window and the corresponding model error.

Unlike the previous two cases, we can see a trend where the skill continues to improve as the window size is increased.

The best result is a sliding window of all 8 years of historical data with an RMSE of 4.271.

Years=1, RMSE: 5.950 Years=2, RMSE: 5.083 Years=3, RMSE: 4.664 Years=4, RMSE: 4.539 Years=5, RMSE: 4.448 Years=6, RMSE: 4.358 Years=7, RMSE: 4.371 Years=8, RMSE: 4.271

The plot of sliding window size to model error makes this trend apparent.

It suggests that getting more historical data for this problem might be useful if an optimal model turns out to be a function of the observations on the same day in prior years.

We might do just as well if the observations were averaged from the same week or month in previous seasons, and this might prove a fruitful experiment.

In this tutorial, you discovered seasonal persistence for time series forecasting.

You learned:

- How to use point observations from prior seasons for a persistence forecast.
- How to use a mean of a sliding window across multiple prior seasons for a persistence forecast.
- How to apply seasonal persistence to daily and monthly time series data.

Do you have any questions about persistence with seasonal data?

Ask your questions in the comments and I will do my best to answer.

The post Seasonal Persistence Forecasting With Python appeared first on Machine Learning Mastery.

]]>The post How to Tune ARIMA Parameters in Python appeared first on Machine Learning Mastery.

]]>In this tutorial, we take a look at a few key parameters (other than the order parameter) that you may be curious about.

Specifically, after completing this tutorial, you will know:

- How to suppress noisy output from the underlying mathematical libraries when fitting an ARIMA model.
- The effect of enabling or disabling a trend term in your ARIMA model.
- The influence of using different mathematical solvers to fit coefficients to your training data.

Note, if you are interested in tuning the order parameter, see the post:

Let’s get started.

**Updated Apr/2019**: Updated the link to dataset.

This dataset describes the monthly number of sales of shampoo over a 3 year period.

The units are a sales count and there are 36 observations. The original dataset is credited to Makridakis, Wheelwright, and Hyndman (1998).

The example below loads and creates a plot of the loaded dataset.

# load and plot dataset from pandas import read_csv from pandas import datetime from matplotlib import pyplot # load dataset def parser(x): return datetime.strptime('190'+x, '%Y-%m') series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) # summarize first few rows print(series.head()) # line plot series.plot() pyplot.show()

Running the example loads the dataset as a Pandas Series and prints the first 5 rows.

Month 1901-01-01 266.0 1901-02-01 145.9 1901-03-01 183.1 1901-04-01 119.3 1901-05-01 180.3 Name: Sales, dtype: float64

A line plot of the series is then created showing a clear increasing trend.

It is important to evaluate time series forecasting models consistently.

In this section, we will define how we will evaluate the three forecast models in this tutorial.

First, we will hold the last one year of data back and evaluate forecasts on this data. Given the data is monthly, this means that the last 12 observations will be used as test data.

We will use a walk-forward validation method to evaluate model performance. This means that each time step in the test dataset will be enumerated, a model constructed on history data, and the forecast compared to the expected value. The observation will then be added to the training dataset and the process repeated.

Walk-forward validation is a realistic way to evaluate time series forecast models as one would expect models to be updated as new observations are made available.

Finally, forecasts will be evaluated using root mean squared error, or RMSE. The benefit of RMSE is that it penalizes large errors and the scores are in the same units as the forecast values (car sales per month).

An ARIMA(4,1,0) forecast model will be used as the baseline to explore the additional parameters of the model. This may not be the optimal model for the problem, but is generally skillful against some other hand tested configurations.

In summary, the test harness involves:

- The last 2 years of data used a test set.
- Walk-forward validation for model evaluation.
- Root mean squared error used to report model skill.
- An ARIMA(4,1,0) model will be used as a baseline.

The complete example is listed below.

from pandas import read_csv from pandas import datetime from matplotlib import pyplot from statsmodels.tsa.arima_model import ARIMA from sklearn.metrics import mean_squared_error from math import sqrt # load dataset def parser(x): return datetime.strptime('190'+x, '%Y-%m') series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) # split into train and test sets X = series.values train, test = X[0:-12], X[-12:] history = [x for x in train] predictions = list() # walk-forward validation for t in range(len(test)): # fit model model = ARIMA(history, order=(4,1,0)) model_fit = model.fit() # one step forecast yhat = model_fit.forecast()[0] # store forecast and ob predictions.append(yhat) history.append(test[t]) # evaluate forecasts rmse = sqrt(mean_squared_error(test, predictions)) print('Test RMSE: %.3f' % rmse) # plot forecasts against actual outcomes pyplot.plot(test) pyplot.plot(predictions, color='red') pyplot.show()

Running the example spews a lot of convergence information and finishes with an RMSE score of 84.832 monthly shampoo sales.

... Tit = total number of iterations Tnf = total number of function evaluations Tnint = total number of segments explored during Cauchy searches Skip = number of BFGS updates skipped Nact = number of active bounds at final generalized Cauchy point Projg = norm of the final projected gradient F = final function value * * * N Tit Tnf Tnint Skip Nact Projg F 5 15 20 1 0 0 8.882D-08 5.597D+00 F = 5.5972342395324288 CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH Cauchy time 0.000E+00 seconds. Subspace minimization time 0.000E+00 seconds. Line search time 0.000E+00 seconds. Total User time 0.000E+00 seconds. Test RMSE: 84.832

A plot of the forecast vs the actual observations in the test harness is created to give some context for the model we are working with.

Now let’s dive into some of the other ARIMA parameters.

The first parameter we will look at is the *disp* parameter.

This is described as follows:

If True, convergence information is printed. For the default l_bfgs_b solver, disp controls the frequency of the output during the iterations. disp < 0 means no output in this case.

By default, this parameter is set to 1, which shows output.

We are dealing with this first because it is critical in removing all of the convergence output when evaluating the ARIMA model using walk-forward validation.

Setting it to *False* turns off all of this noise.

The complete example is listed below.

from pandas import read_csv from pandas import datetime from matplotlib import pyplot from statsmodels.tsa.arima_model import ARIMA from sklearn.metrics import mean_squared_error from math import sqrt # load dataset def parser(x): return datetime.strptime('190'+x, '%Y-%m') series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) # split into train and test sets X = series.values size = int(len(X) * 0.66) train, test = X[0:size], X[size:len(X)] history = [x for x in train] predictions = list() # walk-forward validation for t in range(len(test)): # fit model model = ARIMA(history, order=(4,1,0)) model_fit = model.fit(disp=False) # one step forecast yhat = model_fit.forecast()[0] # store forecast and ob predictions.append(yhat) history.append(test[t]) # evaluate forecasts rmse = sqrt(mean_squared_error(test, predictions)) print('Test RMSE: %.3f' % rmse)

Running this example not only produces a cleaner output, but also is much faster to execute.

Test RMSE: 81.545

We will leave *disp=False* on all following examples.

This parameter controls whether or not to perform a transform on AR parameters.

Specifically, it is described as:

Whether or not to transform the parameters to ensure stationarity. Uses the transformation suggested in Jones (1980). If False, no checking for stationarity or invertibility is done.

By default, *transparams* is set to *True*, meaning this transform is performed.

This parameter is also used on the R version of the ARIMA implementation (see docs) and I expect this is why it is here in statsmodels.

The statsmodels doco is weak on this, but you can learn more about the transform in the paper:

The example below demonstrates turning this parameter off.

from pandas import read_csv from pandas import datetime from matplotlib import pyplot from statsmodels.tsa.arima_model import ARIMA from sklearn.metrics import mean_squared_error from math import sqrt # load dataset def parser(x): return datetime.strptime('190'+x, '%Y-%m') series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) # split into train and test sets X = series.values size = int(len(X) * 0.66) train, test = X[0:size], X[size:len(X)] history = [x for x in train] predictions = list() # walk-forward validation for t in range(len(test)): # fit model model = ARIMA(history, order=(4,1,0)) model_fit = model.fit(disp=False, transparams=False) # one step forecast yhat = model_fit.forecast()[0] # store forecast and ob predictions.append(yhat) history.append(test[t]) # evaluate forecasts rmse = sqrt(mean_squared_error(test, predictions)) print('Test RMSE: %.3f' % rmse)

Running this example results in more convergence warnings from the solver.

The RMSE of the model with *transparams* turned off also results in slightly worse results on this dataset.

Experiment with this parameter on and off on your dataset and confirm it results in a benefit.

... .../site-packages/statsmodels/base/model.py:496: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals "Check mle_retvals", ConvergenceWarning) Test RMSE: 81.778

The *trend* parameter adds an additional constant term to the model. Think of it like a bias or intercept term.

It is described as:

Whether to include a constant or not. ‘c’ includes constant, ‘nc’ no constant.

By default, a trend term is enabled with *trend* set to ‘*c*‘.

We can see the effect clearly if we rerun the original example and print the model coefficients for each step of the walk-forward validation and compare the same with the trend term turned off.

The below example prints the coefficients each iteration with the trend constant enabled (the default).

from pandas import read_csv from pandas import datetime from matplotlib import pyplot from statsmodels.tsa.arima_model import ARIMA from sklearn.metrics import mean_squared_error from math import sqrt # load dataset def parser(x): return datetime.strptime('190'+x, '%Y-%m') series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) # split into train and test sets X = series.values size = int(len(X) * 0.66) train, test = X[0:size], X[size:len(X)] history = [x for x in train] predictions = list() # walk-forward validation for t in range(len(test)): # fit model model = ARIMA(history, order=(4,1,0)) model_fit = model.fit(disp=False, trend='c') print(model_fit.params) # one step forecast yhat = model_fit.forecast()[0] # store forecast and ob predictions.append(yhat) history.append(test[t]) # evaluate forecasts rmse = sqrt(mean_squared_error(test, predictions)) print('Test RMSE: %.3f' % rmse)

Running the example shows the 4 AR terms specified in the order of the model plus the first term in the array, which is a trend constant.

Note that one set of parameters is printed for each model fit, one for each step of the walk-forward validation.

... [ 11.42283717 -1.16087885 -0.6519841 -0.547411 -0.28820764] [ 11.75656838 -1.11443479 -0.61607471 -0.49084722 -0.24452864] [ 11.40486702 -1.11705478 -0.65344924 -0.50213939 -0.25677931] Test RMSE: 81.545

We can repeat this experiment with the trend term disabled (*trend=’nc’*), as follows.

from pandas import read_csv from pandas import datetime from matplotlib import pyplot from statsmodels.tsa.arima_model import ARIMA from sklearn.metrics import mean_squared_error from math import sqrt # load dataset def parser(x): return datetime.strptime('190'+x, '%Y-%m') series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) # split into train and test sets X = series.values size = int(len(X) * 0.66) train, test = X[0:size], X[size:len(X)] history = [x for x in train] predictions = list() # walk-forward validation for t in range(len(test)): # fit model model = ARIMA(history, order=(4,1,0)) model_fit = model.fit(disp=False, trend='nc') print(model_fit.params) # one step forecast yhat = model_fit.forecast()[0] # store forecast and ob predictions.append(yhat) history.append(test[t]) # evaluate forecasts rmse = sqrt(mean_squared_error(test, predictions)) print('Test RMSE: %.3f' % rmse)

Running the example shows a slightly worse RMSE score on this problem, with this ARIMA configuration.

We can see that the constant term (11.xxx) removed from the array of coefficients each iteration.

... [-0.90717131 -0.22332019 -0.11240858 -0.04008561] [-0.88836083 -0.21098412 -0.09046333 -0.02121404] [-0.89260136 -0.24120301 -0.10243393 -0.03165432] Test RMSE: 95.061

Experiment on your own problem and determine whether this constant improves performance.

My own experimentation suggests that ARIMA models may be less likely to converge with the *trend* term disabled, especially when using more than zero MA terms.

The *solver* parameter specifies the numerical optimization method to fit the coefficients to the data.

There is often little reason to tune this parameter other than execution speed if you have a lot of data. The differences will likely be quite minor.

The parameter is described as follows:

Solver to be used. The default is ‘lbfgs’ (limited memory Broyden-Fletcher-Goldfarb-Shanno). Other choices are ‘bfgs’, ‘newton’ (Newton-Raphson), ‘nm’ (Nelder-Mead), ‘cg’ – (conjugate gradient), ‘ncg’ (non-conjugate gradient), and ‘powell’. By default, the limited memory BFGS uses m=12 to approximate the Hessian, projected gradient tolerance of 1e-8 and factr = 1e2. You can change these by using kwargs.

The default is the fast “*lbfgs*” method (Limited-memory BFGS).

Nevertheless, below is an experiment that compares the RMSE model skill and execution time of each solver.

from pandas import read_csv from pandas import datetime from matplotlib import pyplot from statsmodels.tsa.arima_model import ARIMA from sklearn.metrics import mean_squared_error from math import sqrt from time import time # load dataset def parser(x): return datetime.strptime('190'+x, '%Y-%m') series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) # split into train and test sets X = series.values size = int(len(X) * 0.66) train, test = X[0:size], X[size:len(X)] # solvers solvers = ['lbfgs', 'bfgs', 'newton', 'nm', 'cg', 'ncg', 'powell'] scores = [] times = [] for solver in solvers: start_time = time() history = [x for x in train] predictions = list() # walk-forward validation for t in range(len(test)): # fit model model = ARIMA(history, order=(4,1,0)) model_fit = model.fit(disp=False, solver=solver) # one step forecast yhat = model_fit.forecast()[0] # store forecast and ob predictions.append(yhat) history.append(test[t]) # evaluate forecasts rmse = sqrt(mean_squared_error(test, predictions)) timing = time() - start_time scores.append(rmse) times.append(timing) print('Solver=%s, Test RMSE: %.3f, Time=%f' % (solver, rmse, timing)) # plot scores ticks = [i for i in range(len(solvers))] pyplot.bar(ticks, scores) pyplot.xticks(ticks, solvers) pyplot.show() # plot times ticks = [i for i in range(len(solvers))] pyplot.bar(ticks, times) pyplot.xticks(ticks, solvers) pyplot.show()

Running the example prints the RMSE and time in seconds of each *solver*.

Solver=lbfgs, Test RMSE: 81.545, Time=1.630316 Solver=bfgs, Test RMSE: 81.545, Time=2.122630 Solver=newton, Test RMSE: 81.545, Time=2.418718 Solver=nm, Test RMSE: 81.472, Time=1.432801 Solver=cg, Test RMSE: 81.543, Time=3.474055 Solver=ncg, Test RMSE: 81.545, Time=2.643767 Solver=powell, Test RMSE: 81.704, Time=1.839257

A graph of *solver* vs RMSE is provided. As expected, there is little difference between the solvers on this small dataset.

You may see different results or different stability of the solvers on your own problem.

A graph of *solver* vs execution time in seconds is also created. The graph shows a marked difference between solvers.

Generally, “*lbfgs*” and “*bfgs*” provide good real-world tradeoff between speed, performance, and stability.

If you do decide to test out solvers, you may also want to vary the “*maxiter*” that limits the number of iterations before converge, the “*tol*” parameter that defines the precision of convergence, and the “*method*” parameter that defines the cost function being optimized.

This section lists some resources you may find useful alongside this tutorial.

- ARIMA Class API
- ARIMAResults Class API
- Source code for the ARIMA and ARIMAResults classes.
- How to Grid Search ARIMA Model Hyperparameters with Python

In this tutorial, you discovered some of the finer points in configuring your ARIMA model with Statsmodels in Python.

Specifically, you learned:

- How to turn off the noisy convergence output from the solver when fitting coefficients.
- How to evaluate the difference between different solvers to fit your ARIMA model.
- The effect of enabling and disabling a trend term in your ARIMA model.

Do you have any questions about fitting your ARIMA model in Python?

Ask your question in the comments below and I will do my best to answer.

The post How to Tune ARIMA Parameters in Python appeared first on Machine Learning Mastery.

]]>The post Simple Time Series Forecasting Models to Test So That You Don’t Fool Yourself appeared first on Machine Learning Mastery.

]]>This requires that you evaluate a suite of standard naive, or simple, time series forecasting models to get an idea of the worst acceptable performance on the problem for more sophisticated models to beat.

Applying these simple models can also uncover new ideas about more advanced methods that may result in better performance.

In this tutorial, you will discover how to implement and automate three standard baseline time series forecasting methods on a real world dataset.

Specifically, you will learn:

- How to automate the persistence model and test a suite of persisted values.
- How to automate the expanding window model.
- How to automate the rolling window forecast model and test a suite of window sizes.

This is an important topic and highly recommended for any time series forecasting project.

Let’s get started.

**Updated Apr/2019**: Updated the link to dataset.**Updated Aug/2019**: Updated data loading to use new API.

This tutorial is broken down into the following 5 parts:

**Monthly Car Sales Dataset**: An overview of the standard time series dataset we will use.**Test Setup**: How we will evaluate forecast models in this tutorial.**Persistence Forecast**: The persistence forecast and how to automate it.**Expanding Window Forecast**: The expanding window forecast and how to automate it.**Rolling Window Forecast**: The rolling window forecast and how to automate it.

An up-to-date Python SciPy environment is used, including Python 2 or 3, Pandas, Numpy, and Matplotlib.

In this tutorial, we will use the Monthly Car Sales dataset.

This dataset describes the number of car sales in Quebec, Canada between 1960 and 1968.

The units are a count of the number of sales and there are 108 observations. The source data is credited to Abraham and Ledolter (1983).

Download the dataset and save it into your current working directory with the filename “*car-sales.csv*“. Note, you may need to delete the footer information from the file.

The code below loads the dataset as a Pandas Series object.

# line plot of time series from pandas import read_csv from matplotlib import pyplot # load dataset series = read_csv('car-sales.csv', header=0, index_col=0) # display first few rows print(series.head(5)) # line plot of dataset series.plot() pyplot.show()

Running the example prints the first 5 rows of data.

Month 1960-01-01 6550 1960-02-01 8728 1960-03-01 12026 1960-04-01 14395 1960-05-01 14587 Name: Sales, dtype: int64

A line plot of the data is also provided.

It is important to evaluate time series forecasting models consistently.

In this section, we will define how we will evaluate the three forecast models in this tutorial.

First, we will hold the last two years of data back and evaluate forecasts on this data. Given the data is monthly, this means that the last 24 observations will be used as test data.

We will use a walk-forward validation method to evaluate model performance. This means that each time step in the test dataset will be enumerated, a model constructed on history data, and the forecast compared to the expected value. The observation will then be added to the training dataset and the process repeated.

Walk-forward validation is a realistic way to evaluate time series forecast models as one would expect models to be updated as new observations are made available.

Finally, forecasts will be evaluated using root mean squared error or RMSE. The benefit of RMSE is that it penalizes large errors and the scores are in the same units as the forecast values (car sales per month).

In summary, the test harness involves:

- The last 2 years of data used a test set.
- Walk-forward validation for model evaluation.
- Root mean squared error used to report model skill.

The persistence forecast involves using the previous observation to predict the next time step.

For this reason, the approach is often called the naive forecast.

Why stop with using the previous observation? In this section, we will look at automating the persistence forecast and evaluate the use of any arbitrary prior time step to predict the next time step.

We will explore using each of the prior 24 months of point observations in a persistence model. Each configuration will be evaluated using the test harness and RMSE scores collected. We will then display the scores and graph the relationship between the persisted time step and the model skill.

The complete example is listed below.

from pandas import read_csv from sklearn.metrics import mean_squared_error from math import sqrt from matplotlib import pyplot # load data series = read_csv('car-sales.csv', header=0, index_col=0) # prepare data X = series.values train, test = X[0:-24], X[-24:] persistence_values = range(1, 25) scores = list() for p in persistence_values: # walk-forward validation history = [x for x in train] predictions = list() for i in range(len(test)): # make prediction yhat = history[-p] predictions.append(yhat) # observation history.append(test[i]) # report performance rmse = sqrt(mean_squared_error(test, predictions)) scores.append(rmse) print('p=%d RMSE:%.3f' % (p, rmse)) # plot scores over persistence values pyplot.plot(persistence_values, scores) pyplot.show()

Running the example prints the RMSE for each persisted point observation.

p=1 RMSE:3947.200 p=2 RMSE:5485.353 p=3 RMSE:6346.176 p=4 RMSE:6474.553 p=5 RMSE:5756.543 p=6 RMSE:5756.076 p=7 RMSE:5958.665 p=8 RMSE:6543.266 p=9 RMSE:6450.839 p=10 RMSE:5595.971 p=11 RMSE:3806.482 p=12 RMSE:1997.732 p=13 RMSE:3968.987 p=14 RMSE:5210.866 p=15 RMSE:6299.040 p=16 RMSE:6144.881 p=17 RMSE:5349.691 p=18 RMSE:5534.784 p=19 RMSE:5655.016 p=20 RMSE:6746.872 p=21 RMSE:6784.611 p=22 RMSE:5642.737 p=23 RMSE:3692.062 p=24 RMSE:2119.103

A plot of the persisted value (t-n) to model skill (RMSE) is also created.

From the results, it is clear that persisting the observation from 12 months ago or 24 months ago is a great starting point on this dataset.

The best result achieved involved persisting the result from t-12 with an RMSE of 1997.732 car sales.

This is an obvious result, but also very useful.

We would expect that a forecast model that is some weighted combination of the observations at t-12, t-24, t-36 and so on would be a powerful starting point.

It also points out that the naive t-1 persistence would have been a less desirable starting point on this dataset.

We can use the t-12 model to make a prediction and plot it against the test data.

The complete example is listed below.

from pandas import read_csv from sklearn.metrics import mean_squared_error from math import sqrt from matplotlib import pyplot # load data series = read_csv('car-sales.csv', header=0, index_col=0) # prepare data X = series.values train, test = X[0:-24], X[-24:] # walk-forward validation history = [x for x in train] predictions = list() for i in range(len(test)): # make prediction yhat = history[-12] predictions.append(yhat) # observation history.append(test[i]) # plot predictions vs observations pyplot.plot(test) pyplot.plot(predictions) pyplot.show()

Running the example plots the test dataset (blue) against the predicted values (orange).

You can learn more about the persistence model for time series forecasting in the post:

An expanding window refers to a model that calculates a statistic on all available historic data and uses that to make a forecast.

It is an expanding window because it grows as more real observations are collected.

Two good starting point statistics to calculate are the mean and the median historical observation.

The example below uses the expanding window mean as the forecast.

from pandas import read_csv from sklearn.metrics import mean_squared_error from math import sqrt from numpy import mean # load data series = read_csv('car-sales.csv', header=0, index_col=0) # prepare data X = series.values train, test = X[0:-24], X[-24:] # walk-forward validation history = [x for x in train] predictions = list() for i in range(len(test)): # make prediction yhat = mean(history) predictions.append(yhat) # observation history.append(test[i]) # report performance rmse = sqrt(mean_squared_error(test, predictions)) print('RMSE: %.3f' % rmse)

Running the example prints the RMSE evaluation of the approach.

RMSE: 5113.067

We can also repeat the same experiment with the median of the historical observations. The complete example is listed below.

from pandas import read_csv from sklearn.metrics import mean_squared_error from math import sqrt from numpy import median # load data series = read_csv('car-sales.csv', header=0, index_col=0) # prepare data X = series.values train, test = X[0:-24], X[-24:] # walk-forward validation history = [x for x in train] predictions = list() for i in range(len(test)): # make prediction yhat = median(history) predictions.append(yhat) # observation history.append(test[i]) # report performance rmse = sqrt(mean_squared_error(test, predictions)) print('RMSE: %.3f' % rmse)

Again, running the example prints the skill of the model.

We can see that on this problem the historical mean produced a better result than the median, but both were worse models than using the optimized persistence values.

RMSE: 5527.408

We can plot the mean expanding window predictions against the test dataset to get a feeling for how the forecast actually looks in context.

The complete example is listed below.

from pandas import read_csv from sklearn.metrics import mean_squared_error from matplotlib import pyplot from numpy import mean # load data series = read_csv('car-sales.csv', header=0, index_col=0) # prepare data X = series.values train, test = X[0:-24], X[-24:] # walk-forward validation history = [x for x in train] predictions = list() for i in range(len(test)): # make prediction yhat = mean(history) predictions.append(yhat) # observation history.append(test[i]) # plot predictions vs observations pyplot.plot(test) pyplot.plot(predictions) pyplot.show()

The plot shows what a poor forecast looks like and how it does not follow the movements of the data at all, other than a slight rising trend.

You can see more examples of expanding window statistics in the post:

A rolling window model involves calculating a statistic on a fixed contiguous block of prior observations and using it as a forecast.

It is much like the expanding window, but the window size remains fixed and counts backwards from the most recent observation.

It may be more useful on time series problems where recent lag values are more predictive than older lag values.

We will automatically check different rolling window sizes from 1 to 24 months (2 years) and start by calculating the mean observation and using that as a forecast. The complete example is listed below.

from pandas import read_csv from sklearn.metrics import mean_squared_error from math import sqrt from matplotlib import pyplot from numpy import mean # load data series = read_csv('car-sales.csv', header=0, index_col=0) # prepare data X = series.values train, test = X[0:-24], X[-24:] window_sizes = range(1, 25) scores = list() for w in window_sizes: # walk-forward validation history = [x for x in train] predictions = list() for i in range(len(test)): # make prediction yhat = mean(history[-w:]) predictions.append(yhat) # observation history.append(test[i]) # report performance rmse = sqrt(mean_squared_error(test, predictions)) scores.append(rmse) print('w=%d RMSE:%.3f' % (w, rmse)) # plot scores over window sizes values pyplot.plot(window_sizes, scores) pyplot.show()

Running the example prints the rolling window size and RMSE for each configuration.

w=1 RMSE:3947.200 w=2 RMSE:4350.413 w=3 RMSE:4701.446 w=4 RMSE:4810.510 w=5 RMSE:4649.667 w=6 RMSE:4549.172 w=7 RMSE:4515.684 w=8 RMSE:4614.551 w=9 RMSE:4653.493 w=10 RMSE:4563.802 w=11 RMSE:4321.599 w=12 RMSE:4023.968 w=13 RMSE:3901.634 w=14 RMSE:3907.671 w=15 RMSE:4017.276 w=16 RMSE:4084.080 w=17 RMSE:4076.399 w=18 RMSE:4085.376 w=19 RMSE:4101.505 w=20 RMSE:4195.617 w=21 RMSE:4269.784 w=22 RMSE:4258.226 w=23 RMSE:4158.029 w=24 RMSE:4021.885

A line plot of window size to error is also created.

The results suggest that a rolling window of w=13 was best with an RMSE of 3,901 monthly car sales.

We can repeat this experiment with the median statistic.

The complete example is listed below.

from pandas import read_csv from sklearn.metrics import mean_squared_error from math import sqrt from matplotlib import pyplot from numpy import median # load data series = read_csv('car-sales.csv', header=0, index_col=0) # prepare data X = series.values train, test = X[0:-24], X[-24:] window_sizes = range(1, 25) scores = list() for w in window_sizes: # walk-forward validation history = [x for x in train] predictions = list() for i in range(len(test)): # make prediction yhat = median(history[-w:]) predictions.append(yhat) # observation history.append(test[i]) # report performance rmse = sqrt(mean_squared_error(test, predictions)) scores.append(rmse) print('w=%d RMSE:%.3f' % (w, rmse)) # plot scores over window sizes values pyplot.plot(window_sizes, scores) pyplot.show()

Running the example again prints the window size and RMSE for each configuration.

w=1 RMSE:3947.200 w=2 RMSE:4350.413 w=3 RMSE:4818.406 w=4 RMSE:4993.473 w=5 RMSE:5212.887 w=6 RMSE:5002.830 w=7 RMSE:4958.621 w=8 RMSE:4817.664 w=9 RMSE:4932.317 w=10 RMSE:4928.661 w=11 RMSE:4885.574 w=12 RMSE:4414.139 w=13 RMSE:4204.665 w=14 RMSE:4172.579 w=15 RMSE:4382.037 w=16 RMSE:4522.304 w=17 RMSE:4494.803 w=18 RMSE:4360.445 w=19 RMSE:4232.285 w=20 RMSE:4346.389 w=21 RMSE:4465.536 w=22 RMSE:4514.596 w=23 RMSE:4428.739 w=24 RMSE:4236.126

A plot of the window size and RMSE is again created.

Here, we can see that best results were achieved with a window size of w=1 with an RMSE of 3947.200 monthly car sales, which was essentially a t-1 persistence model.

The results were generally worse than optimized persistence, but better than the expanding window model. We could imagine better results with a weighted combination of window observations, this idea leads to using linear models such as AR and ARIMA.

Again, we can plot the predictions from the better model (mean rolling window with w=13) against the actual observations to get a feeling for how the forecast looks in context.

The complete example is listed below.

from pandas import read_csv from sklearn.metrics import mean_squared_error from matplotlib import pyplot from numpy import mean # load data series = read_csv('car-sales.csv', header=0, index_col=0) # prepare data X = series.values train, test = X[0:-24], X[-24:] # walk-forward validation history = [x for x in train] predictions = list() for i in range(len(test)): # make prediction yhat = mean(history[-13:]) predictions.append(yhat) # observation history.append(test[i]) # plot predictions vs observations pyplot.plot(test) pyplot.plot(predictions) pyplot.show()

Running the code creates the line plot of observations (blue) compared to the predicted values (orange).

We can see that the model better follows the level of the data, but again does not follow the actual up and down movements.

You can see more examples of rolling window statistics in the post:

In this tutorial, you discovered the importance of calculating the worst acceptable performance on a time series forecasting problem and methods that you can use to ensure you are not fooling yourself with more sophisticated methods.

Specifically, you learned:

- How to automatically test a suite of persistence configurations.
- How to evaluate an expanding window model.
- How to automatically test a suite of rolling window configurations.

Do you have any questions about baseline forecasting methods, or about this post?

Ask your questions in the comments and I will do my best to answer.

The post Simple Time Series Forecasting Models to Test So That You Don’t Fool Yourself appeared first on Machine Learning Mastery.

]]>The post Feature Selection for Time Series Forecasting with Python appeared first on Machine Learning Mastery.

]]>A univariate time series dataset is only comprised of a sequence of observations. These must be transformed into input and output features in order to use supervised learning algorithms.

The problem is that there is little limit to the type and number of features you can engineer for a time series problem. Classical time series analysis tools like the correlogram can help with evaluating lag variables, but do not directly help when selecting other types of features, such as those derived from the timestamps (year, month or day) and moving statistics, like a moving average.

In this tutorial, you will discover how you can use the machine learning tools of feature importance and feature selection when working with time series data.

After completing this tutorial, you will know:

- How to create and interpret a correlogram of lagged observations.
- How to calculate and interpret feature importance scores for time series features.
- How to perform feature selection on time series input variables.

Let’s get started.

**Updated Apr/2019**: Updated the link to dataset.**Updated Jun/2019**: Fixed indenting.**Updated Aug/2019**: Updated data loading to use new API.

This tutorial is broken down into the following 5 steps:

**Monthly Car Sales Dataset**: That describes the dataset we will be working with.**Make Stationary**: That describes how to make the dataset stationary for analysis and forecasting.**Autocorrelation Plot**: That describes how to create a correlogram of the time series data.**Feature Importance of Lag Variables**: That describes how to calculate and review feature importance scores for time series data.**Feature Selection of Lag Variables**: That describes how to calculate and review feature selection results for time series data.

Let’s start off by looking at a standard time series dataset.

Take my free 7-day email course and discover how to get started (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

In this tutorial, we will use the Monthly Car Sales dataset.

This dataset describes the number of car sales in Quebec, Canada between 1960 and 1968.

The units are a count of the number of sales and there are 108 observations. The source data is credited to Abraham and Ledolter (1983).

Download the dataset and save it into your current working directory with the filename “*car-sales.csv*“. Note, you may need to delete the footer information from the file.

The code below loads the dataset as a Pandas *Series* object.

# line plot of time series from pandas import read_csv from matplotlib import pyplot # load dataset series = read_csv('car-sales.csv', header=0, index_col=0) # display first few rows print(series.head(5)) # line plot of dataset series.plot() pyplot.show()

Running the example prints the first 5 rows of data.

Month 1960-01-01 6550 1960-02-01 8728 1960-03-01 12026 1960-04-01 14395 1960-05-01 14587 Name: Sales, dtype: int64

A line plot of the data is also provided.

We can see a clear seasonality and increasing trend in the data.

The trend and seasonality are fixed components that can be added to any prediction we make. They are useful, but need to be removed in order to explore any other systematic signals that can help make predictions.

A time series with seasonality and trend removed is called stationary.

To remove the seasonality, we can take the seasonal difference, resulting in a so-called seasonally adjusted time series.

The period of the seasonality appears to be one year (12 months). The code below calculates the seasonally adjusted time series and saves it to the file “*seasonally-adjusted.csv*“.

# seasonally adjust the time series from pandas import read_csv from matplotlib import pyplot # load dataset series = read_csv('car-sales.csv', header=0, index_col=0) # seasonal difference differenced = series.diff(12) # trim off the first year of empty data differenced = differenced[12:] # save differenced dataset to file differenced.to_csv('seasonally_adjusted.csv', index=False) # plot differenced dataset differenced.plot() pyplot.show()

Because the first 12 months of data have no prior data to be differenced against, they must be discarded.

The stationary data is stored in “*seasonally-adjusted.csv*“. A line plot of the differenced data is created.

The plot suggests that the seasonality and trend information was removed by differencing.

Traditionally, time series features are selected based on their correlation with the output variable.

This is called autocorrelation and involves plotting autocorrelation plots, also called a correlogram. These show the correlation of each lagged observation and whether or not the correlation is statistically significant.

For example, the code below plots the correlogram for all lag variables in the Monthly Car Sales dataset.

from pandas import read_csv from statsmodels.graphics.tsaplots import plot_acf from matplotlib import pyplot series = read_csv('seasonally_adjusted.csv', header=None) plot_acf(series) pyplot.show()

Running the example creates a correlogram, or Autocorrelation Function (ACF) plot, of the data.

The plot shows lag values along the x-axis and correlation on the y-axis between -1 and 1 for negatively and positively correlated lags respectively.

The dots above the blue area indicate statistical significance. The correlation of 1 for the lag value of 0 indicates 100% positive correlation of an observation with itself.

The plot shows significant lag values at 1, 2, 12, and 17 months.

This analysis provides a good baseline for comparison.

We can convert the univariate Monthly Car Sales dataset into a supervised learning problem by taking the lag observation (e.g. t-1) as inputs and using the current observation (t) as the output variable.

We can do this in Pandas using the shift function to create new columns of shifted observations.

The example below creates a new time series with 12 months of lag values to predict the current observation.

The shift of 12 months means that the first 12 rows of data are unusable as they contain *NaN* values.

from pandas import read_csv from pandas import DataFrame # load dataset series = read_csv('seasonally_adjusted.csv', header=None) # reframe as supervised learning dataframe = DataFrame() for i in range(12,0,-1): dataframe['t-'+str(i)] = series.shift(i) dataframe['t'] = series.values print(dataframe.head(13)) dataframe = dataframe[13:] # save to new file dataframe.to_csv('lags_12months_features.csv', index=False)

Running the example prints the first 13 rows of data showing the unusable first 12 rows and the usable 13th row.

t-12 t-11 t-10 t-9 t-8 t-7 t-6 t-5 \ 1961-01-01 NaN NaN NaN NaN NaN NaN NaN NaN 1961-02-01 NaN NaN NaN NaN NaN NaN NaN NaN 1961-03-01 NaN NaN NaN NaN NaN NaN NaN NaN 1961-04-01 NaN NaN NaN NaN NaN NaN NaN NaN 1961-05-01 NaN NaN NaN NaN NaN NaN NaN NaN 1961-06-01 NaN NaN NaN NaN NaN NaN NaN 687.0 1961-07-01 NaN NaN NaN NaN NaN NaN 687.0 646.0 1961-08-01 NaN NaN NaN NaN NaN 687.0 646.0 -189.0 1961-09-01 NaN NaN NaN NaN 687.0 646.0 -189.0 -611.0 1961-10-01 NaN NaN NaN 687.0 646.0 -189.0 -611.0 1339.0 1961-11-01 NaN NaN 687.0 646.0 -189.0 -611.0 1339.0 30.0 1961-12-01 NaN 687.0 646.0 -189.0 -611.0 1339.0 30.0 1645.0 1962-01-01 687.0 646.0 -189.0 -611.0 1339.0 30.0 1645.0 -276.0 t-4 t-3 t-2 t-1 t 1961-01-01 NaN NaN NaN NaN 687.0 1961-02-01 NaN NaN NaN 687.0 646.0 1961-03-01 NaN NaN 687.0 646.0 -189.0 1961-04-01 NaN 687.0 646.0 -189.0 -611.0 1961-05-01 687.0 646.0 -189.0 -611.0 1339.0 1961-06-01 646.0 -189.0 -611.0 1339.0 30.0 1961-07-01 -189.0 -611.0 1339.0 30.0 1645.0 1961-08-01 -611.0 1339.0 30.0 1645.0 -276.0 1961-09-01 1339.0 30.0 1645.0 -276.0 561.0 1961-10-01 30.0 1645.0 -276.0 561.0 470.0 1961-11-01 1645.0 -276.0 561.0 470.0 3395.0 1961-12-01 -276.0 561.0 470.0 3395.0 360.0 1962-01-01 561.0 470.0 3395.0 360.0 3440.0

The first 12 rows are removed from the new dataset and results are saved in the file “*lags_12months_features.csv*“.

This process can be repeated with an arbitrary number of time steps, such as 6 months or 24 months, and I would recommend experimenting.

Ensembles of decision trees, like bagged trees, random forest, and extra trees, can be used to calculate a feature importance score.

This is common in machine learning to estimate the relative usefulness of input features when developing predictive models.

We can use feature importance to help to estimate the relative importance of contrived input features for time series forecasting.

This is important because we can contrive not only the lag observation features above, but also features based on the timestamp of observations, rolling statistics, and much more. Feature importance is one method to help sort out what might be more useful in when modeling.

The example below loads the supervised learning view of the dataset created in the previous section, fits a random forest model (RandomForestRegressor), and summarizes the relative feature importance scores for each of the 12 lag observations.

A large-ish number of trees is used to ensure the scores are somewhat stable. Additionally, the random number seed is initialized to ensure that the same result is achieved each time the code is run.

from pandas import read_csv from sklearn.ensemble import RandomForestRegressor from matplotlib import pyplot # load data dataframe = read_csv('lags_12months_features.csv', header=0) array = dataframe.values # split into input and output X = array[:,0:-1] y = array[:,-1] # fit random forest model model = RandomForestRegressor(n_estimators=500, random_state=1) model.fit(X, y) # show importance scores print(model.feature_importances_) # plot importance scores names = dataframe.columns.values[0:-1] ticks = [i for i in range(len(names))] pyplot.bar(ticks, model.feature_importances_) pyplot.xticks(ticks, names) pyplot.show()

Running the example first prints the importance scores of the lagged observations.

[ 0.21642244 0.06271259 0.05662302 0.05543768 0.07155573 0.08478599 0.07699371 0.05366735 0.1033234 0.04897883 0.1066669 0.06283236]

The scores are then plotted as a bar graph.

The plot shows the high relative importance of the observation at t-12 and, to a lesser degree, the importance of observations at t-2 and t-4.

It is interesting to note a difference with the outcome from the correlogram above.

This process can be repeated with different methods that can calculate importance scores, such as gradient boosting, extra trees, and bagged decision trees.

We can also use feature selection to automatically identify and select those input features that are most predictive.

A popular method for feature selection is called Recursive Feature Selection (RFE).

RFE works by creating predictive models, weighting features, and pruning those with the smallest weights, then repeating the process until a desired number of features are left.

The example below uses RFE with a random forest predictive model and sets the desired number of input features to 4.

from pandas import read_csv from sklearn.feature_selection import RFE from sklearn.ensemble import RandomForestRegressor from matplotlib import pyplot # load dataset dataframe = read_csv('lags_12months_features.csv', header=0) # separate into input and output variables array = dataframe.values X = array[:,0:-1] y = array[:,-1] # perform feature selection rfe = RFE(RandomForestRegressor(n_estimators=500, random_state=1), 4) fit = rfe.fit(X, y) # report selected features print('Selected Features:') names = dataframe.columns.values[0:-1] for i in range(len(fit.support_)): if fit.support_[i]: print(names[i]) # plot feature rank names = dataframe.columns.values[0:-1] ticks = [i for i in range(len(names))] pyplot.bar(ticks, fit.ranking_) pyplot.xticks(ticks, names) pyplot.show()

Running the example prints the names of the 4 selected features.

Unsurprisingly, the results match features that showed a high importance in the previous section.

Selected Features: t-12 t-6 t-4 t-2

A bar graph is also created showing the feature selection rank (smaller is better) for each input feature.

This process can be repeated with different numbers of features to select more than 4 and different models other than random forest.

In this tutorial, you discovered how to use the tools of applied machine learning to help select features from time series data when forecasting.

Specifically, you learned:

- How to interpret a correlogram for highly correlated lagged observations.
- How to calculate and review feature importance scores in time series data.
- How to use feature selection to identify the most relevant input variables in time series data.

Do you have any questions about feature selection with time series data?

Ask your questions in the comments and I will do my best to answer.

The post Feature Selection for Time Series Forecasting with Python appeared first on Machine Learning Mastery.

]]>