Time Series Archives - MachineLearningMastery.com Making developers awesome at machine learning Wed, 09 Dec 2020 23:34:00 +0000 en-US hourly 1 https://wordpress.org/?v=6.1.1 https://machinelearningmastery.com/wp-content/uploads/2016/09/cropped-icon-32x32.png Time Series Archives - MachineLearningMastery.com 32 32 Random Forest for Time Series Forecasting https://machinelearningmastery.com/random-forest-for-time-series-forecasting/ https://machinelearningmastery.com/random-forest-for-time-series-forecasting/#comments Sun, 01 Nov 2020 18:00:23 +0000 https://35.82.237.216/?p=11722 Random Forest is a popular and effective ensemble machine learning algorithm. It is widely used for classification and regression predictive modeling problems with structured (tabular) data sets, e.g. data as it looks in a spreadsheet or database table. Random Forest can also be used for time series forecasting, although it requires that the time series […]

The post Random Forest for Time Series Forecasting appeared first on MachineLearningMastery.com.

]]>
Random Forest is a popular and effective ensemble machine learning algorithm.

It is widely used for classification and regression predictive modeling problems with structured (tabular) data sets, e.g. data as it looks in a spreadsheet or database table.

Random Forest can also be used for time series forecasting, although it requires that the time series dataset be transformed into a supervised learning problem first. It also requires the use of a specialized technique for evaluating the model called walk-forward validation, as evaluating the model using k-fold cross validation would result in optimistically biased results.

In this tutorial, you will discover how to develop a Random Forest model for time series forecasting.

After completing this tutorial, you will know:

• Random Forest is an ensemble of decision trees algorithms that can be used for classification and regression predictive modeling.
• Time series datasets can be transformed into supervised learning using a sliding-window representation.
• How to fit, evaluate, and make predictions with an Random Forest regression model for time series forecasting.

Let’s get started.

Random Forest for Time Series Forecasting
Photo by IvyMike, some rights reserved.

## Tutorial Overview

This tutorial is divided into three parts; they are:

1. Random Forest Ensemble
2. Time Series Data Preparation
3. Random Forest for Time Series

## Random Forest Ensemble

Random forest is an ensemble of decision tree algorithms.

It is an extension of bootstrap aggregation (bagging) of decision trees and can be used for classification and regression problems.

In bagging, a number of decision trees are made where each tree is created from a different bootstrap sample of the training dataset. A bootstrap sample is a sample of the training dataset where an example may appear more than once in the sample. This is referred to as “sampling with replacement”.

Bagging is an effective ensemble algorithm as each decision tree is fit on a slightly different training dataset, and in turn, has a slightly different performance. Unlike normal decision tree models, such as classification and regression trees (CART), trees used in the ensemble are unpruned, making them slightly overfit to the training dataset. This is desirable as it helps to make each tree more different and have less correlated predictions or prediction errors.

Predictions from the trees are averaged across all decision trees, resulting in better performance than any single tree in the model.

A prediction on a regression problem is the average of the prediction across the trees in the ensemble. A prediction on a classification problem is the majority vote for the class label across the trees in the ensemble.

• Regression: Prediction is the average prediction across the decision trees.
• Classification: Prediction is the majority vote class label predicted across the decision trees.

Random forest involves constructing a large number of decision trees from bootstrap samples from the training dataset, like bagging.

Unlike bagging, random forest also involves selecting a subset of input features (columns or variables) at each split point in the construction of the trees. Typically, constructing a decision tree involves evaluating the value for each input variable in the data in order to select a split point. By reducing the features to a random subset that may be considered at each split point, it forces each decision tree in the ensemble to be more different.

The effect is that the predictions, and in turn, prediction errors, made by each tree in the ensemble are more different or less correlated. When the predictions from these less correlated trees are averaged to make a prediction, it often results in better performance than bagged decision trees.

For more on the Random Forest algorithm, see the tutorial:

## Time Series Data Preparation

Time series data can be phrased as supervised learning.

Given a sequence of numbers for a time series dataset, we can restructure the data to look like a supervised learning problem. We can do this by using previous time steps as input variables and use the next time step as the output variable.

Let’s make this concrete with an example. Imagine we have a time series as follows:

time, measure
1, 100
2, 110
3, 108
4, 115
5, 120

We can restructure this time series dataset as a supervised learning problem by using the value at the previous time step to predict the value at the next time-step.

Reorganizing the time series dataset this way, the data would look as follows:

X, y
?, 100
100, 110
110, 108
108, 115
115, 120
120, ?

Note that the time column is dropped and some rows of data are unusable for training a model, such as the first and the last.

This representation is called a sliding window, as the window of inputs and expected outputs is shifted forward through time to create new “samples” for a supervised learning model.

For more on the sliding window approach to preparing time series forecasting data, see the tutorial:

We can use the shift() function in Pandas to automatically create new framings of time series problems given the desired length of input and output sequences.

This would be a useful tool as it would allow us to explore different framings of a time series problem with machine learning algorithms to see which might result in better-performing models.

The function below will take a time series as a NumPy array time series with one or more columns and transform it into a supervised learning problem with the specified number of inputs and outputs.

# transform a time series dataset into a supervised learning dataset
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
n_vars = 1 if type(data) is list else data.shape[1]
df = DataFrame(data)
cols = list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
# put it all together
agg = concat(cols, axis=1)
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg.values

We can use this function to prepare a time series dataset for Random Forest.

For more on the step-by-step development of this function, see the tutorial:

Once the dataset is prepared, we must be careful in how it is used to fit and evaluate a model.

For example, it would not be valid to fit the model on data from the future and have it predict the past. The model must be trained on the past and predict the future.

This means that methods that randomize the dataset during evaluation, like k-fold cross-validation, cannot be used. Instead, we must use a technique called walk-forward validation.

In walk-forward validation, the dataset is first split into train and test sets by selecting a cut point, e.g. all data except the last 12 months is used for training and the last 12 months is used for testing.

If we are interested in making a one-step forecast, e.g. one month, then we can evaluate the model by training on the training dataset and predicting the first step in the test dataset. We can then add the real observation from the test set to the training dataset, refit the model, then have the model predict the second step in the test dataset.

Repeating this process for the entire test dataset will give a one-step prediction for the entire test dataset from which an error measure can be calculated to evaluate the skill of the model.

For more on walk-forward validation, see the tutorial:

The function below performs walk-forward validation.

It takes the entire supervised learning version of the time series dataset and the number of rows to use as the test set as arguments.

It then steps through the test set, calling the random_forest_forecast() function to make a one-step forecast. An error measure is calculated and the details are returned for analysis.

# walk-forward validation for univariate data
def walk_forward_validation(data, n_test):
predictions = list()
# split dataset
train, test = train_test_split(data, n_test)
# seed history with training dataset
history = [x for x in train]
# step over each time-step in the test set
for i in range(len(test)):
# split test row into input and output columns
testX, testy = test[i, :-1], test[i, -1]
# fit model on history and make a prediction
yhat = random_forest_forecast(history, testX)
# store forecast in list of predictions
predictions.append(yhat)
# add actual observation to history for the next loop
history.append(test[i])
# summarize progress
print('>expected=%.1f, predicted=%.1f' % (testy, yhat))
# estimate prediction error
error = mean_absolute_error(test[:, -1], predictions)
return error, test[:, 1], predictions

The train_test_split() function is called to split the dataset into train and test sets.

We can define this function below.

# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
return data[:-n_test, :], data[-n_test:, :]

We can use the RandomForestRegressor class to make a one-step forecast.

The random_forest_forecast() function below implements this, taking the training dataset and test input row as input, fitting a model and making a one-step prediction.

# fit an random forest model and make a one step prediction
def random_forest_forecast(train, testX):
# transform list into array
train = asarray(train)
# split into input and output columns
trainX, trainy = train[:, :-1], train[:, -1]
# fit model
model = RandomForestRegressor(n_estimators=1000)
model.fit(trainX, trainy)
# make a one-step prediction
yhat = model.predict([testX])
return yhat[0]

Now that we know how to prepare time series data for forecasting and evaluate a Random Forest model, next we can look at using Random Forest on a real dataset.

## Random Forest for Time Series

In this section, we will explore how to use the Random Forest regressor for time series forecasting.

We will use a standard univariate time series dataset with the intent of using the model to make a one-step forecast.

You can use the code in this section as the starting point in your own project and easily adapt it for multivariate inputs, multivariate forecasts, and multi-step forecasts.

We will use the daily female births dataset, that is the monthly births across three years.

You can download the dataset from here, place it in your current working directory with the filename “daily-total-female-births.csv“.

The first few lines of the dataset look as follows:

"Date","Births"
"1959-01-01",35
"1959-01-02",32
"1959-01-03",30
"1959-01-04",31
"1959-01-05",44
...

First, let’s load and plot the dataset.

The complete example is listed below.

# load and plot the time series dataset
from matplotlib import pyplot
values = series.values
# plot dataset
pyplot.plot(values)
pyplot.show()

Running the example creates a line plot of the dataset.

We can see there is no obvious trend or seasonality.

Line Plot of Monthly Births Time Series Dataset

A persistence model can achieve a MAE of about 6.7 births when predicting the last 12 months. This provides a baseline in performance above which a model may be considered skillful.

Next, we can evaluate the Random Forest model on the dataset when making one-step forecasts for the last 12 months of data.

We will use only the previous six time steps as input to the model and default model hyperparameters, except we will use 1,000 trees in the ensemble (to avoid underlearning).

The complete example is listed below.

# forecast monthly births with random forest
from numpy import asarray
from pandas import DataFrame
from pandas import concat
from sklearn.metrics import mean_absolute_error
from sklearn.ensemble import RandomForestRegressor
from matplotlib import pyplot

# transform a time series dataset into a supervised learning dataset
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
n_vars = 1 if type(data) is list else data.shape[1]
df = DataFrame(data)
cols = list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
# put it all together
agg = concat(cols, axis=1)
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg.values

# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
return data[:-n_test, :], data[-n_test:, :]

# fit an random forest model and make a one step prediction
def random_forest_forecast(train, testX):
# transform list into array
train = asarray(train)
# split into input and output columns
trainX, trainy = train[:, :-1], train[:, -1]
# fit model
model = RandomForestRegressor(n_estimators=1000)
model.fit(trainX, trainy)
# make a one-step prediction
yhat = model.predict([testX])
return yhat[0]

# walk-forward validation for univariate data
def walk_forward_validation(data, n_test):
predictions = list()
# split dataset
train, test = train_test_split(data, n_test)
# seed history with training dataset
history = [x for x in train]
# step over each time-step in the test set
for i in range(len(test)):
# split test row into input and output columns
testX, testy = test[i, :-1], test[i, -1]
# fit model on history and make a prediction
yhat = random_forest_forecast(history, testX)
# store forecast in list of predictions
predictions.append(yhat)
# add actual observation to history for the next loop
history.append(test[i])
# summarize progress
print('>expected=%.1f, predicted=%.1f' % (testy, yhat))
# estimate prediction error
error = mean_absolute_error(test[:, -1], predictions)
return error, test[:, -1], predictions

values = series.values
# transform the time series data into supervised learning
data = series_to_supervised(values, n_in=6)
# evaluate
mae, y, yhat = walk_forward_validation(data, 12)
print('MAE: %.3f' % mae)
# plot expected vs predicted
pyplot.plot(y, label='Expected')
pyplot.plot(yhat, label='Predicted')
pyplot.legend()
pyplot.show()

Running the example reports the expected and predicted values for each step in the test set, then the MAE for all predicted values.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

We can see that the model performs better than a persistence model, achieving a MAE of about 5.9 births, compared to 6.7 births.

Can you do better?

You can test different Random Forest hyperparameters and numbers of time steps as input to see if you can achieve better performance. Share your results in the comments below.

>expected=42.0, predicted=45.0
>expected=53.0, predicted=43.7
>expected=39.0, predicted=41.4
>expected=40.0, predicted=38.1
>expected=38.0, predicted=42.6
>expected=44.0, predicted=48.7
>expected=34.0, predicted=42.7
>expected=37.0, predicted=37.0
>expected=52.0, predicted=38.4
>expected=48.0, predicted=41.4
>expected=55.0, predicted=43.7
>expected=50.0, predicted=45.3
MAE: 5.905

A line plot is created comparing the series of expected values and predicted values for the last 12 months of the dataset.

This gives a geometric interpretation of how well the model performed on the test set.

Line Plot of Expected vs. Births Predicted Using Random Forest

Once a final Random Forest model configuration is chosen, a model can be finalized and used to make a prediction on new data.

This is called an out-of-sample forecast, e.g. predicting beyond the training dataset. This is identical to making a prediction during the evaluation of the model, as we always want to evaluate a model using the same procedure that we expect to use when the model is used to make predictions on new data.

The example below demonstrates fitting a final Random Forest model on all available data and making a one-step prediction beyond the end of the dataset.

# finalize model and make a prediction for monthly births with random forest
from numpy import asarray
from pandas import DataFrame
from pandas import concat
from sklearn.ensemble import RandomForestRegressor

# transform a time series dataset into a supervised learning dataset
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
n_vars = 1 if type(data) is list else data.shape[1]
df = DataFrame(data)
cols = list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
# put it all together
agg = concat(cols, axis=1)
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg.values

values = series.values
# transform the time series data into supervised learning
train = series_to_supervised(values, n_in=6)
# split into input and output columns
trainX, trainy = train[:, :-1], train[:, -1]
# fit model
model = RandomForestRegressor(n_estimators=1000)
model.fit(trainX, trainy)
# construct an input for a new prediction
row = values[-6:].flatten()
# make a one-step prediction
yhat = model.predict(asarray([row]))
print('Input: %s, Predicted: %.3f' % (row, yhat[0]))

Running the example fits an Random Forest model on all available data.

A new row of input is prepared using the last six months of known data and the next month beyond the end of the dataset is predicted.

Input: [34 37 52 48 55 50], Predicted: 43.053

This section provides more resources on the topic if you are looking to go deeper.

## Summary

In this tutorial, you discovered how to develop a Random Forest model for time series forecasting.

Specifically, you learned:

• Random Forest is an ensemble of decision trees algorithms that can be used for classification and regression predictive modeling.
• Time series datasets can be transformed into supervised learning using a sliding-window representation.
• How to fit, evaluate, and make predictions with an Random Forest regression model for time series forecasting.

Do you have any questions?

The post Random Forest for Time Series Forecasting appeared first on MachineLearningMastery.com.

]]>
https://machinelearningmastery.com/random-forest-for-time-series-forecasting/feed/ 133
Time Series Forecasting With Prophet in Python https://machinelearningmastery.com/time-series-forecasting-with-prophet-in-python/ https://machinelearningmastery.com/time-series-forecasting-with-prophet-in-python/#comments Tue, 25 Aug 2020 19:00:44 +0000 https://35.82.237.216/?p=10417 Time series forecasting can be challenging as there are many different methods you could use and many different hyperparameters for each method. The Prophet library is an open-source library designed for making forecasts for univariate time series datasets. It is easy to use and designed to automatically find a good set of hyperparameters for the […]

The post Time Series Forecasting With Prophet in Python appeared first on MachineLearningMastery.com.

]]>
Time series forecasting can be challenging as there are many different methods you could use and many different hyperparameters for each method.

The Prophet library is an open-source library designed for making forecasts for univariate time series datasets. It is easy to use and designed to automatically find a good set of hyperparameters for the model in an effort to make skillful forecasts for data with trends and seasonal structure by default.

In this tutorial, you will discover how to use the Facebook Prophet library for time series forecasting.

After completing this tutorial, you will know:

• Prophet is an open-source library developed by Facebook and designed for automatic forecasting of univariate time series data.
• How to fit Prophet models and use them to make in-sample and out-of-sample forecasts.
• How to evaluate a Prophet model on a hold-out dataset.

Let’s get started.

Time Series Forecasting With Prophet in Python
Photo by Rinaldo Wurglitsch, some rights reserved.

## Tutorial Overview

This tutorial is divided into three parts; they are:

1. Prophet Forecasting Library
2. Car Sales Dataset
3. Forecast Car Sales With Prophet
1. Fit Prophet Model
2. Make an In-Sample Forecast
3. Make an Out-of-Sample Forecast
4. Manually Evaluate Forecast Model

## Prophet Forecasting Library

Prophet, or “Facebook Prophet,” is an open-source library for univariate (one variable) time series forecasting developed by Facebook.

Prophet implements what they refer to as an additive time series forecasting model, and the implementation supports trends, seasonality, and holidays.

Implements a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects

Package ‘prophet’, 2019.

It is designed to be easy and completely automatic, e.g. point it at a time series and get a forecast. As such, it is intended for internal company use, such as forecasting sales, capacity, etc.

For a great overview of Prophet and its capabilities, see the post:

The library provides two interfaces, including R and Python. We will focus on the Python interface in this tutorial.

The first step is to install the Prophet library using Pip, as follows:

sudo pip install fbprophet

Next, we can confirm that the library was installed correctly.

To do this, we can import the library and print the version number in Python. The complete example is listed below.

# check prophet version
import fbprophet
# print version number
print('Prophet %s' % fbprophet.__version__)

Running the example prints the installed version of Prophet.

You should have the same version or higher.

Prophet 0.5

Now that we have Prophet installed, let’s select a dataset we can use to explore using the library.

## Car Sales Dataset

We will use the monthly car sales dataset.

It is a standard univariate time series dataset that contains both a trend and seasonality. The dataset has 108 months of data and a naive persistence forecast can achieve a mean absolute error of about 3,235 sales, providing a lower error limit.

First, let’s load and summarize the dataset.

Prophet requires data to be in Pandas DataFrames. Therefore, we will load and summarize the data using Pandas.

We can load the data directly from the URL by calling the read_csv() Pandas function, then summarize the shape (number of rows and columns) of the data and view the first few rows of data.

The complete example is listed below.

# load the car sales dataset
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv'
# summarize shape
print(df.shape)
# show first few rows

Running the example first reports the number of rows and columns, then lists the first five rows of data.

We can see that as we expected, there are 108 months worth of data and two columns. The first column is the date and the second is the number of sales.

Note that the first column in the output is a row index and is not a part of the dataset, just a helpful tool that Pandas uses to order rows.

(108, 2)
Month  Sales
0  1960-01   6550
1  1960-02   8728
2  1960-03  12026
3  1960-04  14395
4  1960-05  14587

A time-series dataset does not make sense to us until we plot it.

Plotting a time series helps us actually see if there is a trend, a seasonal cycle, outliers, and more. It gives us a feel for the data.

We can plot the data easily in Pandas by calling the plot() function on the DataFrame.

The complete example is listed below.

# load and plot the car sales dataset
from matplotlib import pyplot
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv'
# plot the time series
df.plot()
pyplot.show()

Running the example creates a plot of the time series.

We can clearly see the trend in sales over time and a monthly seasonal pattern to the sales. These are patterns we expect the forecast model to take into account.

Line Plot of Car Sales Dataset

Now that we are familiar with the dataset, let’s explore how we can use the Prophet library to make forecasts.

## Forecast Car Sales With Prophet

In this section, we will explore using the Prophet to forecast the car sales dataset.

Let’s start by fitting a model on the dataset

### Fit Prophet Model

To use Prophet for forecasting, first, a Prophet() object is defined and configured, then it is fit on the dataset by calling the fit() function and passing the data.

The Prophet() object takes arguments to configure the type of model you want, such as the type of growth, the type of seasonality, and more. By default, the model will work hard to figure out almost everything automatically.

The fit() function takes a DataFrame of time series data. The DataFrame must have a specific format. The first column must have the name ‘ds‘ and contain the date-times. The second column must have the name ‘y‘ and contain the observations.

This means we change the column names in the dataset. It also requires that the first column be converted to date-time objects, if they are not already (e.g. this can be down as part of loading the dataset with the right arguments to read_csv).

For example, we can modify our loaded car sales dataset to have this expected structure, as follows:

...
# prepare expected column names
df.columns = ['ds', 'y']
df['ds']= to_datetime(df['ds'])

The complete example of fitting a Prophet model on the car sales dataset is listed below.

# fit prophet model on the car sales dataset
from pandas import to_datetime
from fbprophet import Prophet
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv'
# prepare expected column names
df.columns = ['ds', 'y']
df['ds']= to_datetime(df['ds'])
# define the model
model = Prophet()
# fit the model
model.fit(df)

Running the example loads the dataset, prepares the DataFrame in the expected format, and fits a Prophet model.

By default, the library provides a lot of verbose output during the fit process. I think it’s a bad idea in general as it trains developers to ignore output.

Nevertheless, the output summarizes what happened during the model fitting process, specifically the optimization processes that ran.

INFO:fbprophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Initial log joint probability = -4.39613
Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
99       270.121    0.00413718       75.7289           1           1      120
Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
179       270.265    0.00019681       84.1622   2.169e-06       0.001      273  LS failed, Hessian reset
199       270.283   1.38947e-05       87.8642      0.3402           1      299
Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
240       270.296    1.6343e-05       89.9117   1.953e-07       0.001      381  LS failed, Hessian reset
299         270.3   4.73573e-08       74.9719      0.3914           1      455
Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
300         270.3   8.25604e-09       74.4478      0.3522      0.3522      456
Optimization terminated normally:
Convergence detected: absolute parameter change was below tolerance

I will not reproduce this output in subsequent sections when we fit the model.

Next, let’s make a forecast.

### Make an In-Sample Forecast

It can be useful to make a forecast on historical data.

That is, we can make a forecast on data used as input to train the model. Ideally, the model has seen the data before and would make a perfect prediction.

Nevertheless, this is not the case as the model tries to generalize across all cases in the data.

This is called making an in-sample (in training set sample) forecast and reviewing the results can give insight into how good the model is. That is, how well it learned the training data.

A forecast is made by calling the predict() function and passing a DataFrame that contains one column named ‘ds‘ and rows with date-times for all the intervals to be predicted.

There are many ways to create this “forecastDataFrame. In this case, we will loop over one year of dates, e.g. the last 12 months in the dataset, and create a string for each month. We will then convert the list of dates into a DataFrame and convert the string values into date-time objects.

...
# define the period for which we want a prediction
future = list()
for i in range(1, 13):
date = '1968-%02d' % i
future.append([date])
future = DataFrame(future)
future.columns = ['ds']
future['ds']= to_datetime(future['ds'])

This DataFrame can then be provided to the predict() function to calculate a forecast.

The result of the predict() function is a DataFrame that contains many columns. Perhaps the most important columns are the forecast date time (‘ds‘), the forecasted value (‘yhat‘), and the lower and upper bounds on the predicted value (‘yhat_lower‘ and ‘yhat_upper‘) that provide uncertainty of the forecast.

For example, we can print the first few predictions as follows:

...
# summarize the forecast

Prophet also provides a built-in tool for visualizing the prediction in the context of the training dataset.

This can be achieved by calling the plot() function on the model and passing it a result DataFrame. It will create a plot of the training dataset and overlay the prediction with the upper and lower bounds for the forecast dates.

...
# plot forecast
model.plot(forecast)
pyplot.show()

Tying this all together, a complete example of making an in-sample forecast is listed below.

# make an in-sample forecast
from pandas import to_datetime
from pandas import DataFrame
from fbprophet import Prophet
from matplotlib import pyplot
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv'
# prepare expected column names
df.columns = ['ds', 'y']
df['ds']= to_datetime(df['ds'])
# define the model
model = Prophet()
# fit the model
model.fit(df)
# define the period for which we want a prediction
future = list()
for i in range(1, 13):
date = '1968-%02d' % i
future.append([date])
future = DataFrame(future)
future.columns = ['ds']
future['ds']= to_datetime(future['ds'])
# use the model to make a forecast
forecast = model.predict(future)
# summarize the forecast
# plot forecast
model.plot(forecast)
pyplot.show()

Running the example forecasts the last 12 months of the dataset.

The first five months of the prediction are reported and we can see that values are not too different from the actual sales values in the dataset.

ds          yhat    yhat_lower    yhat_upper
0 1968-01-01  14364.866157  12816.266184  15956.555409
1 1968-02-01  14940.687225  13299.473640  16463.811658
2 1968-03-01  20858.282598  19439.403787  22345.747821
3 1968-04-01  22893.610396  21417.399440  24454.642588
4 1968-05-01  24212.079727  22667.146433  25816.191457

Next, a plot is created. We can see the training data are represented as black dots and the forecast is a blue line with upper and lower bounds in a blue shaded area.

We can see that the forecasted 12 months is a good match for the real observations, especially when the bounds are taken into account.

Plot of Time Series and In-Sample Forecast With Prophet

### Make an Out-of-Sample Forecast

In practice, we really want a forecast model to make a prediction beyond the training data.

This is called an out-of-sample forecast.

We can achieve this in the same way as an in-sample forecast and simply specify a different forecast period.

In this case, a period beyond the end of the training dataset, starting 1969-01.

...
# define the period for which we want a prediction
future = list()
for i in range(1, 13):
date = '1969-%02d' % i
future.append([date])
future = DataFrame(future)
future.columns = ['ds']
future['ds']= to_datetime(future['ds'])

Tying this together, the complete example is listed below.

# make an out-of-sample forecast
from pandas import to_datetime
from pandas import DataFrame
from fbprophet import Prophet
from matplotlib import pyplot
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv'
# prepare expected column names
df.columns = ['ds', 'y']
df['ds']= to_datetime(df['ds'])
# define the model
model = Prophet()
# fit the model
model.fit(df)
# define the period for which we want a prediction
future = list()
for i in range(1, 13):
date = '1969-%02d' % i
future.append([date])
future = DataFrame(future)
future.columns = ['ds']
future['ds']= to_datetime(future['ds'])
# use the model to make a forecast
forecast = model.predict(future)
# summarize the forecast
# plot forecast
model.plot(forecast)
pyplot.show()

Running the example makes an out-of-sample forecast for the car sales data.

The first five rows of the forecast are printed, although it is hard to get an idea of whether they are sensible or not.

ds          yhat    yhat_lower    yhat_upper
0 1969-01-01  15406.401318  13751.534121  16789.969780
1 1969-02-01  16165.737458  14486.887740  17634.953132
2 1969-03-01  21384.120631  19738.950363  22926.857539
3 1969-04-01  23512.464086  21939.204670  25105.341478
4 1969-05-01  25026.039276  23544.081762  26718.820580

A plot is created to help us evaluate the prediction in the context of the training data.

The new one-year forecast does look sensible, at least by eye.

Plot of Time Series and Out-of-Sample Forecast With Prophet

### Manually Evaluate Forecast Model

It is critical to develop an objective estimate of a forecast model’s performance.

This can be achieved by holding some data back from the model, such as the last 12 months. Then, fitting the model on the first portion of the data, using it to make predictions on the held-pack portion, and calculating an error measure, such as the mean absolute error across the forecasts. E.g. a simulated out-of-sample forecast.

The score gives an estimate of how well we might expect the model to perform on average when making an out-of-sample forecast.

We can do this with the samples data by creating a new DataFrame for training with the last 12 months removed.

...
# create test dataset, remove last 12 months
train = df.drop(df.index[-12:])
print(train.tail())

A forecast can then be made on the last 12 months of date-times.

We can then retrieve the forecast values and the expected values from the original dataset and calculate a mean absolute error metric using the scikit-learn library.

...
# calculate MAE between expected and predicted values for december
y_true = df['y'][-12:].values
y_pred = forecast['yhat'].values
mae = mean_absolute_error(y_true, y_pred)
print('MAE: %.3f' % mae)

It can also be helpful to plot the expected vs. predicted values to see how well the out-of-sample prediction matches the known values.

...
# plot expected vs actual
pyplot.plot(y_true, label='Actual')
pyplot.plot(y_pred, label='Predicted')
pyplot.legend()
pyplot.show()

Tying this together, the example below demonstrates how to evaluate a Prophet model on a hold-out dataset.

# evaluate prophet time series forecasting model on hold out dataset
from pandas import to_datetime
from pandas import DataFrame
from fbprophet import Prophet
from sklearn.metrics import mean_absolute_error
from matplotlib import pyplot
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv'
# prepare expected column names
df.columns = ['ds', 'y']
df['ds']= to_datetime(df['ds'])
# create test dataset, remove last 12 months
train = df.drop(df.index[-12:])
print(train.tail())
# define the model
model = Prophet()
# fit the model
model.fit(train)
# define the period for which we want a prediction
future = list()
for i in range(1, 13):
date = '1968-%02d' % i
future.append([date])
future = DataFrame(future)
future.columns = ['ds']
future['ds'] = to_datetime(future['ds'])
# use the model to make a forecast
forecast = model.predict(future)
# calculate MAE between expected and predicted values for december
y_true = df['y'][-12:].values
y_pred = forecast['yhat'].values
mae = mean_absolute_error(y_true, y_pred)
print('MAE: %.3f' % mae)
# plot expected vs actual
pyplot.plot(y_true, label='Actual')
pyplot.plot(y_pred, label='Predicted')
pyplot.legend()
pyplot.show()

Running the example first reports the last few rows of the training dataset.

It confirms the training ends in the last month of 1967 and 1968 will be used as the hold-out dataset.

ds      y
91 1967-08-01  13434
92 1967-09-01  13598
93 1967-10-01  17187
94 1967-11-01  16119
95 1967-12-01  13713

Next, a mean absolute error is calculated for the forecast period.

In this case we can see that the error is approximately 1,336 sales, which is much lower (better) than a naive persistence model that achieves an error of 3,235 sales over the same period.

MAE: 1336.814

Finally, a plot is created comparing the actual vs. predicted values. In this case, we can see that the forecast is a good fit. The model has skill and forecast that looks sensible.

Plot of Actual vs. Predicted Values for Last 12 Months of Car Sales

The Prophet library also provides tools to automatically evaluate models and plot results, although those tools don’t appear to work well with data above one day in resolution.

This section provides more resources on the topic if you are looking to go deeper.

## Summary

In this tutorial, you discovered how to use the Facebook Prophet library for time series forecasting.

Specifically, you learned:

• Prophet is an open-source library developed by Facebook and designed for automatic forecasting of univariate time series data.
• How to fit Prophet models and use them to make in-sample and out-of-sample forecasts.
• How to evaluate a Prophet model on a hold-out dataset.

Do you have any questions?

The post Time Series Forecasting With Prophet in Python appeared first on MachineLearningMastery.com.

]]>
https://machinelearningmastery.com/time-series-forecasting-with-prophet-in-python/feed/ 76
How to Model Volatility with ARCH and GARCH for Time Series Forecasting in Python https://machinelearningmastery.com/develop-arch-and-garch-models-for-time-series-forecasting-in-python/ https://machinelearningmastery.com/develop-arch-and-garch-models-for-time-series-forecasting-in-python/#comments Thu, 23 Aug 2018 19:00:08 +0000 https://35.82.237.216/?p=6011 A change in the variance or volatility over time can cause problems when modeling time series with classical methods like ARIMA. The ARCH or Autoregressive Conditional Heteroskedasticity method provides a way to model a change in variance in a time series that is time dependent, such as increasing or decreasing volatility. An extension of this approach […]

]]>
A change in the variance or volatility over time can cause problems when modeling time series with classical methods like ARIMA.

The ARCH or Autoregressive Conditional Heteroskedasticity method provides a way to model a change in variance in a time series that is time dependent, such as increasing or decreasing volatility. An extension of this approach named GARCH or Generalized Autoregressive Conditional Heteroskedasticity allows the method to support changes in the time dependent volatility, such as increasing and decreasing volatility in the same series.

In this tutorial, you will discover the ARCH and GARCH models for predicting the variance of a time series.

After completing this tutorial, you will know:

• The problem with variance in a time series and the need for ARCH and GARCH models.
• How to configure ARCH and GARCH models.
• How to implement ARCH and GARCH models in Python.

Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

How to Develop ARCH and GARCH Models for Time Series Forecasting in Python
Photo by Murray Foubister, some rights reserved.

## Tutorial Overview

This tutorial is divided into five parts; they are:

1. Problem with Variance
2. What Is an ARCH Model?
3. What Is a GARCH Model?
4. How to Configure ARCH and GARCH Models
5. ARCH and GARCH Models in Python

## Problem with Variance

Autoregressive models can be developed for univariate time series data that is stationary (AR), has a trend (ARIMA), and has a seasonal component (SARIMA).

One aspect of a univariate time series that these autoregressive models do not model is a change in the variance over time.

Classically, a time series with modest changes in variance can sometimes be adjusted using a power transform, such as by taking the Log or using a Box-Cox transform.

There are some time series where the variance changes consistently over time. In the context of a time series in the financial domain, this would be called increasing and decreasing volatility.

In time series where the variance is increasing in a systematic way, such as an increasing trend, this property of the series is called heteroskedasticity. It’s a fancy word from statistics that means changing or unequal variance across the series.

If the change in variance can be correlated over time, then it can be modeled using an autoregressive process, such as ARCH.

## What Is an ARCH Model?

Autoregressive Conditional Heteroskedasticity, or ARCH, is a method that explicitly models the change in variance over time in a time series.

Specifically, an ARCH method models the variance at a time step as a function of the residual errors from a mean process (e.g. a zero mean).

The ARCH process introduced by Engle (1982) explicitly recognizes the difference between the unconditional and the conditional variance allowing the latter to change over time as a function of past errors.

A lag parameter must be specified to define the number of prior residual errors to include in the model. Using the notation of the GARCH model (discussed later), we can refer to this parameter as “q“. Originally, this parameter was called “p“, and is also called “p” in the arch Python package used later in this tutorial.

• q: The number of lag squared residual errors to include in the ARCH model.

A generally accepted notation for an ARCH model is to specify the ARCH() function with the q parameter ARCH(q); for example, ARCH(1) would be a first order ARCH model.

The approach expects the series is stationary, other than the change in variance, meaning it does not have a trend or seasonal component. An ARCH model is used to predict the variance at future time steps.

[ARCH] are mean zero, serially uncorrelated processes with nonconstant variances conditional on the past, but constant unconditional variances. For such processes, the recent past gives information about the one-period forecast variance.

In practice, this can be used to model the expected variance on the residuals after another autoregressive model has been used, such as an ARMA or similar.

The model should only be applied to a prewhitened residual series {e_t} that is uncorrelated and contains no trends or seasonal changes, such as might be obtained after fitting a satisfactory SARIMA model.

— Page 148, Introductory Time Series with R, 2009.

## What Is a GARCH Model?

Generalized Autoregressive Conditional Heteroskedasticity, or GARCH, is an extension of the ARCH model that incorporates a moving average component together with the autoregressive component.

Specifically, the model includes lag variance terms (e.g. the observations if modeling the white noise residual errors of another process), together with lag residual errors from a mean process.

The introduction of a moving average component allows the model to both model the conditional change in variance over time as well as changes in the time-dependent variance. Examples include conditional increases and decreases in variance.

As such, the model introduces a new parameter “p” that describes the number of lag variance terms:

• p: The number of lag variances to include in the GARCH model.
• q: The number of lag residual errors to include in the GARCH model.

A generally accepted notation for a GARCH model is to specify the GARCH() function with the p and q parameters GARCH(p, q); for example GARCH(1, 1) would be a first order GARCH model.

A GARCH model subsumes ARCH models, where a GARCH(0, q) is equivalent to an ARCH(q) model.

For p = 0 the process reduces to the ARCH(q) process, and for p = q = 0 E(t) is simply white noise. In the ARCH(q) process the conditional variance is specified as a linear function of past sample variances only, whereas the GARCH(p, q) process allows lagged conditional variances to enter as well. This corresponds to some sort of adaptive learning mechanism.

As with ARCH, GARCH predicts the future variance and expects that the series is stationary, other than the change in variance, meaning it does not have a trend or seasonal component.

## How to Configure ARCH and GARCH Models

The configuration for an ARCH model is best understood in the context of ACF and PACF plots of the variance of the time series.

This can be achieved by subtracting the mean from each observation in the series and squaring the result, or just squaring the observation if you’re already working with white noise residuals from another model.

If a correlogram appears to be white noise […], then volatility ca be detected by looking at the correlogram of the squared values since the squared values are equivalent to the variance (provided the series is adjusted to have a mean of zero).

— Pages 146-147, Introductory Time Series with R, 2009.

The ACF and PACF plots can then be interpreted to estimate values for p and q, in a similar way as is done for the ARMA model.

## ARCH and GARCH Models in Python

In this section, we will look at how we can develop ARCH and GARCH models in Python using the arch library.

First, let’s prepare a dataset we can use for these examples.

### Test Dataset

We can create a dataset with a controlled model of variance.

The simplest case would be a series of random noise where the mean is zero and the variance starts at 0.0 and steadily increases.

We can achieve this in Python using the gauss() function that generates a Gaussian random number with the specified mean and standard deviation.

# create dataset
data = [gauss(0, i*0.01) for i in range(1,100+1)]

We can plot the dataset to get an idea of how the linear change in variance looks. The complete example is listed below.

# create a simple white noise with increasing variance
from random import gauss
from random import seed
from matplotlib import pyplot
# seed pseudorandom number generator
seed(1)
# create dataset
data = [gauss(0, i*0.01) for i in range(0,100)]
# plot
pyplot.plot(data)
pyplot.show()

Running the example creates and plots the dataset. We can see the clear change in variance over the course of the series.

Line Plot of Dataset with Increasing Variance

### Autocorrelation

We know there is an autocorrelation in the variance of the contrived dataset.

Nevertheless, we can look at an autocorrelation plot to confirm this expectation. The complete example is listed below.

# check correlations of squared observations
from random import gauss
from random import seed
from matplotlib import pyplot
from statsmodels.graphics.tsaplots import plot_acf
# seed pseudorandom number generator
seed(1)
# create dataset
data = [gauss(0, i*0.01) for i in range(0,100)]
# square the dataset
squared_data = [x**2 for x in data]
# create acf plot
plot_acf(squared_data)
pyplot.show()

Running the example creates an autocorrelation plot of the squared observations. We see significant positive correlation in variance out to perhaps 15 lag time steps.

This might make a reasonable value for the parameter in the ARCH model.

Autocorrelation Plot of Data with Increasing Variance

## ARCH Model

Developing an ARCH model involves three steps:

1. Define the model
2. Fit the model
3. Make a forecast.

Before fitting and forecasting, we can split the dataset into a train and test set so that we can fit the model on the train and evaluate its performance on the test set.

# split into train/test
n_test = 10
train, test = data[:-n_test], data[-n_test:]

A model can be defined by calling the arch_model() function. We can specify a model for the mean of the series: in this case mean=’Zero’ is an appropriate model. We can then specify the model for the variance: in this case vol=’ARCH’. We can also specify the lag parameter for the ARCH model: in this case p=15.

Note, in the arch library, the names of p and q parameters for ARCH/GARCH have been reversed.

# define model
model = arch_model(train, mean='Zero', vol='ARCH', p=15)

The model can be fit on the data by calling the fit() function. There are many options on this function, although the defaults are good enough for getting started. This will return a fit model.

# fit model
model_fit = model.fit()

Finally, we can make a prediction by calling the forecast() function on the fit model. We can specify the horizon for the forecast.

In this case, we will predict the variance for the last 10 time steps of the dataset, and withhold them from the training of the model.

# forecast the test set
yhat = model_fit.forecast(horizon=n_test)

We can tie all of this together; the complete example is listed below.

# example of ARCH model
from random import gauss
from random import seed
from matplotlib import pyplot
from arch import arch_model
# seed pseudorandom number generator
seed(1)
# create dataset
data = [gauss(0, i*0.01) for i in range(0,100)]
# split into train/test
n_test = 10
train, test = data[:-n_test], data[-n_test:]
# define model
model = arch_model(train, mean='Zero', vol='ARCH', p=15)
# fit model
model_fit = model.fit()
# forecast the test set
yhat = model_fit.forecast(horizon=n_test)
# plot the actual variance
var = [i*0.01 for i in range(0,100)]
pyplot.plot(var[-n_test:])
# plot forecast variance
pyplot.plot(yhat.variance.values[-1, :])
pyplot.show()

Running the example defines and fits the model then predicts the variance for the last 10 time steps of the dataset.

A line plot is created comparing the series of expected variance to the predicted variance. Although the model was not tuned, the predicted variance looks reasonable.

Line Plot of Expected Variance to Predicted Variance using ARCH

## GARCH Model

We can fit a GARCH model just as easily using the arch library.

The arch_model() function can specify a GARCH instead of ARCH model vol=’GARCH’ as well as the lag parameters for both.

# define model
model = arch_model(train, mean='Zero', vol='GARCH', p=15, q=15)

The dataset may not be a good fit for a GARCH model given the linearly increasing variance, nevertheless, the complete example is listed below.

# example of ARCH model
from random import gauss
from random import seed
from matplotlib import pyplot
from arch import arch_model
# seed pseudorandom number generator
seed(1)
# create dataset
data = [gauss(0, i*0.01) for i in range(0,100)]
# split into train/test
n_test = 10
train, test = data[:-n_test], data[-n_test:]
# define model
model = arch_model(train, mean='Zero', vol='GARCH', p=15, q=15)
# fit model
model_fit = model.fit()
# forecast the test set
yhat = model_fit.forecast(horizon=n_test)
# plot the actual variance
var = [i*0.01 for i in range(0,100)]
pyplot.plot(var[-n_test:])
# plot forecast variance
pyplot.plot(yhat.variance.values[-1, :])
pyplot.show()

A plot of the expected and predicted variance is listed below.

Line Plot of Expected Variance to Predicted Variance using GARCH

This section provides more resources on the topic if you are looking to go deeper.

## Summary

In this tutorial, you discovered the ARCH and GARCH models for predicting the variance of a time series.

Specifically, you learned:

• The problem with variance in a time series and the need for ARCH and GARCH models.
• How to configure ARCH and GARCH models.
• How to implement ARCH and GARCH models in Python.

Do you have any questions?

]]>
https://machinelearningmastery.com/develop-arch-and-garch-models-for-time-series-forecasting-in-python/feed/ 89
A Gentle Introduction to Exponential Smoothing for Time Series Forecasting in Python https://machinelearningmastery.com/exponential-smoothing-for-time-series-forecasting-in-python/ https://machinelearningmastery.com/exponential-smoothing-for-time-series-forecasting-in-python/#comments Sun, 19 Aug 2018 19:00:01 +0000 https://35.82.237.216/?p=5999 Exponential smoothing is a time series forecasting method for univariate data that can be extended to support data with a systematic trend or seasonal component. It is a powerful forecasting method that may be used as an alternative to the popular Box-Jenkins ARIMA family of methods. In this tutorial, you will discover the exponential smoothing […]

]]>
Exponential smoothing is a time series forecasting method for univariate data that can be extended to support data with a systematic trend or seasonal component.

It is a powerful forecasting method that may be used as an alternative to the popular Box-Jenkins ARIMA family of methods.

In this tutorial, you will discover the exponential smoothing method for univariate time series forecasting.

After completing this tutorial, you will know:

• What exponential smoothing is and how it is different from other forecasting methods.
• The three main types of exponential smoothing and how to configure them.
• How to implement exponential smoothing in Python.

Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

A Gentle Introduction to Exponential Smoothing for Time Series Forecasting in Python
Photo by Wolfgang Staudt, some rights reserved.

## Tutorial Overview

This tutorial is divided into 4 parts; they are:

1. What Is Exponential Smoothing?
2. Types of Exponential Smoothing
3. How to Configure Exponential Smoothing
4. Exponential Smoothing in Python

## What Is Exponential Smoothing?

Exponential smoothing is a time series forecasting method for univariate data.

Time series methods like the Box-Jenkins ARIMA family of methods develop a model where the prediction is a weighted linear sum of recent past observations or lags.

Exponential smoothing forecasting methods are similar in that a prediction is a weighted sum of past observations, but the model explicitly uses an exponentially decreasing weight for past observations.

Specifically, past observations are weighted with a geometrically decreasing ratio.

Forecasts produced using exponential smoothing methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older. In other words, the more recent the observation the higher the associated weight.

— Page 171, Forecasting: principles and practice, 2013.

Exponential smoothing methods may be considered as peers and an alternative to the popular Box-Jenkins ARIMA class of methods for time series forecasting.

Collectively, the methods are sometimes referred to as ETS models, referring to the explicit modeling of Error, Trend and Seasonality.

## Types of Exponential Smoothing

There are three main types of exponential smoothing time series forecasting methods.

A simple method that assumes no systematic structure, an extension that explicitly handles trends, and the most advanced approach that add support for seasonality.

### Single Exponential Smoothing

Single Exponential Smoothing, SES for short, also called Simple Exponential Smoothing, is a time series forecasting method for univariate data without a trend or seasonality.

It requires a single parameter, called alpha (a), also called the smoothing factor or smoothing coefficient.

This parameter controls the rate at which the influence of the observations at prior time steps decay exponentially. Alpha is often set to a value between 0 and 1. Large values mean that the model pays attention mainly to the most recent past observations, whereas smaller values mean more of the history is taken into account when making a prediction.

A value close to 1 indicates fast learning (that is, only the most recent values influence the forecasts), whereas a value close to 0 indicates slow learning (past observations have a large influence on forecasts).

— Page 89, Practical Time Series Forecasting with R, 2016.

Hyperparameters:

• Alpha: Smoothing factor for the level.

### Double Exponential Smoothing

Double Exponential Smoothing is an extension to Exponential Smoothing that explicitly adds support for trends in the univariate time series.

In addition to the alpha parameter for controlling smoothing factor for the level, an additional smoothing factor is added to control the decay of the influence of the change in trend called beta (b).

The method supports trends that change in different ways: an additive and a multiplicative, depending on whether the trend is linear or exponential respectively.

Double Exponential Smoothing with an additive trend is classically referred to as Holt’s linear trend model, named for the developer of the method Charles Holt.

• Additive Trend: Double Exponential Smoothing with a linear trend.
• Multiplicative Trend: Double Exponential Smoothing with an exponential trend.

For longer range (multi-step) forecasts, the trend may continue on unrealistically. As such, it can be useful to dampen the trend over time.

Dampening means reducing the size of the trend over future time steps down to a straight line (no trend).

The forecasts generated by Holt’s linear method display a constant trend (increasing or decreasing) indecently into the future. Even more extreme are the forecasts generated by the exponential trend method […] Motivated by this observation […] introduced a parameter that “dampens” the trend to a flat line some time in the future.

— Page 183, Forecasting: principles and practice, 2013.

As with modeling the trend itself, we can use the same principles in dampening the trend, specifically additively or multiplicatively for a linear or exponential dampening effect. A damping coefficient Phi (p) is used to control the rate of dampening.

• Additive Dampening: Dampen a trend linearly.
• Multiplicative Dampening: Dampen the trend exponentially.

Hyperparameters:

• Alpha: Smoothing factor for the level.
• Beta: Smoothing factor for the trend.
• Trend Type: Additive or multiplicative.
• Dampen Type: Additive or multiplicative.
• Phi: Damping coefficient.

### Triple Exponential Smoothing

Triple Exponential Smoothing is an extension of Exponential Smoothing that explicitly adds support for seasonality to the univariate time series.

This method is sometimes called Holt-Winters Exponential Smoothing, named for two contributors to the method: Charles Holt and Peter Winters.

In addition to the alpha and beta smoothing factors, a new parameter is added called gamma (g) that controls the influence on the seasonal component.

As with the trend, the seasonality may be modeled as either an additive or multiplicative process for a linear or exponential change in the seasonality.

• Additive Seasonality: Triple Exponential Smoothing with a linear seasonality.
• Multiplicative Seasonality: Triple Exponential Smoothing with an exponential seasonality.

Triple exponential smoothing is the most advanced variation of exponential smoothing and through configuration, it can also develop double and single exponential smoothing models.

Being an adaptive method, Holt-Winter’s exponential smoothing allows the level, trend and seasonality patterns to change over time.

— Page 95, Practical Time Series Forecasting with R, 2016.

Additionally, to ensure that the seasonality is modeled correctly, the number of time steps in a seasonal period (Period) must be specified. For example, if the series was monthly data and the seasonal period repeated each year, then the Period=12.

Hyperparameters:

• Alpha: Smoothing factor for the level.
• Beta: Smoothing factor for the trend.
• Gamma: Smoothing factor for the seasonality.
• Trend Type: Additive or multiplicative.
• Dampen Type: Additive or multiplicative.
• Phi: Damping coefficient.
• Seasonality Type: Additive or multiplicative.
• Period: Time steps in seasonal period.

## How to Configure Exponential Smoothing

All of the model hyperparameters can be specified explicitly.

This can be challenging for experts and beginners alike.

Instead, it is common to use numerical optimization to search for and fund the smoothing coefficients (alpha, beta, gamma, and phi) for the model that result in the lowest error.

[…] a more robust and objective way to obtain values for the unknown parameters included in any exponential smoothing method is to estimate them from the observed data. […] the unknown parameters and the initial values for any exponential smoothing method can be estimated by minimizing the SSE [sum of the squared errors].

— Page 177, Forecasting: principles and practice, 2013.

The parameters that specify the type of change in the trend and seasonality, such as weather they are additive or multiplicative and whether they should be dampened, must be specified explicitly.

## Exponential Smoothing in Python

This section looks at how to implement exponential smoothing in Python.

The implementations of Exponential Smoothing in Python are provided in the Statsmodels Python library.

The implementations are based on the description of the method in Rob Hyndman and George Athana­sopou­los’ excellent book “Forecasting: Principles and Practice,” 2013 and their R implementations in their “forecast” package.

### Single Exponential Smoothing

Single Exponential Smoothing or simple smoothing can be implemented in Python via the SimpleExpSmoothing Statsmodels class.

First, an instance of the SimpleExpSmoothing class must be instantiated and passed the training data. The fit() function is then called providing the fit configuration, specifically the alpha value called smoothing_level. If this is not provided or set to None, the model will automatically optimize the value.

This fit() function returns an instance of the HoltWintersResults class that contains the learned coefficients. The forecast() or the predict() function on the result object can be called to make a forecast.

For example:

# single exponential smoothing
...
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
# prepare data
data = ...
# create class
model = SimpleExpSmoothing(data)
# fit model
model_fit = model.fit(...)
# make prediction
yhat = model_fit.predict(...)

### Double and Triple Exponential Smoothing

Single, Double and Triple Exponential Smoothing can be implemented in Python using the ExponentialSmoothing Statsmodels class.

First, an instance of the ExponentialSmoothing class must be instantiated, specifying both the training data and some configuration for the model.

Specifically, you must specify the following configuration parameters:

• trend: The type of trend component, as either “add” for additive or “mul” for multiplicative. Modeling the trend can be disabled by setting it to None.
• damped: Whether or not the trend component should be damped, either True or False.
• seasonal: The type of seasonal component, as either “add” for additive or “mul” for multiplicative. Modeling the seasonal component can be disabled by setting it to None.
• seasonal_periods: The number of time steps in a seasonal period, e.g. 12 for 12 months in a yearly seasonal structure (more here).

The model can then be fit on the training data by calling the fit() function.

This function allows you to either specify the smoothing coefficients of the exponential smoothing model or have them optimized. By default, they are optimized (e.g. optimized=True). These coefficients include:

• smoothing_level (alpha): the smoothing coefficient for the level.
• smoothing_slope (beta): the smoothing coefficient for the trend.
• smoothing_seasonal (gamma): the smoothing coefficient for the seasonal component.
• damping_slope (phi): the coefficient for the damped trend.

Additionally, the fit function can perform basic data preparation prior to modeling; specifically:

• use_boxcox: Whether or not to perform a power transform of the series (True/False) or specify the lambda for the transform.

The fit() function will return an instance of the HoltWintersResults class that contains the learned coefficients. The forecast() or the predict() function on the result object can be called to make a forecast.

# double or triple exponential smoothing
...
from statsmodels.tsa.holtwinters import ExponentialSmoothing
# prepare data
data = ...
# create class
model = ExponentialSmoothing(data, ...)
# fit model
model_fit = model.fit(...)
# make prediction
yhat = model_fit.predict(...)

This section provides more resources on the topic if you are looking to go deeper.

### Summary

In this tutorial, you discovered the exponential smoothing method for univariate time series forecasting.

Specifically, you learned:

• What exponential smoothing is and how it is different from other forecast methods.
• The three main types of exponential smoothing and how to configure them.
• How to implement exponential smoothing in Python.

Do you have any questions?

]]>
https://machinelearningmastery.com/exponential-smoothing-for-time-series-forecasting-in-python/feed/ 67
A Gentle Introduction to SARIMA for Time Series Forecasting in Python https://machinelearningmastery.com/sarima-for-time-series-forecasting-in-python/ https://machinelearningmastery.com/sarima-for-time-series-forecasting-in-python/#comments Thu, 16 Aug 2018 19:00:18 +0000 https://35.82.237.216/?p=5991 Autoregressive Integrated Moving Average, or ARIMA, is one of the most widely used forecasting methods for univariate time series data forecasting. Although the method can handle data with a trend, it does not support time series with a seasonal component. An extension to ARIMA that supports the direct modeling of the seasonal component of the […]

The post A Gentle Introduction to SARIMA for Time Series Forecasting in Python appeared first on MachineLearningMastery.com.

]]>
Autoregressive Integrated Moving Average, or ARIMA, is one of the most widely used forecasting methods for univariate time series data forecasting.

Although the method can handle data with a trend, it does not support time series with a seasonal component.

An extension to ARIMA that supports the direct modeling of the seasonal component of the series is called SARIMA.

In this tutorial, you will discover the Seasonal Autoregressive Integrated Moving Average, or SARIMA, method for time series forecasting with univariate data containing trends and seasonality.

After completing this tutorial, you will know:

• The limitations of ARIMA when it comes to seasonal data.
• The SARIMA extension of ARIMA that explicitly models the seasonal element in univariate data.
• How to implement the SARIMA method in Python using the Statsmodels library.

Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Update:  For help using and grid searching SARIMA hyperparameters, see this post:

A Gentle Introduction to SARIMA for Time Series Forecasting in Python
Photo by Mario Micklisch, some rights reserved.

## Tutorial Overview

This tutorial is divided into four parts; they are:

1. What’s Wrong with ARIMA
2. What Is SARIMA?
3. How to Configure SARIMA
4. How to use SARIMA in Python

## What’s Wrong with ARIMA

Autoregressive Integrated Moving Average, or ARIMA, is a forecasting method for univariate time series data.

As its name suggests, it supports both an autoregressive and moving average elements. The integrated element refers to differencing allowing the method to support time series data with a trend.

A problem with ARIMA is that it does not support seasonal data. That is a time series with a repeating cycle.

ARIMA expects data that is either not seasonal or has the seasonal component removed, e.g. seasonally adjusted via methods such as seasonal differencing.

For more on ARIMA, see the post:

An alternative is to use SARIMA.

## What is SARIMA?

Seasonal Autoregressive Integrated Moving Average, SARIMA or Seasonal ARIMA, is an extension of ARIMA that explicitly supports univariate time series data with a seasonal component.

It adds three new hyperparameters to specify the autoregression (AR), differencing (I) and moving average (MA) for the seasonal component of the series, as well as an additional parameter for the period of the seasonality.

A seasonal ARIMA model is formed by including additional seasonal terms in the ARIMA […] The seasonal part of the model consists of terms that are very similar to the non-seasonal components of the model, but they involve backshifts of the seasonal period.

— Page 242, Forecasting: principles and practice, 2013.

## How to Configure SARIMA

Configuring a SARIMA requires selecting hyperparameters for both the trend and seasonal elements of the series.

### Trend Elements

There are three trend elements that require configuration.

They are the same as the ARIMA model; specifically:

• p: Trend autoregression order.
• d: Trend difference order.
• q: Trend moving average order.

### Seasonal Elements

There are four seasonal elements that are not part of ARIMA that must be configured; they are:

• P: Seasonal autoregressive order.
• D: Seasonal difference order.
• Q: Seasonal moving average order.
• m: The number of time steps for a single seasonal period.

Together, the notation for an SARIMA model is specified as:

SARIMA(p,d,q)(P,D,Q)m

Where the specifically chosen hyperparameters for a model are specified; for example:

SARIMA(3,1,0)(1,1,0)12

Importantly, the m parameter influences the P, D, and Q parameters. For example, an m of 12 for monthly data suggests a yearly seasonal cycle.

A P=1 would make use of the first seasonally offset observation in the model, e.g. t-(m*1) or t-12. A P=2, would use the last two seasonally offset observations t-(m * 1), t-(m * 2).

Similarly, a D of 1 would calculate a first order seasonal difference and a Q=1 would use a first order errors in the model (e.g. moving average).

A seasonal ARIMA model uses differencing at a lag equal to the number of seasons (s) to remove additive seasonal effects. As with lag 1 differencing to remove a trend, the lag s differencing introduces a moving average term. The seasonal ARIMA model includes autoregressive and moving average terms at lag s.

— Page 142, Introductory Time Series with R, 2009.

The trend elements can be chosen through careful analysis of ACF and PACF plots looking at the correlations of recent time steps (e.g. 1, 2, 3).

Similarly, ACF and PACF plots can be analyzed to specify values for the seasonal model by looking at correlation at seasonal lag time steps.

For more on interpreting ACF/PACF plots, see the post:

Seasonal ARIMA models can potentially have a large number of parameters and combinations of terms. Therefore, it is appropriate to try out a wide range of models when fitting to data and choose a best fitting model using an appropriate criterion …

— Pages 143-144, Introductory Time Series with R, 2009.

Alternately, a grid search can be used across the trend and seasonal hyperparameters.

For more on grid searching SARIMA parameters, see the post:

## How to use SARIMA in Python

The SARIMA time series forecasting method is supported in Python via the Statsmodels library.

To use SARIMA there are three steps, they are:

1. Define the model.
2. Fit the defined model.
3. Make a prediction with the fit model.

Let’s look at each step in turn.

### 1. Define Model

An instance of the SARIMAX class can be created by providing the training data and a host of model configuration parameters.

# specify training data
data = ...
# define model
model = SARIMAX(data, ...)

The implementation is called SARIMAX instead of SARIMA because the “X” addition to the method name means that the implementation also supports exogenous variables.

These are parallel time series variates that are not modeled directly via AR, I, or MA processes, but are made available as a weighted input to the model.

Exogenous variables are optional can be specified via the “exog” argument.

# specify training data
data = ...
other_data = ...
# define model
model = SARIMAX(data, exog=other_data, ...)

The trend and seasonal hyperparameters are specified as 3 and 4 element tuples respectively to the “order” and “seasonal_order” arguments.

These elements must be specified.

# specify training data
data = ...
# define model configuration
my_order = (1, 1, 1)
my_seasonal_order = (1, 1, 1, 12)
# define model
model = SARIMAX(data, order=my_order, seasonal_order=my_seasonal_order, ...)

These are the main configuration elements.

There are other fine tuning parameters you may want to configure. Learn more in the full API:

### 2. Fit Model

Once the model is created, it can be fit on the training data.

The model is fit by calling the fit() function.

Fitting the model returns an instance of the SARIMAXResults class. This object contains the details of the fit, such as the data and coefficients, as well as functions that can be used to make use of the model.

# specify training data
data = ...
# define model
model = SARIMAX(data, order=..., seasonal_order=...)
# fit model
model_fit = model.fit()

Many elements of the fitting process can be configured, and it is worth reading the API to review these options once you are comfortable with the implementation.

### 3. Make Prediction

Once fit, the model can be used to make a forecast.

A forecast can be made by calling the forecast() or the predict() functions on the SARIMAXResults object returned from calling fit.

The forecast() function takes a single parameter that specifies the number of out of sample time steps to forecast, or assumes a one step forecast if no arguments are provided.

# specify training data
data = ...
# define model
model = SARIMAX(data, order=..., seasonal_order=...)
# fit model
model_fit = model.fit()
# one step forecast
yhat = model_fit.forecast()

The predict() function requires a start and end date or index to be specified.

Additionally, if exogenous variables were provided when defining the model, they too must be provided for the forecast period to the predict() function.

# specify training data
data = ...
# define model
model = SARIMAX(data, order=..., seasonal_order=...)
# fit model
model_fit = model.fit()
# one step forecast
yhat = model_fit.predict(start=len(data), end=len(data))

This section provides more resources on the topic if you are looking to go deeper.

## Summary

In this tutorial, you discovered the Seasonal Autoregressive Integrated Moving Average, or SARIMA, method for time series forecasting with univariate data containing trends and seasonality.

Specifically, you learned:

• The limitations of ARIMA when it comes to seasonal data.
• The SARIMA extension of ARIMA that explicitly models the seasonal element in univariate data.
• How to implement the SARIMA method in Python using the Statsmodels library.

Do you have any questions?

The post A Gentle Introduction to SARIMA for Time Series Forecasting in Python appeared first on MachineLearningMastery.com.

]]>
https://machinelearningmastery.com/sarima-for-time-series-forecasting-in-python/feed/ 134
11 Classical Time Series Forecasting Methods in Python (Cheat Sheet) https://machinelearningmastery.com/time-series-forecasting-methods-in-python-cheat-sheet/ https://machinelearningmastery.com/time-series-forecasting-methods-in-python-cheat-sheet/#comments Sun, 05 Aug 2018 19:00:45 +0000 https://35.82.237.216/?p=5952 Machine learning methods can be used for classification and forecasting on time series problems. Before exploring machine learning methods for time series, it is a good idea to ensure you have exhausted classical linear time series forecasting methods. Classical time series forecasting methods may be focused on linear relationships, nevertheless, they are sophisticated and perform […]

The post 11 Classical Time Series Forecasting Methods in Python (Cheat Sheet) appeared first on MachineLearningMastery.com.

]]>
Machine learning methods can be used for classification and forecasting on time series problems.

Before exploring machine learning methods for time series, it is a good idea to ensure you have exhausted classical linear time series forecasting methods. Classical time series forecasting methods may be focused on linear relationships, nevertheless, they are sophisticated and perform well on a wide range of problems, assuming that your data is suitably prepared and the method is well configured.

In this post, will you will discover a suite of classical methods for time series forecasting that you can test on your forecasting problem prior to exploring to machine learning methods.

The post is structured as a cheat sheet to give you just enough information on each method to get started with a working code example and where to look to get more information on the method.

All code examples are in Python and use the Statsmodels library. The APIs for this library can be tricky for beginners (trust me!), so having a working code example as a starting point will greatly accelerate your progress.

This is a large post; you may want to bookmark it.

Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

• Updated Apr/2020: Changed AR to AutoReg due to API change.
• Updated Dec/2020: Updated ARIMA API to the latest version of statsmodels.

11 Classical Time Series Forecasting Methods in Python (Cheat Sheet)
Photo by Ron Reiring, some rights reserved.

## Overview

This cheat sheet demonstrates 11 different classical time series forecasting methods; they are:

1. Autoregression (AR)
2. Moving Average (MA)
3. Autoregressive Moving Average (ARMA)
4. Autoregressive Integrated Moving Average (ARIMA)
5. Seasonal Autoregressive Integrated Moving-Average (SARIMA)
6. Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX)
7. Vector Autoregression (VAR)
8. Vector Autoregression Moving-Average (VARMA)
9. Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX)
10. Simple Exponential Smoothing (SES)
11. Holt Winter’s Exponential Smoothing (HWES)

Did I miss your favorite classical time series forecasting method?
Let me know in the comments below.

Each method is presented in a consistent manner.

This includes:

• Description. A short and precise description of the technique.
• Python Code. A short working example of fitting the model and making a prediction in Python.

Each code example is demonstrated on a simple contrived dataset that may or may not be appropriate for the method. Replace the contrived dataset with your data in order to test the method.

Remember: each method will require tuning to your specific problem. In many cases, I have examples of how to configure and even grid search parameters on the blog already, try the search function.

If you find this cheat sheet useful, please let me know in the comments below.

## Autoregression (AR)

The autoregression (AR) method models the next step in the sequence as a linear function of the observations at prior time steps.

The notation for the model involves specifying the order of the model p as a parameter to the AR function, e.g. AR(p). For example, AR(1) is a first-order autoregression model.

The method is suitable for univariate time series without trend and seasonal components.

### Python Code

# AR example
from statsmodels.tsa.ar_model import AutoReg
from random import random
# contrived dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = AutoReg(data, lags=1)
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

## Moving Average (MA)

The moving average (MA) method models the next step in the sequence as a linear function of the residual errors from a mean process at prior time steps.

A moving average model is different from calculating the moving average of the time series.

The notation for the model involves specifying the order of the model q as a parameter to the MA function, e.g. MA(q). For example, MA(1) is a first-order moving average model.

The method is suitable for univariate time series without trend and seasonal components.

### Python Code

We can use the ARIMA class to create an MA model and setting a zeroth-order AR model. We must specify the order of the MA model in the order argument.

# MA example
from statsmodels.tsa.arima.model import ARIMA
from random import random
# contrived dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = ARIMA(data, order=(0, 0, 1))
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

## Autoregressive Moving Average (ARMA)

The Autoregressive Moving Average (ARMA) method models the next step in the sequence as a linear function of the observations and residual errors at prior time steps.

It combines both Autoregression (AR) and Moving Average (MA) models.

The notation for the model involves specifying the order for the AR(p) and MA(q) models as parameters to an ARMA function, e.g. ARMA(p, q). An ARIMA model can be used to develop AR or MA models.

The method is suitable for univariate time series without trend and seasonal components.

### Python Code

# ARMA example
from statsmodels.tsa.arima.model import ARIMA
from random import random
# contrived dataset
data = [random() for x in range(1, 100)]
# fit model
model = ARIMA(data, order=(2, 0, 1))
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

## Autoregressive Integrated Moving Average (ARIMA)

The Autoregressive Integrated Moving Average (ARIMA) method models the next step in the sequence as a linear function of the differenced observations and residual errors at prior time steps.

It combines both Autoregression (AR) and Moving Average (MA) models as well as a differencing pre-processing step of the sequence to make the sequence stationary, called integration (I).

The notation for the model involves specifying the order for the AR(p), I(d), and MA(q) models as parameters to an ARIMA function, e.g. ARIMA(p, d, q). An ARIMA model can also be used to develop AR, MA, and ARMA models.

The method is suitable for univariate time series with trend and without seasonal components.

### Python Code

# ARIMA example
from statsmodels.tsa.arima.model import ARIMA
from random import random
# contrived dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = ARIMA(data, order=(1, 1, 1))
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data), typ='levels')
print(yhat)

## Seasonal Autoregressive Integrated Moving-Average (SARIMA)

The Seasonal Autoregressive Integrated Moving Average (SARIMA) method models the next step in the sequence as a linear function of the differenced observations, errors, differenced seasonal observations, and seasonal errors at prior time steps.

It combines the ARIMA model with the ability to perform the same autoregression, differencing, and moving average modeling at the seasonal level.

The notation for the model involves specifying the order for the AR(p), I(d), and MA(q) models as parameters to an ARIMA function and AR(P), I(D), MA(Q) and m parameters at the seasonal level, e.g. SARIMA(p, d, q)(P, D, Q)m where “m” is the number of time steps in each season (the seasonal period). A SARIMA model can be used to develop AR, MA, ARMA and ARIMA models.

The method is suitable for univariate time series with trend and/or seasonal components.

### Python Code

# SARIMA example
from statsmodels.tsa.statespace.sarimax import SARIMAX
from random import random
# contrived dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = SARIMAX(data, order=(1, 1, 1), seasonal_order=(0, 0, 0, 0))
model_fit = model.fit(disp=False)
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

## Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX)

The Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX) is an extension of the SARIMA model that also includes the modeling of exogenous variables.

Exogenous variables are also called covariates and can be thought of as parallel input sequences that have observations at the same time steps as the original series. The primary series may be referred to as endogenous data to contrast it from the exogenous sequence(s). The observations for exogenous variables are included in the model directly at each time step and are not modeled in the same way as the primary endogenous sequence (e.g. as an AR, MA, etc. process).

The SARIMAX method can also be used to model the subsumed models with exogenous variables, such as ARX, MAX, ARMAX, and ARIMAX.

The method is suitable for univariate time series with trend and/or seasonal components and exogenous variables.

### Python Code

# SARIMAX example
from statsmodels.tsa.statespace.sarimax import SARIMAX
from random import random
# contrived dataset
data1 = [x + random() for x in range(1, 100)]
data2 = [x + random() for x in range(101, 200)]
# fit model
model = SARIMAX(data1, exog=data2, order=(1, 1, 1), seasonal_order=(0, 0, 0, 0))
model_fit = model.fit(disp=False)
# make prediction
exog2 = [200 + random()]
yhat = model_fit.predict(len(data1), len(data1), exog=[exog2])
print(yhat)

## Vector Autoregression (VAR)

The Vector Autoregression (VAR) method models the next step in each time series using an AR model. It is the generalization of AR to multiple parallel time series, e.g. multivariate time series.

The notation for the model involves specifying the order for the AR(p) model as parameters to a VAR function, e.g. VAR(p).

The method is suitable for multivariate time series without trend and seasonal components.

### Python Code

# VAR example
from statsmodels.tsa.vector_ar.var_model import VAR
from random import random
# contrived dataset with dependency
data = list()
for i in range(100):
v1 = i + random()
v2 = v1 + random()
row = [v1, v2]
data.append(row)
# fit model
model = VAR(data)
model_fit = model.fit()
# make prediction
yhat = model_fit.forecast(model_fit.y, steps=1)
print(yhat)

## Vector Autoregression Moving-Average (VARMA)

The Vector Autoregression Moving-Average (VARMA) method models the next step in each time series using an ARMA model. It is the generalization of ARMA to multiple parallel time series, e.g. multivariate time series.

The notation for the model involves specifying the order for the AR(p) and MA(q) models as parameters to a VARMA function, e.g. VARMA(p, q). A VARMA model can also be used to develop VAR or VMA models.

The method is suitable for multivariate time series without trend and seasonal components.

### Python Code

# VARMA example
from statsmodels.tsa.statespace.varmax import VARMAX
from random import random
# contrived dataset with dependency
data = list()
for i in range(100):
v1 = random()
v2 = v1 + random()
row = [v1, v2]
data.append(row)
# fit model
model = VARMAX(data, order=(1, 1))
model_fit = model.fit(disp=False)
# make prediction
yhat = model_fit.forecast()
print(yhat)

## Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX)

The Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX) is an extension of the VARMA model that also includes the modeling of exogenous variables. It is a multivariate version of the ARMAX method.

Exogenous variables are also called covariates and can be thought of as parallel input sequences that have observations at the same time steps as the original series. The primary series(es) are referred to as endogenous data to contrast it from the exogenous sequence(s). The observations for exogenous variables are included in the model directly at each time step and are not modeled in the same way as the primary endogenous sequence (e.g. as an AR, MA, etc. process).

The VARMAX method can also be used to model the subsumed models with exogenous variables, such as VARX and VMAX.

The method is suitable for multivariate time series without trend and seasonal components with exogenous variables.

### Python Code

# VARMAX example
from statsmodels.tsa.statespace.varmax import VARMAX
from random import random
# contrived dataset with dependency
data = list()
for i in range(100):
v1 = random()
v2 = v1 + random()
row = [v1, v2]
data.append(row)
data_exog = [x + random() for x in range(100)]
# fit model
model = VARMAX(data, exog=data_exog, order=(1, 1))
model_fit = model.fit(disp=False)
# make prediction
data_exog2 = [[100]]
yhat = model_fit.forecast(exog=data_exog2)
print(yhat)

## Simple Exponential Smoothing (SES)

The Simple Exponential Smoothing (SES) method models the next time step as an exponentially weighted linear function of observations at prior time steps.

The method is suitable for univariate time series without trend and seasonal components.

### Python Code

# SES example
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
from random import random
# contrived dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = SimpleExpSmoothing(data)
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

## Holt Winter’s Exponential Smoothing (HWES)

The Holt Winter’s Exponential Smoothing (HWES) also called the Triple Exponential Smoothing method models the next time step as an exponentially weighted linear function of observations at prior time steps, taking trends and seasonality into account.

The method is suitable for univariate time series with trend and/or seasonal components.

### Python Code

# HWES example
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from random import random
# contrived dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = ExponentialSmoothing(data)
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

This section provides more resources on the topic if you are looking to go deeper.

## Summary

In this post, you discovered a suite of classical time series forecasting methods that you can test and tune on your time series dataset.

Did I miss your favorite classical time series forecasting method?
Let me know in the comments below.

Did you try any of these methods on your dataset?

Do you have any questions?

The post 11 Classical Time Series Forecasting Methods in Python (Cheat Sheet) appeared first on MachineLearningMastery.com.

]]>
https://machinelearningmastery.com/time-series-forecasting-methods-in-python-cheat-sheet/feed/ 357
A Standard Multivariate, Multi-Step, and Multi-Site Time Series Forecasting Problem https://machinelearningmastery.com/standard-multivariate-multi-step-multi-site-time-series-forecasting-problem/ https://machinelearningmastery.com/standard-multivariate-multi-step-multi-site-time-series-forecasting-problem/#comments Thu, 18 Jan 2018 18:00:20 +0000 https://35.82.237.216/?p=4676 Real-world time series forecasting is challenging for a whole host of reasons not limited to problem features such as having multiple input variables, the requirement to predict multiple time steps, and the need to perform the same type of prediction for multiple physical sites. In this post, you will discover a standardized yet complex time […]

]]>
Real-world time series forecasting is challenging for a whole host of reasons not limited to problem features such as having multiple input variables, the requirement to predict multiple time steps, and the need to perform the same type of prediction for multiple physical sites.

In this post, you will discover a standardized yet complex time series forecasting problem that has these properties, but is small and sufficiently well understood that it can be used to explore and better understand methods for developing forecasting models on challenging datasets.

After reading this post, you will know:

• The competition and motivation for addressing the air-quality dataset.
• An overview of the defined prediction problem and the data challenges it covers.
• A description of the free data files that you can download and start working with immediately.

Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

A Standard Multivariate, Multi-Step, and Multi-Site Time Series Forecasting Problem
Photo by someone, some rights reserved.

## EMC Data Science Global Hackathon

The dataset was used as the center of a Kaggle competition.

Specifically, a 24-hour hackathon hosted by Data Science London and Data Science Global as part of a Big Data Week event, two organizations that don’t seem to exist now, 6 years later.

The competition involved a multi-thousand-dollar cash prize, and the dataset was provided by the Cook County, Illinois local government, suggesting all locations mentioned in the dataset are in that locality.

The motivation for the challenge is to develop a better model for predicting air quality, taken from the competition description:

The EPA’s Air Quality Index is used daily by people suffering from asthma and other respiratory diseases to avoid dangerous levels of outdoor air pollutants, which can trigger attacks. According to the World Health Organisation there are now estimated to be 235 million people suffering from asthma. Globally, it is now the most common chronic disease among children, with incidence in the US doubling since 1980.

The competition description suggests that winning models could be used as the basis for a new air-quality prediction system, although it is not clear if any models were ever transitioned for this purpose.

The competition was won by a Kaggle employee, Ben Hamner, who presumably did not collect the prize given the conflict of interest. Ben described his winning approach in the blog post titled “Chucking everything into a Random Forest: Ben Hamner on Winning The Air Quality Prediction Hackathon” and provided his code on GitHub.

There is also a good discussion of solutions and related code in this forum post titled “General approaches to partitioning the models?“.

## Predictive Modeling Problem

The data describes a multi-step forecasting problem given a multivariate time series across multiple sites or physical locations.

Given multiple weather measurements over time, predict a sequence of air quality measurements at specific future time intervals across multiple physical locations.

It is a challenging time series forecasting problem that has a lot of the qualities of real-world forecasting:

• Incomplete data. Not all weather and air quality measures are available for all locations.
• Missing data. Not all available measures have a complete history.
• Multivariate inputs: The model inputs for each forecast are comprised of multiple weather observations.
• Multi-step outputs: The model outputs are a discontiguous sequence of forecasted air quality measures.
• Multi-site outputs: The mode must output a multi-step forecast for multiple physical sites.

## Description of the Dataset Files

There are 4 files of interest that you must download separately; they are:

### File: SiteLocations.csv

This file contains a list of site locations marked by unique identifiers and their precise location on Earth measured by longitude and latitude.

All coordinates appear to be relatively close in the North-Western Hemisphere, e.g. America.

Below is a sample of the file.

"SITE_ID","LATITUDE","LONGITUDE"
1,41.6709918952829,-87.7324568962847
32,41.755832412403,-87.545349670582
50,41.7075695897648,-87.5685738570845
57,41.9128621248178,-87.7227234452095
64,41.7907868783739,-87.6016464917605
...

### File: SiteLocations_with_more_sites.csv

This file has the same format as SiteLocations.csv and appears to list all of the same locations as that file with some additional locations.

As the filename suggests, it is just an updated version of the list of sites.

Below is a sample of the file.

"SITE_ID","LATITUDE","LONGITUDE"
1,41.6709918952829,-87.7324568962847
14,41.834243,-87.6238
22,41.6871654376343,-87.5393154841479
32,41.755832412403,-87.545349670582
50,41.7075695897648,-87.5685738570845
...

### File: TrainingData.csv

This file contains the training data for modeling.

The data is presented in an unnormalized manner. Each row of data contains one set of meteorological measurements for one hour across multiple locations as well as the targets or outcomes for each location for that hour.

The measures include:

• Time information, including the block of time, the index within the contiguous block of time, the average month, day of the week, and hour of the day.
• Wind measurements such as direction and speed.
• Temperature measurements such as minimum and maximum ambient temperature.
• Pressure measurements such as minimum and maximum barometric pressure.

The target variables are a collection of different air quality or pollution measures at different physical locations.

Not all locations have all weather measurements and not all locations are concerned with all target measures. Further, for those recorded variables, there are missing values marked as NA.

Below is a sample of the file.

1,1,1,10,"Saturday",21,0.01,117,187,0.3,0.3,NA,NA,NA,14.9,NA,NA,NA,NA,NA,NA,NA,NA,5.8,NA,NA,NA,NA,NA,NA,NA,NA,747,NA,NA,NA,NA,NA,NA,NA,NA,750,NA,NA,NA,NA,NA,NA,NA,NA,743,NA,NA,NA,NA,NA,2.67923294292042,6.1816228132982,NA,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,NA,2.38965627997991,NA,5.56815355612325,0.690015329704154,NA,NA,NA,NA,NA,NA,2.84349016287551,0.0920223353681394,1.69321097077376,0.368089341472558,0.184044670736279,0.368089341472558,0.276067006104418,0.892616653070952,1.74842437199465,NA,NA,5.1306307034019,1.34160578423204,2.13879182993514,3.01375212399952,NA,5.67928016629218,NA
2,1,2,10,"Saturday",22,0.01,231,202,0.5,0.6,NA,NA,NA,14.9,NA,NA,NA,NA,NA,NA,NA,NA,5.8,NA,NA,NA,NA,NA,NA,NA,NA,747,NA,NA,NA,NA,NA,NA,NA,NA,750,NA,NA,NA,NA,NA,NA,NA,NA,743,NA,NA,NA,NA,NA,2.67923294292042,8.47583334194495,NA,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,NA,1.99138023331659,NA,5.56815355612325,0.923259948195698,NA,NA,NA,NA,NA,NA,3.1011527019063,0.0920223353681394,1.94167127626774,0.368089341472558,0.184044670736279,0.368089341472558,0.368089341472558,1.73922213845783,2.14412041407765,NA,NA,5.1306307034019,1.19577906855465,2.72209869264472,3.88871241806389,NA,7.42675098668978,NA
3,1,3,10,"Saturday",23,0.01,247,227,0.5,1.5,NA,NA,NA,14.9,NA,NA,NA,NA,NA,NA,NA,NA,5.8,NA,NA,NA,NA,NA,NA,NA,NA,747,NA,NA,NA,NA,NA,NA,NA,NA,750,NA,NA,NA,NA,NA,NA,NA,NA,743,NA,NA,NA,NA,NA,2.67923294292042,8.92192983362627,NA,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,NA,1.7524146053186,NA,5.56815355612325,0.680296803933673,NA,NA,NA,NA,NA,NA,3.06434376775904,0.0920223353681394,2.52141198908702,0.460111676840697,0.184044670736279,0.368089341472558,0.368089341472558,1.7852333061419,1.93246904273093,NA,NA,5.13639545700122,1.40965825154816,3.11096993445111,3.88871241806389,NA,7.68373198968942,NA
4,1,4,10,"Sunday",0,0.01,219,218,0.2,1.2,NA,NA,NA,14,NA,NA,NA,NA,NA,NA,NA,NA,4.8,NA,NA,NA,NA,NA,NA,NA,NA,751,NA,NA,NA,NA,NA,NA,NA,NA,754,NA,NA,NA,NA,NA,NA,NA,NA,748,NA,NA,NA,NA,NA,2.67923294292042,5.09824561921501,NA,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,NA,2.38965627997991,NA,5.6776192223642,0.612267123540305,NA,NA,NA,NA,NA,NA,3.21157950434806,0.184044670736279,2.374176252498,0.460111676840697,0.184044670736279,0.368089341472558,0.276067006104418,1.86805340797323,2.08890701285676,NA,NA,5.21710200739181,1.47771071886428,2.04157401948354,3.20818774490271,NA,4.83124285639335,NA
5,1,5,10,"Sunday",1,0.01,2,216,0.2,0.3,NA,NA,NA,14,NA,NA,NA,NA,NA,NA,NA,NA,4.8,NA,NA,NA,NA,NA,NA,NA,NA,751,NA,NA,NA,NA,NA,NA,NA,NA,754,NA,NA,NA,NA,NA,NA,NA,NA,748,NA,NA,NA,NA,NA,2.67923294292042,4.87519737337435,NA,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,0.114975168664303,NA,2.31000107064725,NA,5.6776192223642,0.694874592589394,NA,NA,NA,NA,NA,NA,3.67169118118876,0.184044670736279,2.46619858786614,0.460111676840697,0.184044670736279,0.368089341472558,0.276067006104418,1.70241320431058,2.60423209091834,NA,NA,5.21710200739181,1.45826715677396,2.13879182993514,3.4998411762575,NA,4.62565805399363,NA
...

### File: SubmissionZerosExceptNAs.csv

This file contains a sample of the submission for the prediction problem.

Each row specifies the prediction for each target measure across all target locations for a given hour in a chunk of contiguous time.

Below is a sample of the file.

"rowID","chunkID","position_within_chunk","hour","month_most_common","target_1_57","target_10_4002","target_10_8003","target_11_1","target_11_32","target_11_50","target_11_64","target_11_1003","target_11_1601","target_11_4002","target_11_8003","target_14_4002","target_14_8003","target_15_57","target_2_57","target_3_1","target_3_50","target_3_57","target_3_1601","target_3_4002","target_3_6006","target_4_1","target_4_50","target_4_57","target_4_1018","target_4_1601","target_4_2001","target_4_4002","target_4_4101","target_4_6006","target_4_8003","target_5_6006","target_7_57","target_8_57","target_8_4002","target_8_6004","target_8_8003","target_9_4002","target_9_8003"
193,1,193,21,10,0,0,-1e+06,0,0,0,0,0,0,0,-1e+06,0,-1e+06,0,0,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,0,0,0,0,0,0,0,0,0,-1e+06,-1e+06,0,0,0,0,-1e+06,0,-1e+06
194,1,194,22,10,0,0,-1e+06,0,0,0,0,0,0,0,-1e+06,0,-1e+06,0,0,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,0,0,0,0,0,0,0,0,0,-1e+06,-1e+06,0,0,0,0,-1e+06,0,-1e+06
195,1,195,23,10,0,0,-1e+06,0,0,0,0,0,0,0,-1e+06,0,-1e+06,0,0,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,0,0,0,0,0,0,0,0,0,-1e+06,-1e+06,0,0,0,0,-1e+06,0,-1e+06
196,1,196,0,10,0,0,-1e+06,0,0,0,0,0,0,0,-1e+06,0,-1e+06,0,0,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,0,0,0,0,0,0,0,0,0,-1e+06,-1e+06,0,0,0,0,-1e+06,0,-1e+06
197,1,197,1,10,0,0,-1e+06,0,0,0,0,0,0,0,-1e+06,0,-1e+06,0,0,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,-1e+06,0,0,0,0,0,0,0,0,0,-1e+06,-1e+06,0,0,0,0,-1e+06,0,-1e+06
...

## Framing the Prediction Problem

A large part of the challenge of this prediction problem is the vast number of ways that the problem can be framed for modeling.

This is challenging because it is not clear which framing may be the best for this specific modeling problem.

For example, below are some questions to provoke thought about how the problem could be framed.

• Is it better to impute or ignore missing observations?
• Is it better to feed in a time series of weather observations or only the observations for the current hour?
• Is it better to use weather observations from one or multiple source locations for a forecast?
• Is it better to have one model for each location or one mode for all locations?
• Is it better to have one model for each forecast time or one for all forecast times?

This section provides more resources on the topic if you are looking to go deeper.

## Summary

In this post, you discovered the Kaggle air-quality dataset that provides a standard dataset for complex time series forecasting.

Specifically, you learned:

• The competition and motivation for addressing the air-quality dataset.
• An overview of the defined prediction problem and the data challenges it covers.
• A description of the free data files that can download and start working with immediately.

Have you worked on this dataset, or do you intend to?

]]>
https://machinelearningmastery.com/standard-multivariate-multi-step-multi-site-time-series-forecasting-problem/feed/ 35
How to Convert a Time Series to a Supervised Learning Problem in Python https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/ https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/#comments Sun, 07 May 2017 19:00:31 +0000 https://35.82.237.216/?p=3984 Machine learning methods like deep learning can be used for time series forecasting. Before machine learning can be used, time series forecasting problems must be re-framed as supervised learning problems. From a sequence to pairs of input and output sequences. In this tutorial, you will discover how to transform univariate and multivariate time series forecasting […]

The post How to Convert a Time Series to a Supervised Learning Problem in Python appeared first on MachineLearningMastery.com.

]]>
Machine learning methods like deep learning can be used for time series forecasting.

Before machine learning can be used, time series forecasting problems must be re-framed as supervised learning problems. From a sequence to pairs of input and output sequences.

In this tutorial, you will discover how to transform univariate and multivariate time series forecasting problems into supervised learning problems for use with machine learning algorithms.

After completing this tutorial, you will know:

• How to develop a function to transform a time series dataset into a supervised learning dataset.
• How to transform univariate time series data for machine learning.
• How to transform multivariate time series data for machine learning.

Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

How to Convert a Time Series to a Supervised Learning Problem in Python
Photo by Quim Gil, some rights reserved.

## Time Series vs Supervised Learning

Before we get started, let’s take a moment to better understand the form of time series and supervised learning data.

A time series is a sequence of numbers that are ordered by a time index. This can be thought of as a list or column of ordered values.

For example:

0
1
2
3
4
5
6
7
8
9

A supervised learning problem is comprised of input patterns (X) and output patterns (y), such that an algorithm can learn how to predict the output patterns from the input patterns.

For example:

X,	y
1	2
2,	3
3,	4
4,	5
5,	6
6,	7
7,	8
8,	9

For more on this topic, see the post:

## Pandas shift() Function

A key function to help transform time series data into a supervised learning problem is the Pandas shift() function.

Given a DataFrame, the shift() function can be used to create copies of columns that are pushed forward (rows of NaN values added to the front) or pulled back (rows of NaN values added to the end).

This is the behavior required to create columns of lag observations as well as columns of forecast observations for a time series dataset in a supervised learning format.

Let’s look at some examples of the shift function in action.

We can define a mock time series dataset as a sequence of 10 numbers, in this case a single column in a DataFrame as follows:

from pandas import DataFrame
df = DataFrame()
df['t'] = [x for x in range(10)]
print(df)

Running the example prints the time series data with the row indices for each observation.

t
0  0
1  1
2  2
3  3
4  4
5  5
6  6
7  7
8  8
9  9

We can shift all the observations down by one time step by inserting one new row at the top. Because the new row has no data, we can use NaN to represent “no data”.

The shift function can do this for us and we can insert this shifted column next to our original series.

from pandas import DataFrame
df = DataFrame()
df['t'] = [x for x in range(10)]
df['t-1'] = df['t'].shift(1)
print(df)

Running the example gives us two columns in the dataset. The first with the original observations and a new shifted column.

We can see that shifting the series forward one time step gives us a primitive supervised learning problem, although with X and y in the wrong order. Ignore the column of row labels. The first row would have to be discarded because of the NaN value. The second row shows the input value of 0.0 in the second column (input or X) and the value of 1 in the first column (output or y).

t  t-1
0  0  NaN
1  1  0.0
2  2  1.0
3  3  2.0
4  4  3.0
5  5  4.0
6  6  5.0
7  7  6.0
8  8  7.0
9  9  8.0

We can see that if we can repeat this process with shifts of 2, 3, and more, how we could create long input sequences (X) that can be used to forecast an output value (y).

The shift operator can also accept a negative integer value. This has the effect of pulling the observations up by inserting new rows at the end. Below is an example:

from pandas import DataFrame
df = DataFrame()
df['t'] = [x for x in range(10)]
df['t+1'] = df['t'].shift(-1)
print(df)

Running the example shows a new column with a NaN value as the last value.

We can see that the forecast column can be taken as an input (X) and the second as an output value (y). That is the input value of 0 can be used to forecast the output value of 1.

t  t+1
0  0  1.0
1  1  2.0
2  2  3.0
3  3  4.0
4  4  5.0
5  5  6.0
6  6  7.0
7  7  8.0
8  8  9.0
9  9  NaN

Technically, in time series forecasting terminology the current time (t) and future times (t+1, t+n) are forecast times and past observations (t-1, t-n) are used to make forecasts.

We can see how positive and negative shifts can be used to create a new DataFrame from a time series with sequences of input and output patterns for a supervised learning problem.

This permits not only classical X -> y prediction, but also X -> Y where both input and output can be sequences.

Further, the shift function also works on so-called multivariate time series problems. That is where instead of having one set of observations for a time series, we have multiple (e.g. temperature and pressure). All variates in the time series can be shifted forward or backward to create multivariate input and output sequences. We will explore this more later in the tutorial.

## The series_to_supervised() Function

We can use the shift() function in Pandas to automatically create new framings of time series problems given the desired length of input and output sequences.

This would be a useful tool as it would allow us to explore different framings of a time series problem with machine learning algorithms to see which might result in better performing models.

In this section, we will define a new Python function named series_to_supervised() that takes a univariate or multivariate time series and frames it as a supervised learning dataset.

The function takes four arguments:

• data: Sequence of observations as a list or 2D NumPy array. Required.
• n_in: Number of lag observations as input (X). Values may be between [1..len(data)] Optional. Defaults to 1.
• n_out: Number of observations as output (y). Values may be between [0..len(data)-1]. Optional. Defaults to 1.
• dropnan: Boolean whether or not to drop rows with NaN values. Optional. Defaults to True.

The function returns a single value:

• return: Pandas DataFrame of series framed for supervised learning.

The new dataset is constructed as a DataFrame, with each column suitably named both by variable number and time step. This allows you to design a variety of different time step sequence type forecasting problems from a given univariate or multivariate time series.

Once the DataFrame is returned, you can decide how to split the rows of the returned DataFrame into X and y components for supervised learning any way you wish.

The function is defined with default parameters so that if you call it with just your data, it will construct a DataFrame with t-1 as X and t as y.

The function is confirmed to be compatible with Python 2 and Python 3.

The complete function is listed below, including function comments.

from pandas import DataFrame
from pandas import concat

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
"""
Frame a time series as a supervised learning dataset.
Arguments:
data: Sequence of observations as a list or NumPy array.
n_in: Number of lag observations as input (X).
n_out: Number of observations as output (y).
dropnan: Boolean whether or not to drop rows with NaN values.
Returns:
Pandas DataFrame of series framed for supervised learning.
"""
n_vars = 1 if type(data) is list else data.shape[1]
df = DataFrame(data)
cols, names = list(), list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
if i == 0:
names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
else:
names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
# put it all together
agg = concat(cols, axis=1)
agg.columns = names
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg

Can you see obvious ways to make the function more robust or more readable?

Now that we have the whole function, we can explore how it may be used.

## One-Step Univariate Forecasting

It is standard practice in time series forecasting to use lagged observations (e.g. t-1) as input variables to forecast the current time step (t).

This is called one-step forecasting.

The example below demonstrates a one lag time step (t-1) to predict the current time step (t).

from pandas import DataFrame
from pandas import concat

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
"""
Frame a time series as a supervised learning dataset.
Arguments:
data: Sequence of observations as a list or NumPy array.
n_in: Number of lag observations as input (X).
n_out: Number of observations as output (y).
dropnan: Boolean whether or not to drop rows with NaN values.
Returns:
Pandas DataFrame of series framed for supervised learning.
"""
n_vars = 1 if type(data) is list else data.shape[1]
df = DataFrame(data)
cols, names = list(), list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
if i == 0:
names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
else:
names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
# put it all together
agg = concat(cols, axis=1)
agg.columns = names
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg

values = [x for x in range(10)]
data = series_to_supervised(values)
print(data)

Running the example prints the output of the reframed time series.

var1(t-1)  var1(t)
1        0.0        1
2        1.0        2
3        2.0        3
4        3.0        4
5        4.0        5
6        5.0        6
7        6.0        7
8        7.0        8
9        8.0        9

We can see that the observations are named “var1” and that the input observation is suitably named (t-1) and the output time step is named (t).

We can also see that rows with NaN values have been automatically removed from the DataFrame.

We can repeat this example with an arbitrary number length input sequence, such as 3. This can be done by specifying the length of the input sequence as an argument; for example:

data = series_to_supervised(values, 3)

The complete example is listed below.

from pandas import DataFrame
from pandas import concat

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
"""
Frame a time series as a supervised learning dataset.
Arguments:
data: Sequence of observations as a list or NumPy array.
n_in: Number of lag observations as input (X).
n_out: Number of observations as output (y).
dropnan: Boolean whether or not to drop rows with NaN values.
Returns:
Pandas DataFrame of series framed for supervised learning.
"""
n_vars = 1 if type(data) is list else data.shape[1]
df = DataFrame(data)
cols, names = list(), list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
if i == 0:
names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
else:
names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
# put it all together
agg = concat(cols, axis=1)
agg.columns = names
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg

values = [x for x in range(10)]
data = series_to_supervised(values, 3)
print(data)

Again, running the example prints the reframed series. We can see that the input sequence is in the correct left-to-right order with the output variable to be predicted on the far right.

var1(t-3)  var1(t-2)  var1(t-1)  var1(t)
3        0.0        1.0        2.0        3
4        1.0        2.0        3.0        4
5        2.0        3.0        4.0        5
6        3.0        4.0        5.0        6
7        4.0        5.0        6.0        7
8        5.0        6.0        7.0        8
9        6.0        7.0        8.0        9

## Multi-Step or Sequence Forecasting

A different type of forecasting problem is using past observations to forecast a sequence of future observations.

This may be called sequence forecasting or multi-step forecasting.

We can frame a time series for sequence forecasting by specifying another argument. For example, we could frame a forecast problem with an input sequence of 2 past observations to forecast 2 future observations as follows:

data = series_to_supervised(values, 2, 2)

The complete example is listed below:

from pandas import DataFrame
from pandas import concat

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
"""
Frame a time series as a supervised learning dataset.
Arguments:
data: Sequence of observations as a list or NumPy array.
n_in: Number of lag observations as input (X).
n_out: Number of observations as output (y).
dropnan: Boolean whether or not to drop rows with NaN values.
Returns:
Pandas DataFrame of series framed for supervised learning.
"""
n_vars = 1 if type(data) is list else data.shape[1]
df = DataFrame(data)
cols, names = list(), list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
if i == 0:
names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
else:
names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
# put it all together
agg = concat(cols, axis=1)
agg.columns = names
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg

values = [x for x in range(10)]
data = series_to_supervised(values, 2, 2)
print(data)

Running the example shows the differentiation of input (t-n) and output (t+n) variables with the current observation (t) considered an output.

var1(t-2)  var1(t-1)  var1(t)  var1(t+1)
2        0.0        1.0        2        3.0
3        1.0        2.0        3        4.0
4        2.0        3.0        4        5.0
5        3.0        4.0        5        6.0
6        4.0        5.0        6        7.0
7        5.0        6.0        7        8.0
8        6.0        7.0        8        9.0

## Multivariate Forecasting

Another important type of time series is called multivariate time series.

This is where we may have observations of multiple different measures and an interest in forecasting one or more of them.

For example, we may have two sets of time series observations obs1 and obs2 and we wish to forecast one or both of these.

We can call series_to_supervised() in exactly the same way.

For example:

from pandas import DataFrame
from pandas import concat

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
"""
Frame a time series as a supervised learning dataset.
Arguments:
data: Sequence of observations as a list or NumPy array.
n_in: Number of lag observations as input (X).
n_out: Number of observations as output (y).
dropnan: Boolean whether or not to drop rows with NaN values.
Returns:
Pandas DataFrame of series framed for supervised learning.
"""
n_vars = 1 if type(data) is list else data.shape[1]
df = DataFrame(data)
cols, names = list(), list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
if i == 0:
names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
else:
names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
# put it all together
agg = concat(cols, axis=1)
agg.columns = names
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg

raw = DataFrame()
raw['ob1'] = [x for x in range(10)]
raw['ob2'] = [x for x in range(50, 60)]
values = raw.values
data = series_to_supervised(values)
print(data)

Running the example prints the new framing of the data, showing an input pattern with one time step for both variables and an output pattern of one time step for both variables.

Again, depending on the specifics of the problem, the division of columns into X and Y components can be chosen arbitrarily, such as if the current observation of var1 was also provided as input and only var2 was to be predicted.

var1(t-1)  var2(t-1)  var1(t)  var2(t)
1        0.0       50.0        1       51
2        1.0       51.0        2       52
3        2.0       52.0        3       53
4        3.0       53.0        4       54
5        4.0       54.0        5       55
6        5.0       55.0        6       56
7        6.0       56.0        7       57
8        7.0       57.0        8       58
9        8.0       58.0        9       59

You can see how this may be easily used for sequence forecasting with multivariate time series by specifying the length of the input and output sequences as above.

For example, below is an example of a reframing with 1 time step as input and 2 time steps as forecast sequence.

from pandas import DataFrame
from pandas import concat

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
"""
Frame a time series as a supervised learning dataset.
Arguments:
data: Sequence of observations as a list or NumPy array.
n_in: Number of lag observations as input (X).
n_out: Number of observations as output (y).
dropnan: Boolean whether or not to drop rows with NaN values.
Returns:
Pandas DataFrame of series framed for supervised learning.
"""
n_vars = 1 if type(data) is list else data.shape[1]
df = DataFrame(data)
cols, names = list(), list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
if i == 0:
names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
else:
names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
# put it all together
agg = concat(cols, axis=1)
agg.columns = names
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg

raw = DataFrame()
raw['ob1'] = [x for x in range(10)]
raw['ob2'] = [x for x in range(50, 60)]
values = raw.values
data = series_to_supervised(values, 1, 2)
print(data)

Running the example shows the large reframed DataFrame.

var1(t-1)  var2(t-1)  var1(t)  var2(t)  var1(t+1)  var2(t+1)
1        0.0       50.0        1       51        2.0       52.0
2        1.0       51.0        2       52        3.0       53.0
3        2.0       52.0        3       53        4.0       54.0
4        3.0       53.0        4       54        5.0       55.0
5        4.0       54.0        5       55        6.0       56.0
6        5.0       55.0        6       56        7.0       57.0
7        6.0       56.0        7       57        8.0       58.0
8        7.0       57.0        8       58        9.0       59.0

Experiment with your own dataset and try multiple different framings to see what works best.

## Summary

In this tutorial, you discovered how to reframe time series datasets as supervised learning problems with Python.

Specifically, you learned:

• About the Pandas shift() function and how it can be used to automatically define supervised learning datasets from time series data.
• How to reframe a univariate time series into one-step and multi-step supervised learning problems.
• How to reframe multivariate time series into one-step and multi-step supervised learning problems.

Do you have any questions?

The post How to Convert a Time Series to a Supervised Learning Problem in Python appeared first on MachineLearningMastery.com.

]]>
https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/feed/ 385
Seasonal Persistence Forecasting With Python https://machinelearningmastery.com/seasonal-persistence-forecasting-python/ https://machinelearningmastery.com/seasonal-persistence-forecasting-python/#comments Tue, 04 Apr 2017 19:00:23 +0000 https://35.82.237.216/?p=3872 It is common to use persistence or naive forecasts as a first-cut forecast on time series problems. A better first-cut forecast on time series data with a seasonal component is to persist the observation for the same time in the previous season. This is called seasonal persistence. In this tutorial, you will discover how to […]

The post Seasonal Persistence Forecasting With Python appeared first on MachineLearningMastery.com.

]]>
It is common to use persistence or naive forecasts as a first-cut forecast on time series problems.

A better first-cut forecast on time series data with a seasonal component is to persist the observation for the same time in the previous season. This is called seasonal persistence.

In this tutorial, you will discover how to implement seasonal persistence for time series forecasting in Python.

After completing this tutorial, you will know:

• How to use point observations from prior seasons for a persistence forecast.
• How to use mean observations across a sliding window of prior seasons for a persistence forecast.
• How to apply and evaluate seasonal persistence on monthly and daily time series data.

Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

• Updated Apr/2019: Updated the links to datasets.

## Seasonal Persistence

It is critical to have a useful first-cut forecast on time series problems to provide a lower-bound on skill before moving on to more sophisticated methods.

This is to ensure we are not wasting time on models or datasets that are not predictive.

It is common to use a persistence or a naive forecast as a first-cut forecast model when time series forecasting.

This does not make sense with time series data that has an obvious seasonal component. A better first cut model for seasonal data is to use the observation at the same time in the previous seasonal cycle as the prediction.

We can call this “seasonal persistence” and it is a simple model that can result in an effective first cut model.

One step better is to use a simple function of the last few observations at the same time in previous seasonal cycles. For example, the mean of the observations. This can often provide a small additional benefit.

In this tutorial, we will demonstrate this simple seasonal persistence forecasting method for providing a lower bound on forecast skill on three different real-world time series datasets.

## Seasonal Persistence with Sliding Window

In this tutorial, we will use a sliding window seasonal persistence model to make forecasts.

Within a sliding window, observations at the same time in previous one-year seasons will be collected and the mean of those observations can be used as the persisted forecast.

Different window sizes can be evaluated to find a combination that minimizes error.

As an example, if the data is monthly and the month to be predicted is February, then with a window of size 1 (w=1) the observation last February will be used to make the forecast.

A window of size 2 (w=2) would involve taking observations for the last two Februaries to be averaged and used as a forecast.

An alternate interpretation might seek to use point observations from prior years (e.g. t-12, t-24, etc. for monthly data) rather than taking the mean of the cumulative point observations. Perhaps try both methods on your dataset and see what works best as a good starting point model.

### Stop learning Time Series Forecasting the slow way!

Take my free 7-day email course and discover how to get started (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

## Experimental Test Harness

It is important to evaluate time series forecasting models consistently.

In this section, we will define how we will evaluate forecast models in this tutorial.

First, we will hold the last two years of data back and evaluate forecasts on this data. This works for both monthly and daily data we will look at.

We will use a walk-forward validation to evaluate model performance. This means that each time step in the test dataset will be enumerated, a model constructed on historical data, and the forecast compared to the expected value. The observation will then be added to the training dataset and the process repeated.

Walk-forward validation is a realistic way to evaluate time series forecast models as one would expect models to be updated as new observations are made available.

Finally, forecasts will be evaluated using root mean squared error, or RMSE. The benefit of RMSE is that it penalizes large errors and the scores are in the same units as the forecast values (car sales per month).

In summary, the test harness involves:

• The last 2 years of data used as a test set.
• Walk-forward validation for model evaluation.
• Root mean squared error used to report model skill.

## Case Study 1: Monthly Car Sales Dataset

The Monthly Car Sales dataset describes the number of car sales in Quebec, Canada between 1960 and 1968.

The units are a count of the number of sales and there are 108 observations. The source of the data is credited to Abraham and Ledolter (1983).

Download the dataset and save it into your current working directory with the filename “car-sales.csv“. Note, you may need to delete the footer information from the file.

The code below loads the dataset as a Pandas Series object.

# line plot of time series
from matplotlib import pyplot
# display first few rows
# line plot of dataset
series.plot()
pyplot.show()

Running the example prints the first 5 rows of data.

Month
1960-01-01 6550
1960-02-01 8728
1960-03-01 12026
1960-04-01 14395
1960-05-01 14587
Name: Sales, dtype: int64

A line plot of the data is also provided. We can see both a yearly seasonal component and an increasing trend.

Line Plot of Monthly Car Sales Dataset

The prior 24 months of data will be held back as test data. We will investigate seasonal persistence with a sliding window from 1 to 5 years.

The complete example is listed below.

from sklearn.metrics import mean_squared_error
from math import sqrt
from numpy import mean
from matplotlib import pyplot
# prepare data
X = series.values
train, test = X[0:-24], X[-24:]
# evaluate mean of different number of years
years = [1, 2, 3, 4, 5]
scores = list()
for year in years:
# walk-forward validation
history = [x for x in train]
predictions = list()
for i in range(len(test)):
# collect obs
obs = list()
for y in range(1, year+1):
obs.append(history[-(y*12)])
# make prediction
yhat = mean(obs)
predictions.append(yhat)
# observation
history.append(test[i])
# report performance
rmse = sqrt(mean_squared_error(test, predictions))
scores.append(rmse)
print('Years=%d, RMSE: %.3f' % (year, rmse))
pyplot.plot(years, scores)
pyplot.show()

Running the example prints the year number and the RMSE for the mean observation from the sliding window of observations at the same month in prior years.

The results suggest that taking the average from the last three years is a good starting model with an RMSE of 1803.630 car sales.

Years=1, RMSE: 1997.732
Years=2, RMSE: 1914.911
Years=3, RMSE: 1803.630
Years=4, RMSE: 2099.481
Years=5, RMSE: 2522.235

A plot of the relationship of sliding window size to model error is created.

The plot nicely shows the improvement with the sliding window size to 3 years, then the rapid increase in error from that point.

Sliding Window Size to RMSE for Monthly Car Sales

## Case Study 2: Monthly Writing Paper Sales Dataset

The Monthly Writing Paper Sales dataset describes the number of specialty writing paper sales.

The units are a type of count of the number of sales and there are 147 months of observations (just over 12 years). The counts are fractional, suggesting the data may in fact be in the units of hundreds of thousands of sales. The source of the data is credited to Makridakis and Wheelwright (1989).

Download the dataset and save it into your current working directory with the filename “writing-paper-sales.csv“. Note, you may need to delete the footer information from the file.

The date-time stamps only contain the year number and month. Therefore, a custom date-time parsing function is required to load the data and base the data in an arbitrary year. The year 1900 was chosen as the starting point, which should not affect this case study.

The example below loads the Monthly Writing Paper Sales dataset as a Pandas Series.

from pandas import datetime
from matplotlib import pyplot
def parser(x):
if len(x) == 4:
return datetime.strptime('190'+x, '%Y-%m')
return datetime.strptime('19'+x, '%Y-%m')
# summarize first few rows
# line plot
series.plot()
pyplot.show()

Running the example prints the first 5 rows of the loaded dataset.

Month
1901-01-01 1359.795
1901-02-01 1278.564
1901-03-01 1508.327
1901-04-01 1419.710
1901-05-01 1440.510

A line plot of the loaded dataset is then created. We can see the yearly seasonal component and an increasing trend.

Line Plot of Monthly Writing Paper Sales Dataset

As in the previous example, we can hold back the last 24 months of observations as a test dataset. Because we have much more data, we will try sliding window sizes from 1 year to 10 years.

The complete example is listed below.

from pandas import datetime
from sklearn.metrics import mean_squared_error
from math import sqrt
from numpy import mean
from matplotlib import pyplot
def parser(x):
if len(x) == 4:
return datetime.strptime('190'+x, '%Y-%m')
return datetime.strptime('19'+x, '%Y-%m')
# prepare data
X = series.values
train, test = X[0:-24], X[-24:]
# evaluate mean of different number of years
years = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
scores = list()
for year in years:
# walk-forward validation
history = [x for x in train]
predictions = list()
for i in range(len(test)):
# collect obs
obs = list()
for y in range(1, year+1):
obs.append(history[-(y*12)])
# make prediction
yhat = mean(obs)
predictions.append(yhat)
# observation
history.append(test[i])
# report performance
rmse = sqrt(mean_squared_error(test, predictions))
scores.append(rmse)
print('Years=%d, RMSE: %.3f' % (year, rmse))
pyplot.plot(years, scores)
pyplot.show()

Running the example prints the size of the sliding window and the resulting seasonal persistence model error.

The results suggest that a window size of 5 years is optimal, with an RMSE of 554.660 monthly writing paper sales.

Years=1, RMSE: 606.089
Years=2, RMSE: 557.653
Years=3, RMSE: 555.777
Years=4, RMSE: 544.251
Years=5, RMSE: 540.317
Years=6, RMSE: 554.660
Years=7, RMSE: 569.032
Years=8, RMSE: 581.405
Years=9, RMSE: 602.279
Years=10, RMSE: 624.756

The relationship between window size and error is graphed on a line plot showing a similar trend in error to the previous scenario. Error drops to an inflection point (in this case 5 years) before increasing again.

Sliding Window Size to RMSE for Monthly Writing Paper Sales

## Case Study 3: Daily Maximum Melbourne Temperatures Dataset

The Daily Maximum Melbourne Temperatures dataset describes the daily temperatures in the city Melbourne, Australia from 1981 to 1990.

The units are in degrees Celsius and there 3,650 observations, or 10 years of data. The source of the data is credited to the Australian Bureau of Meteorology.

Download the dataset and save it into your current working directory with the filename “max-daily-temps.csv“. Note, you may need to delete the footer information from the file.

# line plot of time series
from matplotlib import pyplot
# display first few rows
# line plot of dataset
series.plot()
pyplot.show()

Running the example prints the first 5 rows of data.

Date
1981-01-01 38.1
1981-01-02 32.4
1981-01-03 34.5
1981-01-04 20.7
1981-01-05 21.5

A line plot is also created. We can see we have a lot more observations than the previous two scenarios and that there is a clear seasonal trend in the data.

Line Plot of Daily Melbourne Maximum Temperatures Dataset

Because the data is daily, we need to specify the years in the test data as a function of 365 days rather than 12 months.

This ignores leap years, which is a complication that could, or even should, be addressed in your own project.

The complete example of seasonal persistence is listed below.

from sklearn.metrics import mean_squared_error
from math import sqrt
from numpy import mean
from matplotlib import pyplot
# prepare data
X = series.values
train, test = X[0:-(2*365)], X[-(2*365):]
# evaluate mean of different number of years
years = [1, 2, 3, 4, 5, 6, 7, 8]
scores = list()
for year in years:
# walk-forward validation
history = [x for x in train]
predictions = list()
for i in range(len(test)):
# collect obs
obs = list()
for y in range(1, year+1):
obs.append(history[-(y*365)])
# make prediction
yhat = mean(obs)
predictions.append(yhat)
# observation
history.append(test[i])
# report performance
rmse = sqrt(mean_squared_error(test, predictions))
scores.append(rmse)
print('Years=%d, RMSE: %.3f' % (year, rmse))
pyplot.plot(years, scores)
pyplot.show()

Running the example prints the size of the sliding window and the corresponding model error.

Unlike the previous two cases, we can see a trend where the skill continues to improve as the window size is increased.

The best result is a sliding window of all 8 years of historical data with an RMSE of 4.271.

Years=1, RMSE: 5.950
Years=2, RMSE: 5.083
Years=3, RMSE: 4.664
Years=4, RMSE: 4.539
Years=5, RMSE: 4.448
Years=6, RMSE: 4.358
Years=7, RMSE: 4.371
Years=8, RMSE: 4.271

The plot of sliding window size to model error makes this trend apparent.

It suggests that getting more historical data for this problem might be useful if an optimal model turns out to be a function of the observations on the same day in prior years.

Sliding Window Size to RMSE for Daily Melbourne Maximum Temperature

We might do just as well if the observations were averaged from the same week or month in previous seasons, and this might prove a fruitful experiment.

## Summary

In this tutorial, you discovered seasonal persistence for time series forecasting.

You learned:

• How to use point observations from prior seasons for a persistence forecast.
• How to use a mean of a sliding window across multiple prior seasons for a persistence forecast.
• How to apply seasonal persistence to daily and monthly time series data.

Do you have any questions about persistence with seasonal data?

The post Seasonal Persistence Forecasting With Python appeared first on MachineLearningMastery.com.

]]>
https://machinelearningmastery.com/seasonal-persistence-forecasting-python/feed/ 22
Simple Time Series Forecasting Models to Test So That You Don’t Fool Yourself https://machinelearningmastery.com/simple-time-series-forecasting-models/ https://machinelearningmastery.com/simple-time-series-forecasting-models/#comments Thu, 30 Mar 2017 18:00:07 +0000 https://35.82.237.216/?p=3858 It is important to establish a strong baseline of performance on a time series forecasting problem and to not fool yourself into thinking that sophisticated methods are skillful, when in fact they are not. This requires that you evaluate a suite of standard naive, or simple, time series forecasting models to get an idea of […]

The post Simple Time Series Forecasting Models to Test So That You Don’t Fool Yourself appeared first on MachineLearningMastery.com.

]]>
It is important to establish a strong baseline of performance on a time series forecasting problem and to not fool yourself into thinking that sophisticated methods are skillful, when in fact they are not.

This requires that you evaluate a suite of standard naive, or simple, time series forecasting models to get an idea of the worst acceptable performance on the problem for more sophisticated models to beat.

Applying these simple models can also uncover new ideas about more advanced methods that may result in better performance.

In this tutorial, you will discover how to implement and automate three standard baseline time series forecasting methods on a real world dataset.

Specifically, you will learn:

• How to automate the persistence model and test a suite of persisted values.
• How to automate the expanding window model.
• How to automate the rolling window forecast model and test a suite of window sizes.

This is an important topic and highly recommended for any time series forecasting project.

Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

• Updated Apr/2019: Updated the link to dataset.

## Overview

This tutorial is broken down into the following 5 parts:

1. Monthly Car Sales Dataset: An overview of the standard time series dataset we will use.
2. Test Setup: How we will evaluate forecast models in this tutorial.
3. Persistence Forecast: The persistence forecast and how to automate it.
4. Expanding Window Forecast: The expanding window forecast and how to automate it.
5. Rolling Window Forecast: The rolling window forecast and how to automate it.

An up-to-date Python SciPy environment is used, including Python 2 or 3, Pandas, Numpy, and Matplotlib.

## Monthly Car Sales Dataset

In this tutorial, we will use the Monthly Car Sales dataset.

This dataset describes the number of car sales in Quebec, Canada between 1960 and 1968.

The units are a count of the number of sales and there are 108 observations. The source data is credited to Abraham and Ledolter (1983).

Download the dataset and save it into your current working directory with the filename “car-sales.csv“. Note, you may need to delete the footer information from the file.

The code below loads the dataset as a Pandas Series object.

# line plot of time series
from matplotlib import pyplot
# display first few rows
# line plot of dataset
series.plot()
pyplot.show()

Running the example prints the first 5 rows of data.

Month
1960-01-01 6550
1960-02-01 8728
1960-03-01 12026
1960-04-01 14395
1960-05-01 14587
Name: Sales, dtype: int64

A line plot of the data is also provided.

Monthly Car Sales Dataset Line Plot

## Experimental Test Setup

It is important to evaluate time series forecasting models consistently.

In this section, we will define how we will evaluate the three forecast models in this tutorial.

First, we will hold the last two years of data back and evaluate forecasts on this data. Given the data is monthly, this means that the last 24 observations will be used as test data.

We will use a walk-forward validation method to evaluate model performance. This means that each time step in the test dataset will be enumerated, a model constructed on history data, and the forecast compared to the expected value. The observation will then be added to the training dataset and the process repeated.

Walk-forward validation is a realistic way to evaluate time series forecast models as one would expect models to be updated as new observations are made available.

Finally, forecasts will be evaluated using root mean squared error or RMSE. The benefit of RMSE is that it penalizes large errors and the scores are in the same units as the forecast values (car sales per month).

In summary, the test harness involves:

• The last 2 years of data used a test set.
• Walk-forward validation for model evaluation.
• Root mean squared error used to report model skill.

## Optimized Persistence Forecast

The persistence forecast involves using the previous observation to predict the next time step.

For this reason, the approach is often called the naive forecast.

Why stop with using the previous observation? In this section, we will look at automating the persistence forecast and evaluate the use of any arbitrary prior time step to predict the next time step.

We will explore using each of the prior 24 months of point observations in a persistence model. Each configuration will be evaluated using the test harness and RMSE scores collected. We will then display the scores and graph the relationship between the persisted time step and the model skill.

The complete example is listed below.

from sklearn.metrics import mean_squared_error
from math import sqrt
from matplotlib import pyplot
# prepare data
X = series.values
train, test = X[0:-24], X[-24:]
persistence_values = range(1, 25)
scores = list()
for p in persistence_values:
# walk-forward validation
history = [x for x in train]
predictions = list()
for i in range(len(test)):
# make prediction
yhat = history[-p]
predictions.append(yhat)
# observation
history.append(test[i])
# report performance
rmse = sqrt(mean_squared_error(test, predictions))
scores.append(rmse)
print('p=%d RMSE:%.3f' % (p, rmse))
# plot scores over persistence values
pyplot.plot(persistence_values, scores)
pyplot.show()

Running the example prints the RMSE for each persisted point observation.

p=1 RMSE:3947.200
p=2 RMSE:5485.353
p=3 RMSE:6346.176
p=4 RMSE:6474.553
p=5 RMSE:5756.543
p=6 RMSE:5756.076
p=7 RMSE:5958.665
p=8 RMSE:6543.266
p=9 RMSE:6450.839
p=10 RMSE:5595.971
p=11 RMSE:3806.482
p=12 RMSE:1997.732
p=13 RMSE:3968.987
p=14 RMSE:5210.866
p=15 RMSE:6299.040
p=16 RMSE:6144.881
p=17 RMSE:5349.691
p=18 RMSE:5534.784
p=19 RMSE:5655.016
p=20 RMSE:6746.872
p=21 RMSE:6784.611
p=22 RMSE:5642.737
p=23 RMSE:3692.062
p=24 RMSE:2119.103

A plot of the persisted value (t-n) to model skill (RMSE) is also created.

From the results, it is clear that persisting the observation from 12 months ago or 24 months ago is a great starting point on this dataset.

The best result achieved involved persisting the result from t-12 with an RMSE of 1997.732 car sales.

This is an obvious result, but also very useful.

We would expect that a forecast model that is some weighted combination of the observations at t-12, t-24, t-36 and so on would be a powerful starting point.

It also points out that the naive t-1 persistence would have been a less desirable starting point on this dataset.

Persisted Observation to RMSE on the Monthly Car Sales Dataset

We can use the t-12 model to make a prediction and plot it against the test data.

The complete example is listed below.

from sklearn.metrics import mean_squared_error
from math import sqrt
from matplotlib import pyplot
# prepare data
X = series.values
train, test = X[0:-24], X[-24:]
# walk-forward validation
history = [x for x in train]
predictions = list()
for i in range(len(test)):
# make prediction
yhat = history[-12]
predictions.append(yhat)
# observation
history.append(test[i])
# plot predictions vs observations
pyplot.plot(test)
pyplot.plot(predictions)
pyplot.show()

Running the example plots the test dataset (blue) against the predicted values (orange).

Line Plot of Predicted Values vs Test Dataset for the t-12 Persistence Model

## Expanding Window Forecast

An expanding window refers to a model that calculates a statistic on all available historic data and uses that to make a forecast.

It is an expanding window because it grows as more real observations are collected.

Two good starting point statistics to calculate are the mean and the median historical observation.

The example below uses the expanding window mean as the forecast.

from sklearn.metrics import mean_squared_error
from math import sqrt
from numpy import mean
# prepare data
X = series.values
train, test = X[0:-24], X[-24:]
# walk-forward validation
history = [x for x in train]
predictions = list()
for i in range(len(test)):
# make prediction
yhat = mean(history)
predictions.append(yhat)
# observation
history.append(test[i])
# report performance
rmse = sqrt(mean_squared_error(test, predictions))
print('RMSE: %.3f' % rmse)

Running the example prints the RMSE evaluation of the approach.

RMSE: 5113.067

We can also repeat the same experiment with the median of the historical observations. The complete example is listed below.

from sklearn.metrics import mean_squared_error
from math import sqrt
from numpy import median
# prepare data
X = series.values
train, test = X[0:-24], X[-24:]
# walk-forward validation
history = [x for x in train]
predictions = list()
for i in range(len(test)):
# make prediction
yhat = median(history)
predictions.append(yhat)
# observation
history.append(test[i])
# report performance
rmse = sqrt(mean_squared_error(test, predictions))
print('RMSE: %.3f' % rmse)

Again, running the example prints the skill of the model.

We can see that on this problem the historical mean produced a better result than the median, but both were worse models than using the optimized persistence values.

RMSE: 5527.408

We can plot the mean expanding window predictions against the test dataset to get a feeling for how the forecast actually looks in context.

The complete example is listed below.

from sklearn.metrics import mean_squared_error
from matplotlib import pyplot
from numpy import mean
# prepare data
X = series.values
train, test = X[0:-24], X[-24:]
# walk-forward validation
history = [x for x in train]
predictions = list()
for i in range(len(test)):
# make prediction
yhat = mean(history)
predictions.append(yhat)
# observation
history.append(test[i])
# plot predictions vs observations
pyplot.plot(test)
pyplot.plot(predictions)
pyplot.show()

The plot shows what a poor forecast looks like and how it does not follow the movements of the data at all, other than a slight rising trend.

Line Plot of Predicted Values vs Test Dataset for the Mean Expanding Window Model

You can see more examples of expanding window statistics in the post:

## Rolling Window Forecast

A rolling window model involves calculating a statistic on a fixed contiguous block of prior observations and using it as a forecast.

It is much like the expanding window, but the window size remains fixed and counts backwards from the most recent observation.

It may be more useful on time series problems where recent lag values are more predictive than older lag values.

We will automatically check different rolling window sizes from 1 to 24 months (2 years) and start by calculating the mean observation and using that as a forecast. The complete example is listed below.

from sklearn.metrics import mean_squared_error
from math import sqrt
from matplotlib import pyplot
from numpy import mean
# prepare data
X = series.values
train, test = X[0:-24], X[-24:]
window_sizes = range(1, 25)
scores = list()
for w in window_sizes:
# walk-forward validation
history = [x for x in train]
predictions = list()
for i in range(len(test)):
# make prediction
yhat = mean(history[-w:])
predictions.append(yhat)
# observation
history.append(test[i])
# report performance
rmse = sqrt(mean_squared_error(test, predictions))
scores.append(rmse)
print('w=%d RMSE:%.3f' % (w, rmse))
# plot scores over window sizes values
pyplot.plot(window_sizes, scores)
pyplot.show()

Running the example prints the rolling window size and RMSE for each configuration.

w=1 RMSE:3947.200
w=2 RMSE:4350.413
w=3 RMSE:4701.446
w=4 RMSE:4810.510
w=5 RMSE:4649.667
w=6 RMSE:4549.172
w=7 RMSE:4515.684
w=8 RMSE:4614.551
w=9 RMSE:4653.493
w=10 RMSE:4563.802
w=11 RMSE:4321.599
w=12 RMSE:4023.968
w=13 RMSE:3901.634
w=14 RMSE:3907.671
w=15 RMSE:4017.276
w=16 RMSE:4084.080
w=17 RMSE:4076.399
w=18 RMSE:4085.376
w=19 RMSE:4101.505
w=20 RMSE:4195.617
w=21 RMSE:4269.784
w=22 RMSE:4258.226
w=23 RMSE:4158.029
w=24 RMSE:4021.885

A line plot of window size to error is also created.

The results suggest that a rolling window of w=13 was best with an RMSE of 3,901 monthly car sales.

Line Plot of Rolling Window Size to RMSE for a Mean Forecast on the Monthly Car Sales Dataset

We can repeat this experiment with the median statistic.

The complete example is listed below.

from sklearn.metrics import mean_squared_error
from math import sqrt
from matplotlib import pyplot
from numpy import median
# prepare data
X = series.values
train, test = X[0:-24], X[-24:]
window_sizes = range(1, 25)
scores = list()
for w in window_sizes:
# walk-forward validation
history = [x for x in train]
predictions = list()
for i in range(len(test)):
# make prediction
yhat = median(history[-w:])
predictions.append(yhat)
# observation
history.append(test[i])
# report performance
rmse = sqrt(mean_squared_error(test, predictions))
scores.append(rmse)
print('w=%d RMSE:%.3f' % (w, rmse))
# plot scores over window sizes values
pyplot.plot(window_sizes, scores)
pyplot.show()

Running the example again prints the window size and RMSE for each configuration.

w=1 RMSE:3947.200
w=2 RMSE:4350.413
w=3 RMSE:4818.406
w=4 RMSE:4993.473
w=5 RMSE:5212.887
w=6 RMSE:5002.830
w=7 RMSE:4958.621
w=8 RMSE:4817.664
w=9 RMSE:4932.317
w=10 RMSE:4928.661
w=11 RMSE:4885.574
w=12 RMSE:4414.139
w=13 RMSE:4204.665
w=14 RMSE:4172.579
w=15 RMSE:4382.037
w=16 RMSE:4522.304
w=17 RMSE:4494.803
w=18 RMSE:4360.445
w=19 RMSE:4232.285
w=20 RMSE:4346.389
w=21 RMSE:4465.536
w=22 RMSE:4514.596
w=23 RMSE:4428.739
w=24 RMSE:4236.126

A plot of the window size and RMSE is again created.

Here, we can see that best results were achieved with a window size of w=1 with an RMSE of 3947.200 monthly car sales, which was essentially a t-1 persistence model.

The results were generally worse than optimized persistence, but better than the expanding window model. We could imagine better results with a weighted combination of window observations, this idea leads to using linear models such as AR and ARIMA.

Line Plot of Rolling Window Size to RMSE for a Median Forecast on the Monthly Car Sales Dataset

Again, we can plot the predictions from the better model (mean rolling window with w=13) against the actual observations to get a feeling for how the forecast looks in context.

The complete example is listed below.

from sklearn.metrics import mean_squared_error
from matplotlib import pyplot
from numpy import mean
# prepare data
X = series.values
train, test = X[0:-24], X[-24:]
# walk-forward validation
history = [x for x in train]
predictions = list()
for i in range(len(test)):
# make prediction
yhat = mean(history[-13:])
predictions.append(yhat)
# observation
history.append(test[i])
# plot predictions vs observations
pyplot.plot(test)
pyplot.plot(predictions)
pyplot.show()

Running the code creates the line plot of observations (blue) compared to the predicted values (orange).

We can see that the model better follows the level of the data, but again does not follow the actual up and down movements.

Line Plot of Predicted Values vs Test Dataset for the Mean w=13 Rolling Window Model

You can see more examples of rolling window statistics in the post:

## Summary

In this tutorial, you discovered the importance of calculating the worst acceptable performance on a time series forecasting problem and methods that you can use to ensure you are not fooling yourself with more sophisticated methods.

Specifically, you learned:

• How to automatically test a suite of persistence configurations.
• How to evaluate an expanding window model.
• How to automatically test a suite of rolling window configurations.