How to Make Manual Predictions for ARIMA Models with Python

series = read_csv('daily-minimum-temperatures.csv', header=0, index_col=0)

model = ARIMA(history, order=(1,0,0))

ar_coef = model_fit.arparams

yhat = predict(ar_coef, history)

print('>predicted=%.3f, expected=%.3f' % (yhat, obs))

>predicted=9.738, expected=12.900

Note that the ARIMA implementation will automatically model a trend in the time series. This adds a constant to the regression equation that we do not need for demonstration purposes. We turn this convenience off by setting the ‘trend’ argument in the fit() function to the value ‘nc‘ for ‘no constant‘.

The fit() function also outputs a lot of verbose messages that we can turn off by setting the ‘disp‘ argument to ‘False‘.

Running the example prints the prediction and expected value each iteration for 7 days. The final RMSE is printed showing an average error of about 1.9 degrees Celsius for this simple model.

>predicted=12.563, expected=14.600

>predicted=14.219, expected=14.000

>predicted=13.635, expected=13.600

>predicted=13.245, expected=13.500

>predicted=13.148, expected=15.700

>predicted=15.292, expected=13.000

Test RMSE: 1.928

Experiment with AR models with different orders, such as 2 or more.

Moving Average Model

The moving average model, or MA, is a linear regression model of the lag residual errors.

An MA model with a lag of k can be specified in the ARIMA model as follows:

1	model = ARIMA(history, order=(0,0,k))

In this example, we will use a simple MA(1) for demonstration purposes.

Much like above, making a prediction requires that we retrieve the MA coefficients from the fit model and use them with the lag of residual error values and call the custom predict() function defined above.

The residual errors during training are stored in the ARIMA model under the ‘resid‘ parameter of the ARIMAResults object.

1	model_fit.resid

The complete example is listed below.

series = read_csv('daily-minimum-temperatures.csv', header=0, index_col=0)

model = ARIMA(history, order=(0,0,1))

ma_coef = model_fit.maparams

resid = model_fit.resid

yhat = predict(ma_coef, resid)

print('>predicted=%.3f, expected=%.3f' % (yhat, obs))

>predicted=4.610, expected=12.900

Running the example prints the predictions and expected values each iteration for 7 days and ends by summarizing the RMSE of all predictions.

The skill of the model is not great and you can use this as an opportunity to explore MA models with other orders and use them to make manual predictions.

>predicted=7.085, expected=14.600

>predicted=6.423, expected=14.000

>predicted=6.476, expected=13.600

>predicted=6.089, expected=13.500

>predicted=6.335, expected=15.700

>predicted=8.006, expected=13.000

Test RMSE: 7.568

You can see how it would be straightforward to keep track of the residual errors manually outside of the ARIMAResults object as new observations are made available. For example:

residuals = list()

...

error = expected - predicted

residuals.append(error)

Next, let’s put the AR and MA models together and see how we can perform manual predictions.

Autoregression Moving Average Model

We have now seen how we can make manual predictions for a fit AR and MA model.

These approaches can be put directly together to make manual predictions for a fuller ARMA model.

In this example, we will fit an ARMA(1,1) model that can be configured in an ARIMA model as ARIMA(1,0,1) with no differencing.

The complete example is listed below.

series = read_csv('daily-minimum-temperatures.csv', header=0, index_col=0)

model = ARIMA(history, order=(1,0,1))

ar_coef, ma_coef = model_fit.arparams, model_fit.maparams

resid = model_fit.resid

yhat = predict(ar_coef, history) + predict(ma_coef, resid)

print('>predicted=%.3f, expected=%.3f' % (yhat, obs))

>predicted=11.920, expected=12.900

You can see that the prediction (yhat) is the sum of the dot product of the AR coefficients and lag observations and the MA coefficients and lag residual errors.

1	yhat = predict(ar_coef, history) + predict(ma_coef, resid)

Again, running the example prints the predictions and expected values each iteration and the summary RMSE for all predictions made.

>predicted=12.309, expected=14.600

>predicted=13.293, expected=14.000

>predicted=13.549, expected=13.600

>predicted=13.504, expected=13.500

>predicted=13.434, expected=15.700

>predicted=14.401, expected=13.000

Test RMSE: 1.405

We can now add differencing and show how to make predictions for a complete ARIMA model.

Autoregression Integrated Moving Average Model

The I in ARIMA stands for integrated and refers to the differencing performed on the time series observations before predictions are made in the linear regression model.

When making manual predictions, we must perform this differencing of the dataset prior to calling the predict() function. Below is a function that implements differencing of the entire dataset.

def difference(dataset):

diff = list()

for i in range(1, len(dataset)):

value = dataset[i] - dataset[i - 1]

diff.append(value)

return numpy.array(diff)

A simplification would be to keep track of the observation at the oldest required lag value and use that to calculate the differenced series prior to prediction as needed.

This difference function can be called once for each difference required of the ARIMA model.

In this example, we will use a difference level of 1, and combine it with the ARMA example in the previous section to give us an ARIMA(1,1,1) model.

The complete example is listed below.

from pandas import read_csv
from matplotlib import pyplot
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
from math import sqrt
import numpy

def predict(coef, history):
	yhat = 0.0
	for i in range(1, len(coef)+1):
		yhat += coef[i-1] * history[-i]
	return yhat

def difference(dataset):
	diff = list()
	for i in range(1, len(dataset)):
		value = dataset[i] - dataset[i - 1]
		diff.append(value)
	return numpy.array(diff)

series = read_csv('daily-minimum-temperatures.csv', header=0, index_col=0)
X = series.values
size = len(X) - 7
train, test = X[0:size], X[size:]
history = [x for x in train]
predictions = list()
for t in range(len(test)):
	model = ARIMA(history, order=(1,1,1))
	model_fit = model.fit()
	ar_coef, ma_coef = model_fit.arparams, model_fit.maparams
	resid = model_fit.resid
	diff = difference(history)
	yhat = history[-1] + predict(ar_coef, diff) + predict(ma_coef, resid)
	predictions.append(yhat)
	obs = test[t]
	history.append(obs)
	print('>predicted=%.3f, expected=%.3f' % (yhat, obs))
rmse = sqrt(mean_squared_error(test, predictions))
print('Test RMSE: %.3f' % rmse)

import numpy

for i in range(1, len(dataset)):

def difference(dataset):

diff = list()

value = dataset[i] - dataset[i - 1]

diff.append(value)

return numpy.array(diff)

series = read_csv('daily-minimum-temperatures.csv', header=0, index_col=0)

model = ARIMA(history, order=(1,1,1))

ar_coef, ma_coef = model_fit.arparams, model_fit.maparams

resid = model_fit.resid

diff = difference(history)

yhat = history[-1] + predict(ar_coef, diff) + predict(ma_coef, resid)

print('>predicted=%.3f, expected=%.3f' % (yhat, obs))