Establishing a baseline is essential on any time series forecasting problem.

A baseline in performance gives you an idea of how well all other models will actually perform on your problem.

In this tutorial, you will discover how to develop a persistence forecast that you can use to calculate a baseline level of performance on a time series dataset with Python.

After completing this tutorial, you will know:

- The importance of calculating a baseline of performance on time series forecast problems.
- How to develop a persistence model from scratch in Python.
- How to evaluate the forecast from a persistence model and use it to establish a baseline in performance.

Let’s get started.

## Forecast Performance Baseline

A baseline in forecast performance provides a point of comparison.

It is a point of reference for all other modeling techniques on your problem. If a model achieves performance at or below the baseline, the technique should be fixed or abandoned.

The technique used to generate a forecast to calculate the baseline performance must be easy to implement and naive of problem-specific details.

Before you can establish a performance baseline on your forecast problem, you must develop a test harness. This is comprised of:

- The
**dataset**you intend to use to train and evaluate models. - The
**resampling**technique you intend to use to estimate the performance of the technique (e.g. train/test split). - The
**performance measure**you intend to use to evaluate forecasts (e.g. mean squared error).

Once prepared, you then need to select a naive technique that you can use to make a forecast and calculate the baseline performance.

The goal is to get a baseline performance on your time series forecast problem as quickly as possible so that you can get to work better understanding the dataset and developing more advanced models.

Three properties of a good technique for making a baseline forecast are:

**Simple**: A method that requires little or no training or intelligence.**Fast**: A method that is fast to implement and computationally trivial to make a prediction.**Repeatable**: A method that is deterministic, meaning that it produces an expected output given the same input.

A common algorithm used in establishing a baseline performance is the persistence algorithm.

### Stop learning Time Series Forecasting the *slow way*!

Take my free 7-day email course and discover data prep, modeling and more (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

## Persistence Algorithm (the “naive” forecast)

The most common baseline method for supervised machine learning is the Zero Rule algorithm.

This algorithm predicts the majority class in the case of classification, or the average outcome in the case of regression. This could be used for time series, but does not respect the serial correlation structure in time series datasets.

The equivalent technique for use with time series dataset is the persistence algorithm.

The persistence algorithm uses the value at the previous time step (t-1) to predict the expected outcome at the next time step (t+1).

This satisfies the three above conditions for a baseline forecast.

To make this concrete, we will look at how to develop a persistence model and use it to establish a baseline performance for a simple univariate time series problem. First, let’s review the Shampoo Sales dataset.

## Shampoo Sales Dataset

This dataset describes the monthly number of shampoo sales over a 3 year period.

The units are a sales count and there are 36 observations. The original dataset is credited to Makridakis, Wheelwright, and Hyndman (1998).

Below is a sample of the first 5 rows of data, including the header row.

1 2 3 4 5 6 |
"Month","Sales" "1-01",266.0 "1-02",145.9 "1-03",183.1 "1-04",119.3 "1-05",180.3 |

Below is a plot of the entire dataset taken from Data Market where you can download the dataset and learn more about it.

The dataset shows an increasing trend, and possibly some seasonal component.

Download the dataset and place it in the current working directory with the filename “*shampoo-sales.csv*“.

The following snippet of code will load the Shampoo Sales dataset and plot the time series.

1 2 3 4 5 6 7 8 9 10 |
from pandas import read_csv from pandas import datetime from matplotlib import pyplot def parser(x): return datetime.strptime('190'+x, '%Y-%m') series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) series.plot() pyplot.show() |

Running the example plots the time series, as follows:

## Persistence Algorithm

A persistence model can be implemented easily in Python.

We will break this section down into 4 steps:

- Transform the univariate dataset into a supervised learning problem.
- Establish the train and test datasets for the test harness.
- Define the persistence model.
- Make a forecast and establish a baseline performance.
- Review the complete example and plot the output.

Let’s dive in.

### Step 1: Define the Supervised Learning Problem

The first step is to load the dataset and create a lagged representation. That is, given the observation at t-1, predict the observation at t+1.

1 2 3 4 5 |
# Create lagged dataset values = DataFrame(series.values) dataframe = concat([values.shift(1), values], axis=1) dataframe.columns = ['t-1', 't+1'] print(dataframe.head(5)) |

This snippet creates the dataset and prints the first 5 rows of the new dataset.

We can see that the first row (index 0) will have to be discarded as there was no observation prior to the first observation to use to make the prediction.

From a supervised learning perspective, the t-1 column is the input variable, or X, and the t+1 column is the output variable, or y.

1 2 3 4 5 6 |
t-1 t+1 0 NaN 266.0 1 266.0 145.9 2 145.9 183.1 3 183.1 119.3 4 119.3 180.3 |

### Step 2: Train and Test Sets

The next step is to separate the dataset into train and test sets.

We will keep the first 66% of the observations for “training” and the remaining 34% for evaluation. During the split, we are careful to exclude the first row of data with the NaN value.

No training is required in this case; it’s just habit. Each of the train and test sets are then split into the input and output variables.

1 2 3 4 5 6 |
# split into train and test sets X = dataframe.values train_size = int(len(X) * 0.66) train, test = X[1:train_size], X[train_size:] train_X, train_y = train[:,0], train[:,1] test_X, test_y = test[:,0], test[:,1] |

### Step 3: Persistence Algorithm

We can define our persistence model as a function that returns the value provided as input.

For example, if the t-1 value of 266.0 was provided, then this is returned as the prediction, whereas the actual real or expected value happens to be 145.9 (taken from the first usable row in our lagged dataset).

1 2 3 |
# persistence model def model_persistence(x): return x |

### Step 4: Make and Evaluate Forecast

Now we can evaluate this model on the test dataset.

We do this using the walk-forward validation method.

No model training or retraining is required, so in essence, we step through the test dataset time step by time step and get predictions.

Once predictions are made for each time step in the training dataset, they are compared to the expected values and a Mean Squared Error (MSE) score is calculated.

1 2 3 4 5 6 7 |
# walk-forward validation predictions = list() for x in test_X: yhat = model_persistence(x) predictions.append(yhat) test_score = mean_squared_error(test_y, predictions) print('Test MSE: %.3f' % test_score) |

In this case, the error is more than 17,730 over the test dataset.

1 |
Test MSE: 17730.518 |

### Step 5: Complete Example

Finally, a plot is made to show the training dataset and the diverging predictions from the expected values from the test dataset.

From the plot of the persistence model predictions, it is clear that the model is 1-step behind reality. There is a rising trend and month-to-month noise in the sales figures, which highlights the limitations of the persistence technique.

The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
from pandas import read_csv from pandas import datetime from pandas import DataFrame from pandas import concat from matplotlib import pyplot from sklearn.metrics import mean_squared_error def parser(x): return datetime.strptime('190'+x, '%Y-%m') series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) # Create lagged dataset values = DataFrame(series.values) dataframe = concat([values.shift(1), values], axis=1) dataframe.columns = ['t-1', 't+1'] print(dataframe.head(5)) # split into train and test sets X = dataframe.values train_size = int(len(X) * 0.66) train, test = X[1:train_size], X[train_size:] train_X, train_y = train[:,0], train[:,1] test_X, test_y = test[:,0], test[:,1] # persistence model def model_persistence(x): return x # walk-forward validation predictions = list() for x in test_X: yhat = model_persistence(x) predictions.append(yhat) test_score = mean_squared_error(test_y, predictions) print('Test MSE: %.3f' % test_score) # plot predictions and expected results pyplot.plot(train_y) pyplot.plot([None for i in train_y] + [x for x in test_y]) pyplot.plot([None for i in train_y] + [x for x in predictions]) pyplot.show() |

We have seen an example of the persistence model developed from scratch for the Shampoo Sales problem.

The persistence algorithm is naive. It is often called the *naive forecast*.

It assumes nothing about the specifics of the time series problem to which it is applied. This is what makes it so easy to understand and so quick to implement and evaluate.

As a machine learning practitioner, it can also spark a large number of improvements.

Write them down.

This is useful because these ideas can become input features in a feature engineering effort or simple models that may be combined in an ensembling effort later.

## Summary

In this tutorial, you discovered how to establish a baseline performance on time series forecast problems with Python.

Specifically, you learned:

- The importance of establishing a baseline and the persistence algorithm that you can use.
- How to implement the persistence algorithm in Python from scratch.
- How to evaluate the forecasts of the persistence algorithm and use them as a baseline.

Do you have any questions about baseline performance, or about this tutorial?

Ask your questions in the comments below and I will do my best to answer.

Great post. However, I think instead of (t-1) and (t+1), it should be (t) and (t+1). The former indicates a lag of 2 time steps; the persistence model only requires a 1 step look-back.

That would be clearer, thanks Kevin.

Any suggestion how to implement this?

Is it only a question of column declarations?

dataframe.columns = [‘t-1’, ‘t+1’] vs. dataframe.columns = [‘t’, ‘t+1’]

…or do I have to change some more code logic?

Great post!

Thanks Ansh.

Creating a forecast for a baseline. The hard part of baselines is, of course, the future. How good are economists or meteorologists at predicting the stock market or weather? Uncertainty is an unavoidable part of this part of the work.

Yes, not sure I follow. Perhaps you could restate your point?

Hello, I have a question.

Quote: “If a model achieves performance at or below the baseline,

..the technique should be fixed or abandoned”

Where is the baseline in the plot?

What does it mean at or below the baseline in regard of the example plots?

What does the red and the green line describe?

I assume:

blue line = training data

green line = test data

red line = prediction

Am I right?

I’m not afraid to ask stupid questions. I’ve studied a lot and know no one knows everything :-).

Here, baseline is the model you have chosen to be the baseline. E.g. a persistence forecast.

So then I have to compare the MSE-values of the baseline model and a chosen model

to decide whether my chosen model is the right model to predict something.

The performance of the baseline model is:

Test Score: 17730.518 MSE

I have feeded the shampoo-data into your Multilayer Perceptron-example.

Test A)

The performance of the Multilayer Perceptron Model on the shampoo-data is:

Train Score: 6623.57 MSE

Test Score: 19589.78 MSE

Test Score mlp model: 19589.78 MSE > Test Score baseline model: 17730.518 MSE

Conclusion:

The choosen mlp model predicts something, because its MSE is higher then

the MSE of the baseline model while using the same raw data.

‘There is some significance’.

Is this right so far?

Actually I would expect I higher error rate as bad sign.

Test B)

Airline LSTM Example (feeded with shampoo data):

testScoreMyTest = mean_squared_error(testY[0], testPredict[:,0])

print(‘testScoreMyTest: %.2f MSE’ % (testScoreMyTest))

Test Score Airline LSTM: 20288.20 MSE > Test Score baseline model: 17730.518 MSE

Conclusion:

Airline LSTM Example predicts something on shampoo data.

Here I have a problem. I wanted to see if user Wollner is right.

http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/#comment-383708

Does the baseline test shows that he isn’t right???

I also try it with RMSE

testScore = math.sqrt(mean_squared_error(test_y, predictions))

print(‘Test Score: %.2f RMSE’ % (testScore))

The smaller the value of the RMSE, the better is the predictive accuracy of the model.

http://docs.aws.amazon.com/machine-learning/latest/dg/regression-model-insights.html

The performance of the baseline model is:

testScore = math.sqrt(mean_squared_error(test_y, predictions))

print(‘Test Score: %.2f RMSE’ % (testScore))

Test Score: 133.16 RMSE

I have feeded the shampoo-data into your Multilayer Perceptron-example.

Test A)

The performance of the Multilayer Perceptron Model on the shampoo-data is:

Test Score: 139.96 RMSE > 133.16 RMSE

Test B)

Airline LSTM Example (feeded with shampoo data):

Test Score: 142.43 RMSE > 133.16 RMSE

Conclusion:

Actually in regard to the Amazon documentation, I would say both models

perform bad compared to the baseline model and therefore they

are NOT professionally qualified to solve the ‘shampoo problem’.

Do I have a misconception here?

The idea is to compare the performance of the baseline model to all other models that you evaluate on your problem.

Regarding MSE, the goal is to minimize the error, so smaller values are better.

I try to use labels in the plot according to http://matplotlib.org/users/legend_guide.html

Example:

pyplot.plot(train_y, label=’Training Data’)

But every time I have to go to options in the plot window and check auto generate labels to view my declarations.

Is this behavior normal?

I am not an expert in matplotlib sorry, but you can style everything about your plots programmatically.