Time series datasets may contain trends and seasonality, which may need to be removed prior to modeling.

Trends can result in a varying mean over time, whereas seasonality can result in a changing variance over time, both which define a time series as being non-stationary. Stationary datasets are those that have a stable mean and variance, and are in turn much easier to model.

Differencing is a popular and widely used data transform for making time series data stationary.

In this tutorial, you will discover how to apply the difference operation to your time series data with Python.

After completing this tutorial, you will know:

- The contrast between a stationary and non-stationary time series and how to make a series stationary with a difference transform.
- How to apply the difference transform to remove a linear trend from a series.
- How to apply the difference transform to remove a seasonal signal from a series.

Discover how to build models for multivariate and multi-step time series forecasting with LSTMs and more in my new book, with 25 step-by-step tutorials and full source code.

Let’s get started.

## Tutorial Overview

This tutorial is divided into 4 parts; they are:

- Stationarity
- Difference Transform
- Differencing to Remove Trends
- Differencing to Remove Seasonality

## Stationarity

Time series is different from more traditional classification and regression predictive modeling problems.

The temporal structure adds an order to the observations. This imposed order means that important assumptions about the consistency of those observations needs to be handled specifically.

For example, when modeling, there are assumptions that the summary statistics of observations are consistent. In time series terminology, we refer to this expectation as the time series being stationary.

These assumptions can be easily violated in time series by the addition of a trend, seasonality, and other time-dependent structures.

### Stationary Time Series

The observations in a stationary time series are not dependent on time.

Time series are stationary if they do not have trend or seasonal effects. Summary statistics calculated on the time series are consistent over time, like the mean or the variance of the observations.

When a time series is stationary, it can be easier to model. Statistical modeling methods assume or require the time series to be stationary.

### Non-Stationary Time Series

Observations from a non-stationary time series show seasonal effects, trends, and other structures that depend on the time index.

Summary statistics like the mean and variance do change over time, providing a drift in the concepts a model may try to capture.

Classical time series analysis and forecasting methods are concerned with making non-stationary time series data stationary by identifying and removing trends and removing stationary effects.

### Making Series Data Stationary

You can check if your time series is stationary by looking at a line plot of the series over time.

Sign of obvious trends, seasonality, or other systematic structures in the series are indicators of a non-stationary series.

A more accurate method would be to use a statistical test, such as the Dickey-Fuller test.

Should you make your time series stationary?

Generally, yes.

If you have clear trend and seasonality in your time series, then model these components, remove them from observations, then train models on the residuals.

If we fit a stationary model to data, we assume our data are a realization of a stationary process. So our first step in an analysis should be to check whether there is any evidence of a trend or seasonal effects and, if there is, remove them.

— Page 122, Introductory Time Series with R.

Statistical time series methods and even modern machine learning methods will benefit from the clearer signal in the data.

### Need help with Deep Learning for Time Series?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

## Difference Transform

Differencing is a method of transforming a time series dataset.

It can be used to remove the series dependence on time, so-called temporal dependence. This includes structures like trends and seasonality.

Differencing can help stabilize the mean of the time series by removing changes in the level of a time series, and so eliminating (or reducing) trend and seasonality.

— Page 215, Forecasting: principles and practice.

Differencing is performed by subtracting the previous observation from the current observation.

1 |
difference(t) = observation(t) - observation(t-1) |

Inverting the process is required when a prediction must be converted back into the original scale.

This process can be reversed by adding the observation at the prior time step to the difference value.

1 |
inverted(t) = differenced(t) + observation(t-1) |

In this way, a series of differences and inverted differences can be calculated.

### Lag Difference

Taking the difference between consecutive observations is called a lag-1 difference.

The lag difference can be adjusted to suit the specific temporal structure.

For time series with a seasonal component, the lag may be expected to be the period (width) of the seasonality.

### Difference Order

Some temporal structure may still exist after performing a differencing operation, such as in the case of a nonlinear trend.

As such, the process of differencing can be repeated more than once until all temporal dependence has been removed.

The number of times that differencing is performed is called the difference order.

### Calculating Differencing

We can difference the dataset manually.

This involves developing a new function that creates a differenced dataset. The function would loop through a provided series and calculate the differenced values at the specified interval or lag.

The function below named difference() implements this procedure.

1 2 3 4 5 6 7 |
# create a differenced series def difference(dataset, interval=1): diff = list() for i in range(interval, len(dataset)): value = dataset[i] - dataset[i - interval] diff.append(value) return Series(diff) |

We can see that the function is careful to begin the differenced dataset after the specified interval to ensure differenced values can, in fact, be calculated. A default interval or lag value of 1 is defined. This is a sensible default.

One further improvement would be to also be able to specify the order or number of times to perform the differencing operation.

The function below named inverse_difference() inverts the difference operation for a single forecast. It requires that the real observation value for the previous time step also be provided.

1 2 3 |
# invert differenced forecast def inverse_difference(last_ob, value): return value + last_ob |

## Differencing to Remove Trends

In this section, we will look at using the difference transform to remove a trend.

A trend makes a time series non-stationary by increasing the level. This has the effect of varying the mean time series value over time.

The example below applies the difference() function to a contrived dataset with a linearly increasing trend.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
# create a differenced series def difference(dataset, interval=1): diff = list() for i in range(interval, len(dataset)): value = dataset[i] - dataset[i - interval] diff.append(value) return diff # invert differenced forecast def inverse_difference(last_ob, value): return value + last_ob # define a dataset with a linear trend data = [i+1 for i in range(20)] print(data) # difference the dataset diff = difference(data) print(diff) # invert the difference inverted = [inverse_difference(data[i], diff[i]) for i in range(len(diff))] print(inverted) |

Running the example first prints the contrived sequence with a linear trend. Next, the differenced dataset is printed showing the increase by one unit each time step. The length of this sequence is 19 instead of 20 as the difference for the first value in the sequence cannot be calculated as there is no prior value.

Finally, the difference sequence is inverted using the prior values from the original sequence as the primer for each transform.

1 2 3 |
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] |

## Differencing to Remove Seasonality

In this section, we will look at using the difference transform to remove seasonality.

Seasonal variation, or seasonality, are cycles that repeat regularly over time.

A repeating pattern within each year is known as seasonal variation, although the term is applied more generally to repeating patterns within any fixed period.

— Page 6, Introductory Time Series with R.

There are many types of seasonality. Some obvious examples include; time of day, daily, weekly, monthly, annually, and so on. As such, identifying whether there is a seasonality component in your time series problem is subjective.

The simplest approach to determining if there is an aspect of seasonality is to plot and review your data, perhaps at different scales and with the addition of trend lines.

The example below applies the difference() function to a contrived seasonal dataset. The dataset includes two cycles of 360 units each.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
from math import sin from math import radians from matplotlib import pyplot # create a differenced series def difference(dataset, interval=1): diff = list() for i in range(interval, len(dataset)): value = dataset[i] - dataset[i - interval] diff.append(value) return diff # invert differenced forecast def inverse_difference(last_ob, value): return value + last_ob # define a dataset with a linear trend data = [sin(radians(i)) for i in range(360)] + [sin(radians(i)) for i in range(360)] pyplot.plot(data) pyplot.show() # difference the dataset diff = difference(data, 360) pyplot.plot(diff) pyplot.show() # invert the difference inverted = [inverse_difference(data[i], diff[i]) for i in range(len(diff))] pyplot.plot(inverted) pyplot.show() |

Running the example first creates and plots the dataset of two cycles of the 360 time step series.

Next, the difference transform is applied and the result is plotted. The plot shows 360 zero values with all seasonality signal removed.

In the de-trending example above, differencing was applied with a lag of 1, which means the first value was sacrificed. Here an entire cycle is used for differencing, that is 360 time steps. The result is that the entire first cycle is sacrificed in order to difference the second cycle.

Finally, the transform is inverted showing the second cycle with the seasonality restored.

## Further Reading

- Stationary process on Wikipedia
- Seasonal Adjustment on Wikipedia
- How to Check if Time Series Data is Stationary with Python
- How to Difference a Time Series Dataset with Python
- How to Identify and Remove Seasonality from Time Series Data with Python
- Seasonal Persistence Forecasting With Python

## Summary

In this tutorial, you discovered the distinction between stationary and non-stationary time series and how to use the difference transform to remove trends and seasonality with Python.

Specifically, you learned:

- The contrast between a stationary and non-stationary time series and how to make a series stationary with a difference transform.
- How to apply the difference transform to remove a linear trend from a series.
- How to apply the difference transform to remove a seasonal signal from a series.

Do you have any questions about making time series stationary?

Ask your questions in the comments and I will do my best to answer.

Excellent article, explains a lot. Just learned about standard deviations so this is a nice compliment to that.

I find it interesting how removing trends and seasonality resembles how some networks (such as convnet) drive towards transforming inputs into (translation) invariant outputs.

On that note, theres a tangentially related article that I thought you might enjoy – http://news.mit.edu/2012/brain-waves-encode-rules-for-behavior-1121

Earl Miller is doing some really fascinating work, and so are you.

Nice!

Incredibly helpful – thanks for this!

Thanks, I’m glad to hear that.

In time series if both trend and seasonality is present then first whether we remove trend and then seasonality or vice varsa to fit the model.

Remove seasonality then trend.

Why does it make a difference which one you remove first?

Because they interact linearly.

How I can remove seasonality without knowing the lag of period?

Why not just find out the structure of the seasonality?

Hi Jason!

First, my best wishes for the New Year and thank you again for the helpful post.

I have a question concerning predicting the first order difference and recovering the initial data.

I have a panel data with autocorrelated dependent variable y lagged with 12 time steps with the independent variables X.

After taking the first order difference, I forecast that new dependent variable based on the differentiated independent variables, let’s call these y’ and X’.

y’_{i, t+1} = y_{i, t+1} – y_{i, t}

X’_{i, t} = X_{i, t} – X_{i, t-1}

and

y’_{i, t} = f(X_{i, t-1})

I run my regression model and find an R-squared of 27%.

To recover the prediction of my initial dependent variable my operation is:

\hat{y}'{i, t} = y_{i, t-12} + \sum_{s=t-11}^{t}\hat{y}’_{i, s}

(because I am not supposed to know the real ys for months t-11 through t)

My R-squared drops to -8% while the R-squared for \sum_{s=t-11}^{t}\hat{y}’_{i, s} is around 30%.

I was wondering if you can detect something wrong in that reasoning or if it is something normal.

Also, is there a way to work with non-stationary data and adjust the residual errors in the end to take into account the autocorrelation. I have some other problems in which the dependent variable can be constant for several time steps (a risk measure based on daily observations calculated over a one year time window and the time step is one month).

Thank you!

You can invert the differenced prediction using real or predicted obs, depending on what is available.

Hi Jason,

How to detrend using a scientifically determined trend approximation.

In my case, I have a measure called: GHI, which in simple terms: the amount of sun rays ground receives from the Sun. Used in energy fields of studies especially solar energy.

So, a real GHI measured usings sensors, and a clear_GHI, estimated following The Ineichen and Perez clear sky model.

The trend is very obvious, as you can see here: https://pvlib-python.readthedocs.io/en/latest/clearsky.html at least in the location I’m interested in.

First thought, I calculated

ghi / clear_ghi

But this seems very bad, since, values close to zero, at sunrise and sunset, are so fragile to division, as those values might not be so correct.

I think of

clear_ghi – ghi

Thank you so much !

Perhaps you mean seasonal cycle instead of trend?

You can remove the seasonal cycle by seasonal differencing.

Many thanks Jason, your blog is amazing. I will give it a try

Thanks.

Hi Jason,

Thank you for your explination it was very useful.

My q is why we need to remove seasonality and trend..can you summarize the reasons in clear points ?

It makes the problem a lot simpler to model.

The trend and seasonality are the easy parts, so easy that we want to remove them now so we can focus on the hard parts of the problem.

Hi Jason,

Thanks for this great article. I have few doubts and I will try to put them here in words:

1) Most of the practical time series have seasonal patterns and a trend. For making this stationary we need to remove both the seasonality as well as trend. Is it correct?

2) Models like SARIMAX(Seasonal ARIMA) have a parameter ‘d’ for differencing and a seasonal parameter too. So does it mean that the the original time series data can be fed directly to this model and let the ‘d’ difference term remove trend and seasonality parameter take care of the seasonality factor.

3) It gets confusing here because at one point we say to remove both trend and seasonality, and then talk about SARIMAX model which can handle non-stationary data.

I hope you read my confused mind. Looking forward to hear from you.

Best Regards

Chandan

Yes, if the data has trend and seasonality, both should be removed before modeling with a linear algorithm.

Yes, no need to make the data stationary when using SARIMA, as you will specify how to de-trend and de-seasonalize the model as part of the config. You can make it stationary beforehand if you wish.

I hope that helps.

Thanks for replying Sir.

So if I understand this well, we usually remove trend and seasonality by difference(with lag=1 or lag=seasonality), log transforms etc. mainly for two purposes:

– By verifying if the residuals have no pattern and is stationary

– ACF and PACF do not show high variations

And then by looking at the ACF and PACF we choose parameters which we feed to original data series when using SARIMA.

The SARIMA can model the trend and seasonality directly.

I got you Sir. But SARIMA needs to be fed with parameters and I have seen your posts that you use ACF and PACF charts to deduce them.

You can also use grid search:

https://machinelearningmastery.com/how-to-grid-search-sarima-model-hyperparameters-for-time-series-forecasting-in-python/

Hi Jason,

First of all thank you so much for creating these posts!

I was also reading this post of yours (https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/) that uses SciKit’s pipelines, and was wondering if there is an estimator for differencing the series and in this case if it would be beneficial or even a good practice to use it in the pipeline.

Kind regards,

João

Good point.

I’ve not seen one, but it would be valuable!

Why do we always prefer stationary data to perform time series analysis?

what is that which refrains us to analyse non- stationary data?

A stationary time series is much simpler to model.

Jason,

Online searches show conflicting results and I would like to know your opinion. If we use MLP, is it necessary to remove trend and seasonality? If yes, then what is the point of actually using ANN, if simple ARIMA (or ARMA) can get the job done?

It is often a good idea and will make the dataset easier to model.

Try with/without for your data and model and compare the results.

Hi Jason,

Can you please help me to find out -How does one invert the differencing after the residual forecast has been made to get back to a forecast including the trend and seasonality that was differenced out? After differencing the original data set 1 time and completing the prediction, how do I invert or reverse the differencing so that I can get the predicted data without differences?

You can invert the difference by adding the removed value and propagating this down the series.

I have an example here you can use:

https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/

Hello sir, is differencing the same as sliding window? if not, have any link to learn it ?

No, sliding window is a way to frame a time series problem, differencing is a way to remove trend/seasonality from time series data.

I am not sure if my time series data is has seasonality or trend since it looks kinda like random noise. So I assume the data is stationary already. Does it hurt to perform differencing for stationary data before training, or should I leave differencing out?

Try modeling with an without differencing and compare model error.