Time series decomposition involves thinking of a series as a combination of level, trend, seasonality, and noise components.

Decomposition provides a useful abstract model for thinking about time series generally and for better understanding problems during time series analysis and forecasting.

In this tutorial, you will discover time series decomposition and how to automatically split a time series into its components with Python.

After completing this tutorial, you will know:

- The time series decomposition method of analysis and how it can help with forecasting.
- How to automatically decompose time series data in Python.
- How to decompose additive and multiplicative time series problems and plot the results.

Let’s get started.

## Time Series Components

A useful abstraction for selecting forecasting methods is to break a time series down into systematic and unsystematic components.

**Systematic**: Components of the time series that have consistency or recurrence and can be described and modeled.**Non-Systematic**: Components of the time series that cannot be directly modeled.

A given time series is thought to consist of three systematic components including level, trend, seasonality, and one non-systematic component called noise.

These components are defined as follows:

**Level**: The average value in the series.**Trend**: The increasing or decreasing value in the series.**Seasonality**: The repeating short-term cycle in the series.**Noise**: The random variation in the series.

### Stop learning Time Series Forecasting the *slow way*!

Take my free 7-day email course and discover how to get started (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

## Combining Time Series Components

A series is thought to be an aggregate or combination of these four components.

All series have a level and noise. The trend and seasonality components are optional.

It is helpful to think of the components as combining either additively or multiplicatively.

### Additive Model

An additive model suggests that the components are added together as follows:

1 |
y(t) = Level + Trend + Seasonality + Noise |

An additive model is linear where changes over time are consistently made by the same amount.

A linear trend is a straight line.

A linear seasonality has the same frequency (width of cycles) and amplitude (height of cycles).

### Multiplicative Model

A multiplicative model suggests that the components are multiplied together as follows:

1 |
y(t) = Level * Trend * Seasonality * Noise |

A multiplicative model is nonlinear, such as quadratic or exponential. Changes increase or decrease over time.

A nonlinear trend is a curved line.

A non-linear seasonality has an increasing or decreasing frequency and/or amplitude over time.

## Decomposition as a Tool

This is a useful abstraction.

Decomposition is primarily used for time series analysis, and as an analysis tool it can be used to inform forecasting models on your problem.

It provides a structured way of thinking about a time series forecasting problem, both generally in terms of modeling complexity and specifically in terms of how to best capture each of these components in a given model.

Each of these components are something you may need to think about and address during data preparation, model selection, and model tuning. You may address it explicitly in terms of modeling the trend and subtracting it from your data, or implicitly by providing enough history for an algorithm to model a trend if it may exist.

You may or may not be able to cleanly or perfectly break down your specific time series as an additive or multiplicative model.

Real-world problems are messy and noisy. There may be additive and multiplicative components. There may be an increasing trend followed by a decreasing trend. There may be non-repeating cycles mixed in with the repeating seasonality components.

Nevertheless, these abstract models provide a simple framework that you can use to analyze your data and explore ways to think about and forecast your problem.

## Automatic Time Series Decomposition

There are methods to automatically decompose a time series.

The statsmodels library provides an implementation of the naive, or classical, decomposition method in a function called seasonal_decompose(). It requires that you specify whether the model is additive or multiplicative.

Both will produce a result and you must be careful to be critical when interpreting the result. A review of a plot of the time series and some summary statistics can often be a good start to get an idea of whether your time series problem looks additive or multiplicative.

The *seasonal_decompose()* function returns a result object. The result object contains arrays to access four pieces of data from the decomposition.

For example, the snippet below shows how to decompose a series into trend, seasonal, and residual components assuming an additive model.

The result object provides access to the trend and seasonal series as arrays. It also provides access to the residuals, which are the time series after the trend, and seasonal components are removed. Finally, the original or observed data is also stored.

1 2 3 4 5 6 7 |
from statsmodels.tsa.seasonal import seasonal_decompose series = ... result = seasonal_decompose(series, model='additive') print(result.trend) print(result.seasonal) print(result.resid) print(result.observed) |

These four time series can be plotted directly from the result object by calling the *plot()* function. For example:

1 2 3 4 5 6 |
from statsmodels.tsa.seasonal import seasonal_decompose from matplotlib import pyplot series = ... result = seasonal_decompose(series, model='additive') result.plot() pyplot.show() |

Let’s look at some examples.

## Additive Decomposition

We can create a time series comprised of a linearly increasing trend from 1 to 99 and some random noise and decompose it as an additive model.

Because the time series was contrived and was provided as an array of numbers, we must specify the frequency of the observations (the *freq=1* argument). If a Pandas Series object is provided, this argument is not required.

1 2 3 4 5 6 7 8 |
from random import randrange from pandas import Series from matplotlib import pyplot from statsmodels.tsa.seasonal import seasonal_decompose series = [i+randrange(10) for i in range(1,100)] result = seasonal_decompose(series, model='additive', freq=1) result.plot() pyplot.show() |

Running the example creates the series, performs the decomposition, and plots the 4 resulting series.

We can see that the entire series was taken as the trend component and that there was no seasonality.

We can also see that the residual plot shows zero. This is a good example where the naive, or classical, decomposition was not able to separate the noise that we added from the linear trend.

The naive decomposition method is a simple one, and there are more advanced decompositions available, like Seasonal and Trend decomposition using Loess or STL decomposition.

Caution and healthy skepticism is needed when using automated decomposition methods.

## Multiplicative Decomposition

We can contrive a quadratic time series as a square of the time step from 1 to 99, and then decompose it assuming a multiplicative model.

1 2 3 4 5 6 7 |
from pandas import Series from matplotlib import pyplot from statsmodels.tsa.seasonal import seasonal_decompose series = [i**2.0 for i in range(1,100)] result = seasonal_decompose(series, model='multiplicative', freq=1) result.plot() pyplot.show() |

Running the example, we can see that, as in the additive case, the trend is easily extracted and wholly characterizes the time series.

Exponential changes can be made linear by data transforms. In this case, a quadratic trend can be made linear by taking the square root. An exponential growth in seasonality may be made linear by taking the natural logarithm.

Again, it is important to treat decomposition as a potentially useful analysis tool, but to consider exploring the many different ways it could be applied for your problem, such as on data after it has been transformed or on residual model errors.

Let’s look at a real world dataset.

## Airline Passengers Dataset

The Airline Passengers dataset describes the total number of airline passengers over a period of time.

The units are a count of the number of airline passengers in thousands. There are 144 monthly observations from 1949 to 1960.

Learn more and download the dataset from Data Market.

Download the dataset to your current working directory with the filename “*airline-passengers.csv*“.

First, let’s graph the raw observations.

1 2 3 4 5 |
from pandas import Series from matplotlib import pyplot series = Series.from_csv('airline-passengers.csv', header=0) series.plot() pyplot.show() |

Reviewing the line plot, it suggests that there may be a linear trend, but it is hard to be sure from eye-balling. There is also seasonality, but the amplitude (height) of the cycles appears to be increasing, suggesting that it is multiplicative.

We will assume a multiplicative model.

The example below decomposes the airline passengers dataset as a multiplicative model.

1 2 3 4 5 6 7 |
from pandas import Series from matplotlib import pyplot from statsmodels.tsa.seasonal import seasonal_decompose series = Series.from_csv('airline-passengers.csv', header=0) result = seasonal_decompose(series, model='multiplicative') result.plot() pyplot.show() |

Running the example plots the observed, trend, seasonal, and residual time series.

We can see that the trend and seasonality information extracted from the series does seem reasonable. The residuals are also interesting, showing periods of high variability in the early and later years of the series.

## Further Reading

This section lists some resources for further reading on time series decomposition.

- Section 2.2 Time Series Components, Practical Time Series Forecasting with R: A Hands-On Guide.
- Section 6.3, Classical Decomposition, Forecasting: principles and practice

## Summary

In this tutorial, you discovered time series decomposition and how to decompose time series data with Python.

Specifically, you learned:

- The structure of decomposing time series into level, trend, seasonality, and noise.
- How to automatically decompose a time series dataset with Python.
- How to decompose an additive or multiplicative model and plot the results.

Do you have any questions about time series decomposition, or about this tutorial?

Ask your questions in the comments below and I will do my best to answer.

Hi Jason, great post.

Maybe you’ll be able to help me, I’m having some trouble with the statsmodels library. When I try to run your last example, I get this AttributeError:

“AttributeError: ‘Index’ object has no attribute ‘inferred_freq'”

I checked the dataset that was exported from DataMarket and realized that the last was this one:

“International airline passengers: monthly totals in thousands. Jan 49 ? Dec 60”

I removed it and then now a TypeError:

“TypeError: ‘numpy.float64’ object cannot be interpreted as an index”

I already tried to use this statsmodels library before, got this same error several times, gave up and started using R for that. I even asked this question o stackoverflow:

http://stackoverflow.com/questions/41730036/typeerror-on-convolution-filter-call-from-statsmodels/41747712#41747712

and i seems that’s a compatibility issue between numpy 1.12.0 and StatsModels 0.6.1.

Having said that, what are the versions of those libraries that you’re using. Did you go through this same problem and manage to solve any way?

Thanks!

I have not seen this error Álvaro, sorry.

Here are the versions of libraries I am currently using:

I have also tested statsmodels 0.8.0rc1 and it works fine.

I have tested the code on Python 2 and Python 3.

Hi Alvaro,

maybe a bit late but it hope it helps. IMHO you are getting this error as you are feeding to seasonal_decompose() a pandas Series. If so, be sure to have a datetime type in your index or it will crash. You can make a turnaoround of this behavior just by passing the Series values to a np.array() and specifying the frequency manually,

Cheers!

Faced the same problem as Alvaro. Deleted the last line in the csv file and it worked fine.

Glad to hear it.

Hi Jason and Álvaro,

Thanks Jason for the detailed description step by step. Highly appreciated!

I got the same problem using notebook, Jupiter. Please let me know if you figure out the problem.

thanks!

Jason and Álvaro,

I did the followings and although I still receive an VisibleDeprecationWarning (using a non-integer number instead of an integer will result in an error in the future return np.r_[[np.nan] * head, x, [np.nan] * tail]),

I got the plots for time series components.

– I removed the last line in the CSV file: “International airline passengers: monthly totals in thousands. Jan 49 ? Dec 60”

– I read the file as dataframe:

time_series = pd.read_csv(‘~/international-airline-passengers.csv’, header=0)

– I changed the Month column type to datetime:

time_series.Month = pd.to_datetime(time_series.Month, errors=’coerce’)

– Set the dataframe index as:

time_series = time_series.set_index(‘Month’)

– The rest is the same:

result = seasonal_decompose(time_series, model=’multiplicative’)

result.plot()

pyplot.show()

Please let me know if you can find an easier way, or know how to read the file as a series and do the job.

Regards,

The footer data absolutely must be deleted. Not special formatting of the file should be required after that.

Does it work if you run the example outside of a notebook, e.g. from the command line?

To Álvaro

I recommend the installation of statsmodel by whl of v.8 because the module of v.6 will be installed automatically by pip.

The error was improved though it contained some other bugs.

I had the same problem and it worked for me. Thank you!

I had the same error that everyone else had. I deleted the last line and then re-ran the code with no issue.

Glad to hear it Daniel.

I had the same issue and deleted the last line of the data, code ran well.

Thanks for the note Jessica.

Hey Jason,

Thanks for copmlete explanation. Maybe stupid question but how do you explain the random % in the model. Let’s say it has random 49% in multiplicative decomposition.

A time series trace may be thought to comprise of signal and random component. We can’t model the random component, at best we can measure it and factor it into our confidence intervals.

Hello. Everything works OK when I test this dataset, but when I try to run it using data with daily temperature (from your other lesson: http://machinelearningmastery.com/time-series-seasonality-with-python/), I get the error: “ValueError: You must specify a freq or x must be a pandas object with a timeseries index witha freq not set to None”. I don’t understand why some data are not concerned as pandas object.

You must load the data as a Pandas Series or specify the frequency (e.g. 1).

Yes, I understand that. The problem is that I use exactly the same piece of code for both files (data are loaded as pandas series). I want to avoid specyfing the frequency explicitly, because I would like to adapt this code to my own data, whre this freqency is unknown. Are there any specific requirements as regards CSV structure? I noticed that the file from this lesson has values for every month, whereas the other file mentioned by me has values given daily but I don’t believe it may be a reason for errors.

There are no special requirements. You may need to specify a custom function to load the date/time column, depending on whether Pandas can figure it out or not.

Using freq = 1 is what is causing the Seasonal and Residual plots to flat-line. It specifies using a moving average convolution of length 1, i.e. average over a single point.

Try setting freq = 12 (for presumed monthly data) or make the input a pandas Series and set its index to a suitably contrived DatetimeIndex.

Thanks John.

Hi,

Nice example, it’s very helpful.

Simple query: When I am using quarterly data-sets I loose first-2 and last-2 quarters of data in (seasonally) adjusted series. Similarly for monthly I loose first-6 and last-6 months of data?

I would like to have adjusted series which is up to current period. Is there a way to get entire adjusted data series?

Thanks.

Hi,

Is there a way to detect anomalous trends in time series using machine learning

I would recommend looking for papers on the topic on scholar.google.com to see what methods are state of the art.

Hello, your texts are very interesting and useful. But in this text, I have a question. The example with a one column of data in dataset works well, but what I need to do to use the same code column by column in a multiple column dataset? Thank you!

You will need to change the code to work for each column.

I’m using this code:

series = read_csv(file_path, header=0, index_col=0)

result = seasonal_decompose(series[‘Column1′], model=’multiplicative’, freq=12)

series_deseasonality = series / (result.seasonal*result.trend)

series_deseasonality.plot()

pyplot.show()

but it returns this error “AttributeError: ‘Index’ object has no attribute ‘inferred_freq'”.

Can you help me please?

series = df[‘Column 1′]

result = seasonal_decompose(series.values, freq=12, model=’multiplicative’)

Is there a way to find those signals that are periodic? I have a bunch of timeseries data (timestamp, uuid). I want to find those uuids that are seasonal. Expected output must be a set of uuids and their respective timestamps.

To notch it up a bit, given a large set of datapoints, is there a way to find all seasonal data with different seasonality?

Yes, usually a plot will make them obvious.

You can also fit a polynomial to the series, then subtract it.

I do not know the frequency of the given data. The timestamps are in datetime format – like this – 2017-01-29 07:17:10. Their frequency could be hourly, daily, weekly…or some other frequency.

It does not matter, as long as it is consistent.

Hi, Jason, very clear and helpful article.

I noticed that in the Airline Passengers example, seasonal and reside data ,first several values and last several values are lost. Is this a flaw of the algorithm?

if it is , any suggestion to fix this? cause usually people want to detect the most recent abnormal data instead of the historical ones.

I was not aware, are you able to confirm the difference in the number of obs?

Hello Jason thank you for your post.

I wonder if there is a way to identify the months where it peaks the seasonal component.

Thank you

What do you mean exactly Gus? Can you give an example?

the number of obs is the same, but first and last three values are nan in that example.

dont know how to upload a pic here, so I just post the print out values below:

the lenth of observed, trend, seasonal and residual results are 144 144 144 144

the first five values of observed part are Month

1949-01-01 112

1949-02-01 118

1949-03-01 132

1949-04-01 129

1949-05-01 121

Name: number, dtype: int64

the first five values of trend part are Month

1949-01-01 NaN

1949-02-01 NaN

1949-03-01 NaN

1949-04-01 127.857143

1949-05-01 133.000000

Name: number, dtype: float64

the first five values of seasonal part are Month

1949-01-01 1.002765

1949-02-01 1.004550

1949-03-01 0.994257

1949-04-01 0.999830

1949-05-01 0.986458

Name: number, dtype: float64

the first five values of residual part are Month

1949-01-01 NaN

1949-02-01 NaN

1949-03-01 NaN

1949-04-01 1.009110

1949-05-01 0.922264

Name: number, dtype: float64

It may be a result of creating a rolling average or similar in the method.

Agree. I dug into the source code and the trend part is calculated by convolution method, so probably it is caused by rolling average.

So in my case, I can’t call this function directly cause I want to detect abnormality from residual part but its last values are nan. No one care a abnormality happened three days ago. Have to build my own function to achieve the goal.

Just curious in what situation people care about the historical abnormality. If not, this function is really awkward…

Nice work.

Perhaps try to roll your own version of the function where you have more control?

Or model the trend/seasonality explicitly.

I tried stl function provided by R, and it worked well, no missing values at the beginning or end. Generally summarize my experience here, hope it helps people who are interested:

there are three common ways to decompose time series components:

1. use seasonal_decompose method provided by statsmodels.

In this case, one problem as far as I know is the first and last values of trend and residual are nan. People who care the most recent abnormality should be careful about this.

2. use stl function provided by R.

3. like Jason said, you can also choose to build your own version of the statsmodels function so you have more control

Nice work, thanks for sharing.

Hi Jason,

Thank you for the post.

In my data, I have weekly and yearly (possibly) seasonality. I have totally 16 months of daily data. How can I get these two seasonalities from seasonal_decompose method? Or, should I use some other method?

Thank you so much,

Pushpa

Perhaps you can model the different seasonal components and the subtract them to see if the series becomes stationary.

Here are some methods to try:

https://machinelearningmastery.com/time-series-seasonality-with-python/

This is very useful analysis however there is a catch. In order to analyze seasonal changes of stock you need to specify decomposition frequency to an entire year as seasons repeat every year not every day. So if you put freq=252 (252 trading days in one year) you should be able to extract seasonal effects.

Thanks for the note.

How to use the decomposition method described here to predict the future?

I’m not sure you would.

For time series forecasting, I would recommend that you start here:

https://machinelearningmastery.com/start-here/#timeseries

So decomposition mainly used as an EDA tool instead of a forecasting method.

Yes, or a data preparation step prior to modeling.

Jason,

I used seasonal_decompose to get trend line. But somehow this method doesn’t exclude outliers from computing the trend. Is there a way to remove outliers automatically when creating the trend line?

Here is my graph:

https://raw.githubusercontent.com/taihds/test/797f43785eaf5c7124cffe7b58c7d8f2ef2afba0/time_series.png

outliers happen around May 20 in the graph.

Perhaps try fitting a linear regression model using an approach robust to outliers, for example Huber regression (from memory).

Hi Jason, thank you for your post.

What would you suggest for multivariate time series?

I have a time series dataset with 9 dependent variables, and one binary label. The time series consist of minutely based TS for a period of 3 months.

Should we perform the decomposition on each variable separately or what?

Thank you

Perhaps operate on each series separately as a first step?

Hi Jason Brownlee

I just want to ask you if there are alternative to get seasonal decompose result as data frame?

That mean to have trend, seasonal, resid ….

Just want to ask you if I can convert this object as dataFrame ?

The results are numpy arrays I believe. You can convert them to Pandas Dataframes directly via passing them to the DataFrame constructor.

Hi Jason,

Thanks for your posts i have been reading since i started working in data science, i have a question how we should a time series which does not have trend and seasonality.

Perhaps you can model it directly.

It comes to a TypeError“ PeriodIndex given. Check the

`freq`

attribute instead of using infer_freq ‘’ and i dont figure it out,could you please help me out?here are my data:

SLF MLF M0 M1 M2

t

2017Q3 967.74 129200.0 204428.570 1.546462e+06 4929815.300

2017Q4 1717.97 133545.0 207499.450 1.605332e+06 5000216.100

2018Q1 938.10 143265.0 228753.160 1.583823e+06 5189744.060

2018Q2 1188.50 124545.0 210851.268 1.595624e+06 5250947.522

2018Q3 0.00 0.0 0.000 0.000000e+00 0.000

firstly,i set variable ‘t’ as my index.

then select the time range.

thirdly,i converted monthly data into quartly data.

Sorry, I don’t see what is going on. Perhaps post your code and your error to stackoverflow?