Data transforms are intended to remove noise and improve the signal in time series forecasting.

It can be very difficult to select a good, or even best, transform for a given prediction problem. There are many transforms to choose from and each has a different mathematical intuition.

In this tutorial, you will discover how to explore different power-based transforms for time series forecasting with Python.

After completing this tutorial, you will know:

- How to identify when to use and how to explore a square root transform.
- How to identify when to use and explore a log transform and the expectations on raw data.
- How to use the Box-Cox transform to perform square root, log, and automatically discover the best power transform for your dataset.

Let’s get started.

## Airline Passengers Dataset

The Airline Passengers dataset describes a total number of airline passengers over time.

The units are a count of the number of airline passengers in thousands. There are 144 monthly observations from 1949 to 1960.

Learn more and download the dataset from Data Market.

Download the dataset to your current working directory with the filename “*airline-passengers.csv*“.

The example below loads the dataset and plots the data.

1 2 3 4 5 6 7 8 9 10 11 |
from pandas import Series from matplotlib import pyplot series = Series.from_csv('airline-passengers.csv', header=0) pyplot.figure(1) # line plot pyplot.subplot(211) pyplot.plot(series) # histogram pyplot.subplot(212) pyplot.hist(series) pyplot.show() |

Running the example creates two plots, the first showing the time series as a line plot and the second showing the observations as a histogram.

The dataset is non-stationary, meaning that the mean and the variance of the observations change over time. This makes it difficult to model by both classical statistical methods, like ARIMA, and more sophisticated machine learning methods, like neural networks.

This is caused by what appears to be both an increasing trend and a seasonality component.

In addition, the amount of change, or the variance, is increasing with time. This is clear when you look at the size of the seasonal component and notice that from one cycle to the next, the amplitude (from bottom to top of the cycle) is increasing.

In this tutorial, we will investigate transforms that we can use on time series datasets that exhibit this property.

### Stop learning Time Series Forecasting the *slow way*!

Take my free 7-day email course and discover data prep, modeling and more (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

## Square Root Transform

A time series that has a quadratic growth trend can be made linear by taking the square root.

Let’s demonstrate this with a quick contrived example.

Consider a series of the numbers 1 to 99 squared. The line plot of this series will show a quadratic growth trend and a histogram of the values will show an exponential distribution with a long trail.

The snippet of code below creates and graphs this series.

1 2 3 4 5 6 7 8 |
from matplotlib import pyplot series = [i**2 for i in range(1,100)] # line plot pyplot.plot(series) pyplot.show() # histogram pyplot.hist(series) pyplot.show() |

Running the example plots the series both as a line plot over time and a histogram of observations.

If you see a structure like this in your own time series, you may have a quadratic growth trend. This can be removed or made linear by taking the inverse operation of the squaring procedure, which is the square root.

Because the example is perfectly quadratic, we would expect the line plot of the transformed data to show a straight line. Because the source of the squared series is linear, we would expect the histogram to show a uniform distribution.

The example below performs a *sqrt()* transform on the time series and plots the result.

1 2 3 4 5 6 7 8 9 10 11 12 13 |
from matplotlib import pyplot from numpy import sqrt series = [i**2 for i in range(1,100)] # sqrt transform transform = series = sqrt(series) pyplot.figure(1) # line plot pyplot.subplot(211) pyplot.plot(transform) # histogram pyplot.subplot(212) pyplot.hist(transform) pyplot.show() |

We can see that, as expected, the quadratic trend was made linear.

It is possible that the Airline Passengers dataset shows a quadratic growth. If this is the case, then we could expect a square root transform to reduce the growth trend to be linear and change the distribution of observations to be perhaps nearly Gaussian.

The example below performs a square root of the dataset and plots the results.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
from pandas import Series from pandas import DataFrame from numpy import sqrt from matplotlib import pyplot series = Series.from_csv('airline-passengers.csv', header=0) dataframe = DataFrame(series.values) dataframe.columns = ['passengers'] dataframe['passengers'] = sqrt(dataframe['passengers']) pyplot.figure(1) # line plot pyplot.subplot(211) pyplot.plot(dataframe['passengers']) # histogram pyplot.subplot(212) pyplot.hist(dataframe['passengers']) pyplot.show() |

We can see that the trend was reduced, but was not removed.

The line plot still shows an increasing variance from cycle to cycle. The histogram still shows a long tail to the right of the distribution, suggesting an exponential or long-tail distribution.

## Log Transform

A class of more extreme trends are exponential, often graphed as a hockey stick.

Time series with an exponential distribution can be made linear by taking the logarithm of the values. This is called a log transform.

As with the square and square root case above, we can demonstrate this with a quick example.

The code below creates an exponential distribution by raising the numbers from 1 to 99 to the value *e*, which is the base of the natural logarithms or Euler’s number (2.718…).

1 2 3 4 5 6 7 8 9 10 11 |
from matplotlib import pyplot from math import exp series = [exp(i) for i in range(1,100)] pyplot.figure(1) # line plot pyplot.subplot(211) pyplot.plot(series) # histogram pyplot.subplot(212) pyplot.hist(series) pyplot.show() |

Running the example creates a line plot of the series and a histogram of the distribution of observations.

We see an extreme increase on the line graph and an equally extreme long tail distribution on the histogram.

Again, we can transform this series back to linear by taking the natural logarithm of the values.

This would make the series linear and the distribution uniform. The example below demonstrates this for completeness.

1 2 3 4 5 6 7 8 9 10 11 12 13 |
from matplotlib import pyplot from math import exp from numpy import log series = [exp(i) for i in range(1,100)] transform = log(series) pyplot.figure(1) # line plot pyplot.subplot(211) pyplot.plot(transform) # histogram pyplot.subplot(212) pyplot.hist(transform) pyplot.show() |

Running the example creates plots, showing the expected linear result.

Our Airline Passengers dataset has a distribution of this form, but perhaps not this extreme.

The example below demonstrates a log transform of the Airline Passengers dataset.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
from pandas import Series from pandas import DataFrame from numpy import log from matplotlib import pyplot series = Series.from_csv('airline-passengers.csv', header=0) dataframe = DataFrame(series.values) dataframe.columns = ['passengers'] dataframe['passengers'] = log(dataframe['passengers']) pyplot.figure(1) # line plot pyplot.subplot(211) pyplot.plot(dataframe['passengers']) # histogram pyplot.subplot(212) pyplot.hist(dataframe['passengers']) pyplot.show() |

Running the example results in a trend that does look a lot more linear than the square root transform above. The line plot shows a seemingly linear growth and variance.

The histogram also shows a more uniform or squashed Gaussian-like distribution of observations.

Log transforms are popular with time series data as they are effective at removing exponential variance.

It is important to note that this operation assumes values are positive and non-zero. It is common to transform observations by adding a fixed constant to ensure all input values meet this requirement. For example:

1 |
transform = log(constant + x) |

Where *transform* is the transformed series, *constant* is a fixed value that lifts all observations above zero, and *x* is the time series.

## Box-Cox Transform

The square root transform and log transform belong to a class of transforms called power transforms.

The Box-Cox transform is a configurable data transform method that supports both square root and log transform, as well as a suite of related transforms.

More than that, it can be configured to evaluate a suite of transforms automatically and select a best fit. It can be thought of as a power tool to iron out power-based change in your time series. The resulting series may be more linear and the resulting distribution more Gaussian or Uniform, depending on the underlying process that generated it.

The *scipy.stats* library provides an implementation of the Box-Cox transform. The boxcox() function takes an argument, called *lambda*, that controls the type of transform to perform.

Below are some common values for lambda

*lambda*= -1. is a reciprocal transform.*lambda*= -0.5 is a reciprocal square root transform.*lambda*= 0.0 is a log transform.*lambda*= 0.5 is a square root transform.*lambda*= 1.0 is no transform.

For example, we can perform a log transform using the *boxcox()* function as follows:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
from pandas import Series from pandas import DataFrame from scipy.stats import boxcox from matplotlib import pyplot series = Series.from_csv('airline-passengers.csv', header=0) dataframe = DataFrame(series.values) dataframe.columns = ['passengers'] dataframe['passengers'] = boxcox(dataframe['passengers'], lmbda=0.0) pyplot.figure(1) # line plot pyplot.subplot(211) pyplot.plot(dataframe['passengers']) # histogram pyplot.subplot(212) pyplot.hist(dataframe['passengers']) pyplot.show() |

Running the example reproduces the log transform from the previous section.

We can set the lambda parameter to None (the default) and let the function find a statistically tuned value.

The following example demonstrates this usage, returning both the transformed dataset and the chosen *lambda* value.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
from pandas import Series from pandas import DataFrame from scipy.stats import boxcox from matplotlib import pyplot series = Series.from_csv('airline-passengers.csv', header=0) dataframe = DataFrame(series.values) dataframe.columns = ['passengers'] dataframe['passengers'], lam = boxcox(dataframe['passengers']) print('Lambda: %f' % lam) pyplot.figure(1) # line plot pyplot.subplot(211) pyplot.plot(dataframe['passengers']) # histogram pyplot.subplot(212) pyplot.hist(dataframe['passengers']) pyplot.show() |

Running the example discovers the *lambda* value of 0.148023.

We can see that this is very close to a lambda value of 0.0, resulting in a log transform and stronger (less than) than 0.5 for the square root transform.

1 |
Lambda: 0.148023 |

The line and histogram plots are also very similar to those from the log transform.

## Summary

In this tutorial, you discovered how to identify when to use and how to use different power transforms on time series data with Python.

Specifically, you learned:

- How to identify a quadratic change and use the square root transform.
- How to identify an exponential change and how to use the log transform.
- How to use the Box-Cox transform to perform square root and log transforms and automatically optimize the transform for a dataset.

Do you have any questions about power transforms, or about this tutorial?

Ask your questions in the comments below and I will do my best to answer.

Thanks for this article. A lot of people write about log transformations and provide no explanation of when to not do a log transform, and Box-Cox fills in a lot of the gaps. Thanks!

A few follow up questions:

– Assuming the minimum value of a variable is > 0, are there situations in which you would not try and use Box-Cox? The scipy.stats.boxplot module will also decide when a transformation is unnecessary, so seems like there’s no harm in trying it for every variable. I’d imagine the main cost is loss of interpretability if you’re visualizing model results.

– In situations where you shouldn’t use a Box-Cox, what alternative transformations do you recommend?

I agree, try, evaluate and adopt if a transform lifts skill. Often model skill is the goal for a project.

A “Yeo-Johnson transformation” can be used as an alternative to box-cox:

https://en.wikipedia.org/wiki/Power_transform#Yeo-Johnson_transformation

Hi, thanks much for the tutorial.

When taking the square root of the airline, you say we expect the distribution to be Gaussian, while in the previous example with generated quadratic data, you expected a uniform distribution.

When will we expect one over the other ?

Thanks !

It really depends on the data.

Please, help!

after writing the first code line from the first example I have got the following error message:

>>> from pandas import Series

Traceback (most recent call last):

File “”, line 1, in

from pandas import Series

ImportError: No module named pandas

>>>

You need to install Pandas. This tutorial will help you:

https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/