A Gentle Introduction to SARIMA for Time Series Forecasting in Python

Autoregressive Integrated Moving Average, or ARIMA, is one of the most widely used forecasting methods for univariate time series data forecasting.

Although the method can handle data with a trend, it does not support time series with a seasonal component.

An extension to ARIMA that supports the direct modeling of the seasonal component of the series is called SARIMA.

In this tutorial, you will discover the Seasonal Autoregressive Integrated Moving Average, or SARIMA, method for time series forecasting with univariate data containing trends and seasonality.

After completing this tutorial, you will know:

  • The limitations of ARIMA when it comes to seasonal data.
  • The SARIMA extension of ARIMA that explicitly models the seasonal element in univariate data.
  • How to implement the SARIMA method in Python using the Statsmodels library.

Let’s get started.

Update:  For help using and grid searching SARIMA hyperparameters, see this post:

A Gentle Introduction to SARIMA for Time Series Forecasting in Python

A Gentle Introduction to SARIMA for Time Series Forecasting in Python
Photo by Mario Micklisch, some rights reserved.

Tutorial Overview

This tutorial is divided into four parts; they are:

  1. What’s Wrong with ARIMA
  2. What Is SARIMA?
  3. How to Configure SARIMA
  4. How to use SARIMA in Python

What’s Wrong with ARIMA

Autoregressive Integrated Moving Average, or ARIMA, is a forecasting method for univariate time series data.

As its name suggests, it supports both an autoregressive and moving average elements. The integrated element refers to differencing allowing the method to support time series data with a trend.

A problem with ARIMA is that it does not support seasonal data. That is a time series with a repeating cycle.

ARIMA expects data that is either not seasonal or has the seasonal component removed, e.g. seasonally adjusted via methods such as seasonal differencing.

For more on ARIMA, see the post:

An alternative is to use SARIMA.

What is SARIMA?

Seasonal Autoregressive Integrated Moving Average, SARIMA or Seasonal ARIMA, is an extension of ARIMA that explicitly supports univariate time series data with a seasonal component.

It adds three new hyperparameters to specify the autoregression (AR), differencing (I) and moving average (MA) for the seasonal component of the series, as well as an additional parameter for the period of the seasonality.

A seasonal ARIMA model is formed by including additional seasonal terms in the ARIMA […] The seasonal part of the model consists of terms that are very similar to the non-seasonal components of the model, but they involve backshifts of the seasonal period.

— Page 242, Forecasting: principles and practice, 2013.

How to Configure SARIMA

Configuring a SARIMA requires selecting hyperparameters for both the trend and seasonal elements of the series.

Trend Elements

There are three trend elements that require configuration.

They are the same as the ARIMA model; specifically:

  • p: Trend autoregression order.
  • d: Trend difference order.
  • q: Trend moving average order.

Seasonal Elements

There are four seasonal elements that are not part of ARIMA that must be configured; they are:

  • P: Seasonal autoregressive order.
  • D: Seasonal difference order.
  • Q: Seasonal moving average order.
  • m: The number of time steps for a single seasonal period.

Together, the notation for an SARIMA model is specified as:

Where the specifically chosen hyperparameters for a model are specified; for example:

Importantly, the m parameter influences the P, D, and Q parameters. For example, an m of 12 for monthly data suggests a yearly seasonal cycle.

A P=1 would make use of the first seasonally offset observation in the model, e.g. t-(m*1) or t-12. A P=2, would use the last two seasonally offset observations t-(m * 1), t-(m * 2).

Similarly, a D of 1 would calculate a first order seasonal difference and a Q=1 would use a first order errors in the model (e.g. moving average).

A seasonal ARIMA model uses differencing at a lag equal to the number of seasons (s) to remove additive seasonal effects. As with lag 1 differencing to remove a trend, the lag s differencing introduces a moving average term. The seasonal ARIMA model includes autoregressive and moving average terms at lag s.

— Page 142, Introductory Time Series with R, 2009.

The trend elements can be chosen through careful analysis of ACF and PACF plots looking at the correlations of recent time steps (e.g. 1, 2, 3).

Similarly, ACF and PACF plots can be analyzed to specify values for the seasonal model by looking at correlation at seasonal lag time steps.

For more on interpreting ACF/PACF plots, see the post:

Seasonal ARIMA models can potentially have a large number of parameters and combinations of terms. Therefore, it is appropriate to try out a wide range of models when fitting to data and choose a best fitting model using an appropriate criterion …

— Pages 143-144, Introductory Time Series with R, 2009.

Alternately, a grid search can be used across the trend and seasonal hyperparameters.

For more on grid searching SARIMA parameters, see the post:

How to use SARIMA in Python

The SARIMA time series forecasting method is supported in Python via the Statsmodels library.

To use SARIMA there are three steps, they are:

  1. Define the model.
  2. Fit the defined model.
  3. Make a prediction with the fit model.

Let’s look at each step in turn.

1. Define Model

An instance of the SARIMAX class can be created by providing the training data and a host of model configuration parameters.

The implementation is called SARIMAX instead of SARIMA because the “X” addition to the method name means that the implementation also supports exogenous variables.

These are parallel time series variates that are not modeled directly via AR, I, or MA processes, but are made available as a weighted input to the model.

Exogenous variables are optional can be specified via the “exog” argument.

The trend and seasonal hyperparameters are specified as 3 and 4 element tuples respectively to the “order” and “seasonal_order” arguments.

These elements must be specified.

These are the main configuration elements.

There are other fine tuning parameters you may want to configure. Learn more in the full API:

2. Fit Model

Once the model is created, it can be fit on the training data.

The model is fit by calling the fit() function.

Fitting the model returns an instance of the SARIMAXResults class. This object contains the details of the fit, such as the data and coefficients, as well as functions that can be used to make use of the model.

Many elements of the fitting process can be configured, and it is worth reading the API to review these options once you are comfortable with the implementation.

3. Make Prediction

Once fit, the model can be used to make a forecast.

A forecast can be made by calling the forecast() or the predict() functions on the SARIMAXResults object returned from calling fit.

The forecast() function takes a single parameter that specifies the number of out of sample time steps to forecast, or assumes a one step forecast if no arguments are provided.

The predict() function requires a start and end date or index to be specified.

Additionally, if exogenous variables were provided when defining the model, they too must be provided for the forecast period to the predict() function.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Posts

Books

API

Articles

Summary

In this tutorial, you discovered the Seasonal Autoregressive Integrated Moving Average, or SARIMA, method for time series forecasting with univariate data containing trends and seasonality.

Specifically, you learned:

  • The limitations of ARIMA when it comes to seasonal data.
  • The SARIMA extension of ARIMA that explicitly models the seasonal element in univariate data.
  • How to implement the SARIMA method in Python using the Statsmodels library.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Want to Develop Time Series Forecasts with Python?

Introduction to Time Series Forecasting With Python

Develop Your Own Forecasts in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Introduction to Time Series Forecasting With Python

It covers self-study tutorials and end-to-end projects on topics like:
Loading data, visualization, modeling, algorithm tuning, and much more...

Finally Bring Time Series Forecasting to
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.

54 Responses to A Gentle Introduction to SARIMA for Time Series Forecasting in Python

  1. SARAVANAN S August 17, 2018 at 9:22 pm #

    How to configure multiple seasons in SARIMA?

    • Jason Brownlee August 18, 2018 at 5:35 am #

      Good question, you might need to develop a custom model instead.

      • SARAVANAN S August 30, 2018 at 10:51 pm #

        Is there is any available custom models?
        we can set up multivariate time series analysis on ARIMA is it possible it SARIMA too.?

        • Jason Brownlee August 31, 2018 at 8:14 am #

          Yes, you can us VAR in statsmodels. Not sure if there is a VSARIMA, you might have to code one.

          • SARAVANAN S September 7, 2018 at 8:04 pm #

            Thank you:)

  2. Khalid Nawaz August 18, 2018 at 2:17 pm #

    Whats the difference between SARIMA model and the X-12 ARIMA model?

    • Jason Brownlee August 19, 2018 at 6:15 am #

      What is X-12 ARIMA?

      • Costas July 15, 2019 at 6:27 pm #

        X-12 ARIMA was the software used by the U.S. Census Bureau for seasonal adjustment. It has been replaced by X-13 ARIMA SEATS. It is a part of econometric packages, such as Eviews or GRETL and can decompose a time series into a trend, cycle, seasonal components, including calendar effects, and noise.

  3. Anthony The Koala August 31, 2018 at 4:20 am #

    Dear Dr Jason,
    (1) How does one deteremine the SARIMA p d q m values?
    (2) I recall you had a method for values p d q for ARIMA.briefly mentioned in this article. What site you mentioned in an earlier article about the methods to determine p, d, q for ARIMA
    Thank you
    Anthony of Sydney

    • Jason Brownlee August 31, 2018 at 8:15 am #

      You can use ACF and PACF analysis like we do for ARIMA.

      • Anthony The Koala August 31, 2018 at 8:33 am #

        Dear Dr Jason,
        I understand the ACF and PACF for ARIMA. Once you determine the significant lags using ACF and PACF in ARIMA analysis. I have seen your post on using PACF and ACF.

        Please clarify as I am not sure of this further step for the seasonal or S of SARIMA of the analysis . Do you then do ACF and PACF on the lagged and differenced data to work out the seasons?

        Thank you,
        Anthony of Sydney

        • Jason Brownlee August 31, 2018 at 12:10 pm #

          The seasonal aspects can also be learned from an ACF/PACF analysis.

  4. Naval September 12, 2018 at 8:37 pm #

    Dr. Jason,

    I’m looking for your suggestions on TS analysis and forecasting of daily (business day) data (3 yrs data) and I use SARIMAX to fit this data. Could you please share some basic iead on this as most of the ref materials are done with monthly data and could not offer much knowledge about it.

    Thank you,
    Naval

    • Jason Brownlee September 13, 2018 at 8:01 am #

      You can use SARIMAX, you can also use ETS. What is the problem exactly?

  5. yameen shahzada December 15, 2018 at 1:52 pm #

    how can move ARIMA to SARIMA modeling ?
    anyone please explain this problem in Eviews or minitab etc. with data

    • Jason Brownlee December 16, 2018 at 5:19 am #

      A SARIMA can use ARIMA if you set the seasonality to 0.

      What is Eviews and minitab?

  6. sandeep January 31, 2019 at 3:44 pm #

    I think we can feed output of one SARIMA to another SARIMA , with p,d,q of second SARIMA set to zero.

  7. Arjun Nelwade February 12, 2019 at 4:18 pm #

    How could we convert the previously changed dataset to seasonal dataset for using it to SARIMA model

    • Jason Brownlee February 13, 2019 at 7:52 am #

      No need, the model will calculate the seasonal adjustment that you specify by the model hyperparameters.

  8. xxsummer March 11, 2019 at 6:52 pm #

    Dr. Jason,
    I have six year daily data. and want to predict next year. for this work I set this parameters
    m=365, my problem is that very long run time for my model. is it correct to set m= 365 for daily data? and is there any solution for this problem?
    Thanks very much.

  9. aravind March 28, 2019 at 6:00 pm #

    sir i have one doubt, In time series we are using SARIMA model (or) method . Are we used any algorithm in SARIMA model ? is there SARIMA using any algorithm ?
    like ex: Time Series with LMST algorithm in Recurrent Neural Network?
    Thanks in Advance…

    • Jason Brownlee March 29, 2019 at 8:26 am #

      Yes, SARIMA it is a linear algorithm, like linear regression, that uses different inputs (data and residuals) and performs transforms (differencing).

  10. Jeff April 9, 2019 at 9:32 am #

    I am struggling to understand whether one needs to transform a non-stationary time series before using ARIMA or SARIMA. I’ve read several references that indicate using log transformations on a series that has an exponential trend and seasonality before modeling in SARIMA. I’ve also read where SARIMA and ARIMA account for the trend and seasonality and therefore transforming is not necessary.

    Can you provide me with your understanding / opinion?

    Thanks
    Jeff

    • Jason Brownlee April 9, 2019 at 2:39 pm #

      Maybe.

      The SARIMA can handle the differencing for trend and seasonality.

      If your data has a changing variance after trend and seasonality is removed, you can fix it with a box cox or similar power transform.

      You can manually remove trend/seasonality and then run a statistical test to see if it is stationary:
      https://machinelearningmastery.com/time-series-data-stationary-python/

      • Jeff April 10, 2019 at 11:00 pm #

        Thank you! This is what I was looking for. I’m planning to decompose the series and then test the residual. Would you agree with that approach?

  11. Sara Song April 23, 2019 at 6:55 am #

    Hi Dr. Brownlee,

    I have a question about Seasonal Elements. I used Grid Search SARIMA Model Hyperparameters for a time series predictions project. The Seasonal Elements of best SARIMA model is (7,1,9,0). what does it mean that when m(The number of time steps for a single seasonal period) = 0, but seasonal P ,D,Q are not 0? do we need to capture this seasonality or not?

    Thanks,
    Sara

    • Jason Brownlee April 23, 2019 at 8:00 am #

      That is a good question!

      I have found this myself and the results do differ if you zero-out those values (from memory).

      I’m not sure off hand. It might require digging into the code or posting on a Q&A site.

  12. Jay May 14, 2019 at 3:46 pm #

    Hi Jason, this is a great tutorial thanks for making this. I’m using SARIMA and noticed that the Grid Search does not produce a result for trend and season element combinations of (0,1,0)x(0,0,0) or (0,1,0)x(0,1,0). Can you help explain this?

    • Jason Brownlee May 15, 2019 at 8:10 am #

      Perhaps the underlying math library failed to converge for that configuration on your dataset?

      You can see by showing warnings during the grid search.

      • Jay May 15, 2019 at 9:18 am #

        Thanks for your response. Would those have to be manually ran to get the results?

  13. Emily May 29, 2019 at 4:40 am #

    Jason,

    Thanks for sharing this info. I need to set up a self-updating model predicting inventory for various products through multiple time series, and many of these products show seasonality. I have seen where in this case, for the sake of the automation and generalization, others have applied ARIMA and used differencing to remove the seasonality. Does this make the model less accurate? If I base my model off of SARIMA instead for the seasonality does it ruin the ability for automation and generalization for the products?

    • Jason Brownlee May 29, 2019 at 8:56 am #

      SARIMA can model the trend and seasonality, no need to remove beforehand.

  14. Ardeshir May 31, 2019 at 3:24 am #

    Thank you for this tutorial.

    I am trying to model an atmospheric parameter that I have a data of. An hourly time series data. I was literally trying out the example codes on your websites and a few others to test the data I have. Using, model = SARIMAX(aod, order=(1, 1, 1), seasonal_order=(0, 0, 0, 0)) which was available in a default code in some example, provided me with a nearly perfect fit that no other model like ARIMA could provide. The ARIMA model was giving exceptionally low values. The data I have is stationary.
    My question was, is it by some mistake or some thing I am overlooking that my data had a nice fit using the example code above where all the SARIMAX parameters are 1 and seasonal ones are 0? My data is daily hourly data, so I also tried model = SARIMAX(aod, order=(1, 1, 1), seasonal_order=(0, 0, 0, 24)) which also gives a good result.

    • Jason Brownlee May 31, 2019 at 7:53 am #

      Well done!

      I would expect zero order seasonal components to be the same as an ARIMA, but perhaps the statsmodel implementation is doing something different?

      • Ardeshir June 1, 2019 at 2:07 pm #

        Actually I realised my mistake. The ACF and PACF plots might have showed a seasonal trend of 24 and p and q values of 4. The previous model worked because I was predicting on the trained data itself. I actually didn’t split my dataset into two sets for training and testing. That is why when I performed the split and validated it, the predicted series was a straight line. I was depressed at the situation. But later I changed the parameters of SARIMA by analysing the ACF plots and now the data seems to be validated with an RMSE of .003
        If you don’t mind,I had a question regarding this model. The variable I am trying to predict also might depend upon one or two other variables, atleast that is what I want to show as well. Is there any way I can implementation a multivariate SARIMA model?

  15. Arb June 17, 2019 at 5:41 pm #

    Thanks a lot for this wonderful article. Basis this and your other Grid Search article i was able to build a foundation of my model. However I have a doubt. My predictions are shifted by one step. Is it normal? To make use of prediction should I just shift it back one step?

  16. S.Saravanakumar June 27, 2019 at 3:04 pm #

    How can we do double seasonal_order in SARIMAX. Ex. I have to forecast hour wise based on day of week.

    • Jason Brownlee June 28, 2019 at 5:56 am #

      Not sure that two seasonal orders are is supported.

      Perhaps use seasonal differencing for one of them prior to modeling?

  17. Tal July 19, 2019 at 12:30 am #

    Hi,
    Thank you for the article!
    I have a question, in many places I encounter that before running the model there’s a pre processing stage where the author log-ed the input to stabilize the variance and also taking the difference of the log in order to remove the trend.
    Your thoughts will be appropriated!

  18. Lina August 3, 2019 at 3:28 am #

    Hi, I am working with data every 1 minute. My season is daily. So if I understand sari right, my season variable (m) should be 60*24 right? However, the model doesn’t work with a number this high. What should I do?

    • Jason Brownlee August 3, 2019 at 8:14 am #

      Perhaps try modeling at a different resolution, e.g. resample to minutes, 15 min, 30 min, hourly, etc and compare?

      Perhaps try alternate models?

      Perhaps try modeling with a subset of features and engineered features?

Leave a Reply