A Gentle Introduction to SARIMA for Time Series Forecasting in Python

By Jason Brownlee on August 21, 2019 in Time Series 137

Autoregressive Integrated Moving Average, or ARIMA, is one of the most widely used forecasting methods for univariate time series data forecasting.

Although the method can handle data with a trend, it does not support time series with a seasonal component.

An extension to ARIMA that supports the direct modeling of the seasonal component of the series is called SARIMA.

In this tutorial, you will discover the Seasonal Autoregressive Integrated Moving Average, or SARIMA, method for time series forecasting with univariate data containing trends and seasonality.

After completing this tutorial, you will know:

The limitations of ARIMA when it comes to seasonal data.
The SARIMA extension of ARIMA that explicitly models the seasonal element in univariate data.
How to implement the SARIMA method in Python using the Statsmodels library.

Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Update: For help using and grid searching SARIMA hyperparameters, see this post:

How to Grid Search SARIMA Model Hyperparameters for Time Series Forecasting

A Gentle Introduction to SARIMA for Time Series Forecasting in Python
Photo by Mario Micklisch, some rights reserved.

Tutorial Overview

This tutorial is divided into four parts; they are:

What’s Wrong with ARIMA
What Is SARIMA?
How to Configure SARIMA
How to use SARIMA in Python

What’s Wrong with ARIMA

Autoregressive Integrated Moving Average, or ARIMA, is a forecasting method for univariate time series data.

As its name suggests, it supports both an autoregressive and moving average elements. The integrated element refers to differencing allowing the method to support time series data with a trend.

A problem with ARIMA is that it does not support seasonal data. That is a time series with a repeating cycle.

ARIMA expects data that is either not seasonal or has the seasonal component removed, e.g. seasonally adjusted via methods such as seasonal differencing.

For more on ARIMA, see the post:

How to Create an ARIMA Model for Time Series Forecasting with Python

An alternative is to use SARIMA.

What is SARIMA?

Seasonal Autoregressive Integrated Moving Average, SARIMA or Seasonal ARIMA, is an extension of ARIMA that explicitly supports univariate time series data with a seasonal component.

It adds three new hyperparameters to specify the autoregression (AR), differencing (I) and moving average (MA) for the seasonal component of the series, as well as an additional parameter for the period of the seasonality.

A seasonal ARIMA model is formed by including additional seasonal terms in the ARIMA […] The seasonal part of the model consists of terms that are very similar to the non-seasonal components of the model, but they involve backshifts of the seasonal period.

— Page 242, Forecasting: principles and practice, 2013.

How to Configure SARIMA

Configuring a SARIMA requires selecting hyperparameters for both the trend and seasonal elements of the series.

Trend Elements

There are three trend elements that require configuration.

They are the same as the ARIMA model; specifically:

p: Trend autoregression order.
d: Trend difference order.
q: Trend moving average order.

Seasonal Elements

There are four seasonal elements that are not part of ARIMA that must be configured; they are:

P: Seasonal autoregressive order.
D: Seasonal difference order.
Q: Seasonal moving average order.
m: The number of time steps for a single seasonal period.

Together, the notation for an SARIMA model is specified as:

SARIMA(p,d,q)(P,D,Q)m

1	SARIMA(p,d,q)(P,D,Q)m

Where the specifically chosen hyperparameters for a model are specified; for example:

SARIMA(3,1,0)(1,1,0)12

1	SARIMA(3,1,0)(1,1,0)12

Importantly, the m parameter influences the P, D, and Q parameters. For example, an m of 12 for monthly data suggests a yearly seasonal cycle.

A P=1 would make use of the first seasonally offset observation in the model, e.g. t-(m*1) or t-12. A P=2, would use the last two seasonally offset observations t-(m * 1), t-(m * 2).

Similarly, a D of 1 would calculate a first order seasonal difference and a Q=1 would use a first order errors in the model (e.g. moving average).

A seasonal ARIMA model uses differencing at a lag equal to the number of seasons (s) to remove additive seasonal effects. As with lag 1 differencing to remove a trend, the lag s differencing introduces a moving average term. The seasonal ARIMA model includes autoregressive and moving average terms at lag s.

— Page 142, Introductory Time Series with R, 2009.

The trend elements can be chosen through careful analysis of ACF and PACF plots looking at the correlations of recent time steps (e.g. 1, 2, 3).

Similarly, ACF and PACF plots can be analyzed to specify values for the seasonal model by looking at correlation at seasonal lag time steps.

For more on interpreting ACF/PACF plots, see the post:

A Gentle Introduction to Autocorrelation and Partial Autocorrelation

Seasonal ARIMA models can potentially have a large number of parameters and combinations of terms. Therefore, it is appropriate to try out a wide range of models when fitting to data and choose a best fitting model using an appropriate criterion …

— Pages 143-144, Introductory Time Series with R, 2009.

Alternately, a grid search can be used across the trend and seasonal hyperparameters.

For more on grid searching SARIMA parameters, see the post:

How to Grid Search SARIMA Model Hyperparameters for Time Series Forecasting

How to use SARIMA in Python

The SARIMA time series forecasting method is supported in Python via the Statsmodels library.

To use SARIMA there are three steps, they are:

Define the model.
Fit the defined model.
Make a prediction with the fit model.

Let’s look at each step in turn.

1. Define Model

An instance of the SARIMAX class can be created by providing the training data and a host of model configuration parameters.

# specify training data
data = ...
# define model
model = SARIMAX(data, ...)

# specify training data

data = ...

# define model

model = SARIMAX(data, ...)

The implementation is called SARIMAX instead of SARIMA because the “X” addition to the method name means that the implementation also supports exogenous variables.

These are parallel time series variates that are not modeled directly via AR, I, or MA processes, but are made available as a weighted input to the model.

Exogenous variables are optional can be specified via the “exog” argument.

# specify training data
data = ...
# specify additional data
other_data = ...
# define model
model = SARIMAX(data, exog=other_data, ...)

# specify training data

data = ...

# specify additional data

other_data = ...

# define model

model = SARIMAX(data, exog=other_data, ...)

The trend and seasonal hyperparameters are specified as 3 and 4 element tuples respectively to the “order” and “seasonal_order” arguments.

These elements must be specified.

# specify training data
data = ...
# define model configuration
my_order = (1, 1, 1)
my_seasonal_order = (1, 1, 1, 12)
# define model
model = SARIMAX(data, order=my_order, seasonal_order=my_seasonal_order, ...)

# specify training data

data = ...

# define model configuration

my_order = (1, 1, 1)

my_seasonal_order = (1, 1, 1, 12)

# define model

model = SARIMAX(data, order=my_order, seasonal_order=my_seasonal_order, ...)

These are the main configuration elements.

There are other fine tuning parameters you may want to configure. Learn more in the full API:

statsmodels.tsa.statespace.sarimax.SARIMAX API

2. Fit Model

Once the model is created, it can be fit on the training data.

The model is fit by calling the fit() function.

Fitting the model returns an instance of the SARIMAXResults class. This object contains the details of the fit, such as the data and coefficients, as well as functions that can be used to make use of the model.

# specify training data
data = ...
# define model
model = SARIMAX(data, order=..., seasonal_order=...)
# fit model
model_fit = model.fit()

# specify training data

data = ...

# define model

model = SARIMAX(data, order=..., seasonal_order=...)

# fit model

model_fit = model.fit()

Many elements of the fitting process can be configured, and it is worth reading the API to review these options once you are comfortable with the implementation.

statsmodels.tsa.statespace.sarimax.SARIMAX.fit API

3. Make Prediction

Once fit, the model can be used to make a forecast.

A forecast can be made by calling the forecast() or the predict() functions on the SARIMAXResults object returned from calling fit.

The forecast() function takes a single parameter that specifies the number of out of sample time steps to forecast, or assumes a one step forecast if no arguments are provided.

# specify training data
data = ...
# define model
model = SARIMAX(data, order=..., seasonal_order=...)
# fit model
model_fit = model.fit()
# one step forecast
yhat = model_fit.forecast()

# specify training data

data = ...

# define model

model = SARIMAX(data, order=..., seasonal_order=...)

# fit model

model_fit = model.fit()

# one step forecast

yhat = model_fit.forecast()

The predict() function requires a start and end date or index to be specified.

Additionally, if exogenous variables were provided when defining the model, they too must be provided for the forecast period to the predict() function.

# specify training data
data = ...
# define model
model = SARIMAX(data, order=..., seasonal_order=...)
# fit model
model_fit = model.fit()
# one step forecast
yhat = model_fit.predict(start=len(data), end=len(data))

# specify training data

data = ...

# define model

model = SARIMAX(data, order=..., seasonal_order=...)

# fit model

model_fit = model.fit()

# one step forecast

yhat = model_fit.predict(start=len(data), end=len(data))

Summary

In this tutorial, you discovered the Seasonal Autoregressive Integrated Moving Average, or SARIMA, method for time series forecasting with univariate data containing trends and seasonality.

Specifically, you learned:

The limitations of ARIMA when it comes to seasonal data.
The SARIMA extension of ARIMA that explicitly models the seasonal element in univariate data.
How to implement the SARIMA method in Python using the Statsmodels library.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

137 Responses to A Gentle Introduction to SARIMA for Time Series Forecasting in Python

SARAVANAN S August 17, 2018 at 9:22 pm #

How to configure multiple seasons in SARIMA?

Reply
- Jason Brownlee August 18, 2018 at 5:35 am #
  
  Good question, you might need to develop a custom model instead.
  
  Reply
  - SARAVANAN S August 30, 2018 at 10:51 pm #
    
    Is there is any available custom models?
    we can set up multivariate time series analysis on ARIMA is it possible it SARIMA too.?
    
    Reply
    - Jason Brownlee August 31, 2018 at 8:14 am #
      
      Yes, you can us VAR in statsmodels. Not sure if there is a VSARIMA, you might have to code one.
      
      Reply
      - SARAVANAN S September 7, 2018 at 8:04 pm #
        
        Thank you:)
Khalid Nawaz August 18, 2018 at 2:17 pm #

Whats the difference between SARIMA model and the X-12 ARIMA model?

Reply
- Jason Brownlee August 19, 2018 at 6:15 am #
  
  What is X-12 ARIMA?
  
  Reply
  - Costas July 15, 2019 at 6:27 pm #
    
    X-12 ARIMA was the software used by the U.S. Census Bureau for seasonal adjustment. It has been replaced by X-13 ARIMA SEATS. It is a part of econometric packages, such as Eviews or GRETL and can decompose a time series into a trend, cycle, seasonal components, including calendar effects, and noise.
    
    Reply
    - Jason Brownlee July 16, 2019 at 8:13 am #
      
      Fascinating. Thanks for sharing.
      
      Reply
Anthony The Koala August 31, 2018 at 4:20 am #

Dear Dr Jason,
(1) How does one deteremine the SARIMA p d q m values?
(2) I recall you had a method for values p d q for ARIMA.briefly mentioned in this article. What site you mentioned in an earlier article about the methods to determine p, d, q for ARIMA
Thank you
Anthony of Sydney

Reply
- Jason Brownlee August 31, 2018 at 8:15 am #
  
  You can use ACF and PACF analysis like we do for ARIMA.
  
  Reply
  - Anthony The Koala August 31, 2018 at 8:33 am #
    
    Dear Dr Jason,
    I understand the ACF and PACF for ARIMA. Once you determine the significant lags using ACF and PACF in ARIMA analysis. I have seen your post on using PACF and ACF.
    
    Please clarify as I am not sure of this further step for the seasonal or S of SARIMA of the analysis . Do you then do ACF and PACF on the lagged and differenced data to work out the seasons?
    
    Thank you,
    Anthony of Sydney
    
    Reply
    - Jason Brownlee August 31, 2018 at 12:10 pm #
      
      The seasonal aspects can also be learned from an ACF/PACF analysis.
      
      Reply
      - Azad September 1, 2021 at 11:01 am #
        
        Any Example with real value Sir ?
      - Adrian Tam September 1, 2021 at 11:28 am #
        
        Do you mean those in this post: https://machinelearningmastery.com/arima-for-time-series-forecasting-with-python/
        There is a complete example that you can run.
Naval September 12, 2018 at 8:37 pm #

Dr. Jason,

I’m looking for your suggestions on TS analysis and forecasting of daily (business day) data (3 yrs data) and I use SARIMAX to fit this data. Could you please share some basic iead on this as most of the ref materials are done with monthly data and could not offer much knowledge about it.

Thank you,
Naval

Reply
- Jason Brownlee September 13, 2018 at 8:01 am #
  
  You can use SARIMAX, you can also use ETS. What is the problem exactly?
  
  Reply
yameen shahzada December 15, 2018 at 1:52 pm #

how can move ARIMA to SARIMA modeling ?
anyone please explain this problem in Eviews or minitab etc. with data

Reply
- Jason Brownlee December 16, 2018 at 5:19 am #
  
  A SARIMA can use ARIMA if you set the seasonality to 0.
  
  What is Eviews and minitab?
  
  Reply
sandeep January 31, 2019 at 3:44 pm #

I think we can feed output of one SARIMA to another SARIMA , with p,d,q of second SARIMA set to zero.

Reply
- Jason Brownlee February 1, 2019 at 5:32 am #
  
  Not sure why you would want to do that?
  
  Reply
Arjun Nelwade February 12, 2019 at 4:18 pm #

How could we convert the previously changed dataset to seasonal dataset for using it to SARIMA model

Reply
- Jason Brownlee February 13, 2019 at 7:52 am #
  
  No need, the model will calculate the seasonal adjustment that you specify by the model hyperparameters.
  
  Reply
xxsummer March 11, 2019 at 6:52 pm #

Dr. Jason,
I have six year daily data. and want to predict next year. for this work I set this parameters
m=365, my problem is that very long run time for my model. is it correct to set m= 365 for daily data? and is there any solution for this problem?
Thanks very much.

Reply
- Jason Brownlee March 12, 2019 at 6:47 am #
  
  Perhaps use less historic data as input?
  
  Reply
aravind March 28, 2019 at 6:00 pm #

sir i have one doubt, In time series we are using SARIMA model (or) method . Are we used any algorithm in SARIMA model ? is there SARIMA using any algorithm ?
like ex: Time Series with LMST algorithm in Recurrent Neural Network?
Thanks in Advance…

Reply
- Jason Brownlee March 29, 2019 at 8:26 am #
  
  Yes, SARIMA it is a linear algorithm, like linear regression, that uses different inputs (data and residuals) and performs transforms (differencing).
  
  Reply
  - aravind March 30, 2019 at 6:09 pm #
    
    Thank you sir.
    
    Reply
Jeff April 9, 2019 at 9:32 am #

I am struggling to understand whether one needs to transform a non-stationary time series before using ARIMA or SARIMA. I’ve read several references that indicate using log transformations on a series that has an exponential trend and seasonality before modeling in SARIMA. I’ve also read where SARIMA and ARIMA account for the trend and seasonality and therefore transforming is not necessary.

Can you provide me with your understanding / opinion?

Thanks
Jeff

Reply
- Jason Brownlee April 9, 2019 at 2:39 pm #
  
  Maybe.
  
  The SARIMA can handle the differencing for trend and seasonality.
  
  If your data has a changing variance after trend and seasonality is removed, you can fix it with a box cox or similar power transform.
  
  You can manually remove trend/seasonality and then run a statistical test to see if it is stationary:
  https://machinelearningmastery.com/time-series-data-stationary-python/
  
  Reply
  - Jeff April 10, 2019 at 11:00 pm #
    
    Thank you! This is what I was looking for. I’m planning to decompose the series and then test the residual. Would you agree with that approach?
    
    Reply
    - Jason Brownlee April 11, 2019 at 6:40 am #
      
      Perhaps try it and see?
      
      Reply
Sara Song April 23, 2019 at 6:55 am #

Hi Dr. Brownlee,

I have a question about Seasonal Elements. I used Grid Search SARIMA Model Hyperparameters for a time series predictions project. The Seasonal Elements of best SARIMA model is (7,1,9,0). what does it mean that when m(The number of time steps for a single seasonal period) = 0, but seasonal P ,D,Q are not 0? do we need to capture this seasonality or not?

Thanks,
Sara

Reply
- Jason Brownlee April 23, 2019 at 8:00 am #
  
  That is a good question!
  
  I have found this myself and the results do differ if you zero-out those values (from memory).
  
  I’m not sure off hand. It might require digging into the code or posting on a Q&A site.
  
  Reply
Jay May 14, 2019 at 3:46 pm #

Hi Jason, this is a great tutorial thanks for making this. I’m using SARIMA and noticed that the Grid Search does not produce a result for trend and season element combinations of (0,1,0)x(0,0,0) or (0,1,0)x(0,1,0). Can you help explain this?

Reply
- Jason Brownlee May 15, 2019 at 8:10 am #
  
  Perhaps the underlying math library failed to converge for that configuration on your dataset?
  
  You can see by showing warnings during the grid search.
  
  Reply
  - Jay May 15, 2019 at 9:18 am #
    
    Thanks for your response. Would those have to be manually ran to get the results?
    
    Reply
    - Jason Brownlee May 15, 2019 at 2:43 pm #
      
      That is one way, yes.
      
      Reply
Emily May 29, 2019 at 4:40 am #

Jason,

Thanks for sharing this info. I need to set up a self-updating model predicting inventory for various products through multiple time series, and many of these products show seasonality. I have seen where in this case, for the sake of the automation and generalization, others have applied ARIMA and used differencing to remove the seasonality. Does this make the model less accurate? If I base my model off of SARIMA instead for the seasonality does it ruin the ability for automation and generalization for the products?

Reply
- Jason Brownlee May 29, 2019 at 8:56 am #
  
  SARIMA can model the trend and seasonality, no need to remove beforehand.
  
  Reply
Ardeshir May 31, 2019 at 3:24 am #

Thank you for this tutorial.

I am trying to model an atmospheric parameter that I have a data of. An hourly time series data. I was literally trying out the example codes on your websites and a few others to test the data I have. Using, model = SARIMAX(aod, order=(1, 1, 1), seasonal_order=(0, 0, 0, 0)) which was available in a default code in some example, provided me with a nearly perfect fit that no other model like ARIMA could provide. The ARIMA model was giving exceptionally low values. The data I have is stationary.
My question was, is it by some mistake or some thing I am overlooking that my data had a nice fit using the example code above where all the SARIMAX parameters are 1 and seasonal ones are 0? My data is daily hourly data, so I also tried model = SARIMAX(aod, order=(1, 1, 1), seasonal_order=(0, 0, 0, 24)) which also gives a good result.

Reply
- Jason Brownlee May 31, 2019 at 7:53 am #
  
  Well done!
  
  I would expect zero order seasonal components to be the same as an ARIMA, but perhaps the statsmodel implementation is doing something different?
  
  Reply
  - Ardeshir June 1, 2019 at 2:07 pm #
    
    Actually I realised my mistake. The ACF and PACF plots might have showed a seasonal trend of 24 and p and q values of 4. The previous model worked because I was predicting on the trained data itself. I actually didn’t split my dataset into two sets for training and testing. That is why when I performed the split and validated it, the predicted series was a straight line. I was depressed at the situation. But later I changed the parameters of SARIMA by analysing the ACF plots and now the data seems to be validated with an RMSE of .003
    If you don’t mind,I had a question regarding this model. The variable I am trying to predict also might depend upon one or two other variables, atleast that is what I want to show as well. Is there any way I can implementation a multivariate SARIMA model?
    
    Reply
    - Jason Brownlee June 2, 2019 at 6:38 am #
      
      Nice work!
      
      Yes, you can include additional variables via exog variables, e.g. SARIMAX. I think there is an example here:
      https://machinelearningmastery.com/time-series-forecasting-methods-in-python-cheat-sheet/
      
      Reply
      - Ardeshir June 3, 2019 at 7:50 pm #
        
        Thanks a lot. Your SARIMA article saved my soul. Even though you wrote about one time step prediction, I extended the predict function to a number of time steps if that’s okay?
        I am trying to include each variable independently as exog and then combining them and again including them. Since I am doing this for a project, what do you recommend for a closure? I mean what parameters or test do I need to show so that I can write up and present a reasonable project thesis? Sorry if this is too much to ask for.
      - Jason Brownlee June 4, 2019 at 7:49 am #
        
        Great question, perhaps this will help:
        https://machinelearningmastery.com/how-to-grid-search-sarima-model-hyperparameters-for-time-series-forecasting-in-python/
      - Shipta December 4, 2019 at 4:09 pm #
        
        In the Sarimax model when using exog variables, can the exog variable be something like an employee ID?
        The exog data used in the model preparation step i.e., while calling the sarimax() or auto_arima() function is the same as when calling predict() on the fitted model?
      - Jason Brownlee December 5, 2019 at 6:38 am #
        
        I think an id would not be predictive and probably should not be in the model.
      - Shipta December 5, 2019 at 8:14 pm #
        
        Let’s say I want to predict the sales for a shop at product level granularity. So can I take product Id as the exogenous variable and sales per month as the endogenous variable. I have the dataset at monthly level sales of product in that shop. Now I want to predict the future sales. So in this case can product Id be taken as exogenous variable in the model.
        I am confused about what is the criteria on which we decide whether a column can be considered as exogenous or not.
        
        Also can you share some tricks to make the auto_arima() model and sarimax model run faster if taking seasonality trends into account. The model I am running is taking a lot of time, and mostly the process gets killed because the resources gets exhausted.
        
        For taking into account seasonality I have to assign a value for m, so I am having multiple rows of data corresponding to a single date. The data repeats itself after a year, so is the value of m should be 12 or the number of observations(I.e., number of product ids) * 12.
        
        TIA
      - Jason Brownlee December 6, 2019 at 5:14 am #
        
        The id won’t be predictive and should not be used as input to the model.
        
        Yes, I wrote a custom auto tuning method here:
        https://machinelearningmastery.com/how-to-grid-search-sarima-model-hyperparameters-for-time-series-forecasting-in-python/
      - Vivek Yadav February 22, 2020 at 9:34 pm #
        
        Can I apply SARIMAX Weekly and Monthly & also Can it apply for festival seasion
      - Jason Brownlee February 23, 2020 at 7:27 am #
        
        Perhaps try it and see?
Arb June 17, 2019 at 5:41 pm #

Thanks a lot for this wonderful article. Basis this and your other Grid Search article i was able to build a foundation of my model. However I have a doubt. My predictions are shifted by one step. Is it normal? To make use of prediction should I just shift it back one step?

Reply
- Jason Brownlee June 18, 2019 at 6:35 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/why-is-my-forecasted-time-series-right-behind-the-actual-time-series
  
  Reply
  - Arb June 18, 2019 at 6:09 pm #
    
    Ah ok, thanks. That means my model is far from usable as out of sample forecast is converging near mean in as little as 10 steps. Oh well!
    
    Reply
    - Jason Brownlee June 19, 2019 at 7:50 am #
      
      Perhaps try a suite of configurations and models?
      
      Reply
S.Saravanakumar June 27, 2019 at 3:04 pm #

How can we do double seasonal_order in SARIMAX. Ex. I have to forecast hour wise based on day of week.

Reply
- Jason Brownlee June 28, 2019 at 5:56 am #
  
  Not sure that two seasonal orders are is supported.
  
  Perhaps use seasonal differencing for one of them prior to modeling?
  
  Reply
Tal July 19, 2019 at 12:30 am #

Hi,
Thank you for the article!
I have a question, in many places I encounter that before running the model there’s a pre processing stage where the author log-ed the input to stabilize the variance and also taking the difference of the log in order to remove the trend.
Your thoughts will be appropriated!

Reply
- Jason Brownlee July 19, 2019 at 9:20 am #
  
  Yes, see this:
  https://machinelearningmastery.com/power-transform-time-series-forecast-data-python/
  
  Reply
Lina August 3, 2019 at 3:28 am #

Hi, I am working with data every 1 minute. My season is daily. So if I understand sari right, my season variable (m) should be 60*24 right? However, the model doesn’t work with a number this high. What should I do?

Reply
- Jason Brownlee August 3, 2019 at 8:14 am #
  
  Perhaps try modeling at a different resolution, e.g. resample to minutes, 15 min, 30 min, hourly, etc and compare?
  
  Perhaps try alternate models?
  
  Perhaps try modeling with a subset of features and engineered features?
  
  Reply
kiya November 7, 2019 at 1:03 pm #

Hi thanks for sharing information!
I am working with daily data. what would be the best values to set for m parameter. is there a rule of thump to set it? like 7 for have weekly cycle?

Reply
- Jason Brownlee November 7, 2019 at 2:07 pm #
  
  Yes, weekly, monthly, yearly, etc.
  
  Reply
Viswanathan C January 26, 2020 at 11:20 pm #

Hi Jason,
Very helpful article. Thanks for sharing.
I need a clarification..

I want SARIMA (1,1,0)(0,1,1)12 in a time series data containing month wise data for 10 years.
Does this model can be translated as the following:

‘The prediction of this month’s value(Jan) will be based on
the previous value of the differences of the Dec and Nov (since p=1,d=1) and
the moving average of the residues of of the difference of previous two Jans. (since D=1,Q=1,m=12).’

I understand SARIMA as a sum of ARIMA and the ARIMA of seasonal component.

Reply
- Jason Brownlee January 27, 2020 at 7:05 am #
  
  Nice work.
  
  Yes, that appears about right.
  
  Reply
Alberto February 12, 2020 at 4:01 am #

Hi Jason. Thank you very much for your articles, are very helpfull and helps me to grow.

I have a question about this paragraph of your article:

“As its name suggests, it supports both an autoregressive and moving average elements. The integrated element refers to differencing allowing the method to support time series data with a trend.

A problem with ARIMA is that it does not support seasonal data. That is a time series with a repeating cycle.”

I think that if you set good parameters to your Autroregresive part and to your Moving Average part these parameters have to adjust the function to the data, right? So, Can you explain memathematically, why ARIMA is not good with seasonal data?

Thank you in advance for your answer

Reply
- Jason Brownlee February 12, 2020 at 5:54 am #
  
  It does not have parameters for the seasonal element. SARIMA does.
  
  Reply
antonio March 3, 2020 at 10:30 pm #

I have minute data:

2020-01-09 12:00:00 90.82098
2020-01-09 12:15:00 90.61686
2020-01-09 12:30:00 86.22828

I want to apply SARIMAX … which is a seasonal_order param?
I tried 96 (because I have 4 data in one hour…so 4×24 is 96) is right?

Reply
- Jason Brownlee March 4, 2020 at 5:56 am #
  
  Perhaps try a suite of configurations and see which results in the lowest prediction error.
  
  Reply
Linna April 26, 2020 at 9:00 pm #

What are the difference among ARIMA, ARIMAX, SARIMA and SARIMAX model?

Reply
- Jason Brownlee April 27, 2020 at 5:34 am #
  
  Good question, see this:
  https://machinelearningmastery.com/time-series-forecasting-methods-in-python-cheat-sheet/
  
  Reply
Gopi June 3, 2020 at 8:34 pm #

Hi Jason,

Thank you for your excellent articles.

I have a weather related time series data set which has more than 4000 records and it is hourly based sample. In that few samples values have out of range i.e. if range is 1 – 10 the value is 15. few samples have much variance between previous and current sample value. this might be a sensor problem or something. i want to detect it based previous and next values of that sample. Could you please suggest any solution for that and how can i impute appropriate value in that place after detect that sample.

Reply
- Jason Brownlee June 4, 2020 at 6:19 am #
  
  You could design a statistical test and use it to detect outliers in real time, e.g. 3-4 standard deviations from the mean. Experimentation is required.
  
  Reply
aflal June 5, 2020 at 6:19 am #

Thank You for the wonderful Article,
How can i implement calendar effects to SARIMA?

I would greatly appreciate it if you kindly give me some links

Reply
- Jason Brownlee June 5, 2020 at 8:25 am #
  
  You’re welcome.
  
  Good question, I hope to write about this topic in the future.
  
  Reply
Ghulam Mohyuddin July 3, 2020 at 2:57 pm #

Can we consider cross-correlation using exogenous variables?
For example, in forecasting the day-ahead and real-time prices, the correlation between these prices can be considered in this model?

Reply
- Jason Brownlee July 4, 2020 at 5:48 am #
  
  Good question, I don’t believe so. SARIMA models are really designed as univariate models.
  
  Reply
Akash Maurya July 30, 2020 at 5:32 pm #

I tried daily sequential data and used m = 365 but I’m not getting all the related parameter to plot the result using

results.plot_diagnostics(figsize=(16, 8))
plt.show()

is that because the data points are not perfectly divisible by the 365?

Reply
- Jason Brownlee July 31, 2020 at 6:15 am #
  
  I don’t know sorry. Perhaps experiment.
  
  Reply
- Malavika September 12, 2020 at 2:39 am #
  
  Have you figured out why a value of m=365 for daily sequential data did not work?
  
  Reply
Salman Ahmad August 20, 2020 at 6:23 am #

Is it make sense to apply the SARIMA model for fast-moving(2880 points daily) dataset which has daily as well as weekly seasonality. where every observation comes after every 30sec? If so then we will have to 2880 period for seasonality or we need to resample it to minutely or hourly ?

Reply
- Jason Brownlee August 20, 2020 at 6:54 am #
  
  Perhaps develop a prototype for your dataset and see?
  
  Reply
  - Salman Ahmad August 20, 2020 at 8:14 am #
    
    Sorry I could not understand. What do you mean by prototype
    
    Reply
    - Jason Brownlee August 20, 2020 at 1:29 pm #
      
      Sorry, I meant develop small examples on your data that explore the ideas.
      
      Reply
Athah October 18, 2020 at 12:53 am #

Statement – ARIMA Models are used when the data shows no trend.

This statement is true or False.

Please share the reasoning if possible.

Reply
- Jason Brownlee October 18, 2020 at 6:10 am #
  
  False as ARIMA can handle the trend via differencing.
  
  Reply
Pals November 26, 2020 at 9:09 pm #

Hi Sir,
Do we need to make the data stationary to apply it on SARIMA ? My data does not have a up or a down trend but I see a seasonal component from the seasonal decomposition graph ? I have 5 months of data . How do I decide on the seasonal value is m ?

Reply
- Jason Brownlee November 27, 2020 at 6:39 am #
  
  Yes, but SARIMA can do it for you by differencing the trend and seasonality as part of the model.
  
  Reply
Jonny February 10, 2021 at 4:16 am #

Where can I find out more information about the validity of the model that my grid search produced. I have these results, but I’m new to SARIMA. How do I know if this model is any good and what all of these hyroglyphics mean?

RMSE: 171310.522
SARIMAX Results
=========================================================================================
Dep. Variable: Actual Revenue No. Observations: 13
Model: SARIMAX(2, 0, 0)x(1, 0, 0, 3) Log Likelihood -95.470
Date: Tue, 09 Feb 2021 AIC 200.940
Time: 11:10:54 BIC 201.338
Sample: 12-31-2017 HQIC 198.261
– 12-31-2020
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
——————————————————————————
intercept 2.718e+05 2.73e+05 0.994 0.320 -2.64e+05 8.08e+05
ar.L1 -0.2000 0.528 -0.379 0.705 -1.235 0.835
ar.L2 -0.0870 0.427 -0.204 0.838 -0.923 0.749
ar.S.L3 -0.0680 0.695 -0.098 0.922 -1.429 1.293
sigma2 1.622e+09 11.785 1.38e+08 0.000 1.62e+09 1.62e+09
===================================================================================
Ljung-Box (Q): 7.39 Jarque-Bera (JB): 0.88
Prob(Q): 0.39 Prob(JB): 0.64
Heteroskedasticity (H): 1.21 Skew: 0.47
Prob(H) (two-sided): 0.88 Kurtosis: 1.68
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 2.63e+24. Standard errors may be unstable.

Reply
- Jason Brownlee February 10, 2021 at 8:11 am #
  
  You can evaluate it on a dataset using walk forward validation and compare results to a naive model or other configurations to see if it have predictive skill.
  
  Reply
samia ahmad March 2, 2021 at 4:32 am #

please let me know the merits and demerits of SARIMA Models

Reply
- Jason Brownlee March 2, 2021 at 5:47 am #
  
  It can handle trends and seasonality and is a simple linear model.
  
  Only handles a single variable.
  
  Reply
Aakash April 8, 2021 at 4:48 pm #

My data is hourly data and the seasonality is for one year. so, m = 365 would be a high value to compute. what’s the best possible solution for it?

Reply
- Aakash April 9, 2021 at 5:15 am #
  
  and what does ‘m’ indicates in terms of its unit(example days or hours or minutes). Can you explain a bit more about it.
  
  Reply
  - Jason Brownlee April 9, 2021 at 5:28 am #
    
    m is in the units of your dataset I believe.
    
    Reply
- Jason Brownlee April 9, 2021 at 5:18 am #
  
  Perhaps evaluate a suite of configurations, data preparations and model types and discover what works best for your dataset.
  
  Reply
Kaoutar May 7, 2021 at 10:23 am #

Please, what to do in case we have only seasonality and no trend , in my case the data have weakly seasonality (1period=7days), when i deseasonalize using df.diff(7) the resulted data is stationary, so what model should i use , i mean what are the parameters in the following line
SARIMA(p,d,q)(P,D,Q,7)

Reply
- Jason Brownlee May 8, 2021 at 6:30 am #
  
  Set d to 0 to indicate no trend.
  
  Reply
Chris Wai May 28, 2021 at 4:46 pm #

Hi, can I know is SARIMA consists of residuals? if yes, how to determine its residuals for a non-linear model uses?

Reply
- Jason Brownlee May 29, 2021 at 6:46 am #
  
  It makes use of them in the MA part of the model I believe.
  
  Reply
Chris Wai May 29, 2021 at 6:29 pm #

I see. I think I understood, I will try on it. Thank you so much Dr. Jason

Reply
- Jason Brownlee May 30, 2021 at 5:48 am #
  
  You’re welcome.
  
  Reply
Rupesh S June 9, 2021 at 9:37 pm #

Hi jason,

i have 5 years data which are in daily format and i tried to build SARIMA model using seasonal period as m=365 and it is taking very long time to execute. is there any solution for this?

and i did resampling to converted my daily data to weekly and i used same approach with seasonal period as m=52 for this it is working much faster.

and how do we know whether my data is having daily, weekly ,monthly or yearly seasonality?

Reply
- Jason Brownlee June 10, 2021 at 5:25 am #
  
  Perhaps use less data?
  Perhaps run on a faster machine?
  Perhaps use a different model?
  
  Reply
Sofiane June 23, 2021 at 11:54 am #

Thank you for the article. Can pmdarima.auto_arima of python (Which automatically finds the optimal orders) return sarimax or arimax models, since this function contains P, D, Q and m for the seasonal parameters, and the X parameter for exogenous variables?

Reply
- Jason Brownlee June 24, 2021 at 5:57 am #
  
  Sorry, I’m not familiar with it.
  
  Reply
Vicky Vouk July 23, 2021 at 3:10 am #

Hi Jason, amazing tutorial. Do I have to make external variables stationary likewise the endogenous variable? Thank you

Reply
- Jason Brownlee July 23, 2021 at 6:01 am #
  
  It is probably a good idea. Try with and without and compare results.
  
  Reply
Vasiliki Voukelatou July 24, 2021 at 8:59 am #

Awesome. Thanks 🙂

Reply
- Jason Brownlee July 25, 2021 at 5:11 am #
  
  You’re welcome.
  
  Reply
Tomislav Primorac July 30, 2021 at 1:43 am #

Hi! I’ve been reading your posts many times now and am really grateful that you share your knowledge.

I have a question I have problem finding an answer to. How do I make static out of sample forecasts?
So, of course, the data is divided to test and train samples. Model is fitted to test data. Now, it’s easy to make dynamic predictions. But if I want to make static predictions such that each prediction is made one point at a time and with every new prediction, previous point is taken into consideration, I fail to do this on data that was not used for training, it just does the same as in dynamic prediction.

How can I specify that the test data is the one predictions need to be updated against? Is it something to do with exog? Or can this only be done via some for loop?

Hopefully it’s clear what my problem is.

Reply
- Jason Brownlee July 30, 2021 at 6:30 am #
  
  Fit on all data and call model.forecast() See this:
  https://machinelearningmastery.com/make-sample-forecasts-arima-python/
  
  Reply
  - Tomislav Primorac July 31, 2021 at 2:08 am #
    
    Hi again! First of all, I am thankful for the fact that you are even replying to this questions.
    
    However, I am a bit confused about your answer, the part where you say “fit on all data”. Wouldn’t that completely destroy the point of test data, since it is the part of the data the model is not supposed to be fitted to?
    
    My goal is to make forecasts to data without the model being retrained in each new step. I will check into your other posts, hope you won’t mind if I ask some more questions.
    
    Reply
    - Jason Brownlee July 31, 2021 at 5:38 am #
      
      Sorry, I thought you were asking how to make a prediction on out of sample data – when there is no test data.
      
      If you have a test set, you must use walk-forward validation:
      https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
      
      For a finalized model, you can choose to re-fit as new data becomes available or not. Test to see what works best for your problem. Refitting is probably advised.
      
      Reply
Tomislav Primorac July 30, 2021 at 1:44 am #

*model is fitted to TRAIN data (lapsus)

Reply
- Tomislav Primorac August 2, 2021 at 11:24 pm #
  
  Yes, it makes sense. I’ll be checking the linked post. Your posts are very useful. Thank you!
  
  Regarding refitting, there’s something that’s confusing me. The process of founding the final model is the process of finding the right coefficients. Once we find it, if we re-fit it all the time, we will be getting new coefficients all the time. So what was the point of finding the final model, if it is going to be changed each step?
  
  Anyways, I will be considering different periods for refitting, not each time a new point in time arrives, but for every n points.
  
  Reply
  - Jason Brownlee August 3, 2021 at 4:52 am #
    
    Maybe it needs to change or maybe not, it depends on your data and the model.
    
    Test and discover whatever works best for your specific case.
    
    Reply
Sofiane HAMAM August 11, 2021 at 12:33 am #

Thank you for the article, what is the difference between arima with the seasonal parameters (statsmodels.tsa.arima.model.ARIMA¶) and SARIMA (statsmodels.tsa.statespace.sarimax.SARIMA) ?

Reply
- Adrian Tam August 11, 2021 at 6:50 am #
  
  If you look at the source code, you will find that they are actually the same.
  
  Reply
Mafalda November 14, 2021 at 5:36 am #

Thank you for such a clear article.
I am new to data science and I am struggling with a SARIMA problem with data for 41 consecutive months. I’ve done log transform, differencing, made the series stationary, applied seasonal differencing (6 months) and split it into train/ test (80%/20%) and used auto arima with seasonal factor.
However, I do not get an in-sample forecast. When I plot it, it is blank for forecasting. Could you think of where I’ve gone wrong? This is the code I used to create the forecast:
fc= model_fit.get_forecast(steps=8, return_conf_int=True) # 95% conf #8 is length of test set
conf = fc.conf_int()

I convert it to dataframe for plotting purposes
fc_series = pd.Series(fc.predicted_mean, index=test.index)
lower_series = pd.Series(conf.loc[:, ‘lower MonthlyTotals’], index=test.index)
upper_series = pd.Series(conf.loc[:, ‘upper MonthlyTotals’], index=test.index)

Reply
- Adrian Tam November 14, 2021 at 3:02 pm #
  
  The code seems right to me. If you have it blank, probably somewhere else causing it.
  
  Reply
K.Bhanu prakash December 3, 2021 at 11:51 pm #

Is it possible to make the project of time series forecasting, if he is a beginner of the python?

Reply
- Adrian Tam December 8, 2021 at 7:00 am #
  
  Why not? The statsmodels package is easy to use.
  
  Reply
- James Carmichael December 22, 2021 at 9:26 am #
  
  Hi K.Bhanu…Please provide more detail on the type of project you would be interested in.
  
  Regards,
  
  Reply
Hendy May 19, 2022 at 2:33 am #

Thanks for your article
I used daily dar for ARIMA and I have to use the same data for SARIMA, so the S will be equal 365 or not ?

Reply
- James Carmichael May 19, 2022 at 6:37 am #
  
  Hi Hendy…There is no “S” parameter. The following may help clarify.
  
  https://neptune.ai/blog/arima-sarima-real-world-time-series-forecasting-guide
  
  Reply
  - Hendy May 24, 2022 at 4:00 pm #
    
    Thanks for your time,
    So if I want to use SARIMA I have to use monthly data and put but s = 12
    
    Reply
Gloksinya August 27, 2022 at 6:50 pm #

Do we need to take seasonal differences before plotting acf and pacf to determine P and Q?

Reply
- James Carmichael August 28, 2022 at 8:00 am #
  
  Hi Gloskinya…You may find the following of interest:
  
  https://machinelearningmastery.com/gentle-introduction-autocorrelation-partial-autocorrelation/
  
  Reply
Timon September 29, 2022 at 12:58 am #

Hi Jason. In your very last line of code, shouldn’t that be
yhat = model_fit.predict(start=0, end=len(data))
instead of
yhat = model_fit.predict(start=len(data), end=len(data))

if not, why making start&end the same value?

Reply
- James Carmichael September 29, 2022 at 6:29 am #
  
  Thanks for the question and feedback!
  
  Reply
Timon September 29, 2022 at 2:07 am #

Actually, i just found the answer myself in your ARIMA-Tutorial:

If we used 100 observations in the training dataset to fit the model, then the index of the next time step for making a prediction would be specified to the prediction function as start=101, end=101. This would return an array with one element containing the prediction.

https://machinelearningmastery.com/arima-for-time-series-forecasting-with-python/

Reply
- James Carmichael September 29, 2022 at 6:28 am #
  
  Thank you for the feedback Timon!
  
  Reply
Jaime June 30, 2023 at 10:23 am #

Hi Dr Brownlee,

Your content here is totally saving my butt regarding my college final project X). I have used XGBoost, LSTM and Prophet to make a time series forecasting for comparison. Now I’m trying to do it with SARIMA, I got a pretty good result and these questions may sound dumb but I’m still a complete beginner:

1) The “m” parameter stands for “The number of time steps for a single seasonal period.”, if my time series has daily frequency, and the seasonal period is a year, would m be 12 anyway or 365? Since the data is sampled daily I’m having doubts wether m=12 is only for a monthly frequency time series data.

2) If I run the test and my time series is already stationary, that means that the “d” parameter should be set to zero right?

3) When running the model and fitting it to the train data, then predicting the test data, the predictions seemed to have an offset of 1 day, as in, I took the predictions array and moved each value 1 place to the left and then it fitted the testing real data perfectly, I’m not shure wether I’m doing something wrong or what is it really.

I’m even more confused since I set m to 12 and d to 1 and I still got a good result, even though the time series is stationary

Reply
- James Carmichael July 1, 2023 at 12:17 pm #
  
  Hi Jaime…While SARIMA is a powerful forecasting tool, it is quite limited in terms of using multiple variables to make predictions. I would recommend the following resource as it is has many full solutions to time series problems that will address all of your questions.
  
  https://machinelearningmastery.com/deep-learning-for-time-series-forecasting/
  
  Reply
nia August 3, 2023 at 12:32 pm #

Hi Jason,

Thank you for the explanation. Do you have example forecasting using hybrid SARIMA-ANN method?

Reply

Navigation

A Gentle Introduction to SARIMA for Time Series Forecasting in Python

Tutorial Overview

What’s Wrong with ARIMA

What is SARIMA?

How to Configure SARIMA

Trend Elements

Seasonal Elements

How to use SARIMA in Python

1. Define Model

2. Fit Model

3. Make Prediction

Further Reading

Posts

Books

API

Articles

Summary

Want to Develop Time Series Forecasts with Python?

Develop Your Own Forecasts in Minutes

Finally Bring Time Series Forecasting to
Your Own Projects

More On This Topic

137 Responses to A Gentle Introduction to SARIMA for Time Series Forecasting in Python

Leave a Reply Click here to cancel reply.

Navigation

Tutorial Overview

What’s Wrong with ARIMA

What is SARIMA?

How to Configure SARIMA

Trend Elements

Seasonal Elements

How to use SARIMA in Python

1. Define Model

2. Fit Model

3. Make Prediction

Further Reading

Posts

Books

API

Articles

Summary

Want to Develop Time Series Forecasts with Python?

Develop Your Own Forecasts in Minutes

Finally Bring Time Series Forecasting to Your Own Projects

More On This Topic

137 Responses to A Gentle Introduction to SARIMA for Time Series Forecasting in Python

Leave a Reply Click here to cancel reply.

Finally Bring Time Series Forecasting to
Your Own Projects