A time series forecast process is a set of steps or a recipe that leads you from defining your problem through to the outcome of having a time series forecast model or set of predictions.
In this post, you will discover time series forecast processes that you can use to guide you through your forecast project.
After reading this post, you will know:
- The 5-Step forecasting task by Hyndman and Athanasopoulos to guide you from problem definition to using and evaluating your forecast model.
- The iterative forecast development process by Shmueli and Lichtendahl to guide you from defining your goal to implementing forecasts.
- Suggestions and tips for working through your own time series forecasting project.
Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.
5-Step Forecasting Task
The 5 basic steps in a forecasting task are summarized by Hyndman and Athanasopoulos in their book Forecasting: principles and practice. These steps are:
- Problem Definition. The careful consideration of who requires the forecast and how the forecast will be used. This is described as the most difficult part of the process, most likely because it is entirely problem specific and subjective.
- Gathering Information. The collection of historical data to analyze and model. This also includes getting access to domain experts and gathering information that can help to best interpret the historical information, and ultimately the forecasts that will be made.
- Preliminary Exploratory Analysis. The use of simple tools, like graphing and summary statistics, to better understand the data. Review plots and summarize and note obvious temporal structures, like trends seasonality, anomalies like missing data, corruption, and outliers, and any other structures that may impact forecasting.
- Choosing and Fitting Models. Evaluate two, three, or a suite of models of varying types on the problem. Models may be chosen for evaluation based on the assumptions they make and whether the dataset conforms. Models are configured and fit to the historical data.
- Using and Evaluating a Forecasting Model. The model is used to make forecasts and the performance of those forecasts is evaluated and skill of the models estimated. This may involve back-testing with historical data or waiting for new observations to become available for comparison.
This 5-step process provides a strong overview from starting off with an idea or problem statement and leading through to a model that can be used to make predictions.
The focus of the process is on understanding the problem and fitting a good model.
Each model is itself an artificial construct that is based on a set of assumptions (explicit and implicit) and usually involves one or more parameters which must be “fitted” using the known historical data.
— Page 22, Forecasting: principles and practice
Stop learning Time Series Forecasting the slow way!
Take my free 7-day email course and discover how to get started (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
Iterative Forecast Development Process
The authors Shmueli and Lichtendahl in their book Practical Time Series Forecasting with R: A Hands-On Guide suggest an 8-step process.
This process extends beyond the development of a model and making forecasts and involves iterative loops.
Their process can be summarized as follows:
- Define Goal.
- Get Data.
- Explore and Visualize Series.
- Pre-Process Data.
- Partition Series.
- Apply Forecasting Method/s.
- Evaluate and Compare Performance.
- Implement Forecasts/Systems.
Below are the iterative loops within the process:
- Explore and Visualize Series => Get Data. Data exploration can lead to questions that require access to new data.
- Evaluate and Compare Performance => Apply Forecasting Method/s. The evaluation of models may raise questions or ideas for new methods or new method configurations to try.
The process is more focused on the ongoing development and refinement of one or more models on the problem until an acceptable level of performance is achieved.
This process can continue where models are revised and updated as new data and new insights are made available.
Of course, the process does not end once forecasts are generated, because forecasting is typically an ongoing goal. Hence, forecast accuracy is monitored and sometimes forecasting method is adapted or changed to accommodate changes in the goal or the data over time
— Page 16, Practical Time Series Forecasting with R: A Hands-On Guide
Suggestions and Tips
This section lists 10 suggestions and tips to consider when working through your time series forecasting project.
The thrust of these suggestions is centered on the premise that you cannot know what will work, let alone which methods will work well on your problem beforehand. And that the best source of knowledge on a forecasting project comes from the results of trial and error with real historical data.
- Select or devise a time series forecast process that is tailored to your project, tools, team, and level of expertise.
- Write down all assumptions and questions you have during analysis and forecasting work, then revisit them later and seek to answer them with small experiments on historical data.
- Review a large number of plots of your data at different time scales, zooms, and transforms of observations in an effort to help make exploitable structures present in the data obvious to you.
- Develop a robust test harness for evaluating models using a meaningful performance measure and a reliable test strategy, such as walk-forward validation (rolling forecast).
- Start with simple naive forecast models to provide a baseline of performance for more sophisticated methods to improve upon.
- Create a large number of perspectives or views on your time series data, including a suite of automated transforms, and evaluate each with one or a suite of models in order to help automatically discover non-intuitive representations and model combinations that result in good predictions for your problem.
- Try a suite of models of differing types on your problem, from simple to more advanced approaches.
- Try a suite of configurations for a given problem, including configurations that have worked well on other problems.
- Try automated hyperparameter optimization methods for models to flush out a suite of well-performing models as well as non-intuitive model configurations that you would not have tried manually.
- Devise automated tests of performance and skill for ongoing predictions to help to automatically determine if and when a model has become stale and requires review or retraining.
Further Reading
This section lists some resources that you can use to learn more about the time series forecasting process.
- Section 1.3 The Forecasting Process, Practical Time Series Forecasting with R: A Hands-On Guide.
- Section 1.6 The basic steps in a forecasting task, Forecasting: principles and practice
Do you know any good resources that talk about the time series forecast process?
Share them in the comments below.
Summary
In this post, you discovered processes that you can use to work through time series forecasting problems.
Specifically, you learned:
- The 5 steps of working through a time series forecast task by Hyndman and Athanasopoulos.
- The 8 step iterative process of defining a goal and implementing a forecast system by Shmueli and Lichtendahl.
- The 10 suggestions and practical tips to consider when working through your time series forecasting project.
Do you have any questions about time series forecasting process, or about this post?
Ask your questions in the comments below.
I can recommend this GitHub page that talks about the time series process:
https://github.com/rouseguy/TimeSeriesAnalysiswithPython
Thanks John for useful tips on Forecasting. I have a question. please help – i have past years enrollment data of a school. what is the best algo to forecast/predict the future enrollment?
Try a suite of algorithms and discover what works best for your specific dataset.
Hey Jason,
I am confused about the order of data pre-processing and train test split. Could you please explain their before and after reasons and can you do the split before the scaling. Thanks
I recommend:
1. power transform
2. difference
3. standardize
4. normalize
In that order, where relevant.
Hii sir , I am working on time serie Dataset , but the data column have only time value in the form of hh:mm::ss instead of date values like yy::mm::dd can u please help how can I perform the , prediction based on time [hh:mm:ss] say 23:32:12.. please help me
No difference really, the model is more concerned with sequential observations than the specific increment of time.
Now, in the current project, i am going to apply all these leanings to a real-life data set. i will work through a time series forecasting project from end-to-end, from collecting or importing the data set, analyzing and transforming the time series to training the model, and making predictions on new data.
how i do these using R
These books will help you with time series forecasting in R:
https://machinelearningmastery.com/books-on-time-series-forecasting-with-r/
Hi Jason,
Many thanks for helping ML community to improve their skillset. I’m currently working on forecasting empty containers volume, a time series problem and need your help to find a solution. I’ve setup multiple models including ARIMA, ARMA, LASSO, RIDGE, MLP and CNN – using these models, I’m able to forecast the empty container given a port data (such as, transport type (import/export), number of empty containers in last 3 months, delivery ports/depots with longitudes and latitudes. I can see few ports facing this empty containers volume problem due to high export – containers come back empty from the delivery port.
Now, I’m interested to use the predicted demand for empty containers at each port, to optimise the distribution of empty containers to the closest depots serving said ports where the objective is to minimise the number of container movements, over a given time horizon: 1) Import: truck carries one full container -> unloads the container at customer -> truck carries the empty container away -> deposit the empty container at the nearest port / depot. 2) Export: truck collect an empty container from the nearest port/depot -> truck carries the empty container to the customer -> loads the container at customer -> truck carries the loaded container to the port.
You’re welcome!
That sounds like an interesting project! Thanks for sharing your progress.
It’s hard to give specific advice as I don’t understand the details of your problem. Perhaps you can try prototyping a few different framings of the problem and see what works/makes sense?
Many thanks for your prompt response Jason. I’m actually stuck at thinking of combining the ports empty containers forecasting for optimization. The raw data contains number of containers, their statuses (empty, awaiting authorization…etc) with individual port information such as, (lon,lat) distance, import/export and the delivery ports. Any link or recommendation of a problem similar to mine can be really helpful. 🙂
Perhaps this will give you some ideas:
https://machinelearningmastery.com/faq/single-faq/how-to-develop-forecast-models-for-multiple-sites
Thank you a lot for this helpful article.
In my dataset, I have individuals, each of whom has their own time series of the same measurements.
How is longitudinal forecasting usually done?
I know a bit about longitudinal (panel) data analyses, but I have not yet forecasted from longitudinal data.
Hi RT…You are very welcome! You may wish to investigate multivariate forecasting with deep learning methods:
https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/