On the Suitability of Long Short-Term Memory Networks for Time Series Forecasting

Long Short-Term Memory (LSTM) is a type of recurrent neural network that can learn the order dependence between items in a sequence.

LSTMs have the promise of being able to learn the context required to make predictions in time series forecasting problems, rather than having this context pre-specified and fixed.

Given the promise, there is some doubt as to whether LSTMs are appropriate for time series forecasting.

In this post, we will look at the application of LSTMs to time series forecasting by some of the leading developers of the technique.

Let’s get started.

On the Suitability of Long Short-Term Memory Networks for Time Series Forecasting

On the Suitability of Long Short-Term Memory Networks for Time Series Forecasting
Photo by becosky, some rights reserved.

LSTM for Time Series Forecasting

We will take a closer look at a paper that seeks to explore the suitability of LSTMs for time series forecasting.

The paper is titled “Applying LSTM to Time Series Predictable through Time-Window Approaches” (get the PDF, Gers, Eck and Schmidhuber, published in 2001.

They start off by commenting that univariate time series forecasting problems are actually simpler than the types of problems traditionally used to demonstrate the capabilities of LSTMs.

Time series benchmark problems found in the literature … are often conceptually simpler than many tasks already solved by LSTM. They often do not require RNNs at all, because all relevant information about the next event is conveyed by a few recent events contained within a small time window.

The paper focuses on the application of LSTMs to two complex time series forecasting problems and contrasting the results of LSTMs to other types of neural networks.

The focus of the study are two classical time series problems:

Mackey-Glass Series

This is a contrived time series calculated from a differential equation.

Plot of the Mackey-Glass Series, Taken from Schoarpedia

Plot of the Mackey-Glass Series, Taken from Schoarpedia

For more information, see:

Chaotic Laser Data (Set A)

This is a series taken from a from a contest at the Santa Fe Institute.

Set A is defined as:

A clean physics laboratory experiment. 1,000 points of the fluctuations in a far-infrared laser, approximately described by three coupled nonlinear ordinary differential equations.

Example of Chaotic Laser Data (Set A), Taken from The Future of Time Series

Example of Chaotic Laser Data (Set A), Taken from The Future of Time Series

For more information, see:

Need help with LSTMs for Sequence Prediction?

Take my free 7-day email course and discover 6 different LSTM architectures (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

Autoregression

An autoregression (AR) approach was used to model these problems.

This means that the next time step was taken as a function of some number of past (or lag) observations.

This is a common approach for classical statistical time series forecasting.

The LSTM is exposed to one input at a time with no fixed set of lag variables, as the windowed-multilayer Perceptron (MLP).

For more information on AR for time series, see the post:

Analysis of Results

Some of the more salient comments were in response to the poor results of the LSTMs on the Mackey-Glass Series problem.

First, they comment that increasing the learning capacity of the network did not help:

Increasing the number of memory blocks did not significantly improve the results.

This may have required a further increase in the number of training epochs. It is also possible that a stack of LSTMs may have improved results.

They comment that in order to do well on the Mackey-Glass Series, the LSTM is required to remember recent past observations, whereas the MLP is given this data explicitly.

The results for the AR-LSTM approach are clearly worse than the results for the time window approaches, for example with MLPs. The AR-LSTM network does not have access to the past as part of its input … [for the LSTM to do well] required remembering one or two events from the past, then using this information before over-writing the same memory cells.

They comment that in general, this poses more of a challenge for LSTMs and RNNs than it does for MLPs.

Assuming that any dynamic model needs all inputs from t-tau …, we note that the AR-RNN has to store all inputs from t-tau to t and overwrite them at the adequate time. This requires the implementation of a circular buffer, a structure quite difficult for an RNN to simulate.

Again, I can’t help but think that a much larger hidden layer (more memory units) and a much deeper network (stacked LSTMs) would be better suited to learn multiple past observations.

They later conclude the paper and discuss that based on the results, LSTMs may not be suited to AR type formulations of time series forecasting, at least when the lagged observations are close to the time being forecasted.

This is a fair conclusion given the LSTMs performance compared to MLPs on the tested univariate problems.

A time window based MLP outperformed the LSTM pure-AR approach on certain time series prediction benchmarks solvable by looking at a few recent inputs only. Thus LSTM’s special strength, namely, to learn to remember single events for very long, unknown time periods, was not necessary here.

LSTM learned to tune into the fundamental oscillation of each series but was unable to accurately follow the signal.

They do highlight the LSTMs ability to learn oscillation behavior (e.g. cycles or seasonality).

Our results suggest to use LSTM only on tasks where traditional time window-based approaches must fail.

LSTM’s ability to track slow oscillations in the chaotic signal may be applicable to cognitive domains such as rhythm detection in speech and music.

This is interesting, but perhaps not as useful, as such patterns are often explicitly removed wherever possible prior to forecasting. Nevertheless, it may highlight the possibility of LSTMs learning to forecast in the context of a non-stationary series.

Final Word

So, what does all of this mean?

Taken at face value, we may conclude that LSTMs are unsuitable for AR-based univariate time series forecasting. That we should turn first to MLPs with a fixed window and only to LSTMs if MLPs cannot achieve a good result.

This sounds fair.

I would argue a few points that should be considered before we write-off LSTMs for time series forecasting:

  • Consider more sophisticated data preparation, such as at least scaling and stationarity. If a cycle or trend is obvious, then it should be removed so that the model can focus on the underlying signal. That being said, the capability of LSTMs to perform well on non-stationary data as well or better than other methods is intriguing, but I would expect would be commensurate with an increase in required network capacity and training.
  • Consider the use of both larger models and hierarchical models (stacked LSTMs) to automatically learn (or “remember”) a larger temporal dependence. Larger models can learn more.
  • Consider fitting the model for much longer, e.g. thousands or hundreds of thousands of epochs, whilst making use of regularization techniques. LSTMs take a long time to learn complex dependencies.

I won’t point out that we can move beyond AR based models; it’s obvious, and AR models are a nice clean proving ground for LSTMs to consider and take on classical statistical methods like ARIMA and well performing neural nets like window MLPs.

I believe there’s great promise and opportunity for LSTMs applied to the problem of time series forecasting.

Do you agree?
Let me know in the comments below.

Develop LSTMs for Sequence Prediction Today!

Long Short-Term Memory Networks with Python

Develop Your Own LSTM models in Minutes

…with just a few lines of python code

Discover how in my new Ebook:
Long Short-Term Memory Networks with Python

It provides self-study tutorials on topics like:
CNN LSTMs, Encoder-Decoder LSTMs, generative models, data preparation, making predictions and much more…

Finally Bring LSTM Recurrent Neural Networks to
Your Sequence Predictions Projects

Skip the Academics. Just Results.

Click to learn more.


33 Responses to On the Suitability of Long Short-Term Memory Networks for Time Series Forecasting

  1. Colby May 26, 2017 at 2:53 pm #

    Jason,

    Do you have any recommendations/posts to example implementations of LSTM for time series prediction?

  2. Phillip May 28, 2017 at 8:53 am #

    Thank you for this informative article. While I have not trained any LSTM RNNs on my time series data, I’ve found that plain jane feed forward DNNs that are feature engineered correctly and use sliding window CV predicts OOS as expected. Essentially, it is not necessary for my data to have to mess with LSTMs at all. And my features aren’t all iid…

    I suppose this illustrates the versatility of DNNs.

    • Jason Brownlee June 2, 2017 at 12:08 pm #

      I generally agree Phillip.

      MLPs with a window do very well on small time series forecasting problems.

  3. Gerrit Govaerts May 29, 2017 at 5:09 pm #

    I agree that along with MLP’s , LSTM’s have an advantage over more classical statistical approaches like ARIMA : they can fit non-linear functions and moreover , you do not need to specify the type of non-linearity . There also lies the danger : regularisation is absolutely crucial to avoid overfitting .
    I still hold on to my position that a lot of very interesting time series will never be cracked by any approach that only looks at past data points to forecast the next : prices are set where demand meets supply and both supply and demand care nothing about yesterday’s price .

  4. Jay May 31, 2017 at 5:26 am #

    I found that for the problem I’m working on (forecasting streamflow based on hydrometric station water level measurements), to my surprise, first differencing of the signal prior to LSTM worsened the results. Overall, I find that I can’t improve over a persistence model with LSTM or an MLP using a moving window. Not sure what to try next… (same problem with ARIMA)

  5. Atanas June 1, 2017 at 7:46 am #

    Great discussion,

    LSTMs are much newer/younger to solve /simpler/ AR problems – most industries have already been studying and improving custom models applied on such tasks for decades. I believe RNNs shine for multivariate time series forecasts, that have been simply too difficult to model until now. Similarly with MLPs and multivariate time series models where the different features can have different forecasting powers(ie the sentiment of twitter posts can have influence for a day or two, while ny times articles could have a much longer lasting influence).

  6. Anthony June 7, 2017 at 1:04 am #

    Why use data hungry algorithm like lstm when similar results can be obtained using a machine learning or time series methods?

  7. neha March 4, 2018 at 12:46 am #

    how to work on many to one time series using LSTM?Any link with example code

  8. neha March 4, 2018 at 9:24 pm #

    how to work on multiple seasonality data with lstm?

    • Jason Brownlee March 5, 2018 at 6:23 am #

      Perhaps try seasonal adjusting the data prior to modeling.

      • neha March 6, 2018 at 12:45 am #

        so can i use many to one here?or say differing time steps?

      • neha March 6, 2018 at 5:19 am #

        what do you mean by seasonal adjustment?if i have monthly and annually seasonality?i have taken monthly size for taking care of monthly effect.For annual part if i take it then how to merge both approaches and get a single output?

  9. Cloudy March 18, 2018 at 11:24 pm #

    Hi Ph.D Jason,
    I have multivariate time series with unevenly spaced observations. Can I use LSTM for the time series data like that. Need I transform my data into equally spaced time series data?
    Thank you

    • Jason Brownlee March 19, 2018 at 6:07 am #

      Perhaps try as-is and resampled and compare model performance.

  10. Chris J March 24, 2018 at 10:42 pm #

    Here is a paper on successful use of lstm models for time series prediction:

    “We deploy LSTM networks for predicting out-of-sample directional movements for the
    constituent stocks of the S&P 500 from 1992 until 2015. With daily returns of 0.46 percent and
    a Sharpe Ratio of 5.8 prior to transaction costs, we find LSTM networks to outperform memory
    free classification methods, i.e., a random forest (RAF), a deep neural net (DNN), and a logistic
    regression classifier (LOG).”

    https://www.econstor.eu/bitstream/10419/157808/1/886576210.pdf

  11. Anesh April 13, 2018 at 9:45 pm #

    I think to comsider a problem as time series the data should have same interval of time.Iam not sure about it

    • Jason Brownlee April 14, 2018 at 6:44 am #

      Time series must have observations over time, a time dependence to the obs.

  12. Darshan April 15, 2018 at 9:56 am #

    The paper referenced in this article is from 2001, could the arguments be dated?

    • Jason Brownlee April 16, 2018 at 6:02 am #

      Perhaps.

      I have not seen any good results for a straight LSTM on time series and many personal experiments have confirmed these findings.

      MLPs, even linear models will kill an LSTM on time series forecasting.

Leave a Reply