The Promise of Recurrent Neural Networks for Time Series Forecasting

By Jason Brownlee on August 5, 2019 in Deep Learning for Time Series 74

Recurrent neural networks are a type of neural network that add the explicit handling of order in input observations.

This capability suggests that the promise of recurrent neural networks is to learn the temporal context of input sequences in order to make better predictions. That is, that the suite of lagged observations required to make a prediction no longer must be diagnosed and specified as in traditional time series forecasting, or even forecasting with classical neural networks. Instead, the temporal dependence can be learned, and perhaps changes to this dependence can also be learned.

In this post, you will discover the promised capability of recurrent neural networks for time series forecasting. After reading this post, you will know:

The focus and implicit, if not explicit, limitations on traditional time series forecasting methods.
The capabilities provided in using traditional feed-forward neural networks for time series forecasting.
The additional promise that recurrent neural networks make on top of traditional neural nets and hints of what this may mean in practice.

Kick-start your project with my new book Deep Learning for Time Series Forecasting, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

The Promise of Recurrent Neural Networks for Time Series Forecasting
Photo by Andy Hutchinson, some rights reserved.

Time Series Forecasting

Time series forecasting is difficult.

Unlike the simpler problems of classification and regression, time series problems add the complexity of order or temporal dependence between observations.

This can be difficult as the specialized handling of the data is required when fitting and evaluating models. It also aids in modeling, providing additional structure like trends and seasonality that can be leveraged to improve model skill.

Traditionally, time series forecasting has been dominated by linear methods like ARIMA because they are well understood and effective on many problems. But these traditional methods also suffer from some limitations, such as:

Focus on complete data: missing or corrupt data is generally unsupported.
Focus on linear relationships: assuming a linear relationship excludes more complex joint distributions.
Focus on fixed temporal dependence: the relationship between observations at different times, and in turn the number of lag observations provided as input, must be diagnosed and specified.
Focus on univariate data: many real-world problems have multiple input variables.
Focus on one-step forecasts: many real-world problems require forecasts with a long time horizon.

Existing techniques often depended on hand-crafted features that were expensive to create and required expert knowledge of the field.

— John Gamboa, Deep Learning for Time-Series Analysis, 2017

Note that some specialized techniques have been developed to address some of these limitations.

Need help with Deep Learning for Time Series?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Neural Networks for Time Series

Neural networks approximate a mapping function from input variables to output variables.

This general capability is valuable for time series for a number of reasons.

Robust to Noise. Neural networks are robust to noise in input data and in the mapping function and can even support learning and prediction in the presence of missing values.
Nonlinear. Neural networks do not make strong assumptions about the mapping function and readily learn linear and nonlinear relationships.

… one important contribution of neural networks – namely their elegant ability to approximate arbitrary non-linear functions. This property is of high value in time series processing and promises more powerful applications, especially in the subfeld of forecasting …

— Georg Dorffner, Neural Networks for Time Series Processing, 1996.

More specifically, neural networks can be configured to support an arbitrary defined but fixed number of inputs and outputs in the mapping function. This means that:

Multivariate Inputs. An arbitrary number of input features can be specified, providing direct support for multivariate forecasting.
Multi-Step Forecasts. An arbitrary number of output values can be specified, providing direct support for multi-step and even multivariate forecasting.

For these capabilities alone, feed-forward neural networks are widely used for time series forecasting.

Implicit in the usage of neural networks is the requirement that there is indeed a meaningful mapping from inputs to outputs to learn. Modeling a mapping of a random walk will perform no better than a persistence model (e.g. using the last seen observation as the forecast).

This expectation of a learnable mapping function also makes one of the limitations clear: the mapping function is fixed or static.

Fixed inputs. The number of lag input variables is fixed, in the same way as traditional time series forecasting methods.
Fixed outputs. The number of output variables is also fixed; although a more subtle issue, it means that for each input pattern, one output must be produced.

Sequences pose a challenge for [deep neural networks] because they require that the dimensionality of the inputs and outputs is known and fixed.

— Ilya Sutskever, Oriol Vinyals, Quoc V. Le, Sequence to Sequence Learning with Neural Networks, 2014

Feed-forward neural networks do offer great capability but still suffer from this key limitation of having to specify the temporal dependence upfront in the design of the model.

This dependence is almost always unknown and must be discovered and teased out from detailed analysis in a fixed form.

Recurrent Neural Networks for Time Series

Recurrent neural networks like the Long Short-Term Memory network add the explicit handling of order between observations when learning a mapping function from inputs to outputs.

The addition of sequence is a new dimension to the function being approximated. Instead of mapping inputs to outputs alone, the network is capable of learning a mapping function for the inputs over time to an output.

This capability unlocks time series for neural networks.

Long Short-Term Memory (LSTM) is able to solve many time series tasks unsolvable by feed-forward networks using fixed size time windows.

— Felix A. Gers, Douglas Eck, Jürgen Schmidhuber, Applying LSTM to Time Series Predictable through Time-Window Approaches, 2001

In addition to the general benefits of using neural networks for time series forecasting, recurrent neural networks can also learn the temporal dependence from the data.

Learned Temporal Dependence. The context of observations over time is learned.

That is, in the simplest case, the network is shown one observation at a time from a sequence and can learn what observations it has seen previously are relevant and how they are relevant to forecasting.

Because of this ability to learn long term correlations in a sequence, LSTM networks obviate the need for a pre-specified time window and are capable of accurately modelling complex multivariate sequences.

— Pankaj Malhotra, et al., Long Short Term Memory Networks for Anomaly Detection in Time Series, 2015

The promise of recurrent neural networks is that the temporal dependence in the input data can be learned. That a fixed set of lagged observations does not need to be specified.

Implicit within this promise is that a temporal dependence that varies with circumstance can also be learned.

But, recurrent neural networks may be capable of more.

It is good practice to manually identify and remove such systematic structures from time series data to make the problem easier to model (e.g. make the series stationary), and this may still be a best practice when using recurrent neural networks. But, the general capability of these networks suggests that this may not be a requirement for a skillful model.

Technically, the available context may allow recurrent neural networks to learn:

Trend. An increasing or decreasing level to a time series and even variation in these changes.
Seasonality. Consistently repeating patterns over time.

What do you think the promise is for LSTMs on time series forecasting problems?

Summary

In this post, you discovered the promise of recurrent neural networks for time series forecasting.

Specifically, you learned:

Traditional time series forecasting methods focus on univariate data with linear relationships and fixed and manually-diagnosed temporal dependence.
Neural networks add the capability to learn possibly noisy and nonlinear relationships with arbitrarily defined but fixed numbers of inputs and outputs supporting multivariate and multi-step forecasting.
Recurrent neural networks add the explicit handling of ordered observations and the promise of learning temporal dependence from context.

Do you disagree with my thoughts on the promise of LSTMs for time series forecasting?
Leave a comment below and join the discussion.

74 Responses to The Promise of Recurrent Neural Networks for Time Series Forecasting

Benson Dube May 22, 2017 at 7:33 am #

Thank you Jason for sharing. I am making a gentle start in Deep Learning. Currently gathering very generic information.

Reply
- Jason Brownlee May 22, 2017 at 7:56 am #
  
  Great, stick with it Benson.
  
  Reply
Sander de Winter May 22, 2017 at 10:23 pm #

Great article! I am currently working on my thesis and this very similar to what I am writing but only a bit better. Thank you for the clear summary of a somewhat complex theory about time series predictions!

Reply
- Jason Brownlee May 23, 2017 at 7:53 am #
  
  Thanks Sander, I’m glad to hear that.
  
  Reply
Samuel May 23, 2017 at 1:02 pm #

Thanks for the helpful article Jason. I used RNN in classification with the same random seed on the same data. But running the same code multiple times gives me different results. So far, most results are the same except for maybe 1 or 2 times. But the resutls are drastically different (76% accruacy vs. 48%). Have you had similar experience? If so, what did you do to mitigate it?

Reply
- Jason Brownlee May 23, 2017 at 1:58 pm #
  
  If you are using the tensorflow backend, you will also need to seed the tensorflow random number generator.
  
  Reply
- Peter Marelas May 24, 2017 at 10:29 pm #
  
  If you are using Keras turn off shuffling in the fit method.
  
  Reply
Gerrit Govaerts May 23, 2017 at 5:13 pm #

With all the recent posts on time series forecasting , I would like to remind you that there are numerous time series that will not bow to even the most sophisticated RNN LSTM approach , no matter what you try . Sometimes the necessary data to make an accurate prediction are not contained in the time series data , but rather in exogenous variables . If you think you can predict the oil price of tomorrow based on historical oil price data , I have a bridge in Brooklyn for sale for you . As always , there is no “free lunch” here , no black magick

Reply
- Jason Brownlee May 24, 2017 at 4:53 am #
  
  Great point Gerrit.
  
  Reply
- Dave May 25, 2017 at 11:10 pm #
  
  On the other hand – if oil prices trade at $50.0 for a few years we can safely assume that they will not start trading at “Microwave Oven” or “Wisp of sentient hydrogen gas at the peripheries of a white dwarf” or “The abstract concept of self identity as characterized by a lonely internet message” – in short one must certainly temper expectations and assume in part that life can be rather unpredictable and yet also understand that often it is not and within certain tolerances can be almost outright boring.
  
  Reply
Atanas Stoyanov May 25, 2017 at 3:52 am #

Hi Jason,

Have you had a chance to evaluate HTM models (https://numenta.org), they seem to fly under the radar. Maybe a future article?

>>Technically, the available context may allow recurrent neural networks to learn ..

Any actual measurements or research you can point to, I notice most of your recent articles you are differencing the series

>>Scaling

Also – any measurements or references?

Thanks again, please keep the great work going – I am an avid customer for your PDF books as well.

Reply
- Jason Brownlee June 2, 2017 at 11:36 am #
  
  Yes, I dived deep into HTM back when they were first “launched”, 2008 or 09 perhaps.
  
  Sorry, not sure I understand your questions. Perhaps you could restate them?
  
  Reply
  - Atanas June 2, 2017 at 1:02 pm #
    
    Thanks Jason, would love to see your input on HTMs
    
    Sorry about my questions , i meant can you point me to research or tests/measurements on the exact influence on differencing(and if its beneficial at all in LSTMs) and feature scaling.
    
    Reply
    - Jason Brownlee June 2, 2017 at 1:03 pm #
      
      No, I have not seen this literature. Time series with LSTMs is a very new area.
      
      Reply
Chris July 6, 2017 at 6:12 pm #

Hi Jason,

I am currently doing my final year masters project using LSTM for battery life cycle prediction, your posts on LSTM for time series have been so helpful.

At the moment I am trying to determine how my model is affected by window size, but I want to approach it more systematically than just running empirical tests. I was just wondering, when you say “the temporal dependence in the input data can be learned,” how would this relate to the window size? Could you expand on this a little for me and maybe direct me to some papers.

Thank you 🙂

Reply
- Jason Brownlee July 9, 2017 at 10:27 am #
  
  Yes, see this post on autocorrelation:
  https://machinelearningmastery.com/gentle-introduction-autocorrelation-partial-autocorrelation/
  
  Reply
Thibault July 18, 2017 at 11:09 pm #

Hi Jason,

I am currently working for my internship on forecasting time series using neural networks, with Keras. Your blog blog has been has been so helpful so far.

But I still have one big question that I have some trouble to deal with; when you say “the promise of recurrent neural networks is that the temporal dependence in the input data can be learned. That a fixed set of lagged observations does not need to be specified”, I feel a contradiction on how we train the model.
For instance, let’s suppose we have a univariate time series x1, x2, …
The goal is to predict at the current time the next two values. In several of your posts using Keras, you shape your inputs/ouputs in this way :

[x1, x2, x3, … , xt] -> [xt+1, xt+2]
[x2, x3, x4, … , xt+1] -> [xt+2, xt+1]
.
.
.

But we are clearly using a time window to predict the next values ? I think I have some troubles to understand the difference between the temporal dependence you are talking about, and the temporal dependence induced by the size of the window …

Thank you

Reply
- Jason Brownlee July 19, 2017 at 8:25 am #
  
  Not really.
  
  We must provide sequence data to the model to learn. The vectorization of the data required by the lib imposes some limit. In fact, BPTT imposes a reasonable limit of 200-400 time steps anyway. These are not a window, but a limit on the number of time steps show to the model before updating weights during training.
  
  It is different to a window of 5 or 10 time steps.
  
  That being said, LSTMs may not be the best choice for autoregression problems:
  https://machinelearningmastery.com/suitability-long-short-term-memory-networks-time-series-forecasting/
  
  Reply
  - Thibault July 25, 2017 at 7:51 pm #
    
    Hi,
    
    Thank you for you answer Jason, it’s starting getting a little clearer !
    
    Concerning your last remark; LSTM’s would not be the best choice for autoregression problems, but many of your blog posts are related to time series forecasting with LSTMs. Is it a recent analysis you made ? Or am I missing something here ?
    
    Thank you, Thibault
    
    Reply
    - Jason Brownlee July 26, 2017 at 7:52 am #
      
      Yes, both experience and some research turned me against LSTMs for autoregression.
      
      This is the rationale why my latest book on LSTMs is not focused on time series, but instead on LSTM architectures.
      
      Reply
      - naga October 29, 2019 at 1:33 pm #
        
        Dear Dr. Jason,
        
        I have couple of questions
        
        1. We preprocess our data in this format [x1, x2, x3, … , xt] -> [xt+1, xt+2]. What is the intuition when we change the value of t ( lag time-steps) to check if the accuracy changes? As you mentioned, the model weights are recalculated as per the above format.
        
        2. “LSTMs is not focused on time series, but instead on LSTM architectures” – Can you please eloborate this statement?
      - Jason Brownlee October 29, 2019 at 1:51 pm #
        
        My comment was that my LSTM book is not about time series. But I have a book on deep learning for time series that is more appropriate.
        
        For more on LSTMs for time series see these tutorials:
        https://machinelearningmastery.com/start-here/#deep_learning_time_series
        
        And this book:
        https://machinelearningmastery.com/deep-learning-for-time-series-forecasting/
  - TeenA October 6, 2020 at 12:16 am #
    
    I don’t think this is correct. The LSTM network is good at remembering dependencies *within* the sequence passed as input sample but does not keep memory *between* samples. So the input sequence must be crafted so that the dependencies are contained into it. So if there is something happening every 60 time steps and I break my sequence into 10-time-step samples, the LSTM won’t be able to learn this from the context
    
    Reply
    - Jason Brownlee October 6, 2020 at 6:55 am #
      
      State is preserved across samples until it is reset manually or automatically at the end of the batch.
      
      You can learn more by reviewing stateful vs stateless LSTMs:
      https://machinelearningmastery.com/start-here/#lstm
      
      Reply
Akash August 2, 2017 at 4:09 am #

Hi Jason, this is one of the best article I read about LSTM and got most out of it.

Reply
- Jason Brownlee August 2, 2017 at 7:56 am #
  
  Thanks Akash.
  
  Reply
Javier August 8, 2017 at 10:36 pm #

Hi Jason, firstly I’d like to thank you for sharing as much knowledge as you do in all of your blog posts, I’ve read many of them and they’ve been tramendously helpful.

I’m currently trying to come up with a model to predict the power output from a wind turbine. My dataset has about 35,000 entries with 3 input variables and 1 output variable. I’m using a 5 layer LSTM network since it’s the one that’s been giving me the best results so far.

My doubt is the following: I understand there’s “trends” in the output, in the sense that, the speed of the wind for instance doesn’t suddenly go from 100mph to 0mph in 1 second. Therefore the power being generated at time t will depend on the value of the input variables at time t, but it will also depend partly on the power generated at t-1, and t-2, and so on…

How can I account for this in my neural network? Does using LSTM layers already account for this or is there a way for me to improve the results of my model? Thanks in advance!!

Reply
- Jason Brownlee August 9, 2017 at 6:32 am #
  
  Hi Javier, you are describing “serial dependence” which is a core concept in time series problems.
  
  Consider reading up a little on autocorrelation and other time series concepts here:
  https://machinelearningmastery.com/start-here/#timeseries
  
  Reply
Ghaith September 5, 2017 at 5:53 am #

Hi,

Could you please try to help me if it’s possible.
I am training my data using RNN, how can I give an evidence the I found the best network?
for example, can ensure obtaining best network by checking its prediction or can plot regression or MSE?

regards,

Reply
- Jason Brownlee September 7, 2017 at 12:39 pm #
  
  Evaluate lots of other models and configurations will give you evidence that perhaps you have a good model.
  
  Reply
Ali September 13, 2017 at 5:52 pm #

Thanks for the nice blog post. I am just wondering, since there are some linear models such as seasonal ARIMA that model data not only based on previous observations but also previous seasonal pattern, are RNNs capable of using seasonal information such as SARIMA models to model data? Are they able to remember all those patterns if they occur long time ago that does not fit into its training window?

Reply
- Jason Brownlee September 15, 2017 at 12:01 pm #
  
  Yes, it is possible. I do not have a good example though.
  
  I have found MLPs outperform LSTMs on autoregression problems during my own testing.
  
  Reply
Shruti January 25, 2018 at 5:41 pm #

Hi Jason, Thank you for your post. I am implementing LSTM for time series forecasting. The length of my series is 300. I have applied vanilla LSTM, stacked LSTM, MLP, and ARIMA to forecast my weekly time series data but LSTM is not performing better than ARIMA and MLP. I have used ‘adam’ optimizer as discussed in your post to train LSTM. Can you please give me some tips to optimize it better. I have also applied regularization and varied the number of epochs.
However, In my another data set which has time series of length 60, then LSTM performed better but just marginally.

Reply
- Jason Brownlee January 26, 2018 at 5:38 am #
  
  Generally I find MLPs outperform LSTMs on time series forecasting. Stick with the MLP.
  
  Reply
Faisal January 28, 2018 at 6:30 am #

Hello Jason,
I’ve data for patients with some inputs(demographics etc) and 1 or more outputs for each of their visits to the hospital. I’ve done some experiments using NN in Matlab by considering all patients together which may not make more sense. Now I would like to do some time series analysis for each patient and seeing their behavior. I’ve some problems though

1. The data is not enough like many patients have for example 2 or 3 visits and I’ve only few patients
2. Each patient has different no. of visits

I’m still trying to think how could I make a time series out of it? Which model to use? ARMAX, NarxNet, etc. and several others.

I would be glad if you could guide me to some solutions.

Thanks.

Reply
- Jason Brownlee January 28, 2018 at 8:28 am #
  
  There is no single best answer. I would recommend testing a suite of different methods to see what works for your specific data.
  
  Reply
  - Faisal January 29, 2018 at 4:40 am #
    
    Thanks for the reply but …
    Assume a simple and hypothetical scenario (similar to my problem) like
    
    ID VisitDate Weight Height LowBP HighBP
    1 Jan 1, 2010 76 5 76 119
    1 Mar 10, 2010 77 5 73 119
    1 July 1, 2010 76 5 76 120
    2 Feb 2, 2009 55 5.5 70 132
    2 Mar 5, 2009 60 5.5 70 132
    2 Aug 2, 2009 57 5.5 71 130
    ….
    
    I would like to predict LowBP and HighBP after 1 month, 2 months, etc.
    
    As you can see the baseline for each patient will be different and the interval is not equal as well. In addition to that I’ve less data and also some missing data.
    
    I would like your opinion about tackling this problem especially the initial stage of preparing the data. How could I make it a time series?
    
    Do I need to create 2 more columns as future values of LowBP and HighBP and copy the next record of each patient as a future value. In that case the last record will have no future value and I may need to delete it.
    
    After that how could I split for train and test data. Do I need to keep all the last records of each patient as test data.
    
    Once I get some initial start I can then try to apply different techniques and models. I was reading your post about time series and also somewhere about LSTM. I’ve Matlab 2016 and LSTM I believe supported from 2017. If needed I’ll try to update.
    
    Thanks again.
    
    Reply
    - Jason Brownlee January 29, 2018 at 8:19 am #
      
      Maybe this post will help:
      https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
      
      Reply
      - Faisal January 30, 2018 at 8:48 am #
        
        Thanks again. In addition to that post I also read this
        https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
        
        and for sure will try to apply windows method and also walk-forward.
        
        Just trying to remove one confusion regarding prediction whether it will be for each ID or for all. What I mean whether I take for example first ID and try to create different models using walk forward say for 15 times. Then I take second ID and do the same for it. Correct me if I’m wrong doing this way I’ll get predictive models that are dependent on IDs. So what about if a new patient comes in? How do we predict for him? Do we need previous values for this patient and need to create again new models for this patient?
      - Jason Brownlee January 30, 2018 at 10:00 am #
        
        You have options. You can choose to model per patient, per patient group, across all patients or some other variation.
        
        There is no one way, so I’d encourage you to explore a few different framings of your problem and see what works best for your specific data.
Faisal January 31, 2018 at 12:15 am #

Ok I’ll try these different options.

Do you have any post for increasing the amount of data. I was thinking of copula or with some regression coefficients. What is your opinion?

As I mentioned earlier I’ve less data and per patient maybe around 3 to 4 records. I’m not sure whether I understand correctly and clear in my previous reply that when I use walk around to predict the next value and then use this value to predict the other next value. Are we actually increasing the amount of data? If yes then how the model can be validated?

Reply
- Jason Brownlee January 31, 2018 at 9:44 am #
  
  Perhaps it would be worth taking some time to really nail down what you want to predict and what data could be used:
  https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
  
  Reply
  - Faisal February 17, 2018 at 2:36 am #
    
    I’ve generated some synthetic data using uniform distribution with same mean and variance. Also I use copula fit to generate with Gaussian distribution. Now how do I verify that it is not biased. Is there any post?
    
    Another thing regarding creating the model patient wise. When I create model (for example NarNet) and predict for one patient and then move to next patient, how to merge the previous model with this patient. Or if I create separate model for each patient then how to merge to create a final single model?
    
    Reply
    - Jason Brownlee February 17, 2018 at 8:48 am #
      
      You can plot the residual errors to see if they are Gaussian.
      
      Perhaps ensemble, or perhaps combine data and train one large model. You are only limited by your imagination.
      
      Reply
navish February 27, 2018 at 6:21 pm #

Hi Jason,
i am new to neural network. I have got closing prices of a market index. I want to model volatility using LSTM neural network. How to implement LSTM in r to forecast the volatility of that index. what are the steps i should undertake to get a good out of sample forecast( prevent over/under fitting and robust method). Is there any r codes on how to model univariate times series using LSTM?

Reply
- Jason Brownlee February 28, 2018 at 6:02 am #
  
  I have a few posts on time series forecasting with LSTMs in Python that might help, start here:
  https://machinelearningmastery.com/start-here/#lstm
  
  Reply
Siddu March 17, 2018 at 12:58 am #

Hello Jason, Nice article.. I have a question.. Can I use RNNs for multi sensor data fusion?..I have data in the form of distances, velocities and acceleration

Reply
- Jason Brownlee March 17, 2018 at 8:39 am #
  
  Perhaps. I am not familiar with the domain sorry. I would recommend searching on google scholar for some similar examples.
  
  RNNs are suited to sequence prediction generally:
  https://machinelearningmastery.com/sequence-prediction/
  
  Reply
GE April 3, 2018 at 4:41 pm #

Hi Jason. I am trying to wrap my head around the different activation methods for neurons and how that can be useful for a more complex LSTM-based RNN. I am building a market close price predictor using OHLC, volume, # trades, RSI values, and SMAs. SMA’s and OHLC are all prices, but I am still struggling on how to tell the LSTM that a high RSI value has an inverse relationship to price. All my data points are normalized to a 0-1 scale but it feels like the LSTM just grabs all the numbers and treats them the same because the predictions are too far off. Im thinking maybe feeding the price inputs to one LSTM, and the RSI to another that uses inverse activation or something might help. Any thoughts? Thank you. GE

Reply
- Jason Brownlee April 4, 2018 at 6:08 am #
  
  You might need a lot more data or a lot more training.
  
  Perhaps start with an MLP and move to LSTM only if it can outperform the MLP. Often for time series problems, the LSTMs is not the right tool.
  
  Reply
Tushar Sinha June 29, 2018 at 5:06 am #

I’m working on a project to predict the usage of all the files in a filesystem in near future based on the metadata of the file system for past 6 months. I’ve got the following attributes about the files with me :
1. The temporal sequence of file usage for last 6 months(whenever the file was read/written/modified and by whom)
2. All the users who are on the server and can access the files.
3. Last modified/written/read epoch time and by whom
4. File creation epoch time and by whom
5. Any compliance regulations on the file(whether the file contains any confidential data)
6. Size, name, extension, version, type of the file
7. Number of users who can access the file
8. File path
9. Total number of times accessed
10. Permitted users

Now, I plan to use LSTM but for standard LSTMs, the input is temporal sequence only. However, all the attributes that I have seem significant in predicting the future usage of the file.
How should I also make use of the attributes of the file that I have? Should I train a Feedforward Neural Network, disregarding the fact that it usually fails on temporal sequences? How should I proceed?
Does a variant of LSTM exist that can take into account the attributes of the file as well and predict the usage of the file in near future?

Thanks in advance 🙂 !!

Reply
- Jason Brownlee June 29, 2018 at 6:14 am #
  
  You could provide all features each time step or have a multi-headed model, with one head the sequence with an LSTM and another head a Dense with the vector.
  
  Try both and see which works best.
  
  Reply
  - Tushar Sinha June 29, 2018 at 9:07 am #
    
    Thanks for the reply Jason. Can you please point me to any research paper/ other resources which solve a similar problem?
    That can help me immensely in working my way out.
    
    Reply
    - Jason Brownlee June 29, 2018 at 3:25 pm #
      
      This might help as a start:
      https://machinelearningmastery.com/keras-functional-api-deep-learning/
      
      I do have a few more examples of multi-headed models on the blog.
      
      Reply
Bi JW July 14, 2018 at 9:51 pm #

Hi Jason,

Great article! Thank you your for sharing!!

You said that “Feed-forward neural networks do offer great capability but still suffer from this key limitation of having to specify the temporal dependence upfront in the design of the model. And LSTM can overcome this limitation”
However, it seems that the parameter ‘timestep’ should be selected when using LSTM. Therefore, I wonder whether the ‘timestep’ is related to the ‘ temporal dependence’ or ‘the number of lag observations’? And how to select the ‘timestep’ when using LSTM?

Thank you!!

Reply
- Jason Brownlee July 15, 2018 at 6:17 am #
  
  It is the number of lag observations, but it can vary from sample to sample via zero-padding. The network processes one time step at a time.
  
  Reply
JG October 7, 2018 at 10:10 pm #

Some times words in language used to define a methodology or techniques is hard for new incomers…so how to express your ideas and concept employing words such as “Recurrent”, “Recursive”, “Iterate”, Interact”, etc. to express the idea to repeating …so until you do not practice with this new tool and ideas you really do not get the “essence” of the concepts and methodology…because those words used to labeled a methods could be interchange without problems…only operating with them give you the real clue of it …but not necessary the words used for tech people !

Reply
Lindsay Moir February 12, 2019 at 2:37 pm #

Your Task
For this lesson, you must suggest one capability from both Convolutional Neural Networks and Recurrent Neural Networks that may be beneficial in modeling time series forecasting problems.
The advantage of the CNN over the recurrent-type network is that due to the convolutional structure of the network, the number of trainable weights is small, resulting in a much more efficient training and predicting.
Recurrent neural networks are a type of neural network that add the explicit handling of order in input observations. Time series obviously are ordered by date.

Reply
- Jason Brownlee February 13, 2019 at 7:49 am #
  
  Agreed. Great note, thanks Lindsay.
  
  Reply
abbas May 26, 2019 at 5:39 am #

hi jason
you have a great blog.I am studying about LSTM and I use Matlab for classification a series to 2 class. my series is semi-random I use matlab then I can predict label with accuracy=80% and loss=0.4. I need to accuracy more than 85% and Could you please help me to find a suitable topic in your blog to do this work in python With keras? I’m confused in this blog.
my second question is about is it possible to reach 90% acc in this case??
my apologies for beginner English.
Thanks

Reply
- Jason Brownlee May 26, 2019 at 6:52 am #
  
  You can learn how to diagnose and improve deep learning model performance here:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
Andrea R. June 23, 2019 at 6:53 am #

Hi Dr. Jason, I was wondering with which lesson I should start to learn something more about Multi-step Forecasting with RNN (I’m actually trying to train my neural network to predict the sellings of the following 7 days given the week before). Some advices?
I’d really appreciate your answer. Thanks.

Reply
- Jason Brownlee June 24, 2019 at 6:19 am #
  
  Start right here:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  More here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Eva August 13, 2019 at 12:40 am #

Hi Dr. Jason, I have been greatly benefited by your posts on RNN and LSTM. They are excellent! Many Thanks.
I am also interested in the math behind RNN and LSTM. Could you please point to some related post that deals with the mathematics in a simple manner. Regards

Reply
- Jason Brownlee August 13, 2019 at 6:11 am #
  
  This might help as a first step:
  https://machinelearningmastery.com/gentle-introduction-long-short-term-memory-networks-experts/
  
  Reply
  - Eva August 13, 2019 at 12:27 pm #
    
    Thanks Dr. Jason
    
    Reply
Nikhil Paliwal October 9, 2019 at 3:48 pm #

They are many new techniques such as 3D convolution for forecasting using temporal information. The limitation of definition the range or portion of information to consume is a major drawback. In such cases, RNN has great scope.

Reply
- Jason Brownlee October 10, 2019 at 6:52 am #
  
  Thanks for sharing.
  
  Reply
Muhammad Asadullah Zahid December 15, 2019 at 7:00 am #

Hi Jason. Your articles are awesome. I am working on my Masters Thesis which is essentially Time series load forecasting in a SoS System of systems environment. There are different machines in an industrial factory and as a black box i have the time series data for every machine’s single model and in addition also some influencing factors e.g Temperature etc.
My task is to forecast the overall load of the factory combining the individual forecasts of the models BUT also considering the influencing factors that could be of interest in the overall forecast.
Would RNN’s be a good approach for this?
Do you have any more ideas on such a topic?
Thanks in advance

Reply
- Jason Brownlee December 16, 2019 at 6:04 am #
  
  Thanks!
  
  I recommend this process:
  https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
  
  Reply
MAHESH MADHUSHAN May 25, 2020 at 1:10 pm #

Thank you very much Jason. I read your article which “Your First Deep Learning Project in Python with Keras Step-By-Step” also. Do you have any article with an example project for build a neural network for time series forecasting?

Reply
- Jason Brownlee May 25, 2020 at 1:24 pm #
  
  Yes, hundreds. Start here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Rafael Magallanes-Quintanar July 19, 2020 at 3:31 am #

Thanks for your great post. I think I am in the right direction: I want to predict a Standardized Precipitation Index that depends on several multivariate climatological variables. So, applying LSTM may be the solution.

Reply
- Jason Brownlee July 19, 2020 at 6:34 am #
  
  Perhaps these tutorials will help you to get started:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply

Navigation

The Promise of Recurrent Neural Networks for Time Series Forecasting

Time Series Forecasting

Need help with Deep Learning for Time Series?

Neural Networks for Time Series

Recurrent Neural Networks for Time Series

Summary

Develop Deep Learning models for Time Series Today!

Develop Your Own Forecasting models in Minutes

Finally Bring Deep Learning to your Time Series Forecasting Projects

More On This Topic

74 Responses to The Promise of Recurrent Neural Networks for Time Series Forecasting

Leave a Reply Click here to cancel reply.