Time Series Prediction With Deep Learning in Keras

Time Series prediction is a difficult problem both to frame and to address with machine learning.

In this post, you will discover how to develop neural network models for time series prediction in Python using the Keras deep learning library.

After reading this post you will know:

  • About the airline passengers univariate time series prediction problem.
  • How to phrase time series prediction as a regression problem and develop a neural network model for it.
  • How to frame time series prediction with a time lag and develop a neural network model for it.

Let’s get started.

  • Update Oct/2016: Replaced graphs with more accurate versions, commented on the limited performance of the first method.
  • Update Mar/2017: Updated example for Keras 2.0.2, TensorFlow 1.0.1 and Theano 0.9.0.

Problem Description

The problem we are going to look at in this post is the international airline passengers prediction problem.

This is a problem where given a year and a month, the task is to predict the number of international airline passengers in units of 1,000. The data ranges from January 1949 to December 1960 or 12 years, with 144 observations.

The dataset is available for free from the DataMarket webpage as a CSV download with the filename “international-airline-passengers.csv“.

Below is a sample of the first few lines of the file.

We can load this dataset easily using the Pandas library. We are not interested in the date, given that each observation is separated by the same interval of one month. Therefore when we load the dataset we can exclude the first column.

The downloaded dataset also has footer information that we can exclude with the skipfooter argument to pandas.read_csv() set to 3 for the 3 footer lines. Once loaded we can easily plot the whole dataset. The code to load and plot the dataset is listed below.

You can see an upward trend in the plot.

You can also see some periodicity to the dataset that probably corresponds to the northern hemisphere summer holiday period.

Plot of the Airline Passengers Dataset

Plot of the Airline Passengers Dataset

We are going to keep things simple and work with the data as-is.

Normally, it is a good idea to investigate various data preparation techniques to rescale the data and to make it stationary.

Need help with Deep Learning for Time Series?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Multilayer Perceptron Regression

We want to phrase the time series prediction problem as a regression problem.

That is, given the number of passengers (in units of thousands) this month, what is the number of passengers next month.

We can write a simple function to convert our single column of data into a two-column dataset. The first column containing this month’s (t) passenger count and the second column containing next month’s (t+1) passenger count, to be predicted.

Before we get started, let’s first import all of the functions and classes we intend to use. This assumes a working SciPy environment with the Keras deep learning library installed.

Before we do anything, it is a good idea to fix the random number seed to ensure our results are reproducible.

We can also use the code from the previous section to load the dataset as a Pandas dataframe. We can then extract the NumPy array from the dataframe and convert the integer values to floating point values which are more suitable for modeling with a neural network.

After we model our data and estimate the skill of our model on the training dataset, we need to get an idea of the skill of the model on new unseen data. For a normal classification or regression problem we would do this using cross validation.

With time series data, the sequence of values is important. A simple method that we can use is to split the ordered dataset into train and test datasets. The code below calculates the index of the split point and separates the data into the training datasets with 67% of the observations that we can use to train our model, leaving the remaining 33% for testing the model.

Now we can define a function to create a new dataset as described above. The function takes two arguments, the dataset which is a NumPy array that we want to convert into a dataset and the look_back which is the number of previous time steps to use as input variables to predict the next time period, in this case, defaulted to 1.

This default will create a dataset where X is the number of passengers at a given time (t) and Y is the number of passengers at the next time (t + 1).

It can be configured and we will look at constructing a differently shaped dataset in the next section.

Let’s take a look at the effect of this function on the first few rows of the dataset.

If you compare these first 5 rows to the original dataset sample listed in the previous section, you can see the X=t and Y=t+1 pattern in the numbers.

Let’s use this function to prepare the train and test datasets ready for modeling.

We can now fit a Multilayer Perceptron model to the training data.

We use a simple network with 1 input, 1 hidden layer with 8 neurons and an output layer. The model is fit using mean squared error, which if we take the square root gives us an error score in the units of the dataset.

I tried a few rough parameters and settled on the configuration below, but by no means is the network listed  optimized.

Once the model is fit, we can estimate the performance of the model on the train and test datasets. This will give us a point of comparison for new models.

Finally, we can generate predictions using the model for both the train and test dataset to get a visual indication of the skill of the model.

Because of how the dataset was prepared, we must shift the predictions so that they aline on the x-axis with the original dataset. Once prepared, the data is plotted, showing the original dataset in blue, the predictions for the train dataset in green the predictions on the unseen test dataset in red.

We can see that the model did a pretty poor job of fitting both the training and the test datasets. It basically predicted the same input value as the output.

Naive Time Series Predictions With Neural Network

Naive Time Series Predictions With Neural Network
Blue=Whole Dataset, Green=Training, Red=Predictions

For completeness, below is the entire code listing.

Running the model produces the following output.

Taking the square root of the performance estimates, we can see that the model has an average error of 23 passengers (in thousands) on the training dataset and 48 passengers (in thousands) on the test dataset.

Multilayer Perceptron Using the Window Method

We can also phrase the problem so that multiple recent time steps can be used to make the prediction for the next time step.

This is called the window method, and the size of the window is a parameter that can be tuned for each problem.

For example, given the current time (t) we want to predict the value at the next time in the sequence (t + 1), we can use the current time (t) as well as the two prior times (t-1 and t-2).

When phrased as a regression problem the input variables are t-2, t-1, t and the output variable is t+1.

The create_dataset() function we wrote in the previous section allows us to create this formulation of the time series problem by increasing the look_back argument from 1 to 3.

A sample of the dataset with this formulation looks as follows:

We can re-run the example in the previous section with the larger window size. We will increase the network capacity to handle the additional information. The first hidden layer is increased to 14 neurons and a second hidden layer is added with 8 neurons. The number of epochs is also increased to 400.

The whole code listing with just the window size change is listed below for completeness.

Running the example provides the following output.

We can see that the error was not significantly reduced compared to that of the previous section.

Looking at the graph, we can see more structure in the predictions.

Again, the window size and the network architecture were not tuned, this is just a demonstration of how to frame a prediction problem.

Taking the square root of the performance scores we can see the average error on the training dataset was 23 passengers (in thousands per month) and the average error on the unseen test set was 47 passengers (in thousands per month).

Window Method For Time Series Predictions With Neural Networks

Window Method For Time Series Predictions With Neural Networks
Blue=Whole Dataset, Green=Training, Red=Predictions

Summary

In this post, you discovered how to develop a neural network model for a time series prediction problem using the Keras deep learning library.

After working through this tutorial you now know:

  • About the international airline passenger prediction time series dataset.
  • How to frame time series prediction problems as a regression problems and develop a neural network model.
  • How use the window approach to frame a time series prediction problem and develop a neural network model.

Do you have any questions about time series prediction with neural networks or about this post?
Ask your question in the comments below and I will do my best to answer.


Develop Deep Learning models for Time Series Today!

Deep Learning for Time Series Forecasting

Develop Your Own Forecasting models in Minutes

…with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Time Series Forecasting

It provides self-study tutorials on topics like: CNNs, LSTMs,
Multivariate Forecasting, Multi-Step Forecasting and much more…

Finally Bring Deep Learning to your Time Series Forecasting Projects

Skip the Academics. Just Results.

Click to learn more.


158 Responses to Time Series Prediction With Deep Learning in Keras

  1. Steve Buckley August 13, 2016 at 2:12 am #

    Hi Jason,

    This is a new tool for me so an interesting post to get started!

    It looks to me like your plot for the first method is wrong. As you’re only giving the previous time point to predict the next, the model is going to fit (close to) a straight line and won’t pull out the periodicity your plot suggests. The almost perfect fit of the red line to the blue line also doesn’t reflect the much worse fit suggested in the model score!

    Hope that’s helpful.

  2. Curious George August 17, 2016 at 2:29 am #

    Hi Jason,

    How can you use this technique to forecast into the future?

    Thanks!

    • Jason Brownlee August 17, 2016 at 9:52 am #

      This example is forecasting t+1 in the future.

      • Curious George August 25, 2016 at 1:18 am #

        In order to forecast t+2, t+3, t+n…., is it recommended to use the previous prediction (t+1) as the assumed data point.

        For example, if I wanted to forecast t+2, I would use the available data including my prediction at t+1.

        I understand that the error would increase the further out the forecast due to relying on predictions as data points.

        Thoughts?

        • Jason Brownlee August 25, 2016 at 5:05 am #

          Yes, using this approach will provide multiple future data points. As you suggest, the further in the future you go, the more likely errors are to compound.

          Give it a go, it’s good to experiment with these models and see what they are capable of.

      • Curious George August 25, 2016 at 1:51 am #

        Also, when running the full code snippet using the window method, the graph produced does not match the one shown.

        This is what I’m getting

        http://imgur.com/a/NaoYE

        • Jason Brownlee August 25, 2016 at 5:06 am #

          I did update the plotting code with a minor change and did not update the images accordingly. I will update them ASAP.

      • Andy March 1, 2017 at 6:03 am #

        Could you show an example where maybe there was a couple more features. So, say you wanted to predict how many passengers, and you knew about temperature and day of the week (Mon-Sun).

        • Jason Brownlee March 1, 2017 at 8:46 am #

          Hi Andy,

          Yes, I am working on more sophisticated time series tutorials at the moment, they should be on the blog soon.

          • Soren Pallesen June 9, 2017 at 6:18 pm #

            Look forward to these time series forecast with multiple features examples – when do you expect to post them to your blog?

            As always thx for this valuable resource and for sharing your experience !

          • Jason Brownlee June 10, 2017 at 8:19 am #

            Perhaps a month. No promises. I am taking my time to ensure they are good.

      • Ramzan Shahid November 10, 2017 at 4:51 am #

        Sir please, share some tutorial on tensorflow and what are the differences to make models in tensorflow and keras. thanks

        • Jason Brownlee November 10, 2017 at 10:41 am #

          Tensorflow is like coding in assembly, Keras is like coding in Python.

          Keras is so much simpler and makes you more productive, but gives up some speed and flexibility, a worthy trade-off for most applications.

      • shahid January 17, 2018 at 6:06 am #

        sir can you have done any example for more than one column for time series prediction like stock data? If yes, please share the link of that. Thanks

  3. Keshav Mathur August 30, 2016 at 7:20 am #

    Hello,

    Thank you for a great article. I have a big doubt and also related to the plot posted in the earlier comment which shows a sort of lag in the prediction. Here we are training the model on t to get predictions for t+1.

    Given this I would assume that when the model sees an input of 112 it should predict around 118 (first data point in the training set). But that’s not what the predictions show. Copying the top 5 train points and their subsequent predictions generated by the code given in this post for the first example:

    trainX[:5] trainPredict[:5]
    [ 112.], [112.56],
    [ 118.], [118.47],
    [ 132.], [132.26],
    [ 129.], [129.55],
    [ 121.] [121.57],

    I am trying to understand from a model perspective as to why is it predicting with a lag?

    • Jason Brownlee October 9, 2016 at 10:54 am #

      Thanks Keshav, I have updated the description and the graphs.

  4. Jev September 5, 2016 at 7:17 am #

    Just as Steve Buckley pointed out, your first method seems to be wrong. The model indeed just fits a straight line ( yPred = a*X+b) , which can be verified by calculating predictions on an input such as arange(200).
    Because you shift the results afterwards before plotting, the outcome seems very good. However, from a conceptual point of view, it should be impossible to predict X_t+1 correctly based on only X_t, as the latter contains no trend or seasonal information.

    Here is what I’ve got after trying to reproduce your results:

    X Y yPred
    0 112.0 118.0 112.897537
    1 118.0 132.0 118.847107
    2 132.0 129.0 132.729446
    3 129.0 121.0 129.754669
    ….

    as you can see, the yPred is way off ( it should be equal to Y), but looks good when shifted one period.

    • Jason Brownlee October 9, 2016 at 10:55 am #

      Yep, right on Jev, thanks. I have updated the description and the graphs.

  5. Max Clayer September 14, 2016 at 4:01 am #

    Hi, Jason

    I also have to agree with Jev, I would expect using predict(trainX) would give values closer to trainY values not trainX values.

    • Jason Brownlee October 9, 2016 at 10:56 am #

      They do Max, you’re right. I have updated the graphs to better reflect the actual predictions made.

  6. Himadri September 24, 2016 at 11:56 pm #

    Hi Jason,
    Thanks for such a wonderful tutorial!
    I was just wondering if in function create_dataset, there should be range(len(dataset)-1) in the loop. Hence for plotting logic, it should be:


    trainPredictPlot[lb:len(train),:] = trainPredict

    testPredictPlot[len(train)+lb:len(dataset),:] = testPredict

    I am just in a big confusion with the index and getting somewhat difference plot for look_back=3 : http://imgur.com/a/DMbOU

  7. Veltzer Doron September 26, 2016 at 6:06 pm #

    Hey, thanks for a most helpful tutorial, any ideas why this seems to work better than the time series predictions using RNNs and LSTM in the sister tutorial? My intuition predicts the opposite.

    • Jason Brownlee September 27, 2016 at 7:41 am #

      I’m glad you like it Veltzer.

      Great question, the LSTMs probably require more fine tuning I expect.

  8. Newbtothis September 29, 2016 at 12:02 pm #

    Hey there! Great blog and articles – the examples really help a lot! I’m new to this so excuse the stupid question if applicable – I want to predict the next three outputs based on the same input. Is that doable in the LSTM framework? This is for predicting the water temperature for the next 3 days.

    • Jason Brownlee September 30, 2016 at 7:48 am #

      Yes, this is called sequence to sequence prediction.

      I see two main options:

      – Run the LSTM 3 times and feed output as input.
      – Change the LSTM to output 3 numbers.

  9. Han September 30, 2016 at 11:37 am #

    This particular time-series has strong seasonality and looks exponential in trend. In reality, the growth rate of this time series is more important. Could you plot the year-on-year growth rate?

    • Jason Brownlee October 1, 2016 at 8:00 am #

      There would be benefit in modeling a stationary version of the data, I agree.

  10. Han October 1, 2016 at 2:44 am #

    I agree with Steve Buckley. The code is predicting x[i+1] = x[i] (approximately), that why the last part of code, which is supposed to fix the shift, couldn’t get the shift part right.

    Try the following: pick any point in your testX, say testX[i], use the model to predict testY[i], then instead of using testX[i+1], use testY[i] as the input parameter for model.predict(), and so on. You will end up with a nearly straight line.

    I’d thank you for your wonderful posts on neural network, which helped me a lot when learning neural network. However, this particular code is not correct.

  11. Jeremy October 5, 2016 at 1:58 pm #

    Thanks for great article! It is really helpful for me. I have one question. If I have two more variable, how can i do? Take example, my data looks like follow,
    date windspeed rain price
    20160101 10 100 1000
    20160102 10 80 1010

    I’d like to predict the price.

    • Jason Brownlee October 6, 2016 at 9:26 am #

      Hi Jeremy, each input would be a feature. You could then use the window method to frame multiple time steps of multiple features as new features.

      For example:

      • Shimin November 22, 2016 at 10:53 pm #

        Hi Jason,

        Thanks for your great explanation!

        I have one question like Jeremy’s. Is there any suggestion for me if I want to predict 2 variables? Data frame shown as below:

        Date X1 X2 X3 X4 Y1 Y2

        I want to predict Y1 and Y2. Also, Y1 and Y2 have some correlations.

        • Jason Brownlee November 23, 2016 at 8:59 am #

          hi Shimin,

          Yes, this is often called a sequence prediction problem in deep learning or a multi-step prediction problem in time series prediction.

          You can use an LSTM with two outputs or you can use an MLP with two outputs to model this problem. Be sure to prepare your data into this form.

          I hope that helps.

  12. Sunny October 18, 2016 at 9:09 am #

    Jason,
    Great writeup on using Keras for TS data. My dataset is something like below:# print the

    Date Time Power1 Power2 Power3 Meter1 Meter2
    12/02/2012 02:53:00 2.423 0.118 0.0303 0.020 1.1000

    My feature vectors/predictors are Date, Time, Power1, Power2, Power3, Meter1. i am trying to predict Meter 2.

    I would like to instead of using MLP use RNN/LSTM for the above time series prediction.
    Can you pl. suggest is this is possible? and if yes, any pointers would help
    thanks
    Sunny

  13. nicoad October 31, 2016 at 7:52 pm #

    Hello , nice tutorial .

    I have one question : it would be usefull to have similar stuff on live data. let s say I have access to some real time data (software downloads, stock price …) , would it requires to train the model each time new data is available ?

    • Jason Brownlee November 1, 2016 at 7:59 am #

      I agree nicoad, a real-time example would be great. I’ll look into it.

      A great thing about neural networks is that they can be updated with new data and do not have to be re-trained from scratch.

  14. sherlockatszx November 8, 2016 at 3:36 am #

    Hi,your original post code is to use 1(or 3) dimension X to predict the later 1 dimension Y.how about I want to use 48 dimension X to predict 49th and 50th.what i mean is i increase the time unit i want to predict ,predict 3 or even 10 time unit . under such condition : does that mean i just change the output_dime of the last output layer :

    model.add(Dense(
    output_dim=3))

    Is that right?

    • Jason Brownlee November 8, 2016 at 9:58 am #

      Yes, that looks right. Let me know how you go.

      • sherlockatszx November 8, 2016 at 6:56 pm #

        Hi jason, I make a quick expriment in jupyter notebook and published in the github
        github:https://github.com/sherlockhoatszx/TimeSeriesPredctionUsingDeeplearning
        the code could work.
        However If you look very carefully of the trainPredict data(IN[18] of the notebook).

        the first 3 array is:
        array([[ 128.60112 , 127.5030365 ],
        [ 121.16256714, 122.3662262 ],
        [ 144.46884155, 145.67802429]

        the list inside [ 128.6,127.5 ] [121,2,122,3] does not like t+1 and t+2.
        **Instead,** It looks like 2 probaly prediction for 1 unit.
        What i means is [128.6,127.5] doesn’t mean t+1 and t+2 prediction, it most possibly mean 2 possible prediction for t+1.
        one output cell with 2dimension and 2 output cell with 1 dimension is different.
        I discussed it with other guy in github .
        https://github.com/Vict0rSch/deep_learning/issues/11
        It seems i should use seq2seq or use timedistributed wrapper .

        I stilll explored this and have not got one solution .

        What is your suggestion?

        • Jason Brownlee November 9, 2016 at 9:49 am #

          That does sound like good advice. Treat the problem as sequence to sequence problem.

  15. sherlockatszx November 8, 2016 at 8:28 pm #

    hi jason , I made a experiment on the jupyter notebook and published on the github .The code could output 2 columns data.
    https://github.com/sherlockhoatszx/TimeSeriesPredctionUsingDeeplearning/blob/master/README.md

    However! If you look very carefully of the trainPredict data(IN[18] of the notebook).

    the first 3 array is:
    array([[ 128.60112 , 127.5030365 ],
    [ 121.16256714, 122.3662262 ],
    [ 144.46884155, 145.67802429]

    the list inside [ 128.6,127.5 ] [121,2,122,3] does not like t+1 and t+2.
    **Instead,** It looks like 2 probaly prediction for 1 unit.
    What i means is [128.6,127.5] doesn’t mean t+1 and t+2 prediction, it most possibly mean 2 possible prediction for t+1.
    1 output cell with 2 dimension and 2 output cell with 1 dimension is different.
    The input dimension and the output dimension will be tricky for the NN.

  16. Xiao November 16, 2016 at 1:59 am #

    Thanks Jason for the conceptual explaining. I have one question about the KERAS package:

    It looks you input the raw data (x=118 etc) to KERAS. Do you know whether KERAS needs to standardize (normalize) the data to (0,1) or (-1,1) or some distribution with mean of 0?

    — Xiao

    • Jason Brownlee November 16, 2016 at 9:32 am #

      Great question Xiao,

      It is a good idea to standardize data or normalize data when working with neural networks. Try it on your problem and see if it affects the performance of your model.

      • Satoshi Report November 19, 2016 at 2:24 pm #

        Wasn’t the data normalized in an early version of this post?

        • Jason Brownlee November 22, 2016 at 6:46 am #

          I don’t believe so Satoshi.

          Normalization is a great idea in general when working with neural nets, though.

  17. charith December 12, 2016 at 8:11 pm #

    I keep getting this error dt = datetime.datetime.fromordinal(ix).replace(tzinfo=UTC)

    ValueError: ordinal must be >= 1

    • Jason Brownlee December 13, 2016 at 8:05 am #

      Sorry charith, I have not seen this error before.

  18. Trex January 7, 2017 at 8:07 am #

    In Your text you say, the window size is 3, But in Your Code you use loop_back = 10 ?

    • Jason Brownlee January 7, 2017 at 8:41 am #

      Thanks Trex.

      That is a typo from some experimenting I was doing at one point. Fixed.

  19. Trex January 7, 2017 at 11:44 am #

    No problem,

    I have another question:

    what the algorithm now does is predict 1 value. I want to predict with this MLP like n-values.

    How should this work?

    • Jason Brownlee January 8, 2017 at 5:17 am #

      Reframe your training dataset to match what you require and change the number of neurons in the output layer to the number of outputs you desire.

  20. Mansolo January 7, 2017 at 12:47 pm #

    Hey Sir,

    great Tutorial.

    I am trying to build a NN for Time-Series-Prediction. But my Datas are different than yours.

    I want to predict a whole next day. But a whole day is defined as 48 values.

    Some lines of the Blank datas:
    2016-11-10 05:00:00.000 0
    2016-11-10 05:30:00.000 0
    2016-11-10 06:00:00.000 1
    2016-11-10 06:30:00.000 3
    2016-11-10 07:00:00.000 12
    2016-11-10 07:30:00.000 36
    2016-11-10 08:00:00.000 89
    2016-11-10 08:30:00.000 120
    2016-11-10 09:00:00.000 209
    2016-11-10 09:30:00.000 233
    2016-11-10 10:00:00.000 217
    2016-11-10 10:30:00.000 199
    2016-11-10 11:00:00.000 244

    There is a value for each half an hour of a whole day.

    i want to predict the values for every half an hour for the next few days. How could this work?

  21. Hem January 8, 2017 at 3:43 pm #

    Could you do an example for a Multivariate Time Series? 🙂

    • Jason Brownlee January 9, 2017 at 7:48 am #

      Yes, there are some tutorials scheduled on the blog. I will link to them once they’re out.

  22. Bonje January 12, 2017 at 12:19 am #

    Why doesnt need the reLu Activation function that the input datas are normalized between 0 and 1?

    If i use the sigmoid activation function, there is a must, that the input datas are normalized.

    But why reLu doesnt need that?

  23. Bonje January 12, 2017 at 1:10 am #

    Another Question:

    Your Input Layer uses reLu as activition Function.
    But why has your Output Layer no activition Function? Is there a default activition function which keras uses if you give one as parameter? if yes, which is it? if no, why is it possible to have a Layer without a activition function in it?

    Thanks 🙂

    • Jason Brownlee January 12, 2017 at 9:31 am #

      Yes the default is Linear, this is a desirable activation function on regression problems.

  24. Bonje January 12, 2017 at 1:11 am #

    dont give one as parameter*

  25. Dmitry N. Medvedev February 4, 2017 at 1:11 pm #

    A stupid question, sir.

    Suppose I have a dataset with two fields: “date” (timestamp), “amount” (float32) describing a year.

    on the first day of each month the amount is set to -200.

    This is true for 11 months, except for the 12th (December).

    Is there a way to train a NN so that it returns 12, marking the December as not having such and amount on its first day?

    • Jason Brownlee February 5, 2017 at 5:15 am #

      Sorry Dmitry, I’m not sure I really understand your question.

      Perhaps you’re able to ask it a different way or provide a small example?

  26. Thomas Durant February 9, 2017 at 3:36 pm #

    Is it common to only predict the single next time point? Or are there times/ways to predict 2,3, and 4 times points into the future, and if so, how do you assess performance metrics for those predictions?

    • Jason Brownlee February 10, 2017 at 9:50 am #

      Good question Thomas.

      The forecast time horizon is problem specific. You can predict multiple steps with a MLP or LSTM using multiple neurons in the output layer.

      Evaluation is problem specific but could be RMSE across the entire forecast or per forecast lead time.

  27. zhou February 28, 2017 at 8:34 pm #

    thanks for Jason’s post, I benefit a lot from it. now I have a problem:how can I get the passengers in 1961-01? anticipates your reply.

    • Jason Brownlee March 1, 2017 at 8:36 am #

      You can train your model on all available data, then call model.predict() to forecast the next out of sample observation.

      • zhou March 3, 2017 at 1:42 pm #

        it seems the model can’t forecast the next month in future?

        • Jason Brownlee March 6, 2017 at 10:41 am #

          What do you mean exactly zhou?

          • zhou March 8, 2017 at 4:39 pm #

            sorry. I want to forecast the passengers in future, what should I do?

  28. Viktor March 2, 2017 at 12:08 am #

    Thanks for the tutorial, Jason. it’s very useful. It would be nice to also know how you chose the different parameters for MLP, and you’d go about optimizing them.

  29. 0xKA March 6, 2017 at 7:35 pm #

    In the first case. If I shift model to the left side, it will be a good model for forecasting because predicted values are quite fit the original data. Is it possible to do that ?

  30. Sphurti March 22, 2017 at 4:29 pm #

    Is there any specific condition to use activation functions? how to deside which activation function is more suitable for linear or nonlinear datasets?

    • Jason Brownlee March 23, 2017 at 8:47 am #

      There are some rules.

      Relu in hidden because it works really well. Sigmoid for binary outputs, linear for regression outputs, softmax for muti-class classification.

      Often you can transform your data for the bounds of a given activation function (e.g. 0,1 for sigmoid, -1,1 for tanh, etc.)

      I hope that helps as a start.

      • Sphurti March 23, 2017 at 5:51 pm #

        how to decide the optimizer? Is there any relevance with activation function?

        • Jason Brownlee March 24, 2017 at 7:53 am #

          Not really. It’s a matter of taste it seems (speed vs time).

  31. John March 29, 2017 at 1:37 am #

    What kind of validation are you using in this tutorial? is it cross validation?

  32. Sphurti March 29, 2017 at 3:31 pm #

    Is there any another deep learning algorithms that can be used for time series prediction? why to prefer multilayer perceptron for time series prediction?

    • Jason Brownlee March 30, 2017 at 8:47 am #

      Yes, you can use Long Short-Term Memory (LSTM) networks.

  33. Qiushi Wang April 3, 2017 at 5:22 pm #

    Hi Jason,

    I always have a question, if we only predict 1 time step further (t+1), the accurate predicted result is just copy the value of t, as the first figure shows. When we add more input like (t-2, t-1, t), the predicted result get worse. Even compare with other prediction method like ARIMA, RNN, this conclusion perhaps is still correct. To better exhibit the power of these prediction methods, should we try to predict more time steps further t+2, t+3, …?

    Thanks

    • Jason Brownlee April 4, 2017 at 9:13 am #

      It is a good idea to make the input data stationary and scale it. Then the network needs to be tuned for the problem.

  34. Stephan Oelze April 10, 2017 at 1:56 am #

    Dear Jason.

    Thanks for sharing your information here. Anyway i was not able to reproduce your last figure. On my machine it still looks like the “bad” figure.

    https://www2.pic-upload.de/img/32978063/auto.png

    I used the code as stated above. Where is my missunderstanding here?

    https://pastebin.com/EzvjnvGv

    Thank You!
    silly me 🙂

  35. trupti April 11, 2017 at 3:22 pm #

    thanks for this post..actually I am referring this for my work. my dataset is linear. Can I use softplus or elu as an activation function for linear data?

    • Jason Brownlee April 12, 2017 at 7:50 am #

      Yes, but your model may be more complex than is needed. In fact, you may be better off with a linear model like Linear Regression or Logistic Regression.

  36. ikok April 20, 2017 at 8:31 am #

    Firstly thanks Jason, I try MLP and LSTM based models on my time series data, and I get some RMSE values. ( e.g. train rmse 10, and test 11) (my example count 1400, min value:21, max value 210 ) What is acceptance value of RMSE. ?

    • Jason Brownlee April 20, 2017 at 9:35 am #

      Nice work!

      An acceptable RMSE depends on your problem and how much error you can bear.

  37. Dmitry April 21, 2017 at 6:58 pm #

    Great article, thank you.
    Is it possible to make a DNN with several outputs? For example the output layer has several neurons responsible for different flight directions. What difficulties can arise?

    • Jason Brownlee April 22, 2017 at 9:25 am #

      Yes, try it.

      Skill at future time steps often degrades quickly.

  38. piemonsparrow April 21, 2017 at 10:50 pm #

    Hello, Jason, i am a student, recently i am learning from your blog. Could you make a display deep learning model training history in this article? I will be very appreciated if you can, because i am a newer. Thank you!

  39. Hans April 23, 2017 at 2:09 pm #

    Does anybody have an idea/code snippet how to store observations of this example code in a variable, so that the variable can be used to to make predictions beyond the airline dataset (one step in the future)?

  40. Hans April 24, 2017 at 3:20 pm #

    Would it be logical incorrect to extend the testX-Array with for example [0,0,0] to forecast unseen data/ a step in the future?

    • Jason Brownlee April 25, 2017 at 7:45 am #

      It would not be required.

      Fit your model on all available data. When a new observation arrives, scale it appropriately, gather it with the other lag observations your model requires as input and call model.predict().

  41. Hans April 24, 2017 at 4:45 pm #

    Is there a magic trick to get the right array-format for a prediction based on observations?
    I always get the wrong format:


    obsv1 = testPredict[4]
    obsv2 = testPredict[5]
    obsv3 = testPredict[6]

    dataset = obsv1, obsv2, obsv3
    dataX = []
    dataX.append(dataset)
    #dataX.append(obsv2)
    #dataX.append(obsv3)
    myNewX = numpy.array(dataX)

    • Hans April 24, 2017 at 5:05 pm #

      Update:

      After several days I manged to make a prediction on unseen data in this example (code below).
      Is this way correct?
      How many observations should be used to get a good prediction on unseen data.
      Are there standard tools available to measure corresponding performances and suggest the amount of observations?
      Would this topic the same as choosing the right window-size for time-series analysis, or where would be the difference?

      Code:

      obsv1 = float(testPredict[4])
      obsv2 = float(testPredict[5])
      obsv3 = float(testPredict[6])

      dataX = []
      myNewX = []
      dataX.append(obsv1)
      dataX.append(obsv2)
      dataX.append(obsv3)
      myNewX.append(dataX)
      myNewX = numpy.array(myNewX)

      futureStepPredict = model.predict(myNewX)
      print(futureStepPredict)

      • Jason Brownlee April 25, 2017 at 7:48 am #

        Looks fine.

        The number of obs required depends on how you have configured your model.

        The “best” window size for a given problem is unknown, you must discover it through trial and error, see this post:
        http://machinelearningmastery.com/a-data-driven-approach-to-machine-learning/

        • Hans April 28, 2017 at 1:04 pm #

          Is there a method or trial and error-strategy to find out how many lag observations are ‘best’ for a forecast of unseen data?
          Is there a relation between look_back (window size) and lag observations?
          In theory I could use all observations to predict one step of unseen data. Would this be useful?

      • Hans May 31, 2017 at 8:38 pm #

        If I fill the model with 3 obs, I get 3 predictions/data points of unseen data.

        If I only want to predict one step in the future, should I build an average of the resulting 3 predictions,
        or should I simply use the last of the 3 prediction steps?

        Thank you.

        • Jason Brownlee June 2, 2017 at 12:46 pm #

          I would recommend changing the model to make one prediction if only one time step prediction is required.

          • Hans June 2, 2017 at 7:49 pm #

            How would you change the Multilayer Perceptron model of this site in this regard?

      • Hans June 9, 2017 at 9:04 pm #

        I have a misconception here. Don’t do the same fellow reader!

        With “obsv(n) = float(testPredict[n])” I took predictions of the test dataset as observations.

        THAT’S WRONG!

        Instead we take a partition of the original raw data as x/observations to predict unseen data, with a trained/fitted model- IN EVERY CASE.

        Like in R:
        http://machinelearningmastery.com/finalize-machine-learning-models-in-r/#comment-401949

        Is this right Jason?

    • Jason Brownlee April 25, 2017 at 7:46 am #

      If you need a 2D array with 1 row and 2 columns, you can do something like:

  42. Md. Armanur Rahman April 28, 2017 at 2:48 pm #

    Hello Sir,

    This is Arman from Malaysia. I am a student of Multimedia University. I want to do “Self-Tuning performance of Hadoop using Deep Learning”. So which framework I will consider for this sort of problem. as like DBM, DBN , CNN, RNN ?

    I need your suggestion.

    With best regards
    Arman

  43. Hans May 1, 2017 at 6:41 pm #

    Are there any more concerns about this code. Or is it updated and technical correct now?

    • Jason Brownlee May 2, 2017 at 5:57 am #

      We can always do things better.

      For this example, I would recommend exploring providing the data as time steps and explore larger networks fit for more epochs.

      • Hans May 2, 2017 at 11:31 pm #

        Hm, I’m not sure if I understand it right.

        I believe I’m already feeding it with time-step like so:

        return datetime.strptime(x, ‘%Y-%m-%d’)

        My raw data items have a decent date column. Is this what you meant?

        How do we explore larger networks fit for more epochs?

        I have everything parameterized in a central batch file now (pipeline).

        Should I increase the epochs for…

        model.fit(trainX, trainY, epochs=myEpochs, batch_size=myBatchSize, verbose=0)

        Thank you.

  44. Hans May 1, 2017 at 6:58 pm #

    I’m trying to adapt some code from:

    http://machinelearningmastery.com/time-series-forecasting-long-short-term-memory-network-python/

    …and build the variable EXPECTED in the context of this script.
    Unfortunately I don’t know how to do it right. I’m a little bit frustrated at this point.


    for i in range(len(test)): <-- what should I better use here?
    expected = dataset[len(train) + i + 1] <-- what should I better use here?
    print(expected)

    This looks cool so far, could I use the index to retrieve a var called EXPECTED?

    for i in range(len(testPredict)):
    pre = '%.3f' % testPredict[i]
    print(pre)

    A code example would help to solve my index-confusions.

  45. Stefan June 15, 2017 at 2:06 am #

    This is a great example that machine learning is often much more than knowing how to use the algorithms / libraries. It’s always important to understand the data we are working with. For this example as it is 1 dimensional this is luckily quite easily done.
    In the first example we are giving the the algorithm one previous value and ask it “What will the next value be?”.

    Since we use a neural net not taking into account any time behavior, this system is strongly overdetermined. There are a lot of values at the y value 290 for example. For half of them the values decline, for half of them the values increase. If we don’t give the algorithm any indication, how should it know which direction this would be for the test datapoint? There is just not enough information.

    One idea could be to additionally give the algorithm the gradient which would help in the decision whether we a rising or a falling value follows (which is somehow what we do when adding a lookback of 2). Yet, the results do obviously not improve significantly.

    Here I want to come back to “understand the data you are dealing with”. If we look at the plot, there are two characteristics which are obvious. A generally rising trend and a periodicity. We want the algorithm to cover both. Only then, will the prediction be accurate. We see that there is an obvious 12 month periodicity (think of summer vacation, christmas). If we want the algorithm to cover that periodicity without including model knowledge (as we are using an ANN) we have to at least provide it the data in a format to deduct this property.

    Hence: Extending the lookback to 12 month (12 datapoints in the X) will lead to a significantly improved “1 month ahead”-prediction! Now however, we have a higher feature dimension, which might not be desired due to computational reasons (doesn’t matter for this toy example, but anyway…). Next thing we do is take only 3 month steps at lookback (still look back 12 month but skip 2 months in the data). We still cover the periodicity but reduce the feature amount. The algorithm provides almost the same performance for the “1 month ahead” prediction.

    Another possibility would surely be to add the month (Jan, Feb, etc.) as a categorical feature.

  46. Paul July 11, 2017 at 10:59 am #

    Hello Jason! Thanks for the great example! I was looking for this kind of example.
    I’m learning Neural Network these days and trying to predict the number which is temperature like this example, but I have more inputs to predict temperature.
    Then should I edit on the pandas.read.csv(…,usecols[1],…) to usecols[0:4] if I have 5 inputs?

    Thanks in advance!

    Best,
    Paul

    • Paul July 11, 2017 at 11:13 am #

      I mean something like below
      X1 X2 X3 X4 X5 Y1
      380 17.00017 9.099979 4 744 889.7142

      Thank you!

    • Jason Brownlee July 12, 2017 at 9:37 am #

      This post might help you frame your prediction problem:
      http://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/

      • Paul July 24, 2017 at 3:13 pm #

        Thanks for replying me back! 🙂 And sorry for late response.
        When I clicked the link that you wrote, it requires username and password.. :'(

      • Paul July 24, 2017 at 3:16 pm #

        NVM. I figured out it was machinelearningmastery. instead of mlmastery.staging.wpengine.com 🙂

        Thanks. 🙂

        Best,
        Paul

        • Jason Brownlee July 25, 2017 at 9:30 am #

          Yes, for some reason I liked to the staging version of my site, sorry about that.

  47. Barkın Tuncer August 2, 2017 at 7:49 am #

    Hey, I am trying to make a case where the test case is not given but the model should predict the so called future of the timeseries. Hence, I wrote a code which takes the last row of the train data and predict a value from it then put the predicted value at the end of that row and make a prediction again. After doing this procedure for let say len(testX) times. It ended up like an exponential graph. I can upload it if you want to check it out. My code is given below. I dont understand why it works like that. I hope you can enlighten me.

    prediction=numpy.zeros((testX.shape[0],1))
    test_initial=trainX[-1].copy()
    testPredictFirst = model.predict(test_initial.reshape(1,3))
    new_=create_pred(test_initial,testPredictFirst[0][0])
    prediction[0]=testPredictFirst

    for k in range(1,len(testX)):
    testPredict=model.predict(new_.reshape(1,3))
    new_=create_pred(new_,testPredict[0][0]) #this code does if new_ is [1,2,3] and testPredict[0][0] is 4 the output is [2,3,4]

    prediction[k]=testPredict

  48. rohini August 2, 2017 at 8:55 pm #

    really awesome and useful to0

  49. Jay Shah August 15, 2017 at 7:34 pm #

    Hi,

    It’s awesome article. Very Helpful. I implemented these concepts in my Categorical TIme Series Forecasting problem.But the result I got is very unexpected.

    My TIme Series can take only 10 values from 0 to 9. I’ve approx 15k rows of data.I want to predict next value in the time series.

    But the issue is ‘1’ appears in time series most of the time. So starting from 2nd or 3rd epoch LSTM predicts only ‘1’ for whatsoever input. I tried varying Hyperparameter but it’s not working out. Can you please point out what could be the approach to solve the problem?

    • Jason Brownlee August 16, 2017 at 6:32 am #

      Perhaps your problem is too challenging for the chosen model.

      Try testing with an MLP with a large window size. The search hyperparameters of the model.

  50. Patt September 10, 2017 at 5:48 am #

    I’m new to coding. How can I predict t+1 from your example code? I mean from your code I want the value of t+1 or can you more explanation about the code where it predicts t+1.

  51. Dogan September 14, 2017 at 2:57 am #

    Hi Jason,

    Why do you think making the data stationary is a good idea in this approach? I know ARIMA assumes the data is stationary, but is it also valid for neural networks in general? I thought normalization would be enough.

    • Jason Brownlee September 15, 2017 at 12:08 pm #

      Yes, it will make the problem easier to model.

  52. karan September 24, 2017 at 10:55 pm #

    I am getting this error:
    Help me please i am new here. i am using tensorflow

    Traceback (most recent call last):
    File “international-airline-passengers.py”, line 49, in
    testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict
    ValueError: could not broadcast input array from shape (94,1) into shape (46,1)

    • karan September 24, 2017 at 11:35 pm #

      I got my error. It was silly mistake.
      thanks

      • Jason Brownlee September 25, 2017 at 5:38 am #

        Glad to hear you worked it out.

        • Sanam September 26, 2017 at 10:47 pm #

          Hi Jason,

          Thankyou so much for all this . I have a question ! Why the obtained accuracy of regression models in terms of MSE is not good when trained using theano, tensorflow or keras. However , if we try to train MLP or anyother model by using matlabs neural network tool , the models show very good accuraccy in terms of e power negative values. why is that so ?

          • Jason Brownlee September 27, 2017 at 5:41 am #

            Accuracy is a score for classification algorithms that predict a label, RMSE is a score for regression algorithms that predict a quantity.

    • Asif khan July 18, 2018 at 11:42 pm #

      Hi, Firstly Thank you for this tutorial. I am implementing this within my design but I am getting an error in this line:

      –> 128 testPredictPlot[len(trainPredict)+(look_back*2)+1:len(Y1)-1] =
      > testPredict

      of: ValueError: could not broadcast input array from shape (19) into shape
      > (0))

      My complete code is: https://stackoverflow.com/questions/51401060/valueerror-could-not-broadcast-input-array-from-shape-19-into-shape-0/51403185#51403185

      I would really appreciate your help as I know this is probably something small but I cannot get passed it. Thank you

  53. MaCa October 5, 2017 at 3:21 am #

    Hi Jason,

    Maybe I am not understanding something.

    You say something like
    “We can see that the model did a pretty poor job of fitting both the training and the test datasets. It basically predicted the same input value as the output.”
    when talking about the first image. I don’t understand how that prediction is bad. It looks very very good to me. I am asking because I tried your code with my own dataset and I obtained something similar, i.e. it looked perfect except it was slightly shifted. But how is it bad?

    Also in the following section you say
    “Looking at the graph, we can see more structure in the predictions.”
    How do we see the structure? To me it looks like it is less precise than the first one.

    Apologies if I quoted you twice, but I don’t really understand…

  54. Wawan November 2, 2017 at 1:19 am #

    Hi Jason
    Do you how train data in PyCharm with Dynamic CNN
    Please give us more explanation..
    thank you

  55. Alessandro December 15, 2017 at 2:21 am #

    Hi Jason,

    I think I’m a little confused.
    Your post seems to address how to forecast t+1 from t.
    The output however looks pretty poor as it ends up performing as a persistence model.
    What is the value of using keras to achieve the same goal as a persistence model then?
    How would you modify your network to try to perform better than a common persistence model?

    What would the model structure look like?
    Thanks in advance!

    • Jason Brownlee December 15, 2017 at 5:37 am #

      I would recommend an MLP tuned to the problem with many lag variables as input.

  56. Volodymyr December 15, 2017 at 7:40 am #

    Hi Jason, thx for great tutorial, but i cant find value t+1. And can we use it for predicting stock prices?

  57. DC February 28, 2018 at 1:34 pm #

    Hi Jason,
    This article as well as the following comments are really helpful. I have tried this one on stock price prediction with more lookbacks, say 10~30, or more layers. But after I add one more layer into the network, it becomes harder/slower to get the loss decreased, which makes bad result over 10,000+ epochs. Do you have any idea about that?

    Thank you.

  58. Alessandro April 21, 2018 at 2:27 am #

    Dear Jason,

    I’m studying time-series prediction and I was impressed when I saw your results on the airline passengers prediction problem. I was amazed by the fact that the prediction of such a complicated non-linear problem was so perfect!

    However, when I looked at the code, I realised that what you’re showing is not really a prediction, or at least it’s not very fair. In fact, when you predict the results for the testing data, you’re only predicting the results for the next timestamp, and not for the entire sequence.
    To say that in other words, you’re predicting the future of next datapoint, given the previous datapoint.
    Maybe I misunderstood the aim of the problem, but from what I understood, you were trying to predict the passengers for a time in the future, given a previous time in the past.

    To make a fair comparison, it would be interesting to see what happens when the network predicts the future based exclusively on the past data. For example, you can predict the first testing point based on the last training point and then continue the prediction using the previous predictions. I tried doing this, and results are just shit 🙂
    I wonder now how it could be possible to write a network that actually predicts the future events based on the past events. I also tried with your LSTM example, but results were still disappointing…

    Cheers,
    Alessandro

  59. Matúš Vršanský May 1, 2018 at 10:39 pm #

    Hello, I would like to ask you something, what exactly means number of verbose write on one epoch?

    For example, I have “0s – loss: 23647.2512” , and what means that number ?

    • Jason Brownlee May 2, 2018 at 5:41 am #

      Good question.

      It reports how long the epoch took in seconds and the loss (a measure of error) on the samples in the training set for that epoch.

  60. Matúš Vršanský May 5, 2018 at 6:51 pm #

    But why each epoch shows so big loss?

    Example: – 0s – loss: 543.4524 – val_loss: 2389.2405

    … why is loss to big? and in final graph training and testing data are very similar to default dataset?

    • Jason Brownlee May 6, 2018 at 6:27 am #

      Good question, I cannot answer that. I suspect it has something to do with the same of your data. Perhaps you need to rescale your data.

  61. Matúš Vršanský May 8, 2018 at 9:31 pm #

    Understand, and last question, please . This dataset represents airline passagers on which country? Just for curiosity 🙂

  62. Isaac July 9, 2018 at 10:51 am #

    Thanks for the tutorial!

    Do you see any problem with shuffling the data? I.e using ‘numpy.random.shuffle(train_test_data’ to randomly select training and test data?
    (as used here)
    https://stackoverflow.com/questions/42786129/keras-doesnt-make-good-predictions/51234143#51234143

  63. Gladys September 10, 2018 at 8:13 am #

    Hi,
    Thank you for this tutorial. However, when using the exact same code in the loop_back=3 case, it seems the graph is much more similar to the first graph shown (loop_back=1) than the second one! Also, isn’t it a bit confusing to compare the error on test vs train, as the slopes are steeper in the second part of the dataset? What I mean is, if we were to train on the last 67% of the dataset and test on the first 33%, the error on the test set would reduce while the error on the train set would increase. It is kind of confusing to present the results this way (maybe the evaluation measure should be relative to the range in values for the current time-window?)
    Thanks anyway!

Leave a Reply