How to Develop Convolutional Neural Network Models for Time Series Forecasting

Convolutional Neural Network models, or CNNs for short, can be applied to time series forecasting.

There are many types of CNN models that can be used for each specific type of time series forecasting problem.

In this tutorial, you will discover how to develop a suite of CNN models for a range of standard time series forecasting problems.

The objective of this tutorial is to provide standalone examples of each model on each type of time series problem as a template that you can copy and adapt for your specific time series forecasting problem.

After completing this tutorial, you will know:

  • How to develop CNN models for univariate time series forecasting.
  • How to develop CNN models for multivariate time series forecasting.
  • How to develop CNN models for multi-step time series forecasting.

This is a large and important post; you may want to bookmark it for future reference.

Let’s get started.

How to Develop Convolutional Neural Network Models for Time Series Forecasting

How to Develop Convolutional Neural Network Models for Time Series Forecasting
Photo by Bureau of Land Management, some rights reserved.

Tutorial Overview

In this tutorial, we will explore how to develop a suite of different types of CNN models for time series forecasting.

The models are demonstrated on small contrived time series problems intended to give the flavor of the type of time series problem being addressed. The chosen configuration of the models is arbitrary and not optimized for each problem; that was not the goal.

This tutorial is divided into four parts; they are:

  1. Univariate CNN Models
  2. Multivariate CNN Models
  3. Multi-Step CNN Models
  4. Multivariate Multi-Step CNN Models

Univariate CNN Models

Although traditionally developed for two-dimensional image data, CNNs can be used to model univariate time series forecasting problems.

Univariate time series are datasets comprised of a single series of observations with a temporal ordering and a model is required to learn from the series of past observations to predict the next value in the sequence.

This section is divided into two parts; they are:

  1. Data Preparation
  2. CNN Model

Data Preparation

Before a univariate series can be modeled, it must be prepared.

The CNN model will learn a function that maps a sequence of past observations as input to an output observation. As such, the sequence of observations must be transformed into multiple examples from which the model can learn.

Consider a given univariate sequence:

We can divide the sequence into multiple input/output patterns called samples, where three time steps are used as input and one time step is used as output for the one-step prediction that is being learned.

The split_sequence() function below implements this behavior and will split a given univariate sequence into multiple samples where each sample has a specified number of time steps and the output is a single time step.

We can demonstrate this function on our small contrived dataset above.

The complete example is listed below.

Running the example splits the univariate series into six samples where each sample has three input time steps and one output time step.

Now that we know how to prepare a univariate series for modeling, let’s look at developing a CNN model that can learn the mapping of inputs to outputs.

Need help with Deep Learning for Time Series?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

CNN Model

A one-dimensional CNN is a CNN model that has a convolutional hidden layer that operates over a 1D sequence. This is followed by perhaps a second convolutional layer in some cases, such as very long input sequences, and then a pooling layer whose job it is to distill the output of the convolutional layer to the most salient elements.

The convolutional and pooling layers are followed by a dense fully connected layer that interprets the features extracted by the convolutional part of the model. A flatten layer is used between the convolutional layers and the dense layer to reduce the feature maps to a single one-dimensional vector.

We can define a 1D CNN Model for univariate time series forecasting as follows.

Key in the definition is the shape of the input; that is what the model expects as input for each sample in terms of the number of time steps and the number of features.

We are working with a univariate series, so the number of features is one, for one variable.

The number of time steps as input is the number we chose when preparing our dataset as an argument to the split_sequence() function.

The input shape for each sample is specified in the input_shape argument on the definition of the first hidden layer.

We almost always have multiple samples, therefore, the model will expect the input component of training data to have the dimensions or shape:

Our split_sequence() function in the previous section outputs the X with the shape [samples, timesteps], so we can easily reshape it to have an additional dimension for the one feature.

The CNN does not actually view the data as having time steps, instead, it is treated as a sequence over which convolutional read operations can be performed, like a one-dimensional image.

In this example, we define a convolutional layer with 64 filter maps and a kernel size of 2. This is followed by a max pooling layer and a dense layer to interpret the input feature. An output layer is specified that predicts a single numerical value.

The model is fit using the efficient Adam version of stochastic gradient descent and optimized using the mean squared error, or ‘mse‘, loss function.

Once the model is defined, we can fit it on the training dataset.

After the model is fit, we can use it to make a prediction.

We can predict the next value in the sequence by providing the input:

And expecting the model to predict something like:

The model expects the input shape to be three-dimensional with [samples, timesteps, features], therefore, we must reshape the single input sample before making the prediction.

We can tie all of this together and demonstrate how to develop a 1D CNN model for univariate time series forecasting and make a single prediction.

Running the example prepares the data, fits the model, and makes a prediction.

Your results may vary given the stochastic nature of the algorithm; try running the example a few times.

We can see that the model predicts the next value in the sequence.

Multivariate CNN Models

Multivariate time series data means data where there is more than one observation for each time step.

There are two main models that we may require with multivariate time series data; they are:

  1. Multiple Input Series.
  2. Multiple Parallel Series.

Let’s take a look at each in turn.

Multiple Input Series

A problem may have two or more parallel input time series and an output time series that is dependent on the input time series.

The input time series are parallel because each series has observations at the same time steps.

We can demonstrate this with a simple example of two parallel input time series where the output series is the simple addition of the input series.

We can reshape these three arrays of data as a single dataset where each row is a time step and each column is a separate time series.

This is a standard way of storing parallel time series in a CSV file.

The complete example is listed below.

Running the example prints the dataset with one row per time step and one column for each of the two input and one output parallel time series.

As with the univariate time series, we must structure these data into samples with input and output samples.

A 1D CNN model needs sufficient context to learn a mapping from an input sequence to an output value. CNNs can support parallel input time series as separate channels, like red, green, and blue components of an image. Therefore, we need to split the data into samples maintaining the order of observations across the two input sequences.

If we chose three input time steps, then the first sample would look as follows:

Input:

Output:

That is, the first three time steps of each parallel series are provided as input to the model and the model associates this with the value in the output series at the third time step, in this case, 65.

We can see that, in transforming the time series into input/output samples to train the model, that we will have to discard some values from the output time series where we do not have values in the input time series at prior time steps. In turn, the choice of the size of the number of input time steps will have an important effect on how much of the training data is used.

We can define a function named split_sequences() that will take a dataset as we have defined it with rows for time steps and columns for parallel series and return input/output samples.

We can test this function on our dataset using three time steps for each input time series as input.

The complete example is listed below.

Running the example first prints the shape of the X and y components.

We can see that the X component has a three-dimensional structure.

The first dimension is the number of samples, in this case 7. The second dimension is the number of time steps per sample, in this case 3, the value specified to the function. Finally, the last dimension specifies the number of parallel time series or the number of variables, in this case 2 for the two parallel series.

This is the exact three-dimensional structure expected by a 1D CNN as input. The data is ready to use without further reshaping.

We can then see that the input and output for each sample is printed, showing the three time steps for each of the two input series and the associated output for each sample.

We are now ready to fit a 1D CNN model on this data, specifying the expected number of time steps and features to expect for each input sample, in this case three and two respectively.

When making a prediction, the model expects three time steps for two input time series.

We can predict the next value in the output series providing the input values of:

The shape of the one sample with three time steps and two variables must be [1, 3, 2].

We would expect the next value in the sequence to be 100 + 105 or 205.

The complete example is listed below.

Running the example prepares the data, fits the model, and makes a prediction.

There is another, more elaborate way to model the problem.

Each input series can be handled by a separate CNN and the output of each of these submodels can be combined before a prediction is made for the output sequence.

We can refer to this as a multi-headed CNN model. It may offer more flexibility or better performance depending on the specifics of the problem that is being modeled. For example, it allows you to configure each sub-model differently for each input series, such as the number of filter maps and the kernel size.

This type of model can be defined in Keras using the Keras functional API.

First, we can define the first input model as a 1D CNN with an input layer that expects vectors with n_steps and 1 feature.

We can define the second input submodel in the same way.

Now that both input submodels have been defined, we can merge the output from each model into one long vector which can be interpreted before making a prediction for the output sequence.

We can then tie the inputs and outputs together.

The image below provides a schematic for how this model looks, including the shape of the inputs and outputs of each layer.

Plot of Multi-Headed 1D CNN for Multivariate Time Series Forecasting

Plot of Multi-Headed 1D CNN for Multivariate Time Series Forecasting

This model requires input to be provided as a list of two elements where each element in the list contains data for one of the submodels.

In order to achieve this, we can split the 3D input data into two separate arrays of input data; that is from one array with the shape [7, 3, 2] to two 3D arrays with [7, 3, 1]

These data can then be provided in order to fit the model.

Similarly, we must prepare the data for a single sample as two separate two-dimensional arrays when making a single one-step prediction.

We can tie all of this together; the complete example is listed below.

Running the example prepares the data, fits the model, and makes a prediction.

Multiple Parallel Series

An alternate time series problem is the case where there are multiple parallel time series and a value must be predicted for each.

For example, given the data from the previous section:

We may want to predict the value for each of the three time series for the next time step.

This might be referred to as multivariate forecasting.

Again, the data must be split into input/output samples in order to train a model.

The first sample of this dataset would be:

Input:

Output:

The split_sequences() function below will split multiple parallel time series with rows for time steps and one series per column into the required input/output shape.

We can demonstrate this on the contrived problem; the complete example is listed below.

Running the example first prints the shape of the prepared X and y components.

The shape of X is three-dimensional, including the number of samples (6), the number of time steps chosen per sample (3), and the number of parallel time series or features (3).

The shape of y is two-dimensional as we might expect for the number of samples (6) and the number of time variables per sample to be predicted (3).

The data is ready to use in a 1D CNN model that expects three-dimensional input and two-dimensional output shapes for the X and y components of each sample.

Then, each of the samples is printed showing the input and output components of each sample.

We are now ready to fit a 1D CNN model on this data.

In this model, the number of time steps and parallel series (features) are specified for the input layer via the input_shape argument.

The number of parallel series is also used in the specification of the number of values to predict by the model in the output layer; again, this is three.

We can predict the next value in each of the three parallel series by providing an input of three time steps for each series.

The shape of the input for making a single prediction must be 1 sample, 3 time steps, and 3 features, or [1, 3, 3].

We would expect the vector output to be:

We can tie all of this together and demonstrate a 1D CNN for multivariate output time series forecasting below.

Running the example prepares the data, fits the model and makes a prediction.

As with multiple input series, there is another more elaborate way to model the problem.

Each output series can be handled by a separate output CNN model.

We can refer to this as a multi-output CNN model. It may offer more flexibility or better performance depending on the specifics of the problem that is being modeled.

This type of model can be defined in Keras using the Keras functional API.

First, we can define the first input model as a 1D CNN model.

We can then define one output layer for each of the three series that we wish to forecast, where each output submodel will forecast a single time step.

We can then tie the input and output layers together into a single model.

To make the model architecture clear, the schematic below clearly shows the three separate output layers of the model and the input and output shapes of each layer.

Plot of Multi-Output 1D CNN for Multivariate Time Series Forecasting

Plot of Multi-Output 1D CNN for Multivariate Time Series Forecasting

When training the model, it will require three separate output arrays per sample. We can achieve this by converting the output training data that has the shape [7, 3] to three arrays with the shape [7, 1].

These arrays can be provided to the model during training.

Tying all of this together, the complete example is listed below.

Running the example prepares the data, fits the model, and makes a prediction.

Multi-Step CNN Models

In practice, there is little difference to the 1D CNN model in predicting a vector output that represents different output variables (as in the previous example), or a vector output that represents multiple time steps of one variable.

Nevertheless, there are subtle and important differences in the way the training data is prepared. In this section, we will demonstrate the case of developing a multi-step forecast model using a vector model.

Before we look at the specifics of the model, let’s first look at the preparation of data for multi-step forecasting.

Data Preparation

As with one-step forecasting, a time series used for multi-step time series forecasting must be split into samples with input and output components.

Both the input and output components will be comprised of multiple time steps and may or may not have the same number of steps.

For example, given the univariate time series:

We could use the last three time steps as input and forecast the next two time steps.

The first sample would look as follows:

Input:

Output:

The split_sequence() function below implements this behavior and will split a given univariate time series into samples with a specified number of input and output time steps.

We can demonstrate this function on the small contrived dataset.

The complete example is listed below.

Running the example splits the univariate series into input and output time steps and prints the input and output components of each.

Now that we know how to prepare data for multi-step forecasting, let’s look at a 1D CNN model that can learn this mapping.

Vector Output Model

The 1D CNN can output a vector directly that can be interpreted as a multi-step forecast.

This approach was seen in the previous section were one time step of each output time series was forecasted as a vector.

As with the 1D CNN models for univariate data in a prior section, the prepared samples must first be reshaped. The CNN expects data to have a three-dimensional structure of [samples, timesteps, features], and in this case, we only have one feature so the reshape is straightforward.

With the number of input and output steps specified in the n_steps_in and n_steps_out variables, we can define a multi-step time-series forecasting model.

The model can make a prediction for a single sample. We can predict the next two steps beyond the end of the dataset by providing the input:

We would expect the predicted output to be:

As expected by the model, the shape of the single sample of input data when making the prediction must be [1, 3, 1] for the 1 sample, 3 time steps of the input, and the single feature.

Tying all of this together, the 1D CNN for multi-step forecasting with a univariate time series is listed below.

Running the example forecasts and prints the next two time steps in the sequence.

Multivariate Multi-Step CNN Models

In the previous sections, we have looked at univariate, multivariate, and multi-step time series forecasting.

It is possible to mix and match the different types of 1D CNN models presented so far for the different problems. This too applies to time series forecasting problems that involve multivariate and multi-step forecasting, but it may be a little more challenging.

In this section, we will explore short examples of data preparation and modeling for multivariate multi-step time series forecasting as a template to ease this challenge, specifically:

  1. Multiple Input Multi-Step Output.
  2. Multiple Parallel Input and Multi-Step Output.

Perhaps the biggest stumbling block is in the preparation of data, so this is where we will focus our attention.

Multiple Input Multi-Step Output

There are those multivariate time series forecasting problems where the output series is separate but dependent upon the input time series, and multiple time steps are required for the output series.

For example, consider our multivariate time series from a prior section:

We may use three prior time steps of each of the two input time series to predict two time steps of the output time series.

Input:

Output:

The split_sequences() function below implements this behavior.

We can demonstrate this on our contrived dataset. The complete example is listed below.

Running the example first prints the shape of the prepared training data.

We can see that the shape of the input portion of the samples is three-dimensional, comprised of six samples, with three time steps and two variables for the two input time series.

The output portion of the samples is two-dimensional for the six samples and the two time steps for each sample to be predicted.

The prepared samples are then printed to confirm that the data was prepared as we specified.

We can now develop a 1D CNN model for multi-step predictions.

In this case, we will demonstrate a vector output model. The complete example is listed below.

Running the example fits the model and predicts the next two time steps of the output sequence beyond the dataset.

We would expect the next two steps to be [185, 205].

It is a challenging framing of the problem with very little data, and the arbitrarily configured version of the model gets close.

Multiple Parallel Input and Multi-Step Output

A problem with parallel time series may require the prediction of multiple time steps of each time series.

For example, consider our multivariate time series from a prior section:

We may use the last three time steps from each of the three time series as input to the model, and predict the next time steps of each of the three time series as output.

The first sample in the training dataset would be the following.

Input:

Output:

The split_sequences() function below implements this behavior.

We can demonstrate this function on the small contrived dataset.

The complete example is listed below.

Running the example first prints the shape of the prepared training dataset.

We can see that both the input (X) and output (Y) elements of the dataset are three dimensional for the number of samples, time steps, and variables or parallel time series respectively.

The input and output elements of each series are then printed side by side so that we can confirm that the data was prepared as we expected.

We can now develop a 1D CNN model for this dataset.

We will use a vector-output model in this case. As such, we must flatten the three-dimensional structure of the output portion of each sample in order to train the model. This means, instead of predicting two steps for each series, the model is trained on and expected to predict a vector of six numbers directly.

The complete example is listed below.

Running the example fits the model and predicts the values for each of the three time steps for the next two time steps beyond the end of the dataset.

We would expect the values for these series and time steps to be as follows:

We can see that the model forecast gets reasonably close to the expected values.

Summary

In this tutorial, you discovered how to develop a suite of CNN models for a range of standard time series forecasting problems.

Specifically, you learned:

  • How to develop CNN models for univariate time series forecasting.
  • How to develop CNN models for multivariate time series forecasting.
  • How to develop CNN models for multi-step time series forecasting.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.


Develop Deep Learning models for Time Series Today!

Deep Learning for Time Series Forecasting

Develop Your Own Forecasting models in Minutes

…with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Time Series Forecasting

It provides self-study tutorials on topics like: CNNs, LSTMs,
Multivariate Forecasting, Multi-Step Forecasting and much more…

Finally Bring Deep Learning to your Time Series Forecasting Projects

Skip the Academics. Just Results.

Click to learn more.


32 Responses to How to Develop Convolutional Neural Network Models for Time Series Forecasting

  1. JSman November 12, 2018 at 8:44 am #

    Hi Jason,

    Good post (as always)!

    I got a non related question. Recently I have been developed almost exclusively in javascript (both front react and backend with node js). It has been long time i have done asny solid coding in python, hence my skillset is rusty.

    Now, I wonder, how do you see the applying of programming languages for ML apps.
    Tensorflow is running now both inn a browser tf.js as well on the backend with node js (just like python?). That sounds like a great thing – one language for everything. There are also courses on the topic, getting more traction
    https://www.udemy.com/machine-learning-with-javascript/

    Is javascript enough for machine learning apps? or python should be used? Can you please elaborate?

    thanks and regards
    JSman

    • Jason Brownlee November 12, 2018 at 2:08 pm #

      Hmmm, maybe for small apps.

      I cannot imagine being able to convince my team that a JS solution would make more sense, unless the existing system was all JS or it as a front-end demo or something. Or maybe if the model was fit using something fast and used to make predictions in JS.

      Really, you want to use the same tech stack as the rest of the existing system/enterprise.

  2. John November 13, 2018 at 1:33 am #

    Hi Jason,

    A very high quality article for me to learn more about deep learning. It really help me a lots.Please keep sharing the knowledge. Thank you!

    Cheer

  3. Ron November 14, 2018 at 12:21 am #

    Nice site. Just a comment. IMO, It’s a bit pretentious and weak to put the title PhD after your name (” I’m Jason Brownlee PhD…”). You don’t need to validate yourself through a useless degree. You have already earned the respect of all of us through your wonderful work. A mention of your credentials at a bio page would have sufficed. Just my two cents.

    • Jason Brownlee November 14, 2018 at 7:31 am #

      Thanks for the feedback.

      Testing showed me that “phd” splashed around helps with creditability for first time visitors.

      • Armando Mendivil November 20, 2018 at 8:20 am #

        Dr. Brownlee,

        My wife has an MS in Robotics Engineering and is a Registered Professional Engineer. I have a PhD in physics from UT. I Know how hard we both worked for our credentials and I certainly would not call them useless. You earned your credentials BRAVO.

        Armando

        • Jason Brownlee November 20, 2018 at 2:03 pm #

          Agreed. Completing degree a degree not useless, although it may not be required to be a practitioner in a given field (e.g. applied machine learning).

  4. Carlos November 16, 2018 at 7:50 am #

    Thanks Jason for your new clear, detailed and very well explained explanation (as always)!.

  5. khalfi November 16, 2018 at 8:45 am #

    I index an image by a low-level feature (color) as form of a digital vector can i can exploit the current topic for an image clasifier

  6. Andrew C November 16, 2018 at 2:42 pm #

    Thanks Jason for a very detailed explanation of CNN, and the many ways we can approach a time forecasting problem with CNNs.

  7. Samar Ansari November 17, 2018 at 2:56 am #

    Hi Jason,

    I have become a fan, after reading this post of yours.

    I have been trying to use 1D CNNs for one of my network anomaly applications, but somehow couldn’t get them to work effectively.

    This post has all that I need to get my network up and running.

    Thanks.

  8. Linda November 21, 2018 at 5:29 pm #

    Hi Jason
    Your books and posts have been very helpful in igniting my interest in machine learning. I just started learning deep learning and would like to know your approach on generating rain forecast maps given a data set with images (in gif format) of historical precipitation maps. Seeing as the sequence of past observations are images and not numbers like the examples above how would one prepare the image data.(I’m very new to deep learning)

    • Jason Brownlee November 22, 2018 at 6:21 am #

      Perhaps you can use a CNN-LSTM or ConvLSTM to read in the images?

  9. Dude from far east November 27, 2018 at 3:03 am #

    Your site is pure gold and It is becoming my reference! You are making difference, thanks for educating for us. I became a ML engineer now because your hardwork, thanks again!

  10. Thanasis November 28, 2018 at 8:07 am #

    Awesome Jason!

    I would like to know your opinion on this :

    CNN architecture : Input ->Conv1d->Dropout->Conv1d . (There is no Dense Layer, as you noticed!)

    Purpose : Multistep Time series Forecasting. For example, 20 “past” input -> 3 “future” output, (continuous output and input).

    • Jason Brownlee November 28, 2018 at 2:52 pm #

      Use the structure that gives the best performance.

      I generally recommend a Dense layer as the output layer when making predictions so that you can specify the transform and structure of the output.

      • Thanasis November 28, 2018 at 7:58 pm #

        Thank you for your answer!

        In addition, what’s your opinion on using filters in “descending order”,
        I mean Input ->Conv1d(40 filters)->Dropout->Conv1d(20 filters)->Dropout->Conv1d(3 filters).

        P.S. 40,20, 3 are just random numbers.

        • Jason Brownlee November 29, 2018 at 7:39 am #

          Seems odd.

          Don’t seek my permission, use the model architecture that gives the best performance.

  11. Babak November 28, 2018 at 6:41 pm #

    Thanks for providing all this.

    I’ve got a question regarding the input dimension while fitting the model, which in case of Conv1D is [samples, timesteps, features]. Now comparing this with the following article using MLP: https://machinelearningmastery.com/how-to-develop-multilayer-perceptron-models-for-time-series-forecasting/ the dimension becomes [samples, features]. What is the reason for this difference although both models should handel “one dimensional” input?

    • Jason Brownlee November 29, 2018 at 7:37 am #

      The CNN must read across subsequences of the input, therefore a 3D input shape is required, much like LSTMs.

      • Babak December 2, 2018 at 8:40 pm #

        With subsequence you mean the timesteps of each given feature, right?

  12. M. Antonio Dias December 3, 2018 at 7:38 pm #

    Hi Jason,
    Great article!

    After some tests, I believe that I can’t predict the next N sequences since the output y is always dependent on the input x (unless I misunderstood the all concept). If so, what is your advice to predict the next N sequences?

    • Jason Brownlee December 4, 2018 at 6:00 am #

      I recommend testing multiple framings of your problem and multiple techniques in order to discover what works best for your specific dataset.

  13. Mutasem December 5, 2018 at 8:11 pm #

    Thanks a lot Dr. Jason. May Allah bless you , we are excited to watch CNN after implementing it to Shampoo Sales Dataset… Do you have any idea to do this.

Leave a Reply