How to Develop Convolutional Neural Network Models for Time Series Forecasting

Convolutional Neural Network models, or CNNs for short, can be applied to time series forecasting.

There are many types of CNN models that can be used for each specific type of time series forecasting problem.

In this tutorial, you will discover how to develop a suite of CNN models for a range of standard time series forecasting problems.

The objective of this tutorial is to provide standalone examples of each model on each type of time series problem as a template that you can copy and adapt for your specific time series forecasting problem.

After completing this tutorial, you will know:

  • How to develop CNN models for univariate time series forecasting.
  • How to develop CNN models for multivariate time series forecasting.
  • How to develop CNN models for multi-step time series forecasting.

This is a large and important post; you may want to bookmark it for future reference.

Kick-start your project with my new book Deep Learning for Time Series Forecasting, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

How to Develop Convolutional Neural Network Models for Time Series Forecasting

How to Develop Convolutional Neural Network Models for Time Series Forecasting
Photo by Bureau of Land Management, some rights reserved.

Tutorial Overview

In this tutorial, we will explore how to develop a suite of different types of CNN models for time series forecasting.

The models are demonstrated on small contrived time series problems intended to give the flavor of the type of time series problem being addressed. The chosen configuration of the models is arbitrary and not optimized for each problem; that was not the goal.

This tutorial is divided into four parts; they are:

  1. Univariate CNN Models
  2. Multivariate CNN Models
  3. Multi-Step CNN Models
  4. Multivariate Multi-Step CNN Models

Univariate CNN Models

Although traditionally developed for two-dimensional image data, CNNs can be used to model univariate time series forecasting problems.

Univariate time series are datasets comprised of a single series of observations with a temporal ordering and a model is required to learn from the series of past observations to predict the next value in the sequence.

This section is divided into two parts; they are:

  1. Data Preparation
  2. CNN Model

Data Preparation

Before a univariate series can be modeled, it must be prepared.

The CNN model will learn a function that maps a sequence of past observations as input to an output observation. As such, the sequence of observations must be transformed into multiple examples from which the model can learn.

Consider a given univariate sequence:

We can divide the sequence into multiple input/output patterns called samples, where three time steps are used as input and one time step is used as output for the one-step prediction that is being learned.

The split_sequence() function below implements this behavior and will split a given univariate sequence into multiple samples where each sample has a specified number of time steps and the output is a single time step.

We can demonstrate this function on our small contrived dataset above.

The complete example is listed below.

Running the example splits the univariate series into six samples where each sample has three input time steps and one output time step.

Now that we know how to prepare a univariate series for modeling, let’s look at developing a CNN model that can learn the mapping of inputs to outputs.

Need help with Deep Learning for Time Series?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

CNN Model

A one-dimensional CNN is a CNN model that has a convolutional hidden layer that operates over a 1D sequence. This is followed by perhaps a second convolutional layer in some cases, such as very long input sequences, and then a pooling layer whose job it is to distill the output of the convolutional layer to the most salient elements.

The convolutional and pooling layers are followed by a dense fully connected layer that interprets the features extracted by the convolutional part of the model. A flatten layer is used between the convolutional layers and the dense layer to reduce the feature maps to a single one-dimensional vector.

We can define a 1D CNN Model for univariate time series forecasting as follows.

Key in the definition is the shape of the input; that is what the model expects as input for each sample in terms of the number of time steps and the number of features.

We are working with a univariate series, so the number of features is one, for one variable.

The number of time steps as input is the number we chose when preparing our dataset as an argument to the split_sequence() function.

The input shape for each sample is specified in the input_shape argument on the definition of the first hidden layer.

We almost always have multiple samples, therefore, the model will expect the input component of training data to have the dimensions or shape:

Our split_sequence() function in the previous section outputs the X with the shape [samples, timesteps], so we can easily reshape it to have an additional dimension for the one feature.

The CNN does not actually view the data as having time steps, instead, it is treated as a sequence over which convolutional read operations can be performed, like a one-dimensional image.

In this example, we define a convolutional layer with 64 filter maps and a kernel size of 2. This is followed by a max pooling layer and a dense layer to interpret the input feature. An output layer is specified that predicts a single numerical value.

The model is fit using the efficient Adam version of stochastic gradient descent and optimized using the mean squared error, or ‘mse‘, loss function.

Once the model is defined, we can fit it on the training dataset.

After the model is fit, we can use it to make a prediction.

We can predict the next value in the sequence by providing the input:

And expecting the model to predict something like:

The model expects the input shape to be three-dimensional with [samples, timesteps, features], therefore, we must reshape the single input sample before making the prediction.

We can tie all of this together and demonstrate how to develop a 1D CNN model for univariate time series forecasting and make a single prediction.

Running the example prepares the data, fits the model, and makes a prediction.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

We can see that the model predicts the next value in the sequence.

Multivariate CNN Models

Multivariate time series data means data where there is more than one observation for each time step.

There are two main models that we may require with multivariate time series data; they are:

  1. Multiple Input Series.
  2. Multiple Parallel Series.

Let’s take a look at each in turn.

Multiple Input Series

A problem may have two or more parallel input time series and an output time series that is dependent on the input time series.

The input time series are parallel because each series has observations at the same time steps.

We can demonstrate this with a simple example of two parallel input time series where the output series is the simple addition of the input series.

We can reshape these three arrays of data as a single dataset where each row is a time step and each column is a separate time series.

This is a standard way of storing parallel time series in a CSV file.

The complete example is listed below.

Running the example prints the dataset with one row per time step and one column for each of the two input and one output parallel time series.

As with the univariate time series, we must structure these data into samples with input and output samples.

A 1D CNN model needs sufficient context to learn a mapping from an input sequence to an output value. CNNs can support parallel input time series as separate channels, like red, green, and blue components of an image. Therefore, we need to split the data into samples maintaining the order of observations across the two input sequences.

If we chose three input time steps, then the first sample would look as follows:

Input:

Output:

That is, the first three time steps of each parallel series are provided as input to the model and the model associates this with the value in the output series at the third time step, in this case, 65.

We can see that, in transforming the time series into input/output samples to train the model, that we will have to discard some values from the output time series where we do not have values in the input time series at prior time steps. In turn, the choice of the size of the number of input time steps will have an important effect on how much of the training data is used.

We can define a function named split_sequences() that will take a dataset as we have defined it with rows for time steps and columns for parallel series and return input/output samples.

We can test this function on our dataset using three time steps for each input time series as input.

The complete example is listed below.

Running the example first prints the shape of the X and y components.

We can see that the X component has a three-dimensional structure.

The first dimension is the number of samples, in this case 7. The second dimension is the number of time steps per sample, in this case 3, the value specified to the function. Finally, the last dimension specifies the number of parallel time series or the number of variables, in this case 2 for the two parallel series.

This is the exact three-dimensional structure expected by a 1D CNN as input. The data is ready to use without further reshaping.

We can then see that the input and output for each sample is printed, showing the three time steps for each of the two input series and the associated output for each sample.

We are now ready to fit a 1D CNN model on this data, specifying the expected number of time steps and features to expect for each input sample, in this case three and two respectively.

When making a prediction, the model expects three time steps for two input time series.

We can predict the next value in the output series providing the input values of:

The shape of the one sample with three time steps and two variables must be [1, 3, 2].

We would expect the next value in the sequence to be 100 + 105 or 205.

The complete example is listed below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example prepares the data, fits the model, and makes a prediction.

There is another, more elaborate way to model the problem.

Each input series can be handled by a separate CNN and the output of each of these submodels can be combined before a prediction is made for the output sequence.

We can refer to this as a multi-headed CNN model. It may offer more flexibility or better performance depending on the specifics of the problem that is being modeled. For example, it allows you to configure each sub-model differently for each input series, such as the number of filter maps and the kernel size.

This type of model can be defined in Keras using the Keras functional API.

First, we can define the first input model as a 1D CNN with an input layer that expects vectors with n_steps and 1 feature.

We can define the second input submodel in the same way.

Now that both input submodels have been defined, we can merge the output from each model into one long vector which can be interpreted before making a prediction for the output sequence.

We can then tie the inputs and outputs together.

The image below provides a schematic for how this model looks, including the shape of the inputs and outputs of each layer.

Plot of Multi-Headed 1D CNN for Multivariate Time Series Forecasting

Plot of Multi-Headed 1D CNN for Multivariate Time Series Forecasting

This model requires input to be provided as a list of two elements where each element in the list contains data for one of the submodels.

In order to achieve this, we can split the 3D input data into two separate arrays of input data; that is from one array with the shape [7, 3, 2] to two 3D arrays with [7, 3, 1]

These data can then be provided in order to fit the model.

Similarly, we must prepare the data for a single sample as two separate two-dimensional arrays when making a single one-step prediction.

We can tie all of this together; the complete example is listed below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example prepares the data, fits the model, and makes a prediction.

Multiple Parallel Series

An alternate time series problem is the case where there are multiple parallel time series and a value must be predicted for each.

For example, given the data from the previous section:

We may want to predict the value for each of the three time series for the next time step.

This might be referred to as multivariate forecasting.

Again, the data must be split into input/output samples in order to train a model.

The first sample of this dataset would be:

Input:

Output:

The split_sequences() function below will split multiple parallel time series with rows for time steps and one series per column into the required input/output shape.

We can demonstrate this on the contrived problem; the complete example is listed below.

Running the example first prints the shape of the prepared X and y components.

The shape of X is three-dimensional, including the number of samples (6), the number of time steps chosen per sample (3), and the number of parallel time series or features (3).

The shape of y is two-dimensional as we might expect for the number of samples (6) and the number of time variables per sample to be predicted (3).

The data is ready to use in a 1D CNN model that expects three-dimensional input and two-dimensional output shapes for the X and y components of each sample.

Then, each of the samples is printed showing the input and output components of each sample.

We are now ready to fit a 1D CNN model on this data.

In this model, the number of time steps and parallel series (features) are specified for the input layer via the input_shape argument.

The number of parallel series is also used in the specification of the number of values to predict by the model in the output layer; again, this is three.

We can predict the next value in each of the three parallel series by providing an input of three time steps for each series.

The shape of the input for making a single prediction must be 1 sample, 3 time steps, and 3 features, or [1, 3, 3].

We would expect the vector output to be:

We can tie all of this together and demonstrate a 1D CNN for multivariate output time series forecasting below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example prepares the data, fits the model and makes a prediction.

As with multiple input series, there is another more elaborate way to model the problem.

Each output series can be handled by a separate output CNN model.

We can refer to this as a multi-output CNN model. It may offer more flexibility or better performance depending on the specifics of the problem that is being modeled.

This type of model can be defined in Keras using the Keras functional API.

First, we can define the first input model as a 1D CNN model.

We can then define one output layer for each of the three series that we wish to forecast, where each output submodel will forecast a single time step.

We can then tie the input and output layers together into a single model.

To make the model architecture clear, the schematic below clearly shows the three separate output layers of the model and the input and output shapes of each layer.

Plot of Multi-Output 1D CNN for Multivariate Time Series Forecasting

Plot of Multi-Output 1D CNN for Multivariate Time Series Forecasting

When training the model, it will require three separate output arrays per sample. We can achieve this by converting the output training data that has the shape [7, 3] to three arrays with the shape [7, 1].

These arrays can be provided to the model during training.

Tying all of this together, the complete example is listed below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example prepares the data, fits the model, and makes a prediction.

Multi-Step CNN Models

In practice, there is little difference to the 1D CNN model in predicting a vector output that represents different output variables (as in the previous example), or a vector output that represents multiple time steps of one variable.

Nevertheless, there are subtle and important differences in the way the training data is prepared. In this section, we will demonstrate the case of developing a multi-step forecast model using a vector model.

Before we look at the specifics of the model, let’s first look at the preparation of data for multi-step forecasting.

Data Preparation

As with one-step forecasting, a time series used for multi-step time series forecasting must be split into samples with input and output components.

Both the input and output components will be comprised of multiple time steps and may or may not have the same number of steps.

For example, given the univariate time series:

We could use the last three time steps as input and forecast the next two time steps.

The first sample would look as follows:

Input:

Output:

The split_sequence() function below implements this behavior and will split a given univariate time series into samples with a specified number of input and output time steps.

We can demonstrate this function on the small contrived dataset.

The complete example is listed below.

Running the example splits the univariate series into input and output time steps and prints the input and output components of each.

Now that we know how to prepare data for multi-step forecasting, let’s look at a 1D CNN model that can learn this mapping.

Vector Output Model

The 1D CNN can output a vector directly that can be interpreted as a multi-step forecast.

This approach was seen in the previous section were one time step of each output time series was forecasted as a vector.

As with the 1D CNN models for univariate data in a prior section, the prepared samples must first be reshaped. The CNN expects data to have a three-dimensional structure of [samples, timesteps, features], and in this case, we only have one feature so the reshape is straightforward.

With the number of input and output steps specified in the n_steps_in and n_steps_out variables, we can define a multi-step time-series forecasting model.

The model can make a prediction for a single sample. We can predict the next two steps beyond the end of the dataset by providing the input:

We would expect the predicted output to be:

As expected by the model, the shape of the single sample of input data when making the prediction must be [1, 3, 1] for the 1 sample, 3 time steps of the input, and the single feature.

Tying all of this together, the 1D CNN for multi-step forecasting with a univariate time series is listed below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example forecasts and prints the next two time steps in the sequence.

Multivariate Multi-Step CNN Models

In the previous sections, we have looked at univariate, multivariate, and multi-step time series forecasting.

It is possible to mix and match the different types of 1D CNN models presented so far for the different problems. This too applies to time series forecasting problems that involve multivariate and multi-step forecasting, but it may be a little more challenging.

In this section, we will explore short examples of data preparation and modeling for multivariate multi-step time series forecasting as a template to ease this challenge, specifically:

  1. Multiple Input Multi-Step Output.
  2. Multiple Parallel Input and Multi-Step Output.

Perhaps the biggest stumbling block is in the preparation of data, so this is where we will focus our attention.

Multiple Input Multi-Step Output

There are those multivariate time series forecasting problems where the output series is separate but dependent upon the input time series, and multiple time steps are required for the output series.

For example, consider our multivariate time series from a prior section:

We may use three prior time steps of each of the two input time series to predict two time steps of the output time series.

Input:

Output:

The split_sequences() function below implements this behavior.

We can demonstrate this on our contrived dataset. The complete example is listed below.

Running the example first prints the shape of the prepared training data.

We can see that the shape of the input portion of the samples is three-dimensional, comprised of six samples, with three time steps and two variables for the two input time series.

The output portion of the samples is two-dimensional for the six samples and the two time steps for each sample to be predicted.

The prepared samples are then printed to confirm that the data was prepared as we specified.

We can now develop a 1D CNN model for multi-step predictions.

In this case, we will demonstrate a vector output model. The complete example is listed below.

Running the example fits the model and predicts the next two time steps of the output sequence beyond the dataset.

We would expect the next two steps to be [185, 205].

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

It is a challenging framing of the problem with very little data, and the arbitrarily configured version of the model gets close.

Multiple Parallel Input and Multi-Step Output

A problem with parallel time series may require the prediction of multiple time steps of each time series.

For example, consider our multivariate time series from a prior section:

We may use the last three time steps from each of the three time series as input to the model, and predict the next time steps of each of the three time series as output.

The first sample in the training dataset would be the following.

Input:

Output:

The split_sequences() function below implements this behavior.

We can demonstrate this function on the small contrived dataset.

The complete example is listed below.

Running the example first prints the shape of the prepared training dataset.

We can see that both the input (X) and output (Y) elements of the dataset are three dimensional for the number of samples, time steps, and variables or parallel time series respectively.

The input and output elements of each series are then printed side by side so that we can confirm that the data was prepared as we expected.

We can now develop a 1D CNN model for this dataset.

We will use a vector-output model in this case. As such, we must flatten the three-dimensional structure of the output portion of each sample in order to train the model. This means, instead of predicting two steps for each series, the model is trained on and expected to predict a vector of six numbers directly.

The complete example is listed below.

Running the example fits the model and predicts the values for each of the three time steps for the next two time steps beyond the end of the dataset.

We would expect the values for these series and time steps to be as follows:

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

We can see that the model forecast gets reasonably close to the expected values.

Summary

In this tutorial, you discovered how to develop a suite of CNN models for a range of standard time series forecasting problems.

Specifically, you learned:

  • How to develop CNN models for univariate time series forecasting.
  • How to develop CNN models for multivariate time series forecasting.
  • How to develop CNN models for multi-step time series forecasting.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning models for Time Series Today!

Deep Learning for Time Series Forecasting

Develop Your Own Forecasting models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Time Series Forecasting

It provides self-study tutorials on topics like:
CNNs, LSTMs, Multivariate Forecasting, Multi-Step Forecasting and much more...

Finally Bring Deep Learning to your Time Series Forecasting Projects

Skip the Academics. Just Results.

See What's Inside

300 Responses to How to Develop Convolutional Neural Network Models for Time Series Forecasting

  1. Avatar
    JSman November 12, 2018 at 8:44 am #

    Hi Jason,

    Good post (as always)!

    I got a non related question. Recently I have been developed almost exclusively in javascript (both front react and backend with node js). It has been long time i have done asny solid coding in python, hence my skillset is rusty.

    Now, I wonder, how do you see the applying of programming languages for ML apps.
    Tensorflow is running now both inn a browser tf.js as well on the backend with node js (just like python?). That sounds like a great thing – one language for everything. There are also courses on the topic, getting more traction
    https://www.udemy.com/machine-learning-with-javascript/

    Is javascript enough for machine learning apps? or python should be used? Can you please elaborate?

    thanks and regards
    JSman

  2. Avatar
    John November 13, 2018 at 1:33 am #

    Hi Jason,

    A very high quality article for me to learn more about deep learning. It really help me a lots.Please keep sharing the knowledge. Thank you!

    Cheer

    • Avatar
      Jason Brownlee November 13, 2018 at 5:49 am #

      Thanks, I’m glad to hear that.

    • Avatar
      Mosaab April 9, 2020 at 6:10 pm #

      Thank you so much for such an informative article, I have learnt a lot.

  3. Avatar
    Ron November 14, 2018 at 12:21 am #

    Nice site. Just a comment. IMO, It’s a bit pretentious and weak to put the title PhD after your name (” I’m Jason Brownlee PhD…”). You don’t need to validate yourself through a useless degree. You have already earned the respect of all of us through your wonderful work. A mention of your credentials at a bio page would have sufficed. Just my two cents.

    • Avatar
      Jason Brownlee November 14, 2018 at 7:31 am #

      Thanks for the feedback.

      Testing showed me that “phd” splashed around helps with creditability for first time visitors.

      • Avatar
        Armando Mendivil November 20, 2018 at 8:20 am #

        Dr. Brownlee,

        My wife has an MS in Robotics Engineering and is a Registered Professional Engineer. I have a PhD in physics from UT. I Know how hard we both worked for our credentials and I certainly would not call them useless. You earned your credentials BRAVO.

        Armando

        • Avatar
          Jason Brownlee November 20, 2018 at 2:03 pm #

          Agreed. Completing degree a degree not useless, although it may not be required to be a practitioner in a given field (e.g. applied machine learning).

          • Avatar
            Suyash August 28, 2019 at 3:29 pm #

            How to increase the number of prediction???? Where in code plz tell

          • Avatar
            Jason Brownlee August 29, 2019 at 5:59 am #

            What do you mean by the number of prediction, do you mean time steps?

            If so, you can start with one of the multi-step forecasting examples and adapt it for your needs.

  4. Avatar
    Carlos November 16, 2018 at 7:50 am #

    Thanks Jason for your new clear, detailed and very well explained explanation (as always)!.

    • Avatar
      Jason Brownlee November 16, 2018 at 1:55 pm #

      I’m glad it helped.

    • Avatar
      Karndeep Singh November 8, 2021 at 7:01 pm #

      Hi Thanks for this wonderful article.
      Please help me to understand when we can use LSTMs and CNNs for Time series forecasting?

      • Avatar
        Adrian Tam November 14, 2021 at 12:21 pm #

        I think the best way is to test out both. It is hard to tell which works on what scenarios. But you can think in this way: CNN is memoryless and look at a window at once, but LSTM is stateful with cell state and hidden state built up as you feed in the data. Which one sounds more reasonable for your data? That might be the choice you want to explore first.

  5. Avatar
    khalfi November 16, 2018 at 8:45 am #

    I index an image by a low-level feature (color) as form of a digital vector can i can exploit the current topic for an image clasifier

  6. Avatar
    Andrew C November 16, 2018 at 2:42 pm #

    Thanks Jason for a very detailed explanation of CNN, and the many ways we can approach a time forecasting problem with CNNs.

  7. Avatar
    Samar Ansari November 17, 2018 at 2:56 am #

    Hi Jason,

    I have become a fan, after reading this post of yours.

    I have been trying to use 1D CNNs for one of my network anomaly applications, but somehow couldn’t get them to work effectively.

    This post has all that I need to get my network up and running.

    Thanks.

  8. Avatar
    Linda November 21, 2018 at 5:29 pm #

    Hi Jason
    Your books and posts have been very helpful in igniting my interest in machine learning. I just started learning deep learning and would like to know your approach on generating rain forecast maps given a data set with images (in gif format) of historical precipitation maps. Seeing as the sequence of past observations are images and not numbers like the examples above how would one prepare the image data.(I’m very new to deep learning)

  9. Avatar
    Dude from far east November 27, 2018 at 3:03 am #

    Your site is pure gold and It is becoming my reference! You are making difference, thanks for educating for us. I became a ML engineer now because your hardwork, thanks again!

  10. Avatar
    Thanasis November 28, 2018 at 8:07 am #

    Awesome Jason!

    I would like to know your opinion on this :

    CNN architecture : Input ->Conv1d->Dropout->Conv1d . (There is no Dense Layer, as you noticed!)

    Purpose : Multistep Time series Forecasting. For example, 20 “past” input -> 3 “future” output, (continuous output and input).

    • Avatar
      Jason Brownlee November 28, 2018 at 2:52 pm #

      Use the structure that gives the best performance.

      I generally recommend a Dense layer as the output layer when making predictions so that you can specify the transform and structure of the output.

      • Avatar
        Thanasis November 28, 2018 at 7:58 pm #

        Thank you for your answer!

        In addition, what’s your opinion on using filters in “descending order”,
        I mean Input ->Conv1d(40 filters)->Dropout->Conv1d(20 filters)->Dropout->Conv1d(3 filters).

        P.S. 40,20, 3 are just random numbers.

        • Avatar
          Jason Brownlee November 29, 2018 at 7:39 am #

          Seems odd.

          Don’t seek my permission, use the model architecture that gives the best performance.

  11. Avatar
    Babak November 28, 2018 at 6:41 pm #

    Thanks for providing all this.

    I’ve got a question regarding the input dimension while fitting the model, which in case of Conv1D is [samples, timesteps, features]. Now comparing this with the following article using MLP: https://machinelearningmastery.com/how-to-develop-multilayer-perceptron-models-for-time-series-forecasting/ the dimension becomes [samples, features]. What is the reason for this difference although both models should handel “one dimensional” input?

    • Avatar
      Jason Brownlee November 29, 2018 at 7:37 am #

      The CNN must read across subsequences of the input, therefore a 3D input shape is required, much like LSTMs.

      • Avatar
        Babak December 2, 2018 at 8:40 pm #

        With subsequence you mean the timesteps of each given feature, right?

  12. Avatar
    M. Antonio Dias December 3, 2018 at 7:38 pm #

    Hi Jason,
    Great article!

    After some tests, I believe that I can’t predict the next N sequences since the output y is always dependent on the input x (unless I misunderstood the all concept). If so, what is your advice to predict the next N sequences?

    • Avatar
      Jason Brownlee December 4, 2018 at 6:00 am #

      I recommend testing multiple framings of your problem and multiple techniques in order to discover what works best for your specific dataset.

  13. Avatar
    Mutasem December 5, 2018 at 8:11 pm #

    Thanks a lot Dr. Jason. May Allah bless you , we are excited to watch CNN after implementing it to Shampoo Sales Dataset… Do you have any idea to do this.

  14. Avatar
    Tom Schwörer December 16, 2018 at 2:56 am #

    Hi Jason,

    great article, thank you!

    I have a question though: could you tell me what the data structure of
    X1 = X[:, :, 0].reshape(X.shape[0], X.shape[1], n_features)
    X2 = X[:, :, 1].reshape(X.shape[0], X.shape[1], n_features)

    in the second example of the multiple input series looks like? As an exercise I’m recreating the code using tensorflow.js and while the code is mostly easy to translate, the data structures in python – a language I’m not really familiar with in detail – often get confusing.

    Most of the time you have shown a plain example of the input data, but not in this case. So it’s kind of hard for me to understand how you split the data in detail and what you feed into the two visible parts of the network.

    Thanks in advance!
    Tom

  15. Avatar
    Ather Abbas December 19, 2018 at 11:05 pm #

    Hello Jason,
    Thank you for your wonderful tutorials. I have a question (sorry if it looks stupid as I am a beginner), if we have 2 outputs from our NN, is it possible to customize the link of certain nodes from last hidden layer to certain output nodes? e.g. if we have two output nodes and 4 nodes in last hidden layer, is it possible that we link 2 nodes from last hidden layer to a specific node in output layer and other 2 nodes in last hidden layer to the other node in the output layer. If yes, can you refer me to relevant literature? I have drawn a rough sketch here. https://imgur.com/a/w8YnRwq

    • Avatar
      Jason Brownlee December 20, 2018 at 6:25 am #

      I’m sure you can, but I don’t have an example sorry.

      Perhaps try setting the weights to zero after training?

      • Avatar
        Ather Abbas December 20, 2018 at 11:48 am #

        Thank you very much for your response. Can you please elaborate it a little more? Do you mean by setting certain weights which affect these particular ‘connections’ as zero? and why did you say ‘after training’?

        • Avatar
          Jason Brownlee December 20, 2018 at 2:00 pm #

          Yes, because I don’t think you can do it other ways (e.g. disable weights). Perhaps you can find a better approach.

  16. Avatar
    dani December 20, 2018 at 12:47 am #

    if we have excel file with 40000 rows and two column than how i can transform to 2D or 3D array as you have taken just 5 number sequence?

  17. Avatar
    sanker February 22, 2019 at 3:34 am #

    i got this error

    ValueError: Negative dimension size caused by subtracting 3 from 2 for ‘conv2d_25/convolution’ (op: ‘Conv2D’) with input shapes: [?,200,2,48], [3,3,48,13].

  18. Avatar
    Vital March 8, 2019 at 1:33 pm #

    Hi,

    I’m trying to implement “Multi-Step CNN Model” on a time serie so i’m using a 1D convolutional network.

    I use a time sequence of 7 weeks as the number of steps in and 40 weeks as the number of weeks to predict.

    Is that a bad idea?

    Should the number of steps in always be greater or equal to the number of outputs?

    Thanks.

    • Avatar
      Jason Brownlee March 8, 2019 at 2:22 pm #

      I recommend testing a range of diffrent approaches in order to discover what works best for your specific dataset.

      • Avatar
        Vital March 8, 2019 at 3:00 pm #

        Thank you for the very fast response!

        With 7 steps in and 40 steps out I get a good MAPE of about 4%.
        Even though its a good error rate, my intuition is telling me that using values in the last 7 weeks to predict values for 40 weeks in the future might not be very believable by the end user of the prediction (forecast). What I mean is that the CNN is trained on patterns in those 7 weeks and then is able to predict the pattern 40 weeks in future?

        I may be misinterpreting the whole definitions of the time steps in and out so any clarification from you will be greatly appreciated!

        I also tried 40 steps in and 40 steps out which yields a MAPE of about 10-12%.

        I think a possible reason is my time series has an upward trend with seasonal spikes every 52 weeks and so when the CNN is training it gets “confused” by the spikes which makes the rest of predictions have a higher error rate. Is there any tricks in CNNs to combat that?

        Thank you for taking the time to help me!

  19. Avatar
    Constantine March 21, 2019 at 9:34 am #

    Hello! I ‘ve been fighting the problem of utilizing the Conv1D for several hours now, and for the life of me, I can’t get it to work no matter what I do. Following your ‘Multivariate CNN’ code, I have a dataset of a pandas data frame of dimension (9666,10) [9 features and the 10th column my y), which I convert to numpy array before I run any further operations, and then use the split_sequences function with n_steps = 3, which gives me X of dimension (9664, 3, 9) and y of (9664,). When I run it gives me the “ValueError: Error when checking target: expected conv1d_25 to have 3 dimensions, but got array with shape (9664, 1)”.

    Could you please help me out? I cannot believe it won’t work after so much effort

    • Avatar
      Jason Brownlee March 21, 2019 at 2:21 pm #

      That is odd, what type of output layer do you have?

      It sounds like you might have a decoder output model attached?

      • Avatar
        Constantine March 21, 2019 at 11:10 pm #

        Firstly, thanks a lot for prompt assistance!

        I was only using the very first 1DConv layer just to check if the input was correct. When I added a Flatten() and then a Dense(1) as the output layer, it worked! I did not know that using only the 1D layer would result in such a strange dimensionality error.

        Another question, now that I got it to work: When I use “adam” as the optimizer it works fine, but when I switch it to ‘sgd’ it gives me ‘nan’ as the loss, starting from the very first Epoch, with the above data. What could that be?

  20. Avatar
    Jim Avazpour March 28, 2019 at 6:42 pm #

    Hi Jason,

    Regarding Conv1D, is there a rule of thumb for figuring out the correct number for filters and kernels?

    Thanks.

  21. Avatar
    Xu Zhang April 12, 2019 at 11:07 am #

    A great article again. Thank you so much.

    If I have a structured data set, such as Titanic data set, is it possible to use 1D convolutional NN to train this dataset? I think it is possible, but I don’t know if it is more feasible and better performance.

    oringinal X.shape = (sample, no_features)
    reshape X to X.shape = (sample, no_feature, 1)

    then use several 1D cnn layers to reduce the size of no_feature, finally use one or two dense layer to do classification.

    Your oppions are highly appreciated

    • Avatar
      Jason Brownlee April 12, 2019 at 2:44 pm #

      No, it would only be appropriate for sequence input. E.g. data with spatial or temporal relationship across input features.

      • Avatar
        Xu Zhang April 13, 2019 at 3:28 am #

        Thank you Jason!

      • Avatar
        Xu Zhang April 19, 2019 at 5:24 am #

        Hi Jason,

        I just read a paper about using CNN to tabular data. Please have a look.

        https://arxiv.org/pdf/1903.06246v1.pdf

        • Avatar
          Jason Brownlee April 19, 2019 at 6:20 am #

          What did you learn from it?

          • Avatar
            Xu Zhang April 24, 2019 at 11:39 am #

            I learned that if the collected data can be transfer into the 2D image data or 2D matrices, we can train them using the pre-trained models. Especially. when we only have a small dataset.
            However, in this paper, their transformation is hard to understand. I can’t figure out what the model learned? What are your opinions?

          • Avatar
            Jason Brownlee April 24, 2019 at 1:58 pm #

            Perhaps contact the author of the paper with your question about their method?

  22. Avatar
    Sramctc April 17, 2019 at 11:19 am #

    Dear Jason,

    Having over thousands of time-series data ( .CSV) will be used for training, for example, intra-day stock prices, I am asked to solve a problem which is to predict if a stock will rise or drop. I have no idea how to start with, says, using RNN or CNN, LSTM? or just simple classifier. Besides, I think I will use the first hour data to predict the trend.
    0001.CSV: [D1,D2……, D60] (input), [Min,Max] (Output)(should I say it “y”?)
    0002.CSV: [D1,D2……, D60] (input), [Min,Max] (Output)
    ……
    3680.CSV: [D1,D2……, D60] (input), [Min,Max] (Output)

    which models above is appropriate to do that? Thanks a lot

  23. Avatar
    Halim May 5, 2019 at 12:38 pm #

    Excuse me, your web page will be apply to my thesis for my reference. Do you have a book for discussion like this learning?

  24. Avatar
    Dan May 10, 2019 at 12:23 pm #

    Thank you very much for another great post.

    I’m confused with the two examples of the Multivariate Multi-Step CNN Models.
    You said that the model “predicts the next two time-steps of the output sequence beyond the dataset”.

    In the ‘Multiple Input Multi-Step Output’ : “..We would expect the next two steps to be [185, 205]” and in the ‘Multiple Parallel Input and Multi-Step Output’: ‘We would expect the values for these series and time steps to be as follows:[ 90, 95, 185 ] , [ 100, 105, 205].

    My question:
    In both examples the first expected output value -185 (first example) and [90,95,185] (second example) are part of the dataset (not beyond) and were in the training set, so why we need to ‘predict’ them when the model has seen them?
    isn’t it only one time-step prediction of the third feature (the out-seq)?

  25. Avatar
    aiedu May 30, 2019 at 9:25 pm #

    Hi Jason

    Pardon my ignorance, but in the Multivariate CNN Models, I am struggling to understand why the model ignores the prior results of the previous time steps. Is it because CNN is borrowed from an image recognition frame work that we cannot do something like ( I am assuming here that the 2 first columns are independent variables, and the third the dependent one, and each line is 3 time steps.

    Input

    [ 10 15 25 ]
    [ 20 25 45 ]
    [ 30 35 ? ] ( not sure what encoding the missing values should take here)

    Output

    [65]

    Thanks

    • Avatar
      Jason Brownlee May 31, 2019 at 7:49 am #

      I’m not sure I follow, sorry. Can you elaborate, which example are you referring to exactly?

      • Avatar
        aideu May 31, 2019 at 6:26 pm #

        Thanks for your time: Your example in the section “Multivariate CNN Models”
        , shows the structure of 1 data point as :

        “If we chose three input time steps, then the first sample would look as follows:”

        Input:

        1 10, 15
        2 10, 15
        3 30, 35

        Output:
        1 65

        It seems to me that there is as much to learn, given that the third column is a linear combination of the first 2, from the item 1,2 as there is from the item 3 for that sample. As in the output are all linear combination of columns 1 and 2. But the model dismisses using all the data available ( value 25 for item 1 and value 45 for item 2
        ) in the model. I thought that letting the network study the linear relationship not only at item 3 but also at item 1 and 2 would improve the results. So I was asking why not using that data structure instead:

        Input

        10 15 25
        20 25 45
        30 35 ?

        Output

        65

        instead of just

        1 10, 15
        2 10, 15
        3 30, 35

        Output:
        1 65

        that’s because 10+15 adds no value to getting to know the relationship 30+35=65
        while knowing that 10+15=25 at item 1, might help understanding the relationship 30+35=65 for that sample? (I was thinking here in a more general time series case than in this particular example. where for example the residual of 10+15 vs 25 might mean something to the residual of 30+35 vs 65)

        Thanks

        • Avatar
          Jason Brownlee June 1, 2019 at 6:12 am #

          Sure, you can use any framing of the prediction problem you wish.

          The idea of this post is to give you many examples or different framings that you can use as a starting point for your own problem.

  26. Avatar
    gustavz July 11, 2019 at 5:25 pm #

    Hi Jason,

    would it be possible to make the model able to take any input size if you make it fully convolutional, by exchanging the dense layers by a 1×1 convolution?

    Then it would not be necessary to fix the input_shape which would make the model be able to do a multi step prediction of a fixed length independent from the input length.

    Am I correct with this assumption? If yes why is this never addressed in your tutorials?

    • Avatar
      Jason Brownlee July 12, 2019 at 8:29 am #

      Perhaps, but not with Keras – it likes to nail down all shapes and sizes so it can optimize the graph.

  27. Avatar
    wang hui July 15, 2019 at 5:00 am #

    hi,jason.thank you for your tutorial. I want to ask you the question that how can we visualize the data after being processing by the pooling layer and a dense layer, and the shape of the processed data.

  28. Avatar
    Lam Vo August 3, 2019 at 1:14 am #

    HI Jason,

    In your examples, most multivariate time series are metrics, what if there are categorical and also metrics variables?

    • Avatar
      Jason Brownlee August 3, 2019 at 8:11 am #

      Each variable must be prepared prior to modeling.

      A categorical var can be encoded as either integer/one hot/embedding. Perhaps try a few approaches and see what works best for your specific dataset?

      • Avatar
        Lam Vo August 4, 2019 at 5:16 am #

        Thanks. That is really a good hint.

  29. Avatar
    Petr August 16, 2019 at 1:51 am #

    How would I structure a CNN where I have 5000 samples with 500 timesteps each and there is a binary response variable for each of the timesteps?

    I have the following setup but am getting an error – keras doesn’t like the value of my y in model.fit:

    n_steps_in, n_steps_out = 500, 500

    model = Sequential()
    model.add(Conv1D(filters=64, kernel_size=2, activation=’relu’, input_shape=(X_train.shape[1], X_train.shape[2])))
    model.add(MaxPooling1D(pool_size=2))
    model.add(Flatten())
    model.add(Dense(50, activation=’relu’))
    model.add(Dense(n_steps_out, name=’output’))
    model.compile(loss = ‘binary_crossentropy’, optimizer=’adam’)

    model.fit(X_train, y_train, epochs=2000)

    ValueError: Error when checking target: expected output to have 2 dimensions, but got array with shape (5000, 500, 1)

    For reference, x_train.shape is (5000, 500, 265)
    y.train.shape is (5000, 500, 1)

  30. Avatar
    Anonymous August 17, 2019 at 2:55 am #

    Hi Jason,

    I use TF version 1.13 (I believe same applies for later versions). I was not able to execute:

    from keras.layers.convolutional import Conv1D
    from keras.layers.convolutional import MaxPooling1D

    this, however, did work:

    from keras.layers import Conv1D
    from keras.layers import MaxPooling1D

    I believe that package ‘convolutional’, isn’t available even in later versions of TF, I may be wrong. It seems that this is a reference to source file rather than the package name.

    Great article BTW (as expected 😉

    • Avatar
      Jason Brownlee August 17, 2019 at 5:57 am #

      Perhaps update your version of Keras to 2.2.4 or higher?

  31. Avatar
    Anonymous August 17, 2019 at 9:07 pm #

    Hi Jason,

    Thank you for the swift response. It was counterintuitive (obviously my assumption doesn’t hold) that TensorFlow (or any other backend implementation) of Keras works only with a default version of Keras with wich TensorFlow (backend itself) comes with.

    Based on the assumption above, I haven’t considered updating Keras version without updating TensorFlow. This is why I’ve looked at TensorFlow API (implementation) to find a particular class/package.

    Once again thank you!

    • Avatar
      Jason Brownlee August 18, 2019 at 6:42 am #

      No problem, happy to hear that you resolved your issue.

  32. Avatar
    suyash August 29, 2019 at 4:25 am #

    instead of sequence of np array as input how can we load csv file input to cnn time forecasting can you tell me???

  33. Avatar
    Suyash August 29, 2019 at 9:55 pm #

    How can we increase the number of predictions like 20 step ahead in future where in code should I change plz tell

    • Avatar
      Jason Brownlee August 30, 2019 at 6:19 am #

      You have to change the framing of the prediction problem – the dataset.

      Then you can either change the model to predict 20 steps or use the model 20 times recursively for 1 step.

  34. Avatar
    Jimmy August 29, 2019 at 9:56 pm #

    That will be great if above example is using tf.data and keras fit_generator. Thank!

    • Avatar
      Jason Brownlee August 30, 2019 at 6:20 am #

      Sorry, I don’t have tutorials on tensorflow directly. Thanks for the suggestion.

      I do have many examples of using a fit_generator, you can search the blog.

  35. Avatar
    Jimmy Au August 31, 2019 at 1:16 pm #

    Just that in practical and performance it should be in tf.data. I am looking for time series data to tf.data to batch, mini batch and window conv1d to fit_generator. Thank for your sharing.

  36. Avatar
    Vasudev Gupta September 16, 2019 at 8:53 pm #

    Hi Jason,

    This is great. I’m stuck with something though, hoping to seek your help.

    Along with the times series, the dataset has features like “holiday_flag”,”day_of_week” etc. I was trying to use these as input features. Can you guide a way ?

    • Avatar
      Jason Brownlee September 17, 2019 at 6:28 am #

      You can provide them as additional input variables either as a series (booleans) or a separate input to the model.

      Does that help?

  37. Avatar
    AH September 18, 2019 at 5:06 pm #

    Thank you for introducing CNN as tool for Time Series problems.
    I am wondering if it is a good choice to use CNN1D for time series problems? When should we consider using CNN1D for time series problems? Should we explore and exhaust other options first before coming to CNN? What should be the intuition for picking CNN on a particular TS problem. and lastly what are the cons of using CNN1D against any other approach. Ofcourse the accuracy of a model should be the deciding factor but given a problem should we avoid CNN and use other tools first?

  38. Avatar
    Vijaya September 22, 2019 at 3:54 pm #

    Hi Jason,

    I have to do hourly prediction of energy.I have data for four different states and each state contains 25 years data(25 CSV files). So total 100 csv files.
    1. I need to predict energy for next 5 years as I have the data till 2014 from 1990
    2. Can I use CNN and LSTM both as LSTM is more suitable for 5 years data.
    3. How do clean the data as data from morning 4am to 7pm is the value other than that data is either negative or zero(it contains day light saving data).
    4. Can I use drift technique along with CNN and LSTM?
    5. How do I read the 4 folders in python.

    Regards,
    Vijaya

  39. Avatar
    Suraj October 19, 2019 at 4:15 am #

    Can we apply this CNN for sensor data prediction(Air pollution) in Internet of things?

  40. Avatar
    Vishnu October 21, 2019 at 7:30 am #

    Hello Jason,

    Thanks for this wonderful article! It’s very useful!

    Recently I’ve been working with the models you’ve mentioned in this article

    I have a question about the multi headed CNN, sometimes the RMSE for the predictions are low and sometimes very high

    The descrepencies are too high, is this common ?

    Does the CNN (multi headed) just fit models better sometimes compared to other runs ? Even though all parameters remain the same

    Or am i possibly making some error ?

  41. Avatar
    Vishnu Suresh October 22, 2019 at 11:20 pm #

    Hello Jason,

    Your tutorials are very helpful!

    I’ve been playing around with CNN architectures for time series forecasting using 1D convolution networks

    You had elaborated on how CNN can be used for multivariate time series data in 2 ways

    One with a single 1D neural net and another multi headed CNN arrangement

    Recently i read about a multi scale CNN architecture where the idea is simple

    Different CNNs are trained using same data but the data is downsized for each CNN

    For example one CNN is trained using data available for 4 years

    Another CNN is trained using the same data but from years 1-3 and not 4

    And then both CNNs are concatenated

    I tried to use this idea using multi headed CNN, where each CNN is trained with differently sized input vectors and it did not work ???? it just said vectors should be of same sizes

    Can i get help from you regarding this ?

    Thank you
    Best Regards
    Vishnu

    • Avatar
      Jason Brownlee October 23, 2019 at 6:48 am #

      Thanks!

      Sounds like a great approach.

      I don’t have any tutorials on it, but I’d love to write about it in the future.

  42. Avatar
    Stuart October 29, 2019 at 6:00 am #

    Hi Jason,

    Are you able to estimate the confidence intervals of the forecast values using this approach?

  43. Avatar
    Michael November 1, 2019 at 12:33 pm #

    Hi Jason,

    Do you have any intuition about how well this method will work for low signal-to-noise ratio signals?

    For some signals you can only see any underlying structure, i.e. the signal, if you do additional processing e.g. FFT.

    I am asking before I expend any effort on experimenting.

    Thanks,

    Michael

    • Avatar
      Jason Brownlee November 1, 2019 at 1:42 pm #

      No, sorry. Intuition suggests that the more you do up front to expose the signal, the better the model will learn – e.g. the more useful it will be.

      I recommend prototyping a model and testing with and without data pre-processing.

  44. Avatar
    Shruti Kaushik November 13, 2019 at 4:36 pm #

    Hi Jason,

    Thank you for your efforts. Your blogs and books have been really helpful in my Ph.D. degree.

    I am performing time-series forecasting where I have 21 features. The 20 features contain numbers between 0 to 30. And 21st feature is related to sale. So, it contains numbers in hundreds. I have 1500 points. 1300 points are used for training and 200 points are used for testing.

    My task is to predict the 21st feature. My time-series is stationary. I checked by performing ADF test. Now, I standardized my data before training the deep learning models. I am using multi-head architectures, where each feature is passed as input in different heads, and output of each head was concatenated to predict the 21st feature.

    My question is: Multi-head CNN-LSTM is performing better than multi-head LSTM. However, my data-set do not contain any spatial features. Also, multi-head CNN is also performing good (which is mostly used for understanding spatial patterns). Why?

    • Avatar
      Jason Brownlee November 14, 2019 at 7:58 am #

      Thanks!

      I recommend testing a suite of different models in order to discover what works best for your specific dataset.

  45. Avatar
    housssem eddine Louchene December 12, 2019 at 4:59 am #

    Hi Jason,
    thanks a lot for the explanation above
    i want to implement particle swarm optimization as an optimizer
    how can I do it?

  46. Avatar
    Qi January 18, 2020 at 12:04 am #

    Hi Jason,

    Thank you for the article, was clear and super helpful.

    By multi-step prediction, how to choose the kernel size? And, is the optimal kernel size related to the number of the predicted steps (prediction horizon)? For example, if we target on a long prediction horizon, say H=24, does a larger kernel size work better in extracting pattern over a long horizon?

    • Avatar
      Jason Brownlee January 18, 2020 at 8:49 am #

      I think the kernel size is probably unrelated to the number of steps to forecast.

      I recommend using controlled experiments to test a suite of different configurations and discover what works best.

  47. Avatar
    Venkatesh B January 27, 2020 at 11:50 pm #

    Hi Jason,

    Thanks a lot helping many aspiring deeplearning experts.
    Your articles and efforts are really awesome and appreciable.
    Want to see articles on image filter using CNN or any other deeplearning algorithms.

    Thanks,
    Venkatesh B

  48. Avatar
    Mohammed Ayub February 24, 2020 at 7:00 am #

    Hi, Good day!
    I want to disaggregate appliance level power from total smart meter aggregate reading. I have total active power and reactive power, and also ground truth level (active and reactive power) for each appliance. I have, lets say 4 appliances. So how to design a CNN model for the problem?

    • Avatar
      Jason Brownlee February 24, 2020 at 7:49 am #

      I cannot design a model for you. I teach how you can design the model yourself.

  49. Avatar
    Adonis El Hajj February 28, 2020 at 6:43 pm #

    Hi Jason,
    is it possible to to mix Multi-Headed 1D CNN for Multivariate Time Series Forecasting and LSTM?

  50. Avatar
    Ruoyan March 10, 2020 at 1:22 pm #

    Hello,Jason I read what you said, and I feel that it is particularly detailed. I encountered a problem in the task of tone sequence recognition, which made me very troubled. I extracted the Mel cepstrum feature of the audio,the CNN+CTC model is used for recognition, but the recognition result is particularly bad, and there is almost no accuracy rate. I would like to ask you where the problem lies.

  51. Avatar
    Shohreh March 21, 2020 at 11:46 am #

    Hi Jason,
    Thanks for your great tutorials. I want to combine multi-head CNN and LSTM. so the concatenation of cnn modules will be input for the LSTM module. but I got error when I concatenate the cnn across axis=0. The model is as follows:
    visible1 = Input(shape=(n_steps, n_features))
    cnn1 = Conv1D(filters=64, kernel_size=2, activation=’relu’)(visible1)
    cnn1 = MaxPooling1D(pool_size=2)(cnn1)
    cnn1 = Flatten()(cnn1)
    # second input model
    visible2 = Input(shape=(n_steps, n_features))
    cnn2 = Conv1D(filters=64, kernel_size=2, activation=’relu’)(visible2)
    cnn2 = MaxPooling1D(pool_size=2)(cnn2)
    cnn2 = Flatten()(cnn2)
    # merge input models
    merge = concatenate([cnn1, cnn2], axis=0)
    #rv = RepeatVector(1)(merge)
    lstm = LSTM(50, activation=’relu’, return_sequences=True)(merge)
    f_lstm = Flatten()(lstm)
    dense = Dense(50, activation=’relu’)(f_lstm)
    output = Dense(1)(dense)
    model = Model(inputs=[visible1, visible2], outputs=output)

    I received this error: Input 0 is incompatible with layer lstm_9: expected ndim=3, found ndim=2

    Could you please tell me what is wrong here? I really appreciate your time and help.

  52. Avatar
    Naresh Agarwala April 4, 2020 at 1:05 am #

    Lot of thanks to Jason Brownlee. Your contribution is really great. You have published so many things. I have learnt so many things.

  53. Avatar
    Naresh Agarwala April 4, 2020 at 1:18 am #

    Dear Shohreh,

    cnn1 = Conv1D(filters=64, kernel_size=2, activation=’relu’)(visible1)
    pool1= MaxPooling1D(pool_size=2)(cnn1)
    cnn2 = Conv1D(filters=64, kernel_size=2, activation=’relu’)(visible2)
    pool2= MaxPooling1D(pool_size=2)(cnn2)

    merge = concatenate([cnn1, cnn2], axis=0) # it might work
    or,
    merge = concatenate([pool1, poo2], axis=0) # it might work

    If you get concatenation related error, you can try with changing ‘axis’ value.

  54. Avatar
    Saeed April 6, 2020 at 11:48 pm #

    Hi Jason,

    thanks for your supports!
    I want to implement a temperature estimation for an electric motor using a CNN; a regression problem.
    A csv file contains all measurements, which are sampled at 2 Hz. The data set consists of 12 columns: 8 columns as input data (such as current, voltage, speed, …) and other 4 columns as output data or targets (the temperature of different parts of the motor).
    My question is, should this problem be considered as a time series forecasting? and if yes, how should be the shape of the input data for a CNN?
    I ask it because we know, time series forecasting is the use of a model to predict future values based on previously observed values. While regression analysis is often employed in such a way that the current values of the independent time series (inputs) affect the current value of another time series (outputs).
    Thank you in advance for your attention.

  55. Avatar
    Sydant April 11, 2020 at 8:17 pm #

    Great article Jason. Do you have any insights on how these kinds of convents for forecasting, like even WaveNet, compare to RNNs seq-seq in terms of general forecasting performance?

    One of the advantages of this is that we can have a variable length sequence as opposed to RNNs. But from what I see on Kaggle though, seq-to-seq RNNs, and Amazon’s Forecasting model (deep AR) is with RNNs. This makes me think that RNNs always dominate over CNNs for time series forecasting. Thoughts?

    • Avatar
      Jason Brownlee April 12, 2020 at 6:19 am #

      It is different for each model and each dataset -better to use controlled experiments for your problem rather than talk in useless generalities.

      In my experiments, I see CNNs and CNN-LSTM hybrids perform better than vanilla LSTMs.

  56. Avatar
    Jordan J. Bird April 17, 2020 at 2:29 am #

    Hi Jason,

    Hope you’re well. I’m enjoying reading your timeseries forecasting articles and I have two questions, I wonder if you could help in pointing me in the right direction:

    1. If I have multiple timeseries that may or may not be related (eg. country ID to population), how would one handle the user ID as a pointer for the existence of a different timeseries? As I would like to have a forecasting model that considers the ID and date to make a prediction, and there may be useful patterns between countries to learn. I’ve created a dataset of 70 days for each country formatted as the following:
    (values not real, just an example)
    Country ID, day, population
    1,1,1000
    1,2,1500

    1,70,50000
    2,1,2000
    2,2,2500

    and so on

    2. If the above is achieved, how then could one have static geopolitical features also considered as input? eg. if I had the population of the UK for every day of 2019, and then also input the 2019 GDP, population density, % in poverty etc. etc. since these features are static, won’t change for the duration of the experiment, but would help in prediction. Likewise my USA data would also have the same set of related geopolitical features and so on for each country in the dataset

    Eg:
    Country ID, GDP, density, poverty
    1,1000,40.5,12.5
    2,2000,20.5,9.5
    ….
    and so on

    Is this possible in Keras? I’ve been searching online a lot and I can’t find a related example that achieves either of these two things, or even better both

    Would love to hear your thoughts on this,
    Thanks,
    Jordan

    • Avatar
      Jason Brownlee April 17, 2020 at 6:25 am #

      Handling of id’s for time series would happen before/after the model with custom code – e.g. a programming question not a modeling question. Same with dates. The model is/should be unaware of id’s and dates.

  57. Avatar
    NuwanC April 23, 2020 at 1:14 pm #

    Hi Jason,

    Thank you for all publications, I have a question,

    in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90]) -> Single sequence

    How can we handle multiple sequences,

    in_seq1 = array([
    [10, 20, 30, 40, 50, 60, 70, 80, 90],
    [12, 22, 35, 41, 44, 67, 73, 84, 97],
    ……………………………
    [11, 25, 34, 46, 44, 67, 73, 84, 100]
    ])

    Thank you,

  58. Avatar
    NuwanC April 23, 2020 at 1:53 pm #

    Hi Jason,

    I highly appreciate that you can provide the link.

    Thank you.

  59. Avatar
    NuwanC April 23, 2020 at 2:04 pm #

    Hi Jason,

    Let me clear out the problem I am facing,

    I have a 500 CSV file containing patient’s CTG scan records as (patient_id, {time})

    1000.CSV => { 1000ID, 149, 34,45……….156} likewise I have 500 CSV files.

    What I did was, I merge those 500 into single CSV, each row contains a UniqueID and time-series data.

    Now I want to process these data and get a predicted output for each record.

    Highly appreciate if you can advise me on this.

    Thank you,

  60. Avatar
    NuwanC April 23, 2020 at 2:37 pm #

    Hi Jason,

    I think I can go with the “Multiple Parallel Series” which is described above.

    Any thoughts on that?

    Thank you.

  61. Avatar
    NuwanC April 26, 2020 at 9:29 pm #

    Hi Jason,

    Thank you for the guidance so far, After creating Multivariate Forecasting model, what are the techniques we can use to Evaluate it?

    Thank you.

  62. Avatar
    NuwanC May 1, 2020 at 12:30 am #

    Hi Json,

    In “Univariate CNN Models” we train the model with only one input sequence, how we can train our model with multiple sequences (with various time steps) .

    Thank you

  63. Avatar
    Jose May 6, 2020 at 11:29 pm #

    Hi Jason,

    Thank you so much for this informative tutorial. I was practising the “Multiple Parallel Series” tutorial for a time series task. and I’m a newbie in CNN. while we develop models using LSTM and RNN we used to normalize the data using methods like min-max. But here It is not mentioned. Do we need to normalize the data before it feeds to the model?

  64. Avatar
    GowriV May 21, 2020 at 11:06 pm #

    Hi sir,

    Thank you for this tutorial. I was trying to do univariate cnn and i’m new to cnn. But, I have a sequence of shape (982, 95).

    data.shape = (982, 95)

    data[0] is the data for day 1, data[1] is the data for day 2 etc, like that I have 982 days data

    With your univariate cnn i feed day 1s data and predicted the last value using the previous 20 values, and the result was good. But i want to feed all the 982 days data as input and predict the value of 982th days last value. How can I do that?

  65. Avatar
    Sep May 25, 2020 at 11:49 pm #

    Dear Sir,

    I have a question regarding the the multi_step prediction but as for classification. My question is what should we consider the output layer? In classification problems the number of nodes in output layer is equal to number of classes we have, in multi-step prediction, the number of nodes in output layer is equal is to the number of steps we want to predict, but what would be the number of nodes for output layer in multi_step prediction for classification problems? would it be the number of classes we have multiply by the number of steps we want to predict? or what it should be?

    Thanks a lot for your kind help in advance.

    • Avatar
      Jason Brownlee May 26, 2020 at 6:26 am #

      You can use a TimeDistributed wrapper to output one classification per output time step. E.g. a seq2seq type model can be used.

      • Avatar
        Sep May 27, 2020 at 12:16 am #

        So sorry for spamming, but just for confirming that I understood correctly, the last layer should be coded as : model.add(TimeDistributed(Dense(number of classes), activation=’softmax’))?

      • Avatar
        Amir February 13, 2022 at 8:51 am #

        Thank you for your good tutorials,
        My dataset is time series based with multi-class. It includes many rows (different results induced by simulation in off-line) and 250 columns. data is a type but it different in time (like Weather). I want to train dataset with lowest time window due to operator needs time for action in real-time. What is your idea?
        X=[0.9 0.9 0.9 0.45 0.46 0.46 0.48 0.5 0.5 0.65 …………. 0.8 0.81 0.81 0.8] , data is change in fourth sample. I should train dataset with lowest samples before and after this time.
        Regards

        • Avatar
          James Carmichael February 13, 2022 at 12:51 pm #

          You are very welcome Amir! Let me know if I can help with any specific questions regarding the code listings.

  66. Avatar
    Sep May 27, 2020 at 1:01 am #

    or it should be model.add(TimeDistributed(Dense(1), activation=’softmax’))?

  67. Avatar
    Sep June 3, 2020 at 8:23 pm #

    Dear Jason,

    Is there any possibility to change the order of input dimension in CNN or RNN in time-series prediction problems? I mean in normal condition the order of the 3D input is (sample, time step,features). I would like to know if the input can be re-order as (features, time step, samples)?

    Thanks a lot for your kind help in advance.

    • Avatar
      Jason Brownlee June 4, 2020 at 6:18 am #

      No.

      • Avatar
        Sep June 4, 2020 at 4:24 pm #

        Even not in (samples ,features, time step ) form?

  68. Avatar
    Ivan June 8, 2020 at 7:41 pm #

    Hi,
    when using Conv1D layers for time series data, do you think batch normalization is problematic? Would it make sense to use it? Thank you

    • Avatar
      Jason Brownlee June 9, 2020 at 6:00 am #

      Perhaps try it on your data/model and see if it makes a difference.

  69. Avatar
    Firas Obeid June 10, 2020 at 9:50 am #

    In regards to deep belief networks, would we apply same hierarchy just as over level thought?

    • Avatar
      Jason Brownlee June 10, 2020 at 1:24 pm #

      DBN are no longer an effective method relative to CNNs, I recommend focusing on CNNs when working with image data.

  70. Avatar
    Kanishk June 30, 2020 at 7:12 am #

    Hi, Amazing tutorial.
    I just wanted to ask that in Multiple Parallel Series does the features affect each other or is the prediction of one feature independent of the other features.

    Sorry if that was somewhat ambiguous, what I wanted ask is :
    if i have multiple time series say [X1, X2, X3, X4] then if using Multiple Parallel Series i predict something like [Y1, Y2, Y3, Y4] would Y1 be dependent upon all the features or just X1.
    Thanks

    • Avatar
      Jason Brownlee June 30, 2020 at 1:02 pm #

      We are modeling assuming that the target values are a function of the input variables, both each variate and lag observations of each variate.

      Does that help?

      • Avatar
        Kanishk June 30, 2020 at 4:49 pm #

        Yes thanks a lot
        I will try to add some LSTM layers as well and see how that works.
        Again thanks a lot

  71. Avatar
    Tunbi Adekunle July 16, 2020 at 5:31 pm #

    Great insight, thanks for sharing your knowledge. Does this mean that if I have a feature vector/matrix (X) and my output (y), I do not need to use the time series generator? What sort of data preparation is optimal when considering predictive model for time series not meant for forecasting?

  72. Avatar
    Ashutosh Makone July 24, 2020 at 3:18 am #

    I am new to deep learning as such.
    1) i want to calculate the MSE for the results. I guess i will put actual values from validation set in one list and results in another. and then find MSE. is there a better way of doing this?

  73. Avatar
    Paula G September 17, 2020 at 2:45 pm #

    Hi, thank you for this post.

    I have 2 questions.

    1. Is there any way to do backtesting or cross-validation of the test set? For example, if I use the multi-step CNN with 1 feature and the output is 30. I want to know how well is the prediction (accuracy) for the first timestep in the future and so on till the 30th time step. I guess I would have to test with several folds (in order). How it should be done?
    2. It would be good if as time goes on I add new data to the training or test data (like a rolling window)? It has to be retrained? I ask that because my intuition says that the data near current time would be better for the prediction. I am wrong? However, i do not know if one example (added) is going to make a difference in the short term.

    Thank you very much again for your work. It is amazing. Sorry for my english.

  74. Avatar
    sarah October 8, 2020 at 3:32 pm #

    Hi Jason,

    Thank you so much for such an informative article

    I have 2 questions hopefully you can help me and answer them.

    First: I understand that using CNN for time series forecasting required data to be reshaped into (samples, timesteps, features ) however if my model is using (samples, features, timesteps ) what does this means? I am not aware of the theory part, that’s why I am not sure about the difference.
    in the experiments, my model gives me more accurate results with the second case!

    Second: if I want some feature to be used as it is at some stage of a functional model. Is it possible?
    I am asking this because I used one naive model , I found good results. I want to use these results with a combination of the deep learning model. I tried to use the output of the naive model by itself to one layer network with linear function .. but I didn’t get the same result. I want to get the output exactly equals to the input. is it possible? I am sorry if this is a silly question

    • Avatar
      Jason Brownlee October 9, 2020 at 6:40 am #

      It means you may have to restructure your data prior to modeling.

      You can pass the output of the naive model as input to the deep learning model, or use another model afterward to ensemble the predictions from both models.

  75. Avatar
    PaulaMG October 18, 2020 at 6:20 am #

    Hi Jason, how are you?
    I was wondering, if I want to do a multi-step forecast but the labels are binary (1 if sales go up, 0 otherwise) in each step ahead. I have two questions:
    1) The last layer should be: model.add(Dense(n_output, activation=’sigmoid’)) or model.add(Dense(n_output, activation=’softmax’))?
    2) The loss function should be: ‘binary_crossentropy’ or ‘categorical_crossentropy’?
    I do not know how to interpret classes in this problem. Classes are the multi-step forecast or the binary component of 1 if sales go up and 0 otherwise? I hope if possible you can clear it up. Thanks your very much. Have a wonderfull day.

    • Avatar
      Jason Brownlee October 18, 2020 at 8:20 am #

      For binary classification you will need to use the sigmoid activation and the binary cross entropy loss function.

      • Avatar
        PaulaMG October 19, 2020 at 5:32 am #

        Thank you Jason!!!

  76. Avatar
    Danilo November 6, 2020 at 9:07 pm #

    Hi Jason, is there a chance to evaluate the gradient of the function which is approximated by the network with respect to the network input?

    • Avatar
      Jason Brownlee November 7, 2020 at 6:29 am #

      Yes, the prediction error from the model calculated using the expected output and the predicted output.

  77. Avatar
    Giulio G. November 19, 2020 at 10:37 pm #

    Dear Jason,

    Thanks for the tutorial! In the multivariate CNN case, one of the TS can be an exogenous variable like temperature for electricity load ?

    Does in make sense in the CNN structure? Or LSTM is better ?

    Last question :), can CNN and LSTM can be concatenate?

    Thanks a lot, Giulio

    • Avatar
      Jason Brownlee November 20, 2020 at 6:46 am #

      You’re welcome.

      Yes. Try a number of models and configurations and discover what works best for your specific dataset.

  78. Avatar
    alb12 November 28, 2020 at 9:48 pm #

    Hi! Thanks for the tutorial.

    I am not understanding the very last passage (prediction) of the Multiple Input Series.

    #Prediction example
    x_input = array([[80, 85], [90, 95], [100, 105]])
    print(x_input)
    x_input = x_input.reshape(1, n_steps, n_features)
    yhat = model.predict(x_input, verbose=0)

    print(yhat)

    This code gets me the following error:
    cannot reshape an array of size 6 into shape (1,3,1)

    We have size 6 because we are understandably missing the value we want to predict…
    but then which shape should we pass to the predict function if the (1,3,1) is not available?

    Thank you very much.
    -A

  79. Avatar
    Franco Olivieri December 24, 2020 at 3:58 am #

    Hi! Thanks for the tutorial.

    I have to model a NN for this problem:

    sales forecast of big products. I have a data set for the last 2 years of sales by month with information about country, product code, phase in, phout date, attributes for the product. The data are very sparse for the most part of countryies and product (2 or 3 units by year, country). I have to produce a forecast at month level for the next 24 months, the accuracy of the first 6 months of forecast and in particular the first month are the most important in therms of performance.

    Data set: 111.000.000 samples
    row sample: month of sale, sale Qty, product Code, market, phase in date, phase out date, product attribute 1, product attribute 2, product attribute 3.

    Which approach do you suggest me? Do you have a sample of this kind of problem?

  80. Avatar
    Farnaz Khaghani January 6, 2021 at 5:11 am #

    Really great post!
    There is another variation of the input and output data I am trying to model and could not find the best match for it.
    Suppose we have an individual parallel time series (multiple parallel and multi-step) for each sample. Since the number of samples would be high, it is going to be inefficient to model each one separately. What do you recommend?

    • Avatar
      Jason Brownlee January 6, 2021 at 6:33 am #

      Each parallel time series would be a feature, in the [sample, timestep, feature] structure of input data.

  81. Avatar
    Muhammad Usama Zahid January 18, 2021 at 5:12 am #

    hello sir!
    I am a big fan of yours and u really make some great tutorials.I strongly suggest u should run a youtube channel.U will have limitless followers.

  82. Avatar
    Keerti January 26, 2021 at 7:00 am #

    Hi Jason, Thank you so much for this amazing article. Your articles are always helpful.

    My regression model takes 10 time steps as an input and predicts next two time steps. But during training validation accuracy is more than the training accuracy. And I don’t seem to find the solution for the same. Could you may be give some inputs what might be wrong? what should I look into?

    Thanks in advance.

  83. Avatar
    David March 12, 2021 at 6:49 pm #

    Hi Jason,

    Huge thanks for these tutorials! They have been of huge help in understanding the topic in greater detail.

    I am wondering if it is possible to train a multivariate model on multiple datasets? I am trying to train the model on impulse response datasets.

    Any input would be greatly appreciated.

    David

    • Avatar
      Jason Brownlee March 13, 2021 at 5:28 am #

      Sure.

      You could combine all data into one dataset and fit a model.
      You could fit the model on each dataset in turn, saving/loading between datasets or keeping the model in memory.
      Some combination of the above two methods.

  84. Avatar
    Sami Kanderian April 1, 2021 at 10:05 am #

    Thanks for the demo. I found it helpful.

    Splitting sequences and put them into ever growing X and y lists subsequently converting them into a 3D array is extremely slow and inefficient, especially for large datasets. It is much faster to pre-allocate the fixed size of the 3D array and populate the array in the loop.

    • Avatar
      Jason Brownlee April 2, 2021 at 5:33 am #

      Yes, it is just an example of the models, not of efficient data prep.

  85. Avatar
    Giacomo April 5, 2021 at 3:07 am #

    Hello,
    I have a problem similar to what you explained in “Multiple Input Multi-Step Output” and I have a doubt about what you reported. Referencing to your data
    [[ 10 15 25]
    [ 20 25 45]
    [ 30 35 65]
    [ 40 45 85]
    [ 50 55 105]
    [ 60 65 125]
    [ 70 75 145]
    [ 80 85 165]]
    I have a similar structure where the third column represents my target. If we consider each row of the “matrix” as the data related to a particular timestamp, I’ve prepared my input data (I will show only the X_1 for simplicity) as follow:
    [ 10 15 25]
    [ 20 25 45]
    [ 30 35 65] —> y_1 (target): 85 or [85,105] according to if I want a single forecast or multi-step forecast.
    I’ve used also the third column in input as they represent past values which I know and it makes sense to use, but I think you didn’t use it to give a general example. Practically, at level code I used the same approach you explained in “Multiple Input Multi-Step Output”, changing only the seq_y as seq_y = sequences[end_ix:out_end_ix, -1] in your split_sequences function in order to specify the target I want.
    I don’t understand why, in your example, you consider 65 as the first desired output. In my opinion 65 represents a “current” value not a future value I’m going to predict, the future value/s I want to predict are those related to following timestamps which are 85 and/or [85, 105]. Did you choose [65, 85] as output to give a general example, as I suppose?
    Thanks a lot in advance.

  86. Avatar
    Selman April 26, 2021 at 6:30 am #

    Hello Jason,
    Thank you for the great tutorial. I used your multivariate CNN model above, but I have a problem:

    When I choese “n_steps” as 1 or two I have a problem of “Negative dimension size caused by subtracting 2 from 1 for ‘{{node max_pooling1d_137/MaxPool}} = MaxPool[T=DT_FLOAT, data_format=”NHWC”, explicit_paddings=[], ksize=[1, 2, 1, 1], padding=”VALID”, strides=[1, 2, 1, 1]](max_pooling1d_137/ExpandDims)’ with input shapes: [?,1,1,64].”

    I need to choose “n_steps” as 1, can you help me please. Thank you.

    • Avatar
      Jason Brownlee April 27, 2021 at 5:10 am #

      Yes, the model may need to be adjusted to support very small input sequences, or you may need to use larger sequences if the model is left unchanged.

  87. Avatar
    Liliana May 11, 2021 at 10:05 am #

    Hi Jason:
    I would like to be sure if I understood the concept well… CNNs were originally created for the treatment of images and image sequences (video), which are 2D data (rows and columns); where the CNN preserves the spatial structure of the input data and is invariant with the object position and the distortion of the same, in other words, a CNN predicts better the structure of the data with respect to space.

    In the case of CNNs for time series, a type of CNN 1D is used, but does this CNN 1D model retain this advantage of preserving the spatial structure of the data in its extraction of characteristics, as if it were image sequences?

    Thanks for your attention.

    • Avatar
      Jason Brownlee May 12, 2021 at 6:05 am #

      Correct.

      Yes, 1D CNN has similar properties, although applied to a 1d sequence of observations.

  88. Avatar
    Liliana May 13, 2021 at 2:55 pm #

    Ok thank you understood then, and this raises three more questions about it:

    • If instead of using a 2D CNN on an image, the image is flattened by converting the pixels into a 1D vector, could a 1D CNN be used with this image and the same forecast results could be achieved as with a 2D CNN, that is, would the spatial structure of the data be preserved in the same way as in the image?

    • If, on the other hand, I have a multivariate time series organized in 2D (rows and columns), with values ​​other than pixels, could I apply a 2D CNN as if it were an image and could I achieve a regression forecast for each field of the dataset, where the spatial structure of the data is preserved, or would there be a risk that being a 2D CNN, the dataset is always interpreted as pixel values ​​(between 0 and 255)?

    • Are there any advantages of faster convergence between a 1D CNN and a 2D CNN?

    Thanks for your attention.

    • Avatar
      Jason Brownlee May 14, 2021 at 6:21 am #

      Working with pixels instead of raw sequence data will be worse in every way (my opinion).

      A 1D CNN can operate on multivariate sequence data directly, a 2d CNN is inappropriate (again, my opinion).

      Run experiments to confirm if you like.

  89. Avatar
    Liliana May 28, 2021 at 10:41 am #

    It was your opinion from experience precisely what I wanted. Thanks for your answer.

  90. Avatar
    Liliana May 29, 2021 at 6:13 am #

    Hi Jason:

    This publication seems excellent to me, thank you very much for your work. Could you tell me where I can consult information or what I should do to know how to optimize the configuration of a 1D CNN model for the case of multiple parallel input and multi-step output type, so that I can adapt this model to my own case study of multivariate time series forecast ?.

    Thanks for your attention.

  91. Avatar
    Priya June 1, 2021 at 1:14 am #

    Thanks for the very informative tutorial.

    you discussed an idea for the implementation of cnn1 and cnn2 and then the merging of features from these two models to predict the output, my small query is – can we use the same idea for lstm1 and lstm2 ?

    • Avatar
      Jason Brownlee June 1, 2021 at 5:36 am #

      Sure.

      • Avatar
        priya June 1, 2021 at 3:28 pm #

        CNN model is giving error for time_step=1. The error is “Negative dimension size caused by subtracting 2 from 1 for ‘conv1d_1/convolution/Conv2D’ (op: ‘Conv2D’) with input shapes: [?,1,1,5], [1,2,5,64]”.

        How can I rectify this error, if I don’t want to increase the number of time step?

  92. Avatar
    Liliana June 4, 2021 at 2:14 am #

    Hi Jason:

    Could you tell me where I can consult information or what I should do to know how to optimize the configuration of a 1D CNN model for the case of multiple parallel input and multi-step output type, so that I can adapt this model to my own case study of multivariate time series forecast ?.

    Thanks for your attention.

  93. Avatar
    Elsa June 4, 2021 at 3:04 pm #

    Hı Jason
    Can you please help me i have items (item1 …..itemn)
    And i have sequences like
    Seq1[item1……item10]
    Seq2[item1……item10]
    .
    .
    .
    Seqn[item1……item10]
    İtems are not numbers or words .. like songs names contain both numbers and character I want to predict the next item item11 for each sequence and I want the predicted items to be from the whole dataset (all items ) I’m using convolutional neural networks and embedding please tell how can I do that !!(prediction from the whole dataset ) and how can I reshape my sequences to use them ( should I use sliding window and how ? Or should I take the last item from every sequence like output ).. please
    and thank you for all what you do .

  94. Avatar
    Elsa June 5, 2021 at 6:24 am #

    I can’t understand, can you please explain more?

    • Avatar
      Jason Brownlee June 6, 2021 at 5:37 am #

      No problem, which part would you like me to explain?

      • Avatar
        Elsa June 6, 2021 at 8:46 am #

        I use the nine first items of sequences as inputs X and the last item as output y … I reshape the X to be [9 , 9 100] the size of the the item vector after embedding is 100 … I fed the inputs to my CNN model with input shape of (9,100) … And I put a finale dense layer with output of shape 100 ( size of vector ) … I’m going right ?

        • Avatar
          Jason Brownlee June 7, 2021 at 5:17 am #

          If the embedding has a length of 100, then the number of features input to the CNN or LSTM model would be 100, e.g. [?, ?, 100].

          If you have 9 time steps per sample, then the shape would be [?, 9, 100]

  95. Avatar
    Elsa June 7, 2021 at 8:46 am #

    i have input data with shape (985, 9, 100) .. i put input shape [985 , 9 , 100 ] it did not work and i try input shape [None ,9, 100] it did not work too
    i get this error ValueError: Input 0 of layer max_pooling1d_4 is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, 985, 3, 100]

    this is my model
    model_CNN = Sequential()
    model_CNN.add(Conv1D(filters=100, kernel_size=3, activation=’relu’ , input_shape=(None, 9 ,100)))
    model_CNN.add(Conv1D(filters=100, kernel_size=3, activation=’relu’))
    model_CNN.add(Dropout(0.25))
    model_CNN.add(Conv1D(filters=100, kernel_size=3, activation=’relu’))
    model_CNN.add(MaxPooling1D(pool_size=2))
    model_CNN.add(Dropout(0.25))
    model_CNN.add(Flatten())
    model_CNN.add(Dense(200 , activation=’relu’))
    model_CNN.add(Dense(100))
    model_CNN.compile(optimizer=’adam’, loss=’mse’)

    this is my input X shape (985, 9, 100)
    and my output y shape (985, 100)
    i’m really sorry, i’m beginner and i really need to deal with that

    • Avatar
      Jason Brownlee June 8, 2021 at 7:09 am #

      Sorry, the cause of the issue is not clear.

      Perhaps work with one or two rows of data and the simplest model in order to debug the cause of the issue.

  96. Avatar
    Vivek Rao June 11, 2021 at 12:23 am #

    Thanks for your tutorial. A new paper by Stanford professors “Deep Learning Statistical Arbitrage” https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3862004 uses a CNN with transformer to predict residual stock returns. Could you write about transformers?

  97. Avatar
    Liliana June 17, 2021 at 4:30 am #

    Hi Jason:

    The following question arises for me, when working with the deep learning methods MPL, CNN, LSTM or their hybrids for the forecast of a time series of the multiple parallel input and multi-step output type, the forecast that these methods perform for each one of the time steps for each of the variables, do they take into account the correlation of each of the other variables in addition to the autocorrelation with the variable that is being predicted, or not?

    Thanks for your attention.

    • Avatar
      Jason Brownlee June 17, 2021 at 6:20 am #

      The model may or may not take into account specific properties in the training dataset. We don’t have direct control over the statistical patterns used by a neural net model.

      • Avatar
        Liliana June 19, 2021 at 6:52 am #

        Ok, thanks for that clarification.

  98. Avatar
    Shiv June 18, 2021 at 5:53 pm #

    Thanks for a detailed explanation Dr. Jason. I am in the need of your sincere suggestion regarding a work that I am doing now for prediction of data.

    I am using CNN regression model without a pooling layer. This is so since the purpose of pooling is to merge semantically similar features into one; however, feature images represent one geologic phenomenon, and none of these geologic phenomena will be the same as those in the real world.
    I could not get an appropriate R2 accuracy value.: Training is about 71 and testing is about 54%. Could you please suggest how to improve? By the way, I am using 24 data points as training since we don’t have more data points. Need your sincere suggestion.

  99. Avatar
    Liliana June 19, 2021 at 7:30 am #

    Hi Jason:

    I wish you could help me with something. I am trying to configure a 1D CNN, to forecast a multivariate time series with multiple parallel input and multi-step output. It is clear to me that finding the best configuration is basically achieved by trial and error, but I have made some attempts and I cannot find at least an initial configuration that gives me the basis and from there I begin to carry out trials and errors.

    I want to train the network with approximately 77400 samples each one composed of a Xwith 10 time steps as input and a y with 3 tiem steps as output.

    Could you please tell me as a first test how robust that network must be? That is, initially how many convolutional layers a network must have for this amount of data, more or less with how many filters and with what kernel size it must have , as well as how many initial nodes should the dense layer have, the possible number of epochs or the batch size.

    I repeat, I know that reaching a complete adjustment is done by trial and error more than anything, but I hope you understand that what happens is that I do not know how to size it sufficiently for a first test according to the amount of data; and since the possibilities are too many, I would really appreciate it if you can help me to size it enough for a first test according to the amount of data, answering those questions. Because this would already help me to have a perspective on what dimensions of that network I should move to do tests.

    Thanks for your attention.

    • Avatar
      Jason Brownlee June 20, 2021 at 5:45 am #

      We cannot know what configuration will work well or best for a given prediction problem, I recommend testing a suite of methods and compare results to naive methods.

      Consider scaling the data and consider a large number of architectures. Ensure you’re using a robust test harness, e.g. walk-forward validation, perhaps repeated.

      • Avatar
        Liliana August 17, 2021 at 10:43 am #

        Hi Jason:

        I would like to ask you, I understand that when working with 1D CNN it is better to work with raw data, that is, how they are without scaling them or doing a normalization or standardization process, is that right ?; or is there a rule or something similar to know if it is better to do any of those processes to my data to use a 1D CNN?

        Thanks for your attention.

        • Avatar
          Adrian Tam August 17, 2021 at 11:48 am #

          Normalizing and standardizing may still necessary in that case if you do not want to see a large input range. Remember, in sigmoidal function, for example, the function works best only on a small range around zero. If your input is in a scale of billions, it may take a long time for the gradient descent algorithm to converge. If you know that it will not be that case in your data, then you’re right.

          • Avatar
            Liliana August 19, 2021 at 4:12 am #

            Thanks for the explanation, it was just what I needed. According to this, in a case where the raw data with a range of values between 0 and 10 it may be better not to pass the data through normalization or standardization. Is it so?

            Thanks for your attention.

  100. Avatar
    Hira Jamil July 8, 2021 at 10:56 am #

    These tutorials are really helpful. I need LSTM out-of-sample forecast example on univariate, did you work on this?

  101. Avatar
    Liliana August 24, 2021 at 7:05 am #

    Hi Jason:

    I would like to know your opinion, if I want to use a 1D CNN model, an MLP or one of the LSTM models to perform forecasting on a time series of the Multiple Parallel Input and Multi-step Output type, whose data have values that only go from 0 to 10 ; Would it be more convenient to carry out some kind of scaling, standardization or standardization of the data?

    Thanks for your attention.

    • Avatar
      Adrian Tam August 24, 2021 at 11:56 am #

      These are data preprocessing techniques, which applies to the input data. If your input data has any reason these techniques are useful (e.g., one feature is in range of 0 to 0.001 while another feature is in range of 0 to 10), then you need them. The output range is controlled by the output layer of the network. Hence should not need the scaler for output data.

      • Avatar
        Liliana August 30, 2021 at 1:35 am #

        That is, in the case that all the data have the same range of values such as integers (0,1,2,3,4,5,6,7,8,9,10), there would not be a Real need to apply this type of technique to the input data to use 1D CNN, MLP and LSTM neural networks?

        Thank you for your attention, I remain pending.

        • Avatar
          Adrian Tam September 1, 2021 at 7:31 am #

          For data like that I am quite comfortable without any scaler. A scaler is to help make the convergence faster. A MLP for example, if we initialize the weight using standard normal and using tanh as the activation function. The convergence would be faster if the input data is within -1 to +1 (because the initial value is already in its ball park range) than 5000 to 8000 (which needs a lot of iterations to move the weight to this range).

          • Avatar
            Liliana September 1, 2021 at 8:50 am #

            Ok Adrian, thanks for the explanation, it is very helpful to me.

          • Avatar
            Liliana September 27, 2021 at 3:12 pm #

            Hello Adrian:

            I would like to know your opinion, if as I told you in the previous question in the case that all the data have the same range of values such as integers (0,1,2,3,4,5,6,7,8, 9,10), in a multivariate series; and you want to make a future forecast of the Multiple Parallel Input and Multi-step Output type, but where the series in question is sparse type, that is, it has many values at zero. Do you think that in this case it would be convenient to carry out some type of processing to the input data such as the ones we have already discussed (scaling, normalization, standardization) or some other such as PSA, etc.
            This to use 1D CNN, MLP and LSTM neural networks; and Random Forest.

            Thank you for your kind attention, I remain attentive.

          • Avatar
            Adrian Tam September 28, 2021 at 9:31 am #

            You need to try that out to confirm if it is helpful. But usually, sparse data is not good for StandardScaler because the variance is obviously underestimated. Otherwise, it should not do harm than not having the preprocessing.

          • Avatar
            Liliana September 29, 2021 at 4:07 am #

            Thanks, as always Adrian, your help is very useful for me.

          • Avatar
            Adrian Tam September 30, 2021 at 1:08 am #

            You’re welcomed.

  102. Avatar
    Juan S Acevedo September 4, 2021 at 12:35 am #

    Hello Jason, thanks a lot for all your posts

    I’m working on a MultiStep-Multivariate CNN for demand forecasting with vector Output (48 setps in, 12 steps output).
    ¿Is there an easy way to visualize the loss on the Train data? For example like you did on the One-Step LSTM for AirLine Passengers.

    ¿How can I plot a train validation-loss-curve? Just want to make sure about the right amount of epochs

    Regards

    • Avatar
      Jason Brownlee September 4, 2021 at 5:23 am #

      Yes, you may need to capture the values yourself in a list/array and plot manually at the end of training.

  103. Avatar
    Li September 10, 2021 at 1:21 pm #

    Thank you very much for your wonderful sharing! I would like to ask if there is any CNN-LSTM model for probability prediction?

  104. Avatar
    Marcin September 22, 2021 at 5:14 am #

    Jason,

    First of all thank you for great blog. I’ve learn a lot from your articles.

    I would like ask you for advice in regarding Multiple Parallel Series.
    I have case where my data came from 5 probes and they has been noted in this way that if there is a peak in signal it is marked as 1, if there aren’t peak there is 0.
    My goal is to predict potential peak in each probe.

    Examples can be found below.

     
    Two questions.

    1. As you can observe in each step in series there are only two 1 in row. This is some kind of additional knowledge that should be taken by neural network to predict results.
    Question. Did this CNN Multiple Parallel Series example what you provided takes knowledge also from others parallel series?

    2. Is it for this Boolean data what I have this CNN Multiple Parallel Series is the best way to analyze ?
     

     

                          p1   p2   p3   p4   p5

    Step   1          0     1     0      0     1
    Step   2          0     1     0      1     0
    Step   3          1     0     0      0     1
    Step   4          0     1     0      1     0
    Step   5          0     1     1      0     0
    Step   6          0     1     0      0     1
    Step   7          0     0     1      0     1
    Step   8          0     1     0      0     1
    Step   9          1     1     0      0     0
    Step 10          0     0     0      1     1
    Step 11          1     0     0      0     1
    Step 12          0     1     0      0     1
    Step 13          0     1     1      0     0
    Step 14          0     1     0      0     1
    Step 15          0     1     0      0     1

    Thank you,
    Marcin

    • Avatar
      Adrian Tam September 23, 2021 at 4:02 am #

      For (2) Boolean should not be a problem, but you may not want to use MaxPooling1D as you are almost always get 1 from it. AveragePooling1D may be what you want. For (1) I think you are talking about using p1 to p5 together as input and each of them is a time series. In this case, you are not developing a 1D convolution but 2D one. More like the case of image.

  105. Avatar
    Liliana September 28, 2021 at 10:23 am #

    Hello Adrian and Jason:

    I would like to know your opinion, if as I told you in the previous question in the case that all the data have the same range of values such as integers (0,1,2,3,4,5,6,7,8, 9,10), in a multivariate series; and you want to make a future forecast of the Multiple Parallel Input and Multi-step Output type, but where the series in question is sparse type, that is, it has many values at zero. Do you think that in this case it would be convenient to carry out some type of processing to the input data such as the ones we have already discussed (scaling, normalization, standardization) or some other such as PSA, etc.
    This to use 1D CNN, MLP and LSTM neural networks; and Random Forest.

    Thank you for your kind attention, I remain attentive.

  106. Avatar
    Priya October 30, 2021 at 11:54 pm #

    Thanks for the great tutorial
    The kernel size=2 means a kernel of 2*2 matrix; Similarly, kernel size=3 means 3*3.
    What if I want to take a kernel matrix 1*3 or 1*2? How can I get this matrix? I saw some research papers on CNN considering a kernel size of 1*3, 1*4…. so on.
    If I am taking the code kernel_size=(1,3), I got an error “The kernel_size argument must be a tuple of 1 integer. Received: (1, 3).”

    • Avatar
      Adrian Tam November 1, 2021 at 1:45 pm #

      In the examples here, it is Conv1D, which is a 1xN matrix. What you described are for Conv2D, which are more often seen from examples of image problems.

  107. Avatar
    SHG January 4, 2022 at 9:02 pm #

    you are giving input shape as [timesteps,1] in univariate time series prediction. how will the model predict for other samples/rows in the dataset. like if I have timesteps of 3 for 400 samples
    X-sample1 -> 1 2 3
    X-sample2 -> 2 3 4
    and so on

    since we are passing just 1 sample , how are other samples passed for training or does it automatically recognizes?

    And the model summary showing output parameter as None signifies what? how will we compute output size with this?

  108. Avatar
    Corne van Zyl April 26, 2022 at 7:15 pm #

    So, I’m hoping to have a discussion on the suitability of the use of CNNs for time series regression modelling.

    I first noticed that all the CNN, CNN-LSTM and ConvLSTM models from your book were unable to model volatile nature for energy demand. I plotted the predictions against the ground truth and saw that the predictions were basically flat lines compare to the real demand as if the model could capture the higher frequencies. I’ve seen a similar thing in other papers that use CNN in their architecture. The model only seems to get the general shape. Sure you could say that the models are small, there wasn’t a lot of data but I’ve seen the same thing in the results of other papers that use CNN in their architectures.

    I’ve read that CNNs are known to reduce noise(ie reduce high frequencies) . I get that it is impossible to model the noise or randomness but could this be a limitation of CNN or is there something I don’t understand or that I’m not accounting for?

    Does the neural network only fit to the patterns that exist and that its architecture allows and everything else is just noise?

    • Avatar
      James Carmichael May 2, 2022 at 9:43 am #

      Hi Corne…Please clarify your question so that we may better assist you.

  109. Avatar
    kia June 5, 2022 at 10:48 pm #

    Hello. Thank you for your useful content
    I have a question, are there any examples of this for R software ??
    Because according to your code, I wrote the code in R software, but I have a problem in compiling the model.
    Can you guide me?

  110. Avatar
    Belle July 28, 2022 at 7:43 pm #

    Hi Jason,

    Thank you so much for your useful content. I have 2 questions,
    1.How to find MAPE for measurement. (code python).
    2. How to Split Train Data Set and Test Data Set for Univariate Time series data.

    Thank you so much.

  111. Avatar
    Alecesa September 13, 2022 at 5:00 pm #

    Hi Jason,

    Great work as usual.

    Please, do you anticipate developing any Wavenet / Temporal CN code example?

    Thanks and regards,
    Alex

    • Avatar
      James Carmichael September 14, 2022 at 5:54 am #

      Hi Alecesa…We do not currently have content related to those topics. The following is a complete list of our ebooks:

      https://machinelearningmastery.com/products/

      • Avatar
        Sandra November 29, 2022 at 1:54 am #

        Hi Jason
        Thanks for your info

        Please can you help me for any examples to optimize conv1d using automatic or intelligence tuning hyper prameter for conv1d bulging with keras

        Such as
        Examples about
        1-pso-cnn1d
        2-kerasga -conv1d by using pygad package

        Thanks

      • Avatar
        Sandra November 29, 2022 at 5:34 pm #

        Thanks ????
        I need another question
        I ask if it Is possible with Keras to use Conv2d to perform a Conv1d? The question may seem weird, but I need to use a tool have conv2d

        Such as I have raw dataset and x-training =(14322,23)

        I need to use this code to enter raw data for x training above instead of using this data in code for image (100,100,3)
        If possible what changes in input and other prameter???

        # Build the keras model using the functional API.

        inout laver = tensorflow.keras.layers.Input(shape=(100, 100, 3))
        con laver1 = tensorflow.keras.lavers.Conv2D(filters=5,
        kernel_size=7,
        activation=”relu”) (input_layer)
        max_pool1 = tensorflow.keras.layers.MaxPooling2D(pool_size=(5,5),
        strides=5) (conv_layer1)
        con laver2 = tensorflow.keras.lavers.Conv2D(filters=3,
        kernel size=3

        con_layer2 = tensorflow.keras.layers.Conv2D(filters=3,
        kernel size=3
        activation=”relu”) (max_pool1)

        flatten laver= tensorflow.keras.layers.Flatten()(conv_layer2)
        dense_layer = tensorflow.keras.layers.Dense (15, activation=”relu”) (flatten_layer)
        output_layer = tensorflow.keras.layers.Dense(4, activation=”softmax”) (dense laver) model = tensorflow.keras.Model(inputs=input_layer, outputs=output_layer)

  112. Avatar
    Chuck October 13, 2022 at 3:38 am #

    Jason,

    Thank you for a great article.

    I am trying to determine at a glance if a Keras model is fully connected or not.

    It seems the rule of thumb is if none of the layers are Dense, then the model is NOT fully connected.

    For my univariate time series, if a 1D convolution auto encoder model only uses Keras layers that are 1D conv, Dropout, and 1D conv transpose, how do you tell if the model is fully connected or not ?

    I am using data with time and voltage to model in an autoencoder. None of the layers are Dense, but what specifically makes a Keras model NOT fully connected ?

  113. Avatar
    Sandra November 29, 2022 at 5:19 pm #

    Thanks ????

    I need another question
    I ask if it’s possible to use Conv2d to perform a Conv1d? The question may seem weird, but I need to use a tool that Used conv2d
    If possible how ???
    Such as i need to use this code for raw dataset have
    X-training =(14255,23) instead of datay for image (100,100,3)
    What changes in input and other parameters? If possible

    # Build the keras model using the functional API.

    Input layer = tensorflow.keras.layers.Input(shape=(100, 100, 3))
    con laver1 = tensorflow.keras.lavers.Conv2D(filters=5,
    kernel_size=7,
    activation=”relu”) (input_layer)
    max_pool1 = tensorflow.keras.layers.MaxPooling2D(pool_size=(5,5),
    strides=5) (conv_layer1)
    con laver2 = tensorflow.keras.lavers.Conv2D(filters=3,
    kernel size=3

  114. Avatar
    tfrud January 31, 2024 at 11:02 am #

    Jason – thank you so much for this tutorial – it was a God-send for me. Unlike some tutorials, clearly meant to enable, with brief, clear explanations where people are likely to need them, and with the logical progression in scale from the simplest examples to the more complex using the same basic framework. This is how it should be done when educating! It is so very much appreciated.

    • Avatar
      James Carmichael February 1, 2024 at 10:30 am #

      Thank you tfrud for your support and feedback! We greatly apprecite it!

Leave a Reply