How to Develop RNN Models for Human Activity Recognition Time Series Classification

Last Updated on

Human activity recognition is the problem of classifying sequences of accelerometer data recorded by specialized harnesses or smart phones into known well-defined movements.

Classical approaches to the problem involve hand crafting features from the time series data based on fixed-sized windows and training machine learning models, such as ensembles of decision trees. The difficulty is that this feature engineering requires strong expertise in the field.

Recently, deep learning methods such as recurrent neural networks like as LSTMs and variations that make use of one-dimensional convolutional neural networks or CNNs have been shown to provide state-of-the-art results on challenging activity recognition tasks with little or no data feature engineering, instead using feature learning on raw data.

In this tutorial, you will discover three recurrent neural network architectures for modeling an activity recognition time series classification problem.

After completing this tutorial, you will know:

  • How to develop a Long Short-Term Memory Recurrent Neural Network for human activity recognition.
  • How to develop a one-dimensional Convolutional Neural Network LSTM, or CNN-LSTM, model.
  • How to develop a one-dimensional Convolutional LSTM, or ConvLSTM, model for the same problem.

Discover how to build models for multivariate and multi-step time series forecasting with LSTMs and more in my new book, with 25 step-by-step tutorials and full source code.

Let’s get started.

How to Develop RNN Models for Human Activity Recognition Time Series Classification

How to Develop RNN Models for Human Activity Recognition Time Series Classification
Photo by Bonnie Moreland, some rights reserved.

Tutorial Overview

This tutorial is divided into four parts; they are:

  1. Activity Recognition Using Smartphones Dataset
  2. Develop an LSTM Network Model
  3. Develop a CNN-LSTM Network Model
  4. Develop a ConvLSTM Network Model

Activity Recognition Using Smartphones Dataset

Human Activity Recognition, or HAR for short, is the problem of predicting what a person is doing based on a trace of their movement using sensors.

A standard human activity recognition dataset is the ‘Activity Recognition Using Smart Phones Dataset’ made available in 2012.

It was prepared and made available by Davide Anguita, et al. from the University of Genova, Italy and is described in full in their 2013 paper “A Public Domain Dataset for Human Activity Recognition Using Smartphones.” The dataset was modeled with machine learning algorithms in their 2012 paper titled “Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine.”

The dataset was made available and can be downloaded for free from the UCI Machine Learning Repository:

The data was collected from 30 subjects aged between 19 and 48 years old performing one of six standard activities while wearing a waist-mounted smartphone that recorded the movement data. Video was recorded of each subject performing the activities and the movement data was labeled manually from these videos.

Below is an example video of a subject performing the activities while their movement data is being recorded.

The six activities performed were as follows:

  1. Walking
  2. Walking Upstairs
  3. Walking Downstairs
  4. Sitting
  5. Standing
  6. Laying

The movement data recorded was the x, y, and z accelerometer data (linear acceleration) and gyroscopic data (angular velocity) from the smart phone, specifically a Samsung Galaxy S II. Observations were recorded at 50 Hz (i.e. 50 data points per second). Each subject performed the sequence of activities twice; once with the device on their left-hand-side and once with the device on their right-hand side.

The raw data is not available. Instead, a pre-processed version of the dataset was made available. The pre-processing steps included:

  • Pre-processing accelerometer and gyroscope using noise filters.
  • Splitting data into fixed windows of 2.56 seconds (128 data points) with 50% overlap.Splitting of accelerometer data into gravitational (total) and body motion components.

Feature engineering was applied to the window data, and a copy of the data with these engineered features was made available.

A number of time and frequency features commonly used in the field of human activity recognition were extracted from each window. The result was a 561 element vector of features.

The dataset was split into train (70%) and test (30%) sets based on data for subjects, e.g. 21 subjects for train and nine for test.

Experiment results with a support vector machine intended for use on a smartphone (e.g. fixed-point arithmetic) resulted in a predictive accuracy of 89% on the test dataset, achieving similar results as an unmodified SVM implementation.

The dataset is freely available and can be downloaded from the UCI Machine Learning repository.

The data is provided as a single zip file that is about 58 megabytes in size. The direct link for this download is below:

Download the dataset and unzip all files into a new directory in your current working directory named “HARDataset”.

Need help with Deep Learning for Time Series?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Develop an LSTM Network Model

In this section, we will develop a Long Short-Term Memory network model (LSTM) for the human activity recognition dataset.

LSTM network models are a type of recurrent neural network that are able to learn and remember over long sequences of input data. They are intended for use with data that is comprised of long sequences of data, up to 200 to 400 time steps. They may be a good fit for this problem.

The model can support multiple parallel sequences of input data, such as each axis of the accelerometer and gyroscope data. The model learns to extract features from sequences of observations and how to map the internal features to different activity types.

The benefit of using LSTMs for sequence classification is that they can learn from the raw time series data directly, and in turn do not require domain expertise to manually engineer input features. The model can learn an internal representation of the time series data and ideally achieve comparable performance to models fit on a version of the dataset with engineered features.

This section is divided into four parts; they are:

  1. Load Data
  2. Fit and Evaluate Model
  3. Summarize Results
  4. Complete Example

Load Data

The first step is to load the raw dataset into memory.

There are three main signal types in the raw data: total acceleration, body acceleration, and body gyroscope. Each has 3 axises of data. This means that there are a total of nine variables for each time step.

Further, each series of data has been partitioned into overlapping windows of 2.56 seconds of data, or 128 time steps. These windows of data correspond to the windows of engineered features (rows) in the previous section.

This means that one row of data has (128 * 9), or 1,152 elements. This is a little less than double the size of the 561 element vectors in the previous section and it is likely that there is some redundant data.

The signals are stored in the /Inertial Signals/ directory under the train and test subdirectories. Each axis of each signal is stored in a separate file, meaning that each of the train and test datasets have nine input files to load and one output file to load. We can batch the loading of these files into groups given the consistent directory structures and file naming conventions.

The input data is in CSV format where columns are separated by whitespace. Each of these files can be loaded as a NumPy array. The load_file() function below loads a dataset given the fill path to the file and returns the loaded data as a NumPy array.

We can then load all data for a given group (train or test) into a single three-dimensional NumPy array, where the dimensions of the array are [samples, time steps, features].

To make this clearer, there are 128 time steps and nine features, where the number of samples is the number of rows in any given raw signal data file.

The load_group() function below implements this behavior. The dstack() NumPy function allows us to stack each of the loaded 3D arrays into a single 3D array where the variables are separated on the third dimension (features).

We can use this function to load all input signal data for a given group, such as train or test.

The load_dataset_group() function below loads all input signal data and the output data for a single group using the consistent naming conventions between the directories.

Finally, we can load each of the train and test datasets.

The output data is defined as an integer for the class number. We must one hot encode these class integers so that the data is suitable for fitting a neural network multi-class classification model. We can do this by calling the to_categorical() Keras function.

The load_dataset() function below implements this behavior and returns the train and test X and y elements ready for fitting and evaluating the defined models.

Fit and Evaluate Model

Now that we have the data loaded into memory ready for modeling, we can define, fit, and evaluate an LSTM model.

We can define a function named evaluate_model() that takes the train and test dataset, fits a model on the training dataset, evaluates it on the test dataset, and returns an estimate of the model’s performance.

First, we must define the LSTM model using the Keras deep learning library. The model requires a three-dimensional input with [samples, time steps, features].

This is exactly how we have loaded the data, where one sample is one window of the time series data, each window has 128 time steps, and a time step has nine variables or features.

The output for the model will be a six-element vector containing the probability of a given window belonging to each of the six activity types.

Thees input and output dimensions are required when fitting the model, and we can extract them from the provided training dataset.

The model is defined as a Sequential Keras model, for simplicity.

We will define the model as having a single LSTM hidden layer. This is followed by a dropout layer intended to reduce overfitting of the model to the training data. Finally, a dense fully connected layer is used to interpret the features extracted by the LSTM hidden layer, before a final output layer is used to make predictions.

The efficient Adam version of stochastic gradient descent will be used to optimize the network, and the categorical cross entropy loss function will be used given that we are learning a multi-class classification problem.

The definition of the model is listed below.

The model is fit for a fixed number of epochs, in this case 15, and a batch size of 64 samples will be used, where 64 windows of data will be exposed to the model before the weights of the model are updated.

Once the model is fit, it is evaluated on the test dataset and the accuracy of the fit model on the test dataset is returned.

Note, it is common to not shuffle sequence data when fitting an LSTM. Here we do shuffle the windows of input data during training (the default). In this problem, we are interested in harnessing the LSTMs ability to learn and extract features across the time steps in a window, not across windows.

The complete evaluate_model() function is listed below.

There is nothing special about the network structure or chosen hyperparameters, they are just a starting point for this problem.

Summarize Results

We cannot judge the skill of the model from a single evaluation.

The reason for this is that neural networks are stochastic, meaning that a different specific model will result when training the same model configuration on the same data.

This is a feature of the network in that it gives the model its adaptive ability, but requires a slightly more complicated evaluation of the model.

We will repeat the evaluation of the model multiple times, then summarize the performance of the model across each of those runs. For example, we can call evaluate_model() a total of 10 times. This will result in a population of model evaluation scores that must be summarized.

We can summarize the sample of scores by calculating and reporting the mean and standard deviation of the performance. The mean gives the average accuracy of the model on the dataset, whereas the standard deviation gives the average variance of the accuracy from the mean.

The function summarize_results() below summarizes the results of a run.

We can bundle up the repeated evaluation, gathering of results, and summarization of results into a main function for the experiment, called run_experiment(), listed below.

By default, the model is evaluated 10 times before the performance of the model is reported.

Complete Example

Now that we have all of the pieces, we can tie them together into a worked example.

The complete code listing is provided below.

Running the example first prints the shape of the loaded dataset, then the shape of the train and test sets and the input and output elements. This confirms the number of samples, time steps, and variables, as well as the number of classes.

Next, models are created and evaluated and a debug message is printed for each.

Finally, the sample of scores is printed, followed by the mean and standard deviation. We can see that the model performed well, achieving a classification accuracy of about 89.7% trained on the raw dataset, with a standard deviation of about 1.3.

This is a good result, considering that the original paper published a result of 89%, trained on the dataset with heavy domain-specific feature engineering, not the raw dataset.

Note: given the stochastic nature of the algorithm, your specific results may vary. If so, try running the code a few times.

Now that we have seen how to develop an LSTM model for time series classification, let’s look at how we can develop a more sophisticated CNN LSTM model.

Develop a CNN-LSTM Network Model

The CNN LSTM architecture involves using Convolutional Neural Network (CNN) layers for feature extraction on input data combined with LSTMs to support sequence prediction.

CNN LSTMs were developed for visual time series prediction problems and the application of generating textual descriptions from sequences of images (e.g. videos). Specifically, the problems of:

  • Activity Recognition: Generating a textual description of an activity demonstrated in a sequence of images.
  • Image Description: Generating a textual description of a single image.
  • Video Description: Generating a textual description of a sequence of images.

You can learn more about the CNN LSTM architecture in the post:

To learn more about the consequences of combining these models, see the paper:

The CNN LSTM model will read subsequences of the main sequence in as blocks, extract features from each block, then allow the LSTM to interpret the features extracted from each block.

One approach to implementing this model is to split each window of 128 time steps into subsequences for the CNN model to process. For example, the 128 time steps in each window can be split into four subsequences of 32 time steps.

We can then define a CNN model that expects to read in sequences with a length of 32 time steps and nine features.

The entire CNN model can be wrapped in a TimeDistributed layer to allow the same CNN model to read in each of the four subsequences in the window. The extracted features are then flattened and provided to the LSTM model to read, extracting its own features before a final mapping to an activity is made.

It is common to use two consecutive CNN layers followed by dropout and a max pooling layer, and that is the simple structure used in the CNN LSTM model here.

The updated evaluate_model() is listed below.

We can evaluate this model as we did the straight LSTM model in the previous section.

The complete code listing is provided below.

Running the example summarizes the model performance for each of the 10 runs before a final summary of the models performance on the test set is reported.

We can see that the model achieved a performance of about 90.6% with a standard deviation of about 1%.

Note: given the stochastic nature of the algorithm, your specific results may vary. If so, try running the code a few times.

Develop a ConvLSTM Network Model

A further extension of the CNN LSTM idea is to perform the convolutions of the CNN (e.g. how the CNN reads the input sequence data) as part of the LSTM.

This combination is called a Convolutional LSTM, or ConvLSTM for short, and like the CNN LSTM is also used for spatio-temporal data.

Unlike an LSTM that reads the data in directly in order to calculate internal state and state transitions, and unlike the CNN LSTM that is interpreting the output from CNN models, the ConvLSTM is using convolutions directly as part of reading input into the LSTM units themselves.

For more information for how the equations for the ConvLSTM are calculated within the LSTM unit, see the paper:

The Keras library provides the ConvLSTM2D class that supports the ConvLSTM model for 2D data. It can be configured for 1D multivariate time series classification.

The ConvLSTM2D class, by default, expects input data to have the shape:

Where each time step of data is defined as an image of (rows * columns) data points.

In the previous section, we divided a given window of data (128 time steps) into four subsequences of 32 time steps. We can use this same subsequence approach in defining the ConvLSTM2D input where the number of time steps is the number of subsequences in the window, the number of rows is 1 as we are working with one-dimensional data, and the number of columns represents the number of time steps in the subsequence, in this case 32.

For this chosen framing of the problem, the input for the ConvLSTM2D would therefore be:

  • Samples: n, for the number of windows in the dataset.
  • Time: 4, for the four subsequences that we split a window of 128 time steps into.
  • Rows: 1, for the one-dimensional shape of each subsequence.
  • Columns: 32, for the 32 time steps in an input subsequence.
  • Channels: 9, for the nine input variables.

We can now prepare the data for the ConvLSTM2D model.

The ConvLSTM2D class requires configuration both in terms of the CNN and the LSTM. This includes specifying the number of filters (e.g. 64), the two-dimensional kernel size, in this case (1 row and 3 columns of the subsequence time steps), and the activation function, in this case rectified linear (ReLU).

As with a CNN or LSTM model, the output must be flattened into one long vector before it can be interpreted by a dense layer.

We can then evaluate the model as we did the LSTM and CNN LSTM models before it.

The complete example is listed below.

As with the prior experiments, running the model prints the performance of the model each time it is fit and evaluated. A summary of the final model performance is presented at the end of the run.

We can see that the model does consistently perform well on the problem achieving an accuracy of about 90%, perhaps with fewer resources than the larger CNN LSTM model.

Note: given the stochastic nature of the algorithm, your specific results may vary. If so, try running the code a few times.

Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

  • Data Preparation. Consider exploring whether simple data scaling schemes can further lift model performance, such as normalization, standardization, and power transforms.
  • LSTM Variations. There are variations of the LSTM architecture that may achieve better performance on this problem, such as stacked LSTMs and Bidirectional LSTMs.
  • Hyperparameter Tuning. Consider exploring tuning of model hyperparameters such as the number of units, training epochs, batch size, and more.

If you explore any of these extensions, I’d love to know.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Papers

Articles

Summary

In this tutorial, you discovered three recurrent neural network architectures for modeling an activity recognition time series classification problem.

Specifically, you learned:

  • How to develop a Long Short-Term Memory Recurrent Neural Network for human activity recognition.
  • How to develop a one-dimensional Convolutional Neural Network LSTM, or CNN LSTM, model.
  • How to develop a one-dimensional Convolutional LSTM, or ConvLSTM, model for the same problem.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning models for Time Series Today!

Deep Learning for Time Series Forecasting

Develop Your Own Forecasting models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Time Series Forecasting

It provides self-study tutorials on topics like:
CNNs, LSTMs, Multivariate Forecasting, Multi-Step Forecasting and much more...

Finally Bring Deep Learning to your Time Series Forecasting Projects

Skip the Academics. Just Results.

See What's Inside

144 Responses to How to Develop RNN Models for Human Activity Recognition Time Series Classification

  1. Tin September 28, 2018 at 9:20 am #

    Hi Jason,

    So enjoy reading with your stuff, very helpful. As we can use CNN+LSTM to predict the spatial-temporal data, can we reverse the architecture as LSTM+CNN to do the same job? Any examples for LSTM + CNN?

  2. Akilesh October 29, 2018 at 12:40 pm #

    Hi Jason,
    Can you please explain the choice of parameters for the LSTM network?
    Especially the LSTM layer and the dense layer?
    What does the value 100 signify?

    • Jason Brownlee October 29, 2018 at 2:14 pm #

      The model was configured via trial and error.

      There is no analytical way to calculate how to configure a neural network model, more details here:
      https://machinelearningmastery.com/faq/single-faq/how-many-layers-and-nodes-do-i-need-in-my-neural-network

      • Kiran April 3, 2019 at 12:33 am #

        Hi Jason, I’m just trying to understand correct me if I am wrong, does this value 100 in LSTM layer equal to 100 LSTM units in input layer? and each LSTM layer is fed with a sequence of length 128 (Time steps), right?

        • Jason Brownlee April 3, 2019 at 6:44 am #

          Yes, 100 is refers to the number of parallel units or nodes. It is unrelated to the number of timesteps.

          Each node gets the full input sequence, not each layer.

          • Kiran April 3, 2019 at 6:57 pm #

            Thank you for clarifying and I have one more question regarding dense layer. what is the input that dense layer is receiving from LSTM layer?( is it the time series itself or the last time step) and what happens if # of nodes in dense layer is not equal to # of nodes in LSTM(I mean is it possible to have more nodes in dense layer )?

          • Jason Brownlee April 4, 2019 at 7:46 am #

            The LSTM creates an internal representation / extracted features from the entire input sequence.

          • Jaroslaw Goslinski August 2, 2019 at 2:32 am #

            The term: „LSTM units” is very misleading. When it comes to the number of units we actually speak about the size of an internal state vector (either hidden, input or forget), so in the end it is just a mathematical thing. In my opinion it should not be called parallel because everything is done in one place at the same time (simple matrix-vector multiplication, where both are dimensionally-extended due to the number of “LSTM units”). BTW: Very good tutorial

          • Jason Brownlee August 2, 2019 at 6:53 am #

            Thanks!

            Perhaps “nodes” would be more appropriate?

          • Jaroslaw Goslinski August 2, 2019 at 10:04 pm #

            Definitely!

          • Jason Brownlee August 3, 2019 at 8:04 am #

            Note that I took the “unit” nomenclature from the original LSTM paper, and use it all the time across this site and my books:
            https://www.bioinf.jku.at/publications/older/2604.pdf

  3. Amir October 31, 2018 at 8:13 pm #

    Thanks for such a comprehensive tutorial
    I try it and it worked as expressed in the tutorial.

    Now, I’m going to try it with my data that comes from a single axis accelerometer. It means I have only one feature so I don’t need a 3D array but a 2D.
    You mentioned “The model (RNN LSTM) requires a three-dimensional input with [samples, time steps, features]. This is exactly how we have loaded the data.”
    Then, it means that it won’t work with a 2D array? or I can consider a 3D array but the 3rd dimension has only one member?

    and my second question is:
    I need the model for a real-time classification, so I need to train once and then save the model and use it in my web application.
    how can I save the model after training and use it?

    • Jason Brownlee November 1, 2018 at 6:05 am #

      Yes, even if the 3rd dimension has 1 parameter, it is still a 3D array.

      You can call model.save() to save the model.

  4. Daniel Aguilera Garcia November 21, 2018 at 10:24 am #

    Hello Jason.

    In my case I have 2 time series from EGG and I have to diseign a model that classify in two types de signal. I dont understand exactly how should i reshape the data.

    The freq is 256 values per second so i could divide in windows like you did before. The problem is that i dont know how to put the 3rd dimension of caracteristics. From each window I have 7 caracteristics not from each moment (max, min, std, fft bandwidths, fft centroids, Arima 1, Arima 2)

    Please, how could I do what you mean [samples, time steps, features] in my case??

    • Jason Brownlee November 21, 2018 at 2:10 pm #

      Perhaps those 7 characteristics are 7 features.

      If you have 256 samples per section, you can choose how many samples/seconds to use as time steps and perhaps change the resolution via averaging or removing samples.

      • Daniel Aguilera Garcia November 21, 2018 at 9:16 pm #

        Let’s see im going to make an easy example:

        channel1 (values in one second)=2,5,6,8,54,2,8,4,7,8,…,5,7,8 (in total 256 values per second)

        channel2 (values in one second)=2,5,6,8,54,2,8,4,7,8,…,5,7,8 (in total 256 values per second)

        7 diferent features

        [samples,timesteps,features]=[2, 256, 7]?

        and another questions, for example the mean feature:

        chanel 1:

        feat_mean[0]=2
        feat_mean[1]=(2+5)/2=3.5
        feat_mean[2]=(2+5+6)/3=4.33
        etc…

        is it correct? what I understood is that I have to substract features for each each moment?

  5. coolyj November 22, 2018 at 5:21 pm #

    where does the “561 element vector of features” apply to?

    • Jason Brownlee November 23, 2018 at 7:44 am #

      That vector was the pre-processed data, prepared by the authors of the study that we do not use in this tutorial.

  6. Daniel Aguilera Garcia November 26, 2018 at 8:29 pm #

    hello jason!

    should we normalize each feature??

    • Jason Brownlee November 27, 2018 at 6:33 am #

      Perhaps try it and evaluate how it impacts model performance.

      • Asif Nawaz May 21, 2019 at 3:09 am #

        Does Batch Normalization layer serves the same purpose?

        • Jason Brownlee May 21, 2019 at 6:39 am #

          Not really, it is used to normalize a batch of activations between layers.

  7. Leon December 3, 2018 at 7:05 am #

    Hi Jason, thanks for the great article.
    I notice in your load data section, you probably mean 2.56 seconds instead of 2.65, since 128(time step) * 1/50(record sampling rate) = 2.56.

  8. Daniel Aguilera Garcia December 12, 2018 at 11:06 pm #

    Hello Jason,

    Why in Conv LSTM kernel_size=(1,3)? I don’t understand

    • Jason Brownlee December 13, 2018 at 7:53 am #

      For 1 row and 3 columns.

      • Daniel Aguilera Garcia December 14, 2018 at 6:53 am #

        what what represents this in this example?

        • Jason Brownlee December 14, 2018 at 2:34 pm #

          We split each sequence into sub-sequences. Perhaps re-read the section titled “Develop a ConvLSTM Network Model” and note the configuration that chooses these parameters.

      • Asif Nawaz May 9, 2019 at 3:47 am #

        Why we used 1 row? and why we used convLSTM2D? Can’t we model this problem like Conv1D?

        • Jason Brownlee May 9, 2019 at 6:47 am #

          A convlstm2d is not required, the tutorial demonstrates how to create different types of models for time series classification.

  9. beomyoung January 3, 2019 at 5:37 pm #

    In dataset file, there aren’t label file (about y) Can i earn that files??

    • Jason Brownlee January 4, 2019 at 6:27 am #

      They are in a separate file with “y” in the filename.

  10. beomyoung January 7, 2019 at 5:02 pm #

    i found it thank you!!

    • Jason Brownlee January 8, 2019 at 6:45 am #

      Glad to hear that.

      • beomyoung January 8, 2019 at 1:31 pm #

        thank you for answering me. I have one more question! you have 30 subjects in this experiment. so when you handle data, for example, in ‘body_acc_x_train’ all of the 30 subjects data is just merged?

        • Jason Brownlee January 9, 2019 at 8:37 am #

          Yes.

          Also note, I did not run the experiment, I just analyzed the freely available data.

          • beomyoung January 9, 2019 at 1:05 pm #

            Oh thanks. Nowadays, i’m doing task about rare activity detection using sensor data, RNN&CRNN.
            For example, i wanna detect scratch activity and non-scratch activity. But in my experiment, ratio of scratch and non-scratch window is not balanced (scratch is so rare..) then how to put my input ? Can you give me some advices?

          • Jason Brownlee January 10, 2019 at 7:43 am #

            Perhaps you can oversample the rare events? E.g. train more on the rare cases?

  11. Jemshit January 17, 2019 at 7:41 pm #

    Hi Jason, i have question regarding to feeding feature data to LSTM. I have not used CNN, but if i use regular Auto encoder (sandwich like structure) instead of CNN for feature extraction, and if i define the time step of LSTM to be lets say 128,
    1) should i extract feature from each time step and concatenate them to form a window before feeding to LSTM or
    2) should i extract feature from window itself and feed the vector to LSTM

    Thanks

    • Jason Brownlee January 18, 2019 at 5:33 am #

      The CNN must extract features from a sequence of obs (multiple time steps), not from a single observation (time step).

      • Jemshit January 18, 2019 at 6:18 am #

        But LSTM will interpret features of each time step, not of a whole window right?

  12. Kiran Krishna February 11, 2019 at 11:28 pm #

    Hi Jason,

    Thank you for the great material. My question is on data preprocessing. I have a squence of pressure data every 5 seconds with timestamp. How can I convert the 2D dataframe(Sample, feature) into 3D (Sample, Timestep, feature).

  13. P.Y Lee February 12, 2019 at 1:57 pm #

    Hi Jason,

    In CNN LSTM model part, why we need to split the 128 time steps into 4 subsequences of 32 time steps?
    Can’t we do this model with 128 time step directly?

    Thank you

    • Jason Brownlee February 12, 2019 at 2:00 pm #

      No, as this specific model expects sequences of sequences as input.

  14. imGaboy February 19, 2019 at 11:48 pm #

    Hi there,

    I have a question about your first model.

    You set the batch_size for 64.
    When I run your model (with verbose = 1), I got this :
    Epoch 1/15
    7352/7352 [==============================] – 12s 2ms/step – loss: 1.2669 – acc: 0.4528

    It is mean 7352 * 64 ?

    I ask this, because I want to overwrite your example which woulduse fit_generator, and I didn’t get the same results.

    Here is my code:

    ………………

    • Jason Brownlee February 20, 2019 at 8:08 am #

      I don’t recall sorry, perhaps check the data file to confirm the number of instances?

  15. cora March 22, 2019 at 5:30 am #

    any one cannot load the data? help, I think I follow the guide, unzip and rename in the working directory

  16. Adim April 11, 2019 at 11:35 pm #

    Hi Jason,

    thank you for this tutorial. I am little be confused about the load the data set. Why we read_csv since no CVS files in the dataset. Sorry for this question because I am new to this subject. Also, I applied the code for (load_group), (load_dataset_group) and (load_dataset) ? Can you tell me if there is something need to add ?

    • Jason Brownlee April 12, 2019 at 7:47 am #

      In this tutorial we are loading datasets from CSV files, multiple files.

  17. Manisha April 12, 2019 at 4:40 pm #

    Sir, LSTM(100) means 100 LSTM cells with each cell having forget,input and output gate and each LSTM cell sends the output to other LSTM cells also and dinally every cell will give a 100 D vector as output..m i right?

    • Jason Brownlee April 13, 2019 at 6:23 am #

      100 cells or units means that each unit gets the output and creates an activation for the next layer.

      The units in one layer do not communicate with each other.

      • Manisha April 13, 2019 at 6:48 am #

        Sorry Sir I didn’t get this.

        1: Does these LSTM 100 cells communicate with each other?

        2:If say we have 7352 samples with 128 timesteps and 9 features and my batch size is 64 then can i say that at time= 1 we will input first timestep of 64 samples to all the 100 LSTM cells then at time=2 we will input second time step of 64 samples and so on till we input 128th time step of 64 samples at time= 128 and then do BPTT and each lstm preserves its state from time =1 to time=128 ?

        • Jason Brownlee April 13, 2019 at 6:53 am #

          No, cells in one layer do not communicate with each other.

          No, each sample is processed one at a time, at time step 1, all 100 cells would get the first time step of data with all 9 features.

          BPTT refers to the end of the batch when model weights are updated. One way to think about it is to unroll each cell back through time into a deep network, more here:
          https://machinelearningmastery.com/rnn-unrolling/

          • Manisha April 13, 2019 at 7:17 am #

            “No, each sample is processed one at a time, at time step 1, all 100 cells would get the first time step of data with all 9 features”

            sir batch size means how many samples to show to network before weight updates.

            If i have 64 as batch size….then is it that at time step 1, all 100 cells would get the first time step of each 64 data points with all 9 features?then at next time step =2, all 10 cells would get the second time step of each 64 data points with all 9 features and so on.

            or is it like at time step 1, all cells get the first time step of one data point with all 9 features then at time step =2 all cells get the second time step of that data point and when all the 128 time steps of that point is fed to the network we compute the loss and do this same for remaining 63 points and then updates weights?

            I am getting confused how batch size is working here..what i am visualizing wrong here ?

          • Jason Brownlee April 13, 2019 at 1:42 pm #

            If the batch size is 64, then 64 samples are shown to the network before weights are updated and state is reset.

            In a batch, samples are processed one at a time, e.g. all time steps of sample 1, then all time steps of samples 2, etc.

            I strongly recommend reading this:
            https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input

  18. Manisha April 12, 2019 at 6:31 pm #

    Also here we are using dropout after lstm 100 means only 50 values will be passed to a dense 100 layer right sir?

  19. Avani Pitre April 13, 2019 at 12:19 am #

    hello
    I am beginner in area.. thanks for wonderful tutorial..
    I wanted to workout your LSTM and CNN RNN example, I have downloaded HARDataset but
    I have simple question here how to give input CSV file at the beginning ?
    Do I have to generate that file? if so how to do that?

    # load a single file as a numpy array
    def load_file(filepath):
    dataframe = read_csv(filepath, header=None, delim_whitespace=True)
    return dataframe.values

    Please help me
    thanks in advance

    • Jason Brownlee April 13, 2019 at 6:33 am #

      You can follow the tutorial to learn how to load the CSV file.

  20. Manisha April 13, 2019 at 8:12 am #

    Sir i printed the model summary for CNN_LSTM output dimensions for timedistributed came as 4dimensions like (None,None,30,64)…it is 4D because we have partioned the windows into 4 subwindows of 32 size?

    what is None,None representing here….30,64 I got that it is the output after 1st convolution

    • Jason Brownlee April 13, 2019 at 1:43 pm #

      The “None” sized dimension means that the provided dataset will define the other dimensions.

  21. Manisha April 13, 2019 at 9:02 am #

    Sir https://machinelearningmastery.com/cnn-long-short-term-memory-networks/ u have mentioned that…

    “In both of these cases, conceptually there is a single CNN model and a sequence of LSTM models, one for each time step. We want to apply the CNN model to each input image and pass on the output of each input image to the LSTM as a single time step.”

    So here we are dividing the 128 time steps into 4 subblocks..we will give each block to the CNN at once and our CNN model will give the output features.

    means we have converted 128 time sequences to 4 time sequences here whih we will feed to our LSTM model now.

    so earlier we were feeding 128 time steps to the LSTM(in simple LSTM) and now we will feed 4…am i right?

    • Jason Brownlee April 13, 2019 at 1:47 pm #

      Yes, but the “4 time steps” are distilled from many more real time steps via the CNN model.

      • Manisha April 13, 2019 at 5:50 pm #

        and sir CNN will process each of the 32 windows parallelly or will it first process first 32 sized window and feed it to lstm then another 32 and so on

        • Jason Brownlee April 14, 2019 at 5:45 am #

          Each window of data is processed by the n filters in the CNN layer in parallel – if that is what you mean.

          • Manisha April 14, 2019 at 8:35 am #

            yes sir go it ..thanks a lot

  22. Manisha April 14, 2019 at 9:07 am #

    Sir means all four 32 sized windows from one 128 time step window are processed by th CNN parallelly then at time step 1 what we will input into our lstm?

    like in normal lstm with 128 timesteps we input 1st time step with 9 features then 2nd timestep and so on..

    here since we have processed 4 time steps parallely what we will input to lstm?

    • Jason Brownlee April 15, 2019 at 7:49 am #

      The LSTM takes the CNN activations (feature maps) as input. You can see the output shapes of each layer via the model.summary() output.

      • Manisha April 15, 2019 at 12:20 pm #

        Sir I did that but getting a little confused….this is te summary

        None,None,30,64….1st convoution
        None,None,28,64….2nd convolution
        None,None,28,64…dropout
        None,None,896…maxpool
        None,100…lstm 100
        None,100…dropout
        None,100…dense 100
        None,6…softmax

        Sir,My doubt is we input 32 sized window from 128 sized original window to lstm…so does lstm here predicting the activity for each 32 sized window treating it as one time sequence?

  23. Sàçha April 30, 2019 at 9:40 pm #

    Do you have an idea about the multi-class classification with the algorithm ECOC algorithm.

    Can we use it for unsupervised classification (clustering)

  24. Asif Nawaz May 16, 2019 at 11:15 pm #

    Does it makes sense to use dropout and maxpooling after convlstm layer, like we did in cnn?

    • Jason Brownlee May 17, 2019 at 5:54 am #

      Hmmm, maybe.

      Always test, and use a configuration that gives best performance.

  25. Alicia May 21, 2019 at 11:56 pm #

    Hi Jason,

    thank you for this tutorial.
    Why is it necessary to perform signal windowing before training the neural network?
    Can we consider the full signal instead?

    • Jason Brownlee May 22, 2019 at 8:09 am #

      We must transform the series into a supervised learning problem:
      https://machinelearningmastery.com/time-series-forecasting-supervised-learning/

      • Alicia May 22, 2019 at 7:12 pm #

        Let’s assume that the original signal is an acceleration representing a person walking and that the aim is to establish wheter the person fell on the floor.
        Let’s say the original signal is composed by 1000 samples belonging to class1. The signal is processed and divided into fixed windows of N data points: now I have sub-signals each one labelled with class1. Is that correct to consider different windows even if the peak of the falling is present only in one of the them?

        • Jason Brownlee May 23, 2019 at 5:58 am #

          Perhaps. There are many ways to frame a given prediction problem, perhaps experiment with a few approaches and see what works best for your specific dataset?

  26. Khan May 26, 2019 at 3:09 am #

    Is their any role of LSTM units in convlstm. parameters of convlstm layer are similar to CNN, but nothing related to LSTM can be seen. In LSTM model, 100 LSTM units are used. How can we see convlstm in this context?

    • Jason Brownlee May 26, 2019 at 6:48 am #

      Not sure I follow what you’re asking, can you elaborate?

      • Khan May 26, 2019 at 6:22 pm #

        Following are input layers used in three different models.

        model.add(LSTM(100, input_shape=(n_timesteps,n_features)))
        model.add(TimeDistributed(Conv1D(filters=64, kernel_size=3, activation=’relu’), input_shape=(None,n_length,n_features)))
        model.add(ConvLSTM2D(filters=64, kernel_size=(1,3), activation=’relu’, input_shape=(n_steps, 1, n_length, n_features)))

        ConvLSTM2D layer I believe should be a mixture of CNN and LSTM. All parameters in convlstm layers are parameters of cnn like no of filters, filter size, activation function etc. LSTM as standalone uses 100 LSTM cells. My question was how many LSTM cells will be used by convlstm model? I believe convlstm operates with only one lstm cell?

        • Jason Brownlee May 27, 2019 at 6:48 am #

          I believe each filter is a cell – the idea behind these two concepts is merged.

  27. Riddy June 4, 2019 at 1:27 am #

    I am generating sine and cosine curves for classification. However, I am not sure whether I need to pre-process data or just load it to the model. My understanding is further confounded by the following state in your post “Further, each series of data has been partitioned into overlapping windows of 2.56 seconds of data, or 128 time step”

  28. Amy June 7, 2019 at 11:12 am #

    Jason thanks a lot for his wonderful tutorial.

    I want to use your approach for my problem. But my data set is a little different than what you used here.
    I have 75 time series. Each of them shows two classes. From time 0 until time t ( which is different for each time series) is data for class 1 and from time t until the end of time series is class 2. Then, I want for test time series predict the time at which the class has changed from 1 to 2, or at each time the class is 1 or 2. can you help me how I can use your approach for my problem?

  29. Amy June 8, 2019 at 10:32 pm #

    I mean each time series shows two classes, healthy and unhealthy, for one system. From time 0 until time t, it shows healthy state and from time t until the failure of the system, it shows unhealthy state. We have 75 time series like that with different lengths for both classess. Now We want to determine for a test system, the time that it switches from healthy to unhealthy state.

    Thanks

    • Jason Brownlee June 9, 2019 at 6:21 am #

      Perhaps you can predict the class value for each input time step, if you have the data?

      • Amy June 9, 2019 at 10:42 am #

        May you please explain more? Do you think I can use your approach ?

        • Jason Brownlee June 10, 2019 at 7:35 am #

          I was suggesting perhaps try modeling the problem as a one-to-one mapping so each input time step has a classification.

          More on sequence prediction here:
          https://machinelearningmastery.com/models-sequence-prediction-recurrent-neural-networks/

          • Christian Post September 7, 2019 at 12:19 am #

            I applied your tutorial to data similar to Amy’s (I guessed) where I tried to predict disease events, and for training and validation I used the n days before a disease event, and as comparison n days from an individual without disease events, and each window classified as either 1 (sick) or 0 (healthy).
            The model was performing okay with an AUC of >0.85, but I’m not sure how I would apply this in practice because the time windows for the validation data were designed with a priori knowledge.
            In practice one would have to construct a new input vector every time step, and I don’t think the classification of those vectors would be as good. But I did not try that out yet.

            What I didn’t understand is how would I apply the on-to-one mapping from your article here? You state that the one-to-one approach isn’t appropriate for RNNs since it doesn’t capture the dependencies between time points.

            @Amy you could investigate on heart rate classification with neural networks, I think that is a somewhat similar problem.

          • Jason Brownlee September 7, 2019 at 5:35 am #

            Not sure I follow the question completely.

            Generally, you must frame the problem around the way the model is intended to be used, then evaluate under those constraints. If you build/evaluate a model based on a framing that you cannot use in practice, then the evaluation is next to useless.

            Using training info or domain knowledge in the framing is fine, as long as you expect it to generalize. Again, this too can be challenged with experiments.

            If I’ve missed the point, perhaps you can elaborate here, or email me?
            https://machinelearningmastery.com/contact/

      • Amy June 9, 2019 at 10:43 am #

        May you please explain more?

  30. vinodh June 21, 2019 at 2:55 pm #

    Hello sir,
    Great tutorial, how to normalize or standardize this data.

  31. Simon June 22, 2019 at 9:05 pm #

    Dear Mr. Brownlee,

    I am a student from Germany and first of all: thank you for your great Blog! It is so much better than all the lectures I have been visited so far!

    I had a question regarding the 3D array and I hope you could help me. Let’s assume we have the following case, which is similar to the one from your example:

    We measure the velocity in 3 dimensions (x,y,z direction) with a frequency of 50 Hz over one minute. We measure the velocity of 5 people in total.

    – Would the 3D array have the following form: (5*60*50; 1 ; 3)?

    – What do you mean by time steps? I am referring to “[samples, time steps, features]”.

    – Is the form of the 3D array related to the batch size of our LSTM model?

    Thank you in advance. I would really appreciate your help as I am currently stuck…

    Best regards,
    Simon

  32. jai July 3, 2019 at 6:18 pm #

    what is the use of those 561 features?

  33. jai July 3, 2019 at 6:29 pm #

    if in those 128*9 data, 64*9 represent standing and other 64*9 represent walking then how do i label that 128*9 data?

    • Jason Brownlee July 4, 2019 at 7:43 am #

      You can model the problem as a sequence classification problem, and try different length sequence inputs to see what works best for your specific dataset.

  34. jonniej393 August 9, 2019 at 7:21 pm #

    Amazing!

    But how do I train the network with additional new data? I’m working on a project to detect suspicious activity from surveillance videos.

    I have no idea on how to prepare such dataset. Appreciate your help!

    • Jason Brownlee August 10, 2019 at 7:14 am #

      You can save the model, load it later and update/train it on the new data or a mix of old and new. Or throw away the model and fit it anew.

      Perhaps test diffrent approaches and compare the results.

  35. nisha August 28, 2019 at 9:46 pm #

    Hi Jason,

    Do you have the link for trained model? I would like to quickly check that how well it works on my data.

    Also, what is size of your model? I am looking for the models which are less in size so as to be able to deploy on edge?

    • Jason Brownlee August 29, 2019 at 6:07 am #

      Sorry, I don’t share trained models.

      Perhaps try fitting the model yourself, it only takes a few minutes.

  36. Tommy September 2, 2019 at 1:45 pm #

    what is the purpose of ‘prefix’ when loading the file?

    • Jason Brownlee September 2, 2019 at 1:52 pm #

      In case you have the data located elsewhere and need to specify that location.

  37. Shivamani Patil September 3, 2019 at 9:32 pm #

    Hi Jason,

    If I export this model to an android app should I do any preprocessing on input data from mobile sensors?

  38. Tommy September 14, 2019 at 5:11 pm #

    Hi Jason,
    the first LSTM example, you mentioned that the multiple times of evaluation due to stochastic reason. BUT, how did you get the best performance parameter weights?

  39. Pranav Gundewar September 18, 2019 at 7:59 am #

    Hi Jason, thanks for the great article. I am currently working on human activity recognition (Kinetics-600) and I wanted to connect LSTM with 3D ResNet head for action prediction. Can you please tell how can I use LSTM on 1024 vectors obtained from the last layer and feed it to RNN-LSTM for action prediction?

    Thank you.

  40. Tanveer October 7, 2019 at 10:08 pm #

    model = Sequential()
    model.add(LSTM(100, input_shape=(n_timesteps,n_features)))
    model.add(Dropout(0.5))
    model.add(Dense(100, activation=’relu’))
    model.add(Dense(n_outputs, activation=’softmax’))
    model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
    In this code i am getting an this traceback

    Traceback (most recent call last):
    File “C:\Users\Tanveer\AppData\Local\Programs\Python\Python37-32\HARUP.py”, line 54, in
    model = Sequential()
    NameError: name ‘Sequential’ is not defined
    How can resolve this issue kindly guide me

    • Jason Brownlee October 8, 2019 at 8:02 am #

      You may have skipped some lines from the full example.

  41. AGGELOS PAPOUTSIS October 18, 2019 at 11:44 pm #

    hi Jason. I am running the code from here https://medium.com/@curiousily/human-activity-recognition-using-lstms-on-android-tensorflow-for-hackers-part-vi-492da5adef64 on anaconda and notebook and pycharm with TensorFlow at 1.4 version (i had to downgrade due to the fact that the author is using placeholders which are incompatible with TensorFlow 2) and python at 3.6. The problem here is that I always get train loss: nan

    can you please suggest some ideas here because I cannot find anything when I search for the problem (there are some articles but are not helpful)

    • Jason Brownlee October 19, 2019 at 6:40 am #

      Sorry, i am not familiar with that tutorial, perhaps contact the authors?

  42. Dhiren November 12, 2019 at 6:50 pm #

    Hello Jason,
    Amazing tutorial. I am doing something very similar to this. My problem is: “If I am given a set of 128 time steps with 9 features data, i.e. an ndarray of the shape (128,9), how can I use the model.predict() method to make a prediction for the 128 time steps data?” Currently when I do model.predict( ndarray of shape (128, 9)), I get the error that “expected lstm_1_input to have 3 dimensions, but got array with shape (128, 9)”. From my understanding, I will be provided a time steps data with its feature values, and I have to predict the class for it. How can this data be 3d since I have to predict only one sample?
    Thank you

  43. Alex November 17, 2019 at 11:58 pm #

    hey Jason,

    Thanks for the useful tutorial 🙂

    I don’t really get how the data is shaped into windows of size 128. I know here the data has already been shaped but I am asking if you have a tutorial showing how this shaping is done for a classification problem.

    Thanks,

    • Jason Brownlee November 18, 2019 at 6:46 am #

      This shaping cannot be performed for a classification problem on tabular data, it only makes sense for a sequence prediction problem.

      • Alex November 21, 2019 at 6:18 am #

        so how can I shape my data for a classification problem? is there any tutorial showing similar problem?

  44. john November 22, 2019 at 5:13 am #

    Hi,

    do you think your CNN+LSTM example would achieve good results for pose estimation ? what necessary changes would have to be done to accomplish this ? thanks in advance

  45. Marvi Waheed December 3, 2019 at 8:36 pm #

    Hello,
    my dataset has dimensions (170,200,9) and i want to assign one class label to one sample/window of 200 timesteps, not individual labels to each of the 200 time steps in one window.
    How can i do this? so my target class output has dimensions (170,1)

    • Jason Brownlee December 4, 2019 at 5:35 am #

      Yes, this is called time series classification.

      The above tutorial does this.

Leave a Reply