Last Updated on

Sequence prediction is different from other types of supervised learning problems.

The sequence imposes an order on the observations that must be preserved when training models and making predictions.

Generally, prediction problems that involve sequence data are referred to as sequence prediction problems, although there are a suite of problems that differ based on the input and output sequences.

In this tutorial, you will discover the different types of sequence prediction problems.

After completing this tutorial, you will know:

- The 4 types of sequence prediction problems.
- Definitions for each type of sequence prediction problem by the experts.
- Real-world examples of each type of sequence prediction problem.

Discover how to develop LSTMs such as stacked, bidirectional, CNN-LSTM, Encoder-Decoder seq2seq and more in my new book, with 14 step-by-step tutorials and full code.

Let’s get started.

## Tutorial Overview

This tutorial is divided into 5 parts; they are:

- Sequence
- Sequence Prediction
- Sequence Classification
- Sequence Generation
- Sequence to Sequence Prediction

## Sequence

Often we deal with sets in applied machine learning such as a train or test sets of samples.

Each sample in the set can be thought of as an observation from the domain.

In a set, the order of the observations is not important.

A sequence is different. The sequence imposes an explicit order on the observations.

The order is important. It must be respected in the formulation of prediction problems that use the sequence data as input or output for the model.

### Sequence Prediction

Sequence prediction involves predicting the next value for a given input sequence.

For example:

- Given: 1, 2, 3, 4, 5
- Predict: 6

Sequence prediction attempts to predict elements of a sequence on the basis of the preceding elements

— Sequence Learning: From Recognition and Prediction to Sequential Decision Making, 2001.

A prediction model is trained with a set of training sequences. Once trained, the model is used to perform sequence predictions. A prediction consists in predicting the next items of a sequence. This task has numerous applications such as web page prefetching, consumer product recommendation, weather forecasting and stock market prediction.

— CPT+: Decreasing the time/space complexity of the Compact Prediction Tree, 2015

Sequence prediction may also generally be referred to as “*sequence learning*“.

Learning of sequential data continues to be a fundamental task and a challenge in pattern recognition and machine learning. Applications involving sequential data may require prediction of new events, generation of new sequences, or decision making such as classification of sequences or sub-sequences.

— On Prediction Using Variable Order Markov Models, 2004.

Technically, we could refer to all of the following problems in this post as a type of sequence prediction problem. This can make things confusing for beginners.

Some examples of sequence prediction problems include:

**Weather Forecasting**. Given a sequence of observations about the weather over time, predict the expected weather tomorrow.**Stock Market Prediction**. Given a sequence of movements of a security over time, predict the next movement of the security.**Product Recommendation**. Given a sequence of past purchases of a customer, predict the next purchase of a customer.

## Sequence Classification

Sequence classification involves predicting a class label for a given input sequence.

For example:

- Given: 1, 2, 3, 4, 5
- Predict: “good” or “bad”

The objective of sequence classification is to build a classification model using a labeled dataset D so that the model can be used to predict the class label of an unseen sequence.

— Chapter 14, Data Classification: Algorithms and Applications, 2015

The input sequence may be comprised of real values or discrete values. In the latter case, such problems may be referred to as discrete sequence classification.

Some examples of sequence classification problems include:

**DNA Sequence Classification**. Given a DNA sequence of ACGT values, predict whether the sequence codes for a coding or non-coding region.**Anomaly Detection**. Given a sequence of observations, predict whether the sequence is anomalous or not.**Sentiment Analysis**. Given a sequence of text such as a review or a tweet, predict whether sentiment of the text is positive or negative.

## Sequence Generation

Sequence generation involves generating a new output sequence that has the same general characteristics as other sequences in the corpus.

For example:

- Given: [1, 3, 5], [7, 9, 11]
- Predict: [3, 5 ,7]

[recurrent neural networks] can be trained for sequence generation by processing real data sequences one step at a time and predicting what comes next. Assuming the predictions are probabilistic, novel sequences can be generated from a trained network by iteratively sampling from the network’s output distribution, then feeding in the sample as input at the next step. In other words by making the network treat its inventions as if they were real, much like a person dreaming

— Generating Sequences With Recurrent Neural Networks, 2013.

Some examples of sequence generation problems include:

**Text Generation**. Given a corpus of text, such as the works of Shakespeare, generate new sentences or paragraphs of text that read like Shakespeare.**Handwriting Prediction**. Given a corpus of handwriting examples, generate handwriting for new phrases that has the properties of handwriting in the corpus.**Music Generation**. Given a corpus of examples of music, generate new musical pieces that have the properties of the corpus.

Sequence generation may also refer to the generation of a sequence given a single observation as input.

An example is the automatic textual description of images.

**Image Caption Generation**. Given an image as input, generate a sequence of words that describe an image.

Being able to automatically describe the content of an image using properly formed English sentences is a very challenging task, but it could have great impact, for instance by helping visually impaired people better understand the content of images on the web. […] Indeed, a description must capture not only the objects contained in an image, but it also must express how these objects relate to each other as well as their attributes and the activities they are involved in. Moreover, the above semantic knowledge has to be expressed in a natural language like English, which means that a language model is needed in addition to visual understanding.

— Show and Tell: A Neural Image Caption Generator, 2015

## Sequence-to-Sequence Prediction

Sequence-to-sequence prediction involves predicting an output sequence given an input sequence.

For example:

- Given: 1, 2, 3, 4, 5
- Predict: 6, 7, 8, 9, 10

Despite their flexibility and power, [deep neural networks] can only be applied to problems whose inputs and targets can be sensibly encoded with vectors of fixed dimensionality. It is a significant limitation, since many important problems are best expressed with sequences whose lengths are not known a-priori. For example, speech recognition and machine translation are sequential problems. Likewise, question answering can also be seen as mapping a sequence of words representing the question to a sequence of words representing the answer.

— Sequence to Sequence Learning with Neural Networks, 2014

It is a subtle but challenging extension of sequence prediction where rather than predicting a single next value in the sequence, a new sequence is predicted that may or may not have the same length or be of the same time as the input sequence.

This type of problem has recently seen a lot of study in the area of automatic text translation (e.g. translating English to French) and may be referred to by the abbreviation seq2seq.

seq2seq learning, at its core, uses recurrent neural networks to map variable-length input sequences to variable-length output sequences. While relatively new, the seq2seq approach has achieved state-of-the-art results in not only its original application – machine translation.

— Multi-task Sequence to Sequence Learning, 2016.

If the input and output sequences are a time series, then the problem may be referred to as multi-step time series forecasting.

**Multi-Step Time Series Forecasting**. Given a time series of observations, predict a sequence of observations for a range of future time steps.**Text Summarization**. Given a document of text, predict a shorter sequence of text that describes the salient parts of the source document.**Program Execution**. Given the textual description program or mathematical equation, predict the sequence of characters that describes the correct output.

## Further Reading

This section provides more resources on the topic if you are looking go deeper.

- Sequence on Wikipedia
- CPT+: Decreasing the time/space complexity of the Compact Prediction Tree, 2015
- On Prediction Using Variable Order Markov Models, 2004
- An Introduction to Sequence Prediction, 2016
- Sequence Learning: From Recognition and Prediction to Sequential Decision Making, 2001
- Chapter 14, Discrete Sequence Classification, Data Classification: Algorithms and Applications, 2015
- Generating Sequences With Recurrent Neural Networks, 2013
- Show and Tell: A Neural Image Caption Generator, 2015
- Multi-task Sequence to Sequence Learning, 2016
- Sequence to Sequence Learning with Neural Networks, 2014
- Recursive and direct multi-step forecasting: the best of both worlds, 2012

## Summary

In this tutorial, you discovered the different types of sequence prediction problems.

Specifically, you learned:

- The 4 types of sequence prediction problems.
- Definitions for each type of sequence prediction problem by the experts.
- Real-world examples of each type of sequence prediction problem.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

So I assume it’s fair to say that every time-series is an example of sequence prediction but not vice-versa? Thanks for the interesting post.

Corect Mike.

Hi Jason,

I need your help with time series classification. I have measurements of different medical parameters for patients captured at every one hour. The output label is whether the patient has Acute Kidney Injury(AKI) or not. Based on the first 12 hour data, we should find out whether the patient has the risk of suffering from AKI or not (After 12 hours). I guess this falls under classification approach (Sequence Classification). However I have only one label (AKI == 0). So should this be considered as Anomaly detection in Time series or Sequence classification? Since I have more than 100 patients data for 12 hour (100 * 12 datapoints with multiple input variables), how do I retain the time factor? As there is only one class, how do I do the training? I am quite stuck as there isn’t no proper example for a beginner like me to understand. Can you please share your insights/ guide me as to how to approach this problem/direct me to the appropriate resource?

One idea, you could frame the problem as “does the event occur in this sequence or not”.

Then treat it as sequence classification, much like activity recognition:

https://machinelearningmastery.com/how-to-develop-rnn-models-for-human-activity-recognition-time-series-classification/

Hello, Jason. I would like to congratulate you on the excellent article. This was very helpful to me. Have you done or thought something to predict the next element of some binary sequence based on the frequency stability of the sequence?

Thanks.

Not directly, no.

Exactly after 1yr am reading your comments 😉

How you are relating and stating this.? can you give me some lights on this ” every time-series is an example of sequence prediction but not vice-versa”

A time series is a sequence of observations: 1, 2, 3, 4

Not all sequences are a time series. The ordering could be something other than time.

So, can we say that problems like 20-question game require sequence prediction to solve? and we can use recurrent neural network to implement?

The system asks questions and after each answer, we predict an answer which helps to determine the next question. Right?

Thanks, that was exactly what I need.

I expect Q&A is a sequence prediction problem.

I have not worked on an example so I cannot give you advice about whether RNNs are appropriate. I would recommend a search on google scholar.

Hi Jason.

Could LSTM do multi-step forecasts? I have two examples below:

1. input the [1,2,3] sequence to predict the [4,5,6,7,8,9,10,…15] sequence;

2. input the [1,2,3] sequence to predict the [10,11,12] sequence.

If LSTM can do, could you give a lesson on this kind of problems?

Thank you very much.

Yes, see here:

https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/

Hello Jason,

Thank you for this post, it is very useful and interesting.

I´m thinking about the following problem…, Given a single input sequence, we want to predict several sequences, that can be of different lengths. For instance, this problem can be encountered in the Alternative Splicing phenomenon, where given a single RNA sequence, we can obtain multiple proteins.

My questions are:

1- Have the problem “Input: One sequence -> Output: Several sequences” been studied in the literature?

2- Can LSTMs solve this type of problem?

Best and thanks

I have not seen this, but LSTMs could address it. Consider a multiple-output model:

https://machinelearningmastery.com/keras-functional-api-deep-learning/

Jason –

I enjoyed this post and I believe it may help me solve a predictive problem I’ve been pondering.

The data is primarily text based, time series data involving an ‘actor’ object that I receive information on. That information, other than the date/time information is also text. I know that given information sequence ‘A’ that the next informational sequence is most often ‘B’. However there may well be several other sequences that are also highly likely.

What I’m looking for is a learning method that can identify anomalous information reports so they can be reviewed and subsequently validated as either truly anomalous or potentially a new, yet valid, sequenced item.

Anything you might be able to point me towards would be greatly appreciated.

Thanks!

I would recommend investigating the field of time series anomaly detection. Perhaps start on google scholar?

Thanks Jason, I spent a considerable amount of time yesterday looking into what you suggested.

Just to clarify, the timestamps only serve to order the reports as they arrive, they have little significance beyond that.

Do any of your publications deal with pointing an unsupervised, or minimally supervised, method at this sort of data? As opposed to say numeric data?

I’ve done a considerable amount of ‘crunching’ of the data (it’s billions of rows) and have built a reference table of the likely ‘next event’ given the previous event. However that solution is not as robust, nor as flexible as I’d like it to be.

LSTM and GAN appear to show promise for what I’m trying to do yet most of the examples I’ve seen don’t seem to fit very well with the data I have to work with.

Again, I will appreciate any insight you could share.

Thanks!

Sorry, I don’t have material on semi-supervised learning at this stage, I hope to cover it in the future.

I would recommend testing a suite of methods as well as a suite of different framings of the problem to see what works best.

Thanks Jason!

You’re welcome.

Hello Jason,

Thank you for such informative article.

But I am not able to fit a prediction problem I’ve been working on in any category you have mentioned.

I have data of a person who visits certain places in a sequence from a sample of places.

let’s say he wants to visit [‘NY’, ‘LA’, ‘DC’, ‘TX’, ‘FL’] then he’ll visit it in this sequence [‘TX’, ‘LA’, ‘NY’,’FL’, ‘DC’].

I have historical data of his previous visits in sequence.

[‘TX’, ‘LA’, ‘NY’,’FL’, ‘DC’]

[‘AK’, ‘FL’, ‘NY’] and so on.

so for a random list of places i need to predict in which sequence he is gonna visit those places.

I’ll really appreciate if you can point me toward something.

Thanks

Perhaps this post will help you describe your problem:

http://machinelearningmastery.com/how-to-define-your-machine-learning-problem/

Hi Jason,

My interest in ML is application part of it. I am from VLSI field.

The area of ML is very vast and I don’t know where to start with for my problem.

Below is a brief description of my problem.

The system i am testing basically generate events. Sequence of these are of interest to me.

One can manually look at these event sequence and recognize them to be useful. But manual process is very cumbersome and also there could be millions of events within which one has to look for interesting events.

The interesting event sequence are known a-priori. The spacing between these events can vary though.

Do you have any suggestion as to what I should be trying out to begin with?

I am not looking for solutions actually but only for guidance.

Yes, use LSTMs.

Take an example from the blog as a starting point and adapt it for your problem.

Start here:

https://machinelearningmastery.com/start-here/#lstm

Sure. Thanks, Jason. I will read through and get back if needed

Hello Jason,

Thank you for this tutorial which is very interesting, but I would like to find a sequential dataset that I can use in my research for the predictive maintenance algorithm.

My best advice is to contrive a problem for research purposes that has the properties you require.

Thank you Jason. Very interesting post!

Just a quick but also confusing question of mine. Let’s say I have [4, 5, 6] as input, I want to output

[14, 15, 16] or [24, 25, 26] and etc… Of course I have the training dataset which takes the input as [1, 2, 3] and the output as [11, 12, 13], [21, 22, 23] and etc.. which means I have one-to-many (not the name of model type here) relationship in my training set. I am wondering whether the RNN(or LSTM) can even recognize these relationships simultaneously. Another thing is, since we only need to find 1 to 11, 2 to 12… is seems that if I change order of my training dataset, i.e. [2, 3, 1] as input, [12, 13, 11] as output, the model can still learn the correspondent pattern. So here it might violate the principle that ORDER IS IMPORTANT. I have read a lot of your valuable blogs and learned a lot. But still can not find the answer. Any response is really appreciated!

Sure, my advice would be to try it and see how you go.

The model could output two length n vectors or two sequences of n=3 timesteps. Try both.

I would suggest exploring multiple output models with one “sub-model” for each output, see here:

https://machinelearningmastery.com/keras-functional-api-deep-learning/

I’m eager to hear how you go.

So I have this data set of images that represent grid-wise crime (frequency) on daily basis. So I have a series of images i1,i2,i3,… in, and I want to forecast or predict in+1th and beyond images(crime hotspots or frequency). How do you think I should approach this problem?

This post might help you define your problem:

http://machinelearningmastery.com/how-to-define-your-machine-learning-problem/

Hello Jason,

I’d have a question regarding Time series, forex, there is a pattern named double-bottom looks like the “W” letter, as input sequence this pattern can take any arbitrary length (in time), how should I deal with this problem? Can I transform this input sequence to a sequence of fixed length?

Thanks

What do you mean deal with it? Predict this pattern?

If so, develop a dataset of examples with/without the pattern and fit a model to classify them.

Hi Jason,

sorry for not being explicit enough. I want to classify some time series but the length of the time series patterns, which are inputs here, are required

to be known in advance. However, such information is not always available. In

addition, patterns of different lengths may co-exist in a

time series dataset (for example the forex “W” pattern might be 8 or 38 in length, we don’t know it in advance).

How to present such inputs to the machine if their length is not known in advance?

Thanks!

Interesting.

The different lengths you can address with zero padding and a mask input layer. I have many examples on the blog:

The scale invariance might require some experimentation. Perhaps an LSTM can do it. Perhaps a CNN is required or some other compressed interpretation of the sequence.

Is it possible to sort the results of the prediction?

OR the NN will give the prediction results in descending order based on the prediction values?

I’m not sure I understand. If your model outputs a sequence, why would you need to sort it?

i am sorry ! I should have been more specific.

I meant, for normal cases where the output is not a sequence, can the NN give the prediction results in descending order based on the prediction values?

You can output a prediction probability for each class in a classification problem, then rank the probabilities.

Is that what you mean?

If so, you can use a softmax in the output layer and have one neuron for each class in your problem.

If I have 5 classes and do what you asked to do (using softmax in the output layer and having one neuron for each class), the probabilities I get looks like this for each prediction:

[[ 1.32520108e-05, 7.61212826e-01, 2.38773897e-01, 1.89434655e-08, 1.21214816e-08],

[ 3.46436082e-07, 1.17851084e-03, 9.88936901e-01, 8.01233668e-03, 1.87186315e-03],……..]

and these values are not in any order.

So how can I rank them in an order?

The probabilities will be in the order of the classes (e.g. 1-5 ) for the one hot encoded class values used to train the model.

Hi Jason,

I have a problem where I have training data of tag-ids and I would like to extract the pattern by learning from it. Which models are suitable to train on this sort of data? I see this as a unsupervised learning problem and in current scenario we solve it using the help of regular expressions.

Tag-ids are in this format

eg:

400-SG-01002-A600

50-SG-01010-A600/B1

V-0514

STEEL-ETAGE-1-FRMW

Given a collection of words, I should be able to find out which word is a tag-id based on the learning

This framework will help you define your problem in terms of predictive modeling:

http://machinelearningmastery.com/how-to-define-your-machine-learning-problem/

Hi Jason,

Thank you for all your material.

I’m new on this area and I’m looking for help.

The LSTM models I found to study always work with only one feature, but I would like to give more classes as input to the network.

To be more specific, I would like to use as input and output to the network: [FeatureA][FeatureB][FeatureC].

FeatureA is a categorial class with 100 different possible values.

FeatureB and FeatureC are categorial class too but only have 5 unique values.

Any sugestions or tutorials on how to do this?

A class is an output, not input.

Here is an example of an LSTM with multiple inputs:

https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/

If you have categorical inputs, you can use a one hot encoding or integer encoding prior to modeling.

For multi-class outputs, you can use a Dense layer on the output after the LSTMs with softmax and one neuron per output class, here is an example of a dense without the lstm for multi-class classification:

https://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/

Hello Jason,

Thank you for the article.

I have N datasets and each data-set has 3 features and 1 target. All the features and target have X data points in time. I want to train a LSTM on 80% of datasets and test on rest 20%.

My problem is not exactly forecasting but multiple sequences to sequence prediction. Could you please tell me how to set the input shape my data set.

dataset1 –>

[feature1 –> [0,1,2]

feature2 –> [4,6,8]

feature3 –> [3,5,7]

target –> [1,1,2] ]

Thank you

This will help you shape your data:

https://machinelearningmastery.com/faq/single-faq/how-do-i-prepare-my-data-for-an-lstm

Thank you. So, based on the articles, am i correct in setting the shape of the input data as (number of train datasets, length of any feature array, number of features) ?

Do you have any article that dealt with this kind of example?

Thank you again for all the articles you shared. They are very informative.

Generally, I cannot comment on “correctness” without getting deeply involved in your project.

I would recommend reviewing how to prepare data for the LSTM, perhaps reviewing what has worked on other problems, then try a suite of ways of framing the problem to see what works best for your specific case.

Thank you

Hi

You have best site and best article I learn a lot of solution.

I have question: my data set is numbers and i need predict after number from previous numbers and just 4 targets tar[54,26,18,32] which sequence is true for data set?

Thanks.

Sounds like a many-to-one sequence prediction problem.

806,046,009,905??????????????

What was the problem exactly?

is the sequence prediction algorithm same the Convolutional neural network algorithm?

or it has the same idea

You can use both LSTMs or 1D CNNs for sequence prediction.

Hi!

I have this problem:

https://ai.stackexchange.com/questions/6741/regression-with-more-than-one-output-neural-network

What kind of sequence do you think it could be?

Perhaps you can summarize your problem in a sentence for me?

I tried to shorten my problem description, but I couldn’t make it fit in one sentence because I felt there was too much to say. Hope you don’t mind.

There is a system in which researchers receive a classification that can be C, B, A or A1, where C is the lowest and A1 is the highest.

This classification is based on the number of products that the researcher has in his profile.

I want to make a recommendation of the number of products that a researcher must do to improve their classification within the system, taking into account the number of products and the classification that they currently have.

Sounds like a constraint optimization problem rather than a machine learning problem.

I’d recommend looking into the field of ‘operations research’ and their methods for constraint optimization.

The thing is that the recommendations must be personalized, according to the profile of the researcher. Because there are several categories of products, some are mandatory to go up in category, but besides mandatory products you can choose among several.

For example, if an investigator is a lawyer, it should be unlikely that the system would suggest making products related to medicine, or it might suggest it, in case there is activity of that type in his profile.

These sounds like constraints in an optimization algorithm, like a bin packing problem or knapsack problem. It does not sound like a recommender system, but I could be wrong.

To build a recommender system for this, you need to give products or activities for the researchers scores to measure how important they are for them. Then you can build a user-based or item-based recommender system. Hope it helps.

You need to give scores for products or activities of researchers to measure how important they are for them. Then you can build a user-based or item-based recommender system. Hope it helps.

hello jason i have a many to one sequence forecast question. i was hoping you could tell me how to get one number correct in massachusetts lottery keno game

a wager of one spot for $20 pays $50 back

i know its an rng with seed and algorithm

i know you have to play when it is busy

This is a common question that I answer here:

https://machinelearningmastery.com/faq/single-faq/can-i-use-machine-learning-to-predict-the-lottery

Hi Jason,

I have a problem which, according to me, does not fit any of the above situations.

Given a disparate set of entries, and a sequence as an output, is it possible to predict what the sequence would be with a different set of entries?

For example:

(a,b,c,d) always gives [d,a,b,c]

(a,c,b,d) also always gives [d,a,b,c]

and so on

Assuming it is trained with every possible letter, I want to know what (a,c,d,e) would give, for example.

One approach I had was to convert this to a sequence to sequence matching problem by feeding in every permutation of the inputs as a sequence, and matching it to the output, but in such a scenario I may not require NN in the first place.

Do you have any insight to offer on this?

Perhaps this framework will help you understand whether your problem can be framed as a supervised learning problem:

http://machinelearningmastery.com/how-to-define-your-machine-learning-problem/

Can i use this to store data of over 50 years and use it to predict what could happen in the tenth year.If so, how can i do it

Sure, you could try.

Once you fit the model, you can call model.predict() to make an out of sample forecast.

I have tens of examples on the blog, try the search.

Hi Jason,

I was wondering if there is really any difference between sequence-to-sequence and sequence prediction problems (assuming length/dimension of sequence is known and fixed).

If there is no difference, then how would one decide between employing GAN or an ordinary neural network model?

Thanks,

I don’t follow, can you give an example?

Can we apply this to predict clinical events based on past data of others , I want to see if certain muscoskeletal injuries have a sequence to it

Perhaps.

Thanks for this tutorial Jason.

I have a problem where we have sensor data with different parameters and we want to predict the CO alarm. As per the different values of the variables we have to predict when the next alarm would take place. The data is a time stamp data. Please guide me how to proceed with such business problems.

Aside this I have sent you LinkedIn invite please accept it.

Thanks in advance.

Jaideep Negi

Perhaps this process will help:

https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/

Hi Jason,

I am trying to predict categorical data with example 6.7 . Each row has some categorical data as below,

[ABC,DEF,GHI, XXX]

[GHI,BTY,,AAA,PPP]

[DEF,XYZ,BBB,GHI]

I followed below steps,

1) Label encoded all values

2) Looped all rows, one hot encode it and train LSTM

3) Predict

But when I do evaluate, I found I am getting same prediction value for all test data.

I exactly followed your code as in example 6.7in LSTM with Python ebook. Also, when I tried to compile your code in 6.7, I was getting error.

Perhaps the problem is challenging or does not have enough data or the model needs to be tuned?

What error are you getting with the code in the tutorial?

Hi,

Good posts Jason. If I would like to do my Ph.D in Sequence Prediction specifically in stock market prediction in India which of your series is most suited for it

None.

Hi jason

could you explain which model is good for stock market prediction and why?

This is a common question that I answer here:

https://machinelearningmastery.com/faq/single-faq/can-you-help-me-with-machine-learning-for-finance-or-the-stock-market

Dear Sir,

I need an LSTM training and testing algorithm of time sequence prediction for deeply study. Is there any book or tutorial in this regards?

Thanks

Azad

Yes, you can start here:

https://machinelearningmastery.com/start-here/#deep_learning_time_series

Hi Jason,

I’m having a hard time adopting this methodology to a classification problem with more than one time series. For example, a data set for customer churn or employee attrition where each customer/employee” can have their own time series. Is an LSTM NN the best way to model such a problem or is a classification algorithm with features that capture the time variant information better?

Thanks!

Ronen

Perhaps try a few different framings of the problem, this might give you some ideas:

https://machinelearningmastery.com/faq/single-faq/how-to-develop-forecast-models-for-multiple-sites

Many thanks for your article. My problem is extracting a sequence of words representing two parts of relations. So the input is an annotated sentence with two chunks of words related to each other. The output is sequence of words representing part 1 and part 2 of the relation. Could you please advise what type of sequence is this and what is the appropriate model to use.

Thanks

I’m not sure off hand, perhaps you can give a short example?

for example, the following sentence has two parts related with Conditional relationship.

your teacher says [if you study hard], [you will pass the exam], however, I don’t think you have enough time.

the parts are enclosed in square brackets (for illustration). The model needs to extract these two chunks

Hmmm, I think you’ll have to do some research on this.

Off the cuff, the simplest approach would be to have one model output chunks with some marker between chunks, but I expect there are more efficient approaches.

Hi Jason,

Thank you for all the amazing blogs,

I would be grateful if you can clarify the following for me.

Say I have one-minute data sample collected from soccer matches with 20 features. I have just over 1500 games to train and test the model.

I tried to implement LSTM model for multiple feature forecast. I trained/tested the model with lag 5 and got a score of 91%.

My question is, given only the first-minute values, is it possible to make a prediction for the remaining 90 minutes of the game.

So my input shape will be (1,1,20) and expected output will have a shape (89,6).

I really appreciate any suggestion.

Thank you,

Abey

That would be a challenging prediction problem!

Nevertheless, try it and see.

Hello Jason –

Thanks for your selflessness with these gems (articles).

I want to mainly predict ‘when’ a patient-level event will occur in hospitals. For instance, there was an article I read a while ago on building an algorithm that could predict onset of sepsis in a patient almost 24 hours prior to the onset. What’s the better algorithm for doing this and what kind of a sequence issue is this (sounds like 1,2,3,4,5 –> 6 based on timestamps)? I can work on predicting who’s at risk but the ‘when’ they’re likely to have that event is the real question.

Thanks,

Elijah

Sounds like a great problem.

I recommend testing a suite of algorithms on the problem, e.g. start with MLP and explore CNN and LSTM. This framework will help you to get started:

https://machinelearningmastery.com/start-here/#deep_learning_time_series

Hi: now I have a problem. I have some time series with different length. I want to use LSTM auto encoder(or any other deep learning methods) to extract the features from the time series. How can I do that? I’m looking forward to your reply. Thanks a lot.

Perhaps this approach will help as a starting point:

https://machinelearningmastery.com/lstm-autoencoders/

Hi Jason,

I am working on a model to predict the next page clicked by the user based on the click sequence data of more than lakhs of users. The sequences are of varying length. Which model will be most appropriate to predict the next clicked page?

Perhaps explore an LSTM model:

https://machinelearningmastery.com/start-here/#lstm

Hi Jason,

I enjoyed reading this article!

What’s the difference between “sequence generation” and “sequence to sequence prediction”?

If the input in “sequence generation” is also a sequence, then it looks very similar to “sequence to sequence prediction” right?

Thanks!

Good question.

Generally, sequence generation involves giving the model a seed and getting a much longer sequence out, e.g. a few words in and a few paragraphs out, like a simple language model.

Seq2Seq often refers translating a input sequence to an output sequence, such that they are directly related, like German to English or text to summary, etc.

Hi Jason,

Thanks for this tutorial! I have a question about product sequences..

Suppose I have data for a single customer and all the products he has purchased in the last year.

For example: cust_id : x1

order history : order_id_1 : [product1 , product2, product3] order_id_2 : [product1 , product2 , product5]

what is the best way to predict the next set of products the customer might buy with probabilities..

Thanks

I recommend testing a suite of framings of the problem in order to discover what works best for your specific dataset.

Perhaps you can model per customer?

Perhaps you can model per customer group?

Perhaps you can model across all customers?

Perhaps you can model by product categories?

…

Let em know how you go.

Hi Jason,

Thanks for the reply! I was going to first try out by modelling per customer, but I’m not getting what model to use? I’m new to this, sorry for the silly question!

Thanks

I recommend testing a suite of models in order to discover which works best for your dataset.

This may help:

https://machinelearningmastery.com/faq/single-faq/what-algorithm-config-should-i-use

Hi Jason,

I loved this article!

how can I predict the upcoming exam questions using 10 past exams? like what algorithms or using machine learning to find the sequence. Thanks

Hmmm. That is a very hard problem.

Perhaps you can model it as a language generation problem – for fun?

Okay thank you ! and how to I do that? I am a novice. Please share any article, reading material, book, you tube video or your own suggestion. Really appreciate your help 🙂

It would require a lot of testing development – e.g. there are no right answers, you must discover what works. I don’t recommend it as a project for a beginner.

Perhaps start here on something simpler:

https://machinelearningmastery.com/start-here/#deeplearning

Thank you Jason 🙂

Hi Jason,

Thanks for the blog post. I do have some queries.

Say example i have an input data set :

2018, Q1 – Category classes 1, 2, 3

2018, Q2 – Category classes 1, 2, 3, 5

2018, Q3 – Category classes 3, 4

2018, Q4 – Category classes 1, 3, 4, 5

I want to predict 2019. Q1 with category classes 1, 2, 4 (For example)

* In total i have category classes : 1, 2, 3, 4, 5

From where i am seeing this, it looks like a combination of sequence classification and sequence prediction. Using only historical data as input to predict the next sequence of classification as an output.

May i know what approach should i go about working on this? As for categorical classification/sequence classification would require of me to have the input data set for the classification (in this case, wont be a prediction).

From this blog, i noticed also i should not shuffle my data set?

Thanks

This might be a multi-label (not multi-class) time series classification problem, where a given interval requires the prediction of zero or more labels/classes.

A good place to start might be here:

https://machinelearningmastery.com/start-here/#deep_learning_time_series

Hi Jason,

Your tutorials are awesome. But still its hard to follow .

Can you share some weather forecasting toy example? using a few features?

Yes, see here:

https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/

why do you respond when you ask for weather forecast and stay away when you ask for financial forecast?

I avoid advice on finance problems, here’s why:

https://machinelearningmastery.com/faq/single-faq/can-you-help-me-with-machine-learning-for-finance-or-the-stock-market

Hi,

Is there a way to generate a seed out of a sequence of numbers?

Example:

I have this list of numbers:

03 08 11 17 19 26 28 31 36 37

How can I get the seed value from this list ?

Thank’s in advance

Regards

If the sequence is random or pseudo-random, then no, it’s not a learnable function.

Hi Jason,

Thank you for this great article, your other posts on LSTM are also very helpful!

It is the following ‘sequence’ definition that I have a hard time wrapping my head around.

The data that I have consists of multiple time series, say I have 200 ‘blocks’ of spatial time series. Within each block I have the location of an object per time step, say each minute, whereas the recorded length of each block is 2 hours. For the same time steps I have factors of influence on the next location of the object, for example wind speeds.

The time ‘blocks’ themselves do not create a complete time series, one block may be 2 hours recorded on the 28th of May in 2016, the other block may be 2 hours recorded on the 6th of June 2019, etc.

In a way, this problem can be described as a Sequence Generation problem you address in this article, I can feed a sequence of wind speeds of the same length of the location sequence I want to predict, add constants that give an initial starting point to the model, and ‘translate’ or ‘predict’ a sequence of locations.

What I do wonder is whether this model is capturing the characteristics –within- the blocks, as the new location of the object depends on the location it was (at least) one time step before, hence within such a time series block it is more of a Sequence Prediction problem. Though this is not what I’m ultimately interested in, as I want to generate a complete motion sequence, rather than predicting the next motion steps given part of a location sequence. Do you think a Sequence Generation LSTM can capture this ‘within’ dependency of the timesteps?

Thank you very much in advance!

Sounds like a great problem!

I would encourage you to explore diffrent framings of the problem. e.g. per-location, per-location-time, across locations/times, etc. See what works.

Think of the problem in terms of inputs and outputs. This might also give you ideas:

https://machinelearningmastery.com/faq/single-faq/how-to-develop-forecast-models-for-multiple-sites

Let me know how you go.

Hello Doctor Jason. If I have like 20 sequences/trajectories. Can I train my network with 5 of those sequences/trajectories and then train the network to predict the remaining 15 sequences/trajectories?

If so, do you have any example ,tutorial or resources that I can follow? I can predict within one sequence/trajectory by going some steps back and predicting a step forward. But my goal is to predict full trajectories. Thanks.

Sure.

Yes, this might be a good place to start:

https://machinelearningmastery.com/start-here/#deep_learning_time_series

Hi,

is it necessary to have equal no. of input variables during and training and during prediction. I am trying to teach an LSTM network an algorithm so that if I give one input (The first State,t=0) it would predict the final state(t=500). I have the whole sequence between t=0 to t=500 to train it with. I tried to train the network using the initial 499 steps as training input and the 500th step as the output. But this implies that i have to input 499 steps as input during the prediction stage too , which completely undermine my objective to obtain the final step by just giving input of intial time step.

The i tried to train the LSTM network giving only the first and last time step as the input and output. Which resulted in overfitting. I tried simple to complex network rchitecture different activation function but to no avail.

Can you suggest a solution, Is there anyway i can train the network on all time steps but for prediction only need to input one single intial steps.

(The algorithm is Metropolis algorithm on ising model)

Thanks in Advance…

I recommend framing the prediction problem based on how you intend to use the model.

E.g. if you want to make prediction based on the prior 7 days of data only, then construct the model to take 7 days of input for each sample, etc.

Hello Jason,

In sequence classification problem, instead of predicting the classes [‘good’ or ‘bad’] on inputting a whole sequence [1,2,3,4,5], I just want to provide only a part of sequence as input e.g [1,2,3], and the network should predict whether it belongs to [‘good’ or ‘bad’].

So in my case, how can i approach this issue ?

Could you suggest me any links or papers ???

Note: I am using LSTM’s for this problem.

Perhaps adapt the example in this post:

https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/

May i use timdistributed layer after my lstm layer like you have mentioned in

‘https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/’

Perhaps. Not directly though.

Hello Jason,

If i follow the link which you have suggested (https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/) whether I can able to predict the class [‘good review’, ‘bad review], if only the part of the words given as Input into the trained model ?

Overview about my work

My data contains Vehicle CAN signal, dynamics data.

X_train.shape = (271,100,4)

# 271 segments, each segment is of shape 100*4

# every row in 100*4 corresponds to each Time step (t0, t1, t2, t3,…..t99)

Y_train.shape = (195,)

# each segment out of 271 segments belongs to either 0 or 1 (2 classes)

# [0,0,1,0,1,0,0,0,0,1,………………………………………………..1,0]

X_test.shape = (31,100,4) # 31 segment of shape 100*4

Y_test .shape = (31,)

MY REQUIREMENT

After training , my model should predict the correct class (either 0 or 1) if i give only a part of segment as input, say, I am sending my testing data as (31,60,4) or (31,70,4) or (31,80,4) (31,90,4) and the model should predict which class each segments belong to.

I would be happy if you provide me some hint to continue further

You must train the model in the way you intend to use it.

That means that if you want a prediction from a partial input, then you must train your model in this way.

Hi Jason, I’m completely lost when trying to choose the type of predictive model for my problem. Is it autoregressive model, Conditional Random Field, Hidden Markov Model or other? Can you please give me some advise?

78, 18, 51, 89, 19, 43, 62, 28, 94, 49

Suppose, everyday I’m given 10 data, and an example was listed above. They’re numbers generated by two devices, namely Device A and Device B. Each of them is capable to generate numbers from 0 to 9.

The first number in the data is generated by Device A, while the second number is generated by Device B. For instance, for the first data of “78”, “7” was generated by Device A and “8” was generated by Device B. Similarly, for the last data of “49”, “4” was generated by Device A, and “9” was generated by Device B.

I want to be able to predict the next outcome variable after the last “49”.

I have a total of 300 historical data for 30 days.

From my initial investigation for the 300 data, every device tends to produce repeated sequences. For instance, Device A will repeat the sequence “6-2-9-4” (as in the last 4 data). That means this sequence appeared twice within the 300 historical data for Device A. For another example, the sequence “8-1-9-9” (the 2nd to the 5th data) in Device B appeared twice, too. Each of them produces at least three repeated sequences.

I’d like to predict the next outcome variable after the last “49”. Which model is more appropriate?

Thank you in advance!

Yes, follow this process:

https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/

Thank you for your reply, Jason. May I know why do you think that this could be a time series problem?

I’m sorry for the misrepresentation, Jason. The data was taken on every Monday, Thursday and Friday. 10 data per day. Can i still model it as a time series problem?

Perhaps this post will help you to determine if time series forecasting is an appropriate framing of your dataset:

https://machinelearningmastery.com/time-series-forecasting/

I don’t know, I got the impressive that the observations were ordered by time. Sorry if that was incorrect.

Thanks again, Jason! I think it’s a time series.

Great.

I have read so many of your tutorials and blogs and it helped me a lot. You are a legend.

Thanks, I’m happy they’re useful to you!

Hi Jason,

thanks for all your tutorials about time serial and its generation.

Ive just came up with a new problem where im not sure ML is the right approach or if its even possible at all. Can you please give me your opinion about that project?

Its about multiple vibration motors which run simultaneously and play 5 different musters each. They aim to stimulate some kind of emotions (my labels).

Is it possible, given an emotionially label, to generate new vibration pattern for each motor with similar attributes?

I´ve considered interpreting my 5 vib.sequences as matrix and perform smth like a cnn on a 5xn matrix, where n is the number of vibrations in each sequence or to use some kind of RNN you presented in some of your articles.

If you have any ideas i´d appreciate your view.

Best wishes

Kenny

Yes. I believe you are looking for a generative model for time series data.

I don’t have a tutorial on this topic, but perhaps some searching on google or scholar.google.com will point you in the right direction.

Ty!! I´ll check this out.

Hi Jason,

Having you is a blessing for ML seekers like me, thanks!

I’ve just got a problem for which I’m struggling how to formulate and define as a ML problem.

The dataset contains blood units that have been collected from a supplier, and after going through a sequence of statuses (each status occurs in a certain time and location), they result in one of the statuses “Transfused” or “Discarded”.

The thing that I’m looking for is the pattern of discards (or something that helps me predict the possibility of being discarded for a certain blood unit).

Please let me know if more clarification needed.

I’d appreciate you advising me / refering me to a material.

Best regards,

Jaber

Good question Jaber, I believe this framework may help:

http://machinelearningmastery.com/how-to-define-your-machine-learning-problem/

Hello Jason,

Firstly, I am very thankful for all your ML blogs and books. They are very helpful for a fresher like myself.

I am currently working with solar irradiance hourly time series. I have the hourly data for several years which are then clustered into representative/typical days(say 10 days). And each day of the year is assigned to one such typical day number/ index. Thus resulting in a sequence of 365 terms with numbers ranging from 1 to 10. for one year. And I have this sequence for several years. I need a model to forecast this sequence. I tried using SARIMA model but I am not sure how to use it for discrete numbers.

Please help me find a time series classification or categorization model that could also accommodate seasonality.

Best regards,

Anuj.

Perhaps try some of the models here:

https://machinelearningmastery.com/start-here/#deep_learning_time_series

Hey Jason,

Thank you very much for your dedication, your selflessness is a huge help for our beginners.

If I have a set of pictures with temperature changes (about 27,000 picture frames), this picture shows the trend of temperature change. Can I predict the following 200 frames of the trend of temperature change from these previous 27000 picture frames, provided that there is no trend information for the subsequent temperature changes in my training data sets, and only the first 27,000 frames are in the training set.

Best wishes,

mallota

Perhaps try it and see?

Hi Jason,

Thanks for all your tutorials and blog posts!

I’m working on a educational problem for high school students. In each year (n), student (i) participates in several courses (j) and I have his/her grades for each course (A). Then when students finish their high school, they all submit their grades and a “Statement of Purpose” letter (B) to college. Then the college ranks students (C) and decide to either accept or reject (D) them. Therefore, for each student I want to predict his/her ranking as well as being accepted or rejected to a college knowing his/her grades in each course across different years. So my inputs are A(i,j,n) and B(i) while my outputs are C(i) and D(i).

Now I want to have a Machine Learning model to predict C(i) and D(i) based on the X(i,j,n) and B(i) inputs. To my understanding my dataset is a sequential data and I need to use “sequence prediction” model, is this correct? And if so, what’s the best method for doing this, should I use RNN?

Again thanks for your help.

Rajit

That sounds like a fun project.

Perhaps try modeling it and see if the framing is effective?

hello jason. pls i like to ask

is it possible to do sequence labeling or tagging with xgboost

if it is true, kindly direct me to a link where i can read more about it.

i have searched a lot but yet to see what i am looking for, thanks in advance

Perhaps.

I don’t have examples sorry.

Hi JASON

I have a text data.

I need to predict the mean funniness( estimated funniness) from 0 to 3 corresponding to every single sentence.

Can you tell me how sequence method can help me

Start by collecting or preparing a dataset made of text and funniness scores.

Thank you.

I already have a labeled data set.Now how i start working on it.

You can follow the tutorials here to learn how to model sequence prediction problems with neural networks:

https://machinelearningmastery.com/start-here/#deep_learning_time_series

Hi Jason, thank you for your great tutorials!

Do you think modern NLP transformers with long memory like GPT-2 could outperform LSTM on non-language sequence prediction tasks like medical history or user behavior modeling? I googled hard, but didn’t find any examples of this approach.

Good question – probably.

Perhaps try it on your dataset and see?

Thank you Sir for your help.

You’re welcome.

Hello Jason,

I came to this article while searching for my problem on Google.

So far, I’m a naive in Data Science area.

Problem / requirement statement:

We have a power generator, which is continuously running. It’s suggested maintenance time is after 1000 hours. We don’t want to rely on it’s documented schedule. There could be a time when the machine require early maintenance.

So we want to devise a mechanism for prediction by which we can pre-plan the maintenance window and intimate the teams about it’s downtime.

We continuously receive sensor’s data of it and keep storing all that information. I am not sure whether this is a Sequence Prediction problem? Is it related to LSTM? if Yes, then how? and if No then which algorithm or technique we shall consider to address this problem?

Your guidance and input to this would be very helpful.

Perhaps model it as a time series classification task. The tutorials here will help you to get started:

https://machinelearningmastery.com/start-here/#deep_learning_time_series

Alright, thanks. Will read it thru and let you know if i face any problem.

Hi Jason. I have a question about how to solve sequence comparison tasks. Say I’m trying to predict the winner of two tennis players and as my inputs I want two sequences of their respective careers (all previous matches and relevant stats). How would I go about modelling this with LSTMs? My feeling is I wouldnt want one large sequence model as there isnt a relationship between the neighbouring timesteps so I would imagine I want two different LSTMs that merge somehow?

Regards,

Louis

A rating system might be more appropriate than an LSTM.

Nevertheless, a sequence of scores or prior outcomes might be a start, e.g. a time series classification task for win/loss.

Thanks for your reply. Interesting idea. Perhaps an encoder-decoder setup and then train on the winner of two encoded players?

I’m attempting to train on the sequence of prior outcomes using a shared LSTM layer from two input sequences and then a softmax classification layer but it is struggling to learn. Potentially not enough training data.

There may be many ways to frame the problem. I’d encourage you to test many approaches, see what works/sticks.

Hi Jason,

I am working on a problem where the input is a sequence, like an acceleration vs time signal . However the output is another quantity (not acceleration). Could you please tell any traditional ML methods (other than RNN) that uses sequential information of input data to predict a different output quantity. I think these kinds of problems don’t belong to sequence prediction, sequence classification, sequence generation nor sequence to sequence prediction. Thanks,

Yes, the tutorials here will provide a starting point:

https://machinelearningmastery.com/start-here/#deep_learning_time_series

Hello, Jason

I’m really happy to read this.

I have a quick question,

what is the difference between ‘sequence generation’ and ‘sequence to sequence’?

For me,

It seems like same cause both of them generate sequences.

Could you please tell me the difference between them?

Thank you

Perhaps seq2seq assumes both a sequence in and out and sequence generation does not make an assumption about the impetus.

Hello Jason.

Thank you for the concise article. I liked how you classified sequence modeling tasks that make it easy to visualize real-world use cases.

Is it possible that a sequence prediction task can be achieved such that at each time step features are fed as input and to output once again features? The features I mention are the same that would be usually fed into a feedforward neural network for classification/regression tasks. From what I’ve noticed, every example uses ID for both input and output in sequence modeling tasks. Is this the only way?

Thank you.

Not sure I follow your question.

No need to pass in id’s they are not predictive (most likely). Perhaps this will help:

https://machinelearningmastery.com/how-to-connect-model-input-data-with-predictions-for-machine-learning/