Sequence prediction is different from other types of supervised learning problems.

The sequence imposes an order on the observations that must be preserved when training models and making predictions.

Generally, prediction problems that involve sequence data are referred to as sequence prediction problems, although there are a suite of problems that differ based on the input and output sequences.

In this tutorial, you will discover the different types of sequence prediction problems.

After completing this tutorial, you will know:

- The 4 types of sequence prediction problems.
- Definitions for each type of sequence prediction problem by the experts.
- Real-world examples of each type of sequence prediction problem.

Let’s get started.

## Tutorial Overview

This tutorial is divided into 5 parts; they are:

- Sequence
- Sequence Prediction
- Sequence Classification
- Sequence Generation
- Sequence to Sequence Prediction

## Sequence

Often we deal with sets in applied machine learning such as a train or test sets of samples.

Each sample in the set can be thought of as an observation from the domain.

In a set, the order of the observations is not important.

A sequence is different. The sequence imposes an explicit order on the observations.

The order is important. It must be respected in the formulation of prediction problems that use the sequence data as input or output for the model.

### Sequence Prediction

Sequence prediction involves predicting the next value for a given input sequence.

For example:

- Given: 1, 2, 3, 4, 5
- Predict: 6

Sequence prediction attempts to predict elements of a sequence on the basis of the preceding elements

— Sequence Learning: From Recognition and Prediction to Sequential Decision Making, 2001.

A prediction model is trained with a set of training sequences. Once trained, the model is used to perform sequence predictions. A prediction consists in predicting the next items of a sequence. This task has numerous applications such as web page prefetching, consumer product recommendation, weather forecasting and stock market prediction.

— CPT+: Decreasing the time/space complexity of the Compact Prediction Tree, 2015

Sequence prediction may also generally be referred to as “*sequence learning*“.

Learning of sequential data continues to be a fundamental task and a challenge in pattern recognition and machine learning. Applications involving sequential data may require prediction of new events, generation of new sequences, or decision making such as classification of sequences or sub-sequences.

— On Prediction Using Variable Order Markov Models, 2004.

Technically, we could refer to all of the following problems in this post as a type of sequence prediction problem. This can make things confusing for beginners.

Some examples of sequence prediction problems include:

**Weather Forecasting**. Given a sequence of observations about the weather over time, predict the expected weather tomorrow.**Stock Market Prediction**. Given a sequence of movements of a security over time, predict the next movement of the security.**Product Recommendation**. Given a sequence of past purchases of a customer, predict the next purchase of a customer.

## Sequence Classification

Sequence classification involves predicting a class label for a given input sequence.

For example:

- Given: 1, 2, 3, 4, 5
- Predict: “good” or “bad”

The objective of sequence classification is to build a classification model using a labeled dataset D so that the model can be used to predict the class label of an unseen sequence.

— Chapter 14, Data Classification: Algorithms and Applications, 2015

The input sequence may be comprised of real values or discrete values. In the latter case, such problems may be referred to as discrete sequence classification.

Some examples of sequence classification problems include:

**DNA Sequence Classification**. Given a DNA sequence of ACGT values, predict whether the sequence codes for a coding or non-coding region.**Anomaly Detection**. Given a sequence of observations, predict whether the sequence is anomalous or not.**Sentiment Analysis**. Given a sequence of text such as a review or a tweet, predict whether sentiment of the text is positive or negative.

## Sequence Generation

Sequence generation involves generating a new output sequence that has the same general characteristics as other sequences in the corpus.

For example:

- Given: [1, 3, 5], [7, 9, 11]
- Predict: [3, 5 ,7]

[recurrent neural networks] can be trained for sequence generation by processing real data sequences one step at a time and predicting what comes next. Assuming the predictions are probabilistic, novel sequences can be generated from a trained network by iteratively sampling from the network’s output distribution, then feeding in the sample as input at the next step. In other words by making the network treat its inventions as if they were real, much like a person dreaming

— Generating Sequences With Recurrent Neural Networks, 2013.

Some examples of sequence generation problems include:

**Text Generation**. Given a corpus of text, such as the works of Shakespeare, generate new sentences or paragraphs of text that read like Shakespeare.**Handwriting Prediction**. Given a corpus of handwriting examples, generate handwriting for new phrases that has the properties of handwriting in the corpus.**Music Generation**. Given a corpus of examples of music, generate new musical pieces that have the properties of the corpus.

Sequence generation may also refer to the generation of a sequence given a single observation as input.

An example is the automatic textual description of images.

**Image Caption Generation**. Given an image as input, generate a sequence of words that describe an image.

Being able to automatically describe the content of an image using properly formed English sentences is a very challenging task, but it could have great impact, for instance by helping visually impaired people better understand the content of images on the web. […] Indeed, a description must capture not only the objects contained in an image, but it also must express how these objects relate to each other as well as their attributes and the activities they are involved in. Moreover, the above semantic knowledge has to be expressed in a natural language like English, which means that a language model is needed in addition to visual understanding.

— Show and Tell: A Neural Image Caption Generator, 2015

## Sequence-to-Sequence Prediction

Sequence-to-sequence prediction involves predicting an output sequence given an input sequence.

For example:

- Given: 1, 2, 3, 4, 5
- Predict: 6, 7, 8, 9, 10

Despite their flexibility and power, [deep neural networks] can only be applied to problems whose inputs and targets can be sensibly encoded with vectors of fixed dimensionality. It is a significant limitation, since many important problems are best expressed with sequences whose lengths are not known a-priori. For example, speech recognition and machine translation are sequential problems. Likewise, question answering can also be seen as mapping a sequence of words representing the question to a sequence of words representing the answer.

— Sequence to Sequence Learning with Neural Networks, 2014

It is a subtle but challenging extension of sequence prediction where rather than predicting a single next value in the sequence, a new sequence is predicted that may or may not have the same length or be of the same time as the input sequence.

This type of problem has recently seen a lot of study in the area of automatic text translation (e.g. translating English to French) and may be referred to by the abbreviation seq2seq.

seq2seq learning, at its core, uses recurrent neural networks to map variable-length input sequences to variable-length output sequences. While relatively new, the seq2seq approach has achieved state-of-the-art results in not only its original application – machine translation.

— Multi-task Sequence to Sequence Learning, 2016.

If the input and output sequences are a time series, then the problem may be referred to as multi-step time series forecasting.

**Multi-Step Time Series Forecasting**. Given a time series of observations, predict a sequence of observations for a range of future time steps.**Text Summarization**. Given a document of text, predict a shorter sequence of text that describes the salient parts of the source document.**Program Execution**. Given the textual description program or mathematical equation, predict the sequence of characters that describes the correct output.

## Further Reading

This section provides more resources on the topic if you are looking go deeper.

- Sequence on Wikipedia
- CPT+: Decreasing the time/space complexity of the Compact Prediction Tree, 2015
- On Prediction Using Variable Order Markov Models, 2004
- An Introduction to Sequence Prediction, 2016
- Sequence Learning: From Recognition and Prediction to Sequential Decision Making, 2001
- Chapter 14, Discrete Sequence Classification, Data Classification: Algorithms and Applications, 2015
- Generating Sequences With Recurrent Neural Networks, 2013
- Show and Tell: A Neural Image Caption Generator, 2015
- Multi-task Sequence to Sequence Learning, 2016
- Sequence to Sequence Learning with Neural Networks, 2014
- Recursive and direct multi-step forecasting: the best of both worlds, 2012

## Summary

In this tutorial, you discovered the different types of sequence prediction problems.

Specifically, you learned:

- The 4 types of sequence prediction problems.
- Definitions for each type of sequence prediction problem by the experts.
- Real-world examples of each type of sequence prediction problem.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

So I assume it’s fair to say that every time-series is an example of sequence prediction but not vice-versa? Thanks for the interesting post.

Corect Mike.

So, can we say that problems like 20-question game require sequence prediction to solve? and we can use recurrent neural network to implement?

The system asks questions and after each answer, we predict an answer which helps to determine the next question. Right?

Thanks, that was exactly what I need.

I expect Q&A is a sequence prediction problem.

I have not worked on an example so I cannot give you advice about whether RNNs are appropriate. I would recommend a search on google scholar.

Hi Jason.

Could LSTM do multi-step forecasts? I have two examples below:

1. input the [1,2,3] sequence to predict the [4,5,6,7,8,9,10,…15] sequence;

2. input the [1,2,3] sequence to predict the [10,11,12] sequence.

If LSTM can do, could you give a lesson on this kind of problems?

Thank you very much.

Yes, see here:

https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/

Hello Jason,

Thank you for this post, it is very useful and interesting.

I´m thinking about the following problem…, Given a single input sequence, we want to predict several sequences, that can be of different lengths. For instance, this problem can be encountered in the Alternative Splicing phenomenon, where given a single RNA sequence, we can obtain multiple proteins.

My questions are:

1- Have the problem “Input: One sequence -> Output: Several sequences” been studied in the literature?

2- Can LSTMs solve this type of problem?

Best and thanks

I have not seen this, but LSTMs could address it. Consider a multiple-output model:

https://machinelearningmastery.com/keras-functional-api-deep-learning/

Jason –

I enjoyed this post and I believe it may help me solve a predictive problem I’ve been pondering.

The data is primarily text based, time series data involving an ‘actor’ object that I receive information on. That information, other than the date/time information is also text. I know that given information sequence ‘A’ that the next informational sequence is most often ‘B’. However there may well be several other sequences that are also highly likely.

What I’m looking for is a learning method that can identify anomalous information reports so they can be reviewed and subsequently validated as either truly anomalous or potentially a new, yet valid, sequenced item.

Anything you might be able to point me towards would be greatly appreciated.

Thanks!

I would recommend investigating the field of time series anomaly detection. Perhaps start on google scholar?

Thanks Jason, I spent a considerable amount of time yesterday looking into what you suggested.

Just to clarify, the timestamps only serve to order the reports as they arrive, they have little significance beyond that.

Do any of your publications deal with pointing an unsupervised, or minimally supervised, method at this sort of data? As opposed to say numeric data?

I’ve done a considerable amount of ‘crunching’ of the data (it’s billions of rows) and have built a reference table of the likely ‘next event’ given the previous event. However that solution is not as robust, nor as flexible as I’d like it to be.

LSTM and GAN appear to show promise for what I’m trying to do yet most of the examples I’ve seen don’t seem to fit very well with the data I have to work with.

Again, I will appreciate any insight you could share.

Thanks!

Sorry, I don’t have material on semi-supervised learning at this stage, I hope to cover it in the future.

I would recommend testing a suite of methods as well as a suite of different framings of the problem to see what works best.

Thanks Jason!

You’re welcome.

Hello Jason,

Thank you for such informative article.

But I am not able to fit a prediction problem I’ve been working on in any category you have mentioned.

I have data of a person who visits certain places in a sequence from a sample of places.

let’s say he wants to visit [‘NY’, ‘LA’, ‘DC’, ‘TX’, ‘FL’] then he’ll visit it in this sequence [‘TX’, ‘LA’, ‘NY’,’FL’, ‘DC’].

I have historical data of his previous visits in sequence.

[‘TX’, ‘LA’, ‘NY’,’FL’, ‘DC’]

[‘AK’, ‘FL’, ‘NY’] and so on.

so for a random list of places i need to predict in which sequence he is gonna visit those places.

I’ll really appreciate if you can point me toward something.

Thanks

Perhaps this post will help you describe your problem:

http://machinelearningmastery.com/how-to-define-your-machine-learning-problem/

Hi Jason,

My interest in ML is application part of it. I am from VLSI field.

The area of ML is very vast and I don’t know where to start with for my problem.

Below is a brief description of my problem.

The system i am testing basically generate events. Sequence of these are of interest to me.

One can manually look at these event sequence and recognize them to be useful. But manual process is very cumbersome and also there could be millions of events within which one has to look for interesting events.

The interesting event sequence are known a-priori. The spacing between these events can vary though.

Do you have any suggestion as to what I should be trying out to begin with?

I am not looking for solutions actually but only for guidance.

Yes, use LSTMs.

Take an example from the blog as a starting point and adapt it for your problem.

Start here:

https://machinelearningmastery.com/start-here/#lstm

Sure. Thanks, Jason. I will read through and get back if needed

Hello Jason,

Thank you for this tutorial which is very interesting, but I would like to find a sequential dataset that I can use in my research for the predictive maintenance algorithm.

My best advice is to contrive a problem for research purposes that has the properties you require.

Thank you Jason. Very interesting post!

Just a quick but also confusing question of mine. Let’s say I have [4, 5, 6] as input, I want to output

[14, 15, 16] or [24, 25, 26] and etc… Of course I have the training dataset which takes the input as [1, 2, 3] and the output as [11, 12, 13], [21, 22, 23] and etc.. which means I have one-to-many (not the name of model type here) relationship in my training set. I am wondering whether the RNN(or LSTM) can even recognize these relationships simultaneously. Another thing is, since we only need to find 1 to 11, 2 to 12… is seems that if I change order of my training dataset, i.e. [2, 3, 1] as input, [12, 13, 11] as output, the model can still learn the correspondent pattern. So here it might violate the principle that ORDER IS IMPORTANT. I have read a lot of your valuable blogs and learned a lot. But still can not find the answer. Any response is really appreciated!

Sure, my advice would be to try it and see how you go.

The model could output two length n vectors or two sequences of n=3 timesteps. Try both.

I would suggest exploring multiple output models with one “sub-model” for each output, see here:

https://machinelearningmastery.com/keras-functional-api-deep-learning/

I’m eager to hear how you go.

So I have this data set of images that represent grid-wise crime (frequency) on daily basis. So I have a series of images i1,i2,i3,… in, and I want to forecast or predict in+1th and beyond images(crime hotspots or frequency). How do you think I should approach this problem?

This post might help you define your problem:

http://machinelearningmastery.com/how-to-define-your-machine-learning-problem/

Hello Jason,

I’d have a question regarding Time series, forex, there is a pattern named double-bottom looks like the “W” letter, as input sequence this pattern can take any arbitrary length (in time), how should I deal with this problem? Can I transform this input sequence to a sequence of fixed length?

Thanks

What do you mean deal with it? Predict this pattern?

If so, develop a dataset of examples with/without the pattern and fit a model to classify them.

Hi Jason,

sorry for not being explicit enough. I want to classify some time series but the length of the time series patterns, which are inputs here, are required

to be known in advance. However, such information is not always available. In

addition, patterns of different lengths may co-exist in a

time series dataset (for example the forex “W” pattern might be 8 or 38 in length, we don’t know it in advance).

How to present such inputs to the machine if their length is not known in advance?

Thanks!

Interesting.

The different lengths you can address with zero padding and a mask input layer. I have many examples on the blog:

The scale invariance might require some experimentation. Perhaps an LSTM can do it. Perhaps a CNN is required or some other compressed interpretation of the sequence.

Is it possible to sort the results of the prediction?

OR the NN will give the prediction results in descending order based on the prediction values?

I’m not sure I understand. If your model outputs a sequence, why would you need to sort it?

i am sorry ! I should have been more specific.

I meant, for normal cases where the output is not a sequence, can the NN give the prediction results in descending order based on the prediction values?

You can output a prediction probability for each class in a classification problem, then rank the probabilities.

Is that what you mean?

If so, you can use a softmax in the output layer and have one neuron for each class in your problem.

If I have 5 classes and do what you asked to do (using softmax in the output layer and having one neuron for each class), the probabilities I get looks like this for each prediction:

[[ 1.32520108e-05, 7.61212826e-01, 2.38773897e-01, 1.89434655e-08, 1.21214816e-08],

[ 3.46436082e-07, 1.17851084e-03, 9.88936901e-01, 8.01233668e-03, 1.87186315e-03],……..]

and these values are not in any order.

So how can I rank them in an order?

The probabilities will be in the order of the classes (e.g. 1-5 ) for the one hot encoded class values used to train the model.

Hi Jason,

I have a problem where I have training data of tag-ids and I would like to extract the pattern by learning from it. Which models are suitable to train on this sort of data? I see this as a unsupervised learning problem and in current scenario we solve it using the help of regular expressions.

Tag-ids are in this format

eg:

400-SG-01002-A600

50-SG-01010-A600/B1

V-0514

STEEL-ETAGE-1-FRMW

Given a collection of words, I should be able to find out which word is a tag-id based on the learning

This framework will help you define your problem in terms of predictive modeling:

http://machinelearningmastery.com/how-to-define-your-machine-learning-problem/

Hi Jason,

Thank you for all your material.

I’m new on this area and I’m looking for help.

The LSTM models I found to study always work with only one feature, but I would like to give more classes as input to the network.

To be more specific, I would like to use as input and output to the network: [FeatureA][FeatureB][FeatureC].

FeatureA is a categorial class with 100 different possible values.

FeatureB and FeatureC are categorial class too but only have 5 unique values.

Any sugestions or tutorials on how to do this?

A class is an output, not input.

Here is an example of an LSTM with multiple inputs:

https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/

If you have categorical inputs, you can use a one hot encoding or integer encoding prior to modeling.

For multi-class outputs, you can use a Dense layer on the output after the LSTMs with softmax and one neuron per output class, here is an example of a dense without the lstm for multi-class classification:

https://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/

Hello Jason,

Thank you for the article.

I have N datasets and each data-set has 3 features and 1 target. All the features and target have X data points in time. I want to train a LSTM on 80% of datasets and test on rest 20%.

My problem is not exactly forecasting but multiple sequences to sequence prediction. Could you please tell me how to set the input shape my data set.

dataset1 –>

[feature1 –> [0,1,2]

feature2 –> [4,6,8]

feature3 –> [3,5,7]

target –> [1,1,2] ]

Thank you

This will help you shape your data:

https://machinelearningmastery.com/faq/single-faq/how-do-i-prepare-my-data-for-an-lstm

Thank you. So, based on the articles, am i correct in setting the shape of the input data as (number of train datasets, length of any feature array, number of features) ?

Do you have any article that dealt with this kind of example?

Thank you again for all the articles you shared. They are very informative.

Generally, I cannot comment on “correctness” without getting deeply involved in your project.

I would recommend reviewing how to prepare data for the LSTM, perhaps reviewing what has worked on other problems, then try a suite of ways of framing the problem to see what works best for your specific case.

Thank you

Hi

You have best site and best article I learn a lot of solution.

I have question: my data set is numbers and i need predict after number from previous numbers and just 4 targets tar[54,26,18,32] which sequence is true for data set?

Thanks.

Sounds like a many-to-one sequence prediction problem.

806,046,009,905??????????????

What was the problem exactly?

is the sequence prediction algorithm same the Convolutional neural network algorithm?

or it has the same idea

You can use both LSTMs or 1D CNNs for sequence prediction.

Hi!

I have this problem:

https://ai.stackexchange.com/questions/6741/regression-with-more-than-one-output-neural-network

What kind of sequence do you think it could be?

Perhaps you can summarize your problem in a sentence for me?

I tried to shorten my problem description, but I couldn’t make it fit in one sentence because I felt there was too much to say. Hope you don’t mind.

There is a system in which researchers receive a classification that can be C, B, A or A1, where C is the lowest and A1 is the highest.

This classification is based on the number of products that the researcher has in his profile.

I want to make a recommendation of the number of products that a researcher must do to improve their classification within the system, taking into account the number of products and the classification that they currently have.

Sounds like a constraint optimization problem rather than a machine learning problem.

I’d recommend looking into the field of ‘operations research’ and their methods for constraint optimization.

The thing is that the recommendations must be personalized, according to the profile of the researcher. Because there are several categories of products, some are mandatory to go up in category, but besides mandatory products you can choose among several.

For example, if an investigator is a lawyer, it should be unlikely that the system would suggest making products related to medicine, or it might suggest it, in case there is activity of that type in his profile.

These sounds like constraints in an optimization algorithm, like a bin packing problem or knapsack problem. It does not sound like a recommender system, but I could be wrong.

Hi Jason,

I have a problem which, according to me, does not fit any of the above situations.

Given a disparate set of entries, and a sequence as an output, is it possible to predict what the sequence would be with a different set of entries?

For example:

(a,b,c,d) always gives [d,a,b,c]

(a,c,b,d) also always gives [d,a,b,c]

and so on

Assuming it is trained with every possible letter, I want to know what (a,c,d,e) would give, for example.

One approach I had was to convert this to a sequence to sequence matching problem by feeding in every permutation of the inputs as a sequence, and matching it to the output, but in such a scenario I may not require NN in the first place.

Do you have any insight to offer on this?

Perhaps this framework will help you understand whether your problem can be framed as a supervised learning problem:

http://machinelearningmastery.com/how-to-define-your-machine-learning-problem/

Can i use this to store data of over 50 years and use it to predict what could happen in the tenth year.If so, how can i do it

Sure, you could try.

Once you fit the model, you can call model.predict() to make an out of sample forecast.

I have tens of examples on the blog, try the search.