How to Learn to Echo Random Integers with LSTMs in Keras

Long Short-Term Memory (LSTM) Recurrent Neural Networks are able to learn the order dependence in long sequence data.

They are a fundamental technique used in a range of state-of-the-art results, such as image captioning and machine translation.

They can also be difficult to understand, specifically how to frame a problem to get the most out of this type of network.

In this tutorial, you will discover how to develop a simple LSTM recurrent neural network to learn how to echo back the number in an ad hoc sequence of random integers. Although a trivial problem, developing this network will provide the skills needed to apply LSTM on a range of sequence prediction problems.

After completing this tutorial, you will know:

  • How to develop a LSTM for the simpler problem of echoing any given input.
  • How to avoid the beginner’s mistake when applying LSTMs to sequence problems like echoing integers.
  • How to develop a robust LSTM to echo the last observation from ad hoc sequences of random integers.

Kick-start your project with my new book Long Short-Term Memory Networks With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Update Jan/2020: Updated API for Keras 2.3 and TensorFlow 2.0.
How to Learn to Echo Random Integers with Long Short-Term Memory Recurrent Neural Networks

How to Learn to Echo Random Integers with Long Short-Term Memory Recurrent Neural Networks
Photo by Franck Michel, some rights reserved.

Overview

This tutorial is divided into 4 parts; they are:

  1. Generate and Encode Random Sequences
  2. Echo Current Observation
  3. Echo Lag Observation Without Context (Beginners Mistake)
  4. Echo Lag Observation

Environment

This tutorial assumes you have a Python SciPy environment installed. You can use either Python 2 or 3 with this example.

This tutorial assumes you have Keras v2.0 or higher installed with either the TensorFlow or Theano backend. You do not need a GPU for this tutorial, all code will easily run in a CPU.

This tutorial also assumes you have scikit-learn, Pandas, NumPy, and Matplotlib installed.

If you need help setting up your Python environment, see this post:

Need help with LSTMs for Sequence Prediction?

Take my free 7-day email course and discover 6 different LSTM architectures (with code).

Click to sign-up and also get a free PDF Ebook version of the course.

Generate and Encode Random Sequences

The first step is to write some code to generate a random sequence of integers and encode them for the network.

Generate Random Sequence

We can generate random integers in Python using the randint() function that takes two parameters indicating the range of integers from which to draw values.

In this tutorial, we will define the problem as having integer values between 0 and 99 with 100 unique values.

We can put this in a function called generate_sequence() that will generate a sequence of random integers of the desired length, with the default length set to 25 elements.

This function is listed below.

One Hot Encode Random Sequence

Once we have generated sequences of random integers, we need to transform them into a format that is suitable for training an LSTM network.

One option would be to rescale the integer to the range [0,1]. This would work and would require that the problem be phrased as regression.

I am interested in predicting the right number, not a number close to the expected value. This means I would prefer to frame the problem as classification rather than regression, where the expected output is a class and there are 100 possible class values.

In this case, we can use a one hot encoding of the integer values where each value is represented by a 100 elements binary vector that is all “0” values except the index of the integer, which is marked 1.

The function below called one_hot_encode() defines how to iterate over a sequence of integers and create a binary vector representation for each and returns the result as a 2-dimensional array.

We also need to decode the encoded values so that we can make use of the predictions, in this case, just review them.

The one hot encoding can be inverted by using the argmax() NumPy function that returns the index of the value in the vector with the largest value.

The function below, named one_hot_decode(), will decode an encoded sequence and can be used to later decode predictions from our network.

Complete Example

We can tie all of this together.

Below is the complete code listing for generating a sequence of 25 random integers and encoding each integer as a binary vector.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example first prints the list of 25 random integers, followed by a truncated view of the binary representations of all integers in the sequence, one vector per line, then the decoded sequence again.

Now that we know how to prepare and represent random sequences of integers, we can look at using LSTMs to learn them.

Echo Current Observation

Let’s start out by looking at a simpler echo problem.

In this section, we will develop an LSTM to echo the current observation. That is given a random integer as input, return the same integer as output.

Or slightly more formally stated as:

That is, the model is to predict the value at the current time (yhat(t)) as a function (f()) of the observed value at the current time (X(t)).

It is a simple problem because no memory is required, just a function to map an input to an identical output.

It is a trivial problem and will demonstrate a few useful things:

  • How to use the problem representation machinery above.
  • How to use LSTMs in Keras.
  • The capacity of an LSTM required to learn such a trivial problem.

This will lay the foundation for the echo of lag observations next.

First, we will develop a function to prepare a random sequence ready to train or evaluate an LSTM. This function must first generate a random sequence of integers, use a one hot encoding, then transform the input data to be a 3-dimensional array.

LSTMs require a 3D input comprised of the dimensions [samples, timesteps, features]. Our problem will be comprised of 25 examples per sequence, 1 time step, and 100 features for the one hot encoding.

This function is listed below, named generate_data().

Next, we can define our LSTM model.

The model must specify the expected dimensionality of the input data. In this case, in terms of timesteps (1) and features (100). We will use a single hidden layer LSTM with 15 memory units.

The output layer is a fully connected layer (Dense) with 100 neurons for the 100 possible integers that may be output. A softmax activation function is used on the output layer to allow the network to learn and output the distribution over the possible output values.

The network will use the log loss function while training, suitable for multi-class classification problems, and the efficient ADAM optimization algorithm. The accuracy metric will be reported each training epoch to give an idea of the skill of the model in addition to the loss.

We will fit the model manually by running each epoch by hand with a new generated sequence. The model will be fit for 500 epochs, or stated another way, trained on 500 randomly generated sequences.

This will encourage the network to learn to reproduce the actual input rather than memorizing a fixed training dataset.

Once the model is fit, we will make a prediction on a new sequence and compare the predicted output to the expected output.

The complete example is listed below.

Running the example prints the log loss and accuracy each epoch.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

The network is a little over-prescribed, having more memory units and training epochs than is required for such a simple problem, and you can see this by the fact that the network quickly achieves 100% accuracy.

At the end of the run, the predicted sequence is compared to a randomly generated sequence and the two look identical.

Now that we know how to use the tools to create and represent random sequences and to fit an LSTM to learn to echo the current sequence, let’s look at how we can use LSTMs to learn how to echo a past observation.

Echo Lag Observation Without Context
(The Beginners Mistake)

The problem of predicting a lag observation can be more formally defined as follows:

Where the expected output for the current time step yhat(t) is defined as a function (f()) of a specific previous observation (X(t-n)).

The promise of LSTMs suggests that you can show examples to the network one at a time and that the network will use internal state to learn and to sufficiently remember prior observations in order to solve this problem.

Let’s try this out.

First, we must update the generate_data() function and re-define the problem.

Rather than using the same sequence for input and output, we will use a shifted version of the encoded sequence as input and a truncated version of the encoded sequence as output.

These changes are required in order to take a sequence of numbers, such as [1, 2, 3, 4], and turn them into a supervised learning problem with input (X) and output (y) components, such as:

In this example, you can see that the first and last rows do not contain sufficient data for the network to learn. This could be marked as a “no data” value and masked, but a simpler solution is to simply remove it from the dataset.

The updated generate_data() function is listed below:

We must test out this updated representation of the data to confirm it does what we expect. To do this, we can generate a sequence and review the decoded X and y values over the sequence.

The complete code listing for this sanity check is provided below.

Running the example prints the X and y components of the problem framing.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

We can see that the first pattern will be hard (impossible) for the network to learn given the cold start. We can see that the expected pattern of yhat(t) == X(t-1) down through the data.

The network design is similar, but with one small change.

Observations are shown to the network one at a time and a weight update is performed. Because we expect the state between observations to carry the information required to learn the prior observation, we need to ensure that this state is not reset after each batch (in this case, one batch is one training observation). We can do this by making the LSTM layer stateful and manually managing when the state is reset.

This involves setting the stateful argument to True on the LSTM layer and defining the input shape using the batch_input_shape argument that includes the dimensions [batchsize, timesteps, features].

There are 24 X,y pairs for a given random sequence, therefore a batch size of 6 was used (4 batches of 6 samples = 24 samples). Remember, a sequence is broken down into samples, and samples can be shown to the network in batches before an update to the network weights is performed. A large network size of 50 memory units is used, again to over-prescribe the capacity needed for the problem.

Next, after each epoch (one iteration of a randomly generated sequence), the internal state of the network can be manually reset. The model is fit for 2,000 training epochs and care is made to not shuffle the samples within a sequence.

Putting this all together, the complete example is listed below.

Running the example gives a surprising result.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

The problem cannot be learned and training ends with a model with 0% accuracy of echoing the last observation in the sequence.

How can this be?

The Beginner’s Mistake

This is a common mistake made by beginners, and if you have been around the block with RNNs or LSTMs, then you would have spotted this error above.

Specifically, the power of LSTMs does come from the learned internal state maintained, but this state is only powerful if it is trained as a function over past observations.

Stated another way, you must provide the network the context for the prediction (e.g. the observations that may contain the temporal dependence) as time steps on the input.

The above formulation trained the network to learn the output as a function only of the current input value, as in the first example:

Not as a function of the last n observations, or even just the previous observation, as we require:

The LSTM does only need one input at a time in order to learn this unknown temporal dependence, but it must perform backpropagation over the sequence in order to learn this dependence. You must provide the past observations of the sequence as context.

You are not defining a window (as in the case of Multilayer Perceptron where each past observation is a weighted input); instead, you are defining an extent of historical observations from which the LSTM will attempt to learn the temporal dependence (f(X(t-1), … X(t-n))).

To be clear, this is the beginner’s mistake when using LSTMs in Keras, and not necessarily in general.

Echo Lag Observation

Now that we have navigated around a common pitfall for beginners, we can develop an LSTM to echo the previous observation.

The first step is to reformulate the definition of the problem, again.

We know that the network only requires the last observation as input in order to make correct predictions. But we want the network to learn which of the past observations to echo in order to correctly solve this problem. Therefore, we will provide a subsequence of the last 5 observation as context.

Specifically, if our sequence contains: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], the X,y pairs would look as follows:

In this case, you can see that the first 5 rows and the last 1 row do not contain enough data, so in this case, we will remove them.

We will use the Pandas shift() function to create shifted versions of the sequence and the Pandas concat() function to recombine the shifted sequences back together. We will then manually exclude the rows that are not viable.

The updated generate_data() function is listed below.

Again, we can sanity check this updated function by generating a sequence and comparing the decoded X,y pairs. The complete example is listed below.

Running the example shows the context of the last 5 values as input and the last prior observation (X(t-1)) as output.

We can now develop an LSTM to learn this problem.

There are 20 X,y pairs for a given sequence; therefore, a batch size of 5 was chosen (4 batches of 5 examples = 20 samples).

The same structure was used with 50 memory units in the LSTM hidden layer and 100 neurons in the output layer. The network was fit for 2,000 epochs with internal state reset after each epoch.

The complete code listing is provided below.

Running the example shows that the network can fit the problem and correctly learn to return the X(t-1) observation as prediction within the context of 5 prior observations.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Example output is provided below.

Extensions

This section lists some extensions to the experiments in this tutorial.

  • Ignore Internal State. Care was taken to preserve internal state of the LSTMs across samples within a sequence by manually resetting state at the end of the epoch. We know that the network already has all the context and state required within each sample via timesteps. Explore whether the additional cross-sample state adds any benefit to the skill of the model.
  • Mask Missing Data. During data preparation, rows with missing data were removed. Explore the use of marking missing values with a special value (e.g. -1) and seeing whether the LSTM can learn from these examples. Also explore the use of a Masking layer as input and explore masking out missing values.
  • Entire Sequence as Timesteps. A context of only the last 5 observations were provided as context from which to learn to echo. Explore using the entire random sequence as context for each sample, built-up as the sequence unfolds. This may require padding and even masking of missing values to meet the expectation of fixed-sized inputs to the LSTM.
  • Echo Different Lag Value. A specific lag value (t-1) was used in the echo example. Explore using a different lag value in the echo and how this affects properties such as model skill, training time, and LSTM layer size. I would expect that each lag could be learned using the same model structure.
  • Echo Lag Sequence. The network was trained to echo a specific lag observation. Explore variations where a lag sequence is echoed. This may require the use of the TimeDistributed layer on the output of the network to achieve sequence to sequence prediction.

Did you explore any of these extensions?
Share your findings in the comments below.

Summary

In this tutorial, you discovered how to develop an LSTM to address the problem of echoing a lag observation from a random sequence of integers.

Specifically, you learned:

  • How to generate and encode test data for the problem.
  • How to avoid the beginner’s mistake when attempting to address this and similar problems with LSTMs.
  • How to develop a robust LSTM to echo integers in an ad hoc sequence of random integers.

Do you have any questions?
Ask your questions in the comments and I will do my best to answer.

Develop LSTMs for Sequence Prediction Today!

Long Short-Term Memory Networks with Python

Develop Your Own LSTM models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Long Short-Term Memory Networks with Python

It provides self-study tutorials on topics like:
CNN LSTMs, Encoder-Decoder LSTMs, generative models, data preparation, making predictions and much more...

Finally Bring LSTM Recurrent Neural Networks to
Your Sequence Predictions Projects

Skip the Academics. Just Results.

See What's Inside

42 Responses to How to Learn to Echo Random Integers with LSTMs in Keras

  1. Avatar
    Shirish Ranade June 9, 2017 at 12:22 pm #

    Wow,

    That is neat. 100% accuracy !!

    Will this work with binomial classification problem as well?

    • Avatar
      Jason Brownlee June 10, 2017 at 8:12 am #

      This is a special case of a well defined small problem.

  2. Avatar
    MOHD SAIFUL BAHRI IBRAHIM October 4, 2017 at 12:18 pm #

    Very good …thank u

  3. Avatar
    Scott November 7, 2017 at 8:59 am #

    Jason, would you clarify the remark, “You are not defining a window (as in the case of Multilayer Perceptron where each past observation is a weighted input); instead, you are defining an extent of historical observations”? It seems that defining a window of weighted inputs is exactly what we are doing: We are feeding in a series of windows, and the LSTM is learning to pick out one element from that window.

  4. Avatar
    Aditya February 22, 2018 at 8:39 pm #

    Hi Jason

    The post is really insightful, especially the part which covers the Beginner’s mistake.
    I have one lingering question though. In the last section, where we are trying to echo the lag observation, how does an LSTM model provide an advantage over the Multilayer Perceptron?
    We could have considered the time steps to be input features for an MLP and then trained it to learn the correct weights for those inputs and get the correct predicted output.

    Basically, I want to understand the case where using an LSTM based RNN will be advantageous since an MLP model would be unsuccessful. The echo lag example doesn’t really seem to show the advantage of an LSTM model to me.

    • Avatar
      Jason Brownlee February 23, 2018 at 11:55 am #

      The MLP must be exposed to all time steps as features at once, where as the LSTM sees only one time step at a time and accumulates state over time steps in order to create an output.

      That is the key difference.

      • Avatar
        Aditya February 24, 2018 at 7:05 pm #

        Thanks Jason, I understand the difference in the implementation. However, since the echo lag problem could also be solved by using an MLP model with all time steps as features, I wanted to understand what advantage does an LSTM offer over it?
        Specifically, are there any sequences which can not be learnt with an MLP model (by using time steps as input features)?

  5. Avatar
    anurag October 30, 2018 at 12:48 am #

    awesome article.. thnx

  6. Avatar
    Scholes December 8, 2018 at 3:14 am #

    Very helpeful article, i thank you sir for the efforts.

    I’m bit new to LSTM but i kinda understood how it works, But i have a problem in my interpretation :

    – I’d like to predict the next random output from “yhat”, so i did a little loop which will predict and append the list of test each time (So it will predict more future outputs) in other meaning each “yhat” list becomes “X”list in every iteration which i thought it means it will predict the next output.

    But my problem is it stays blocked in the same X list that i gaved in the begening :

    The list of predicted numbers gets smaller despite i did append each time, so i cant have future predictions.

    Would you like i show the piece of code of that and output? it will be so helpfull for me what you did

    Thank you again

    • Avatar
      Jason Brownlee December 8, 2018 at 7:12 am #

      Nice work!

      • Avatar
        Scholes December 11, 2018 at 1:40 am #

        Good morning, I guess you didnt understand me lol sorry i dont explain well

        In every iteration each X_list is the previous yhat predicted list. means :

        X(n iteration) = yhat(n-1) , here’s the execution so you can have an idea about my problem -> :

        (‘Interation = ‘, 6)
        X file: [20, 38, 19, 48, 28, 43, 50, 14, 25, 28, 20, 19, 44, 16, 9]
        Predicted: [48, 28, 43, 50, 14, 25, 28, 20, 19, 44, 16, 9, 25, 13, 15]
        (‘Interation = ‘, 7)
        X file: [28, 43, 50, 14, 25, 28, 20, 19, 44, 16]
        Predicted: [14, 25, 28, 20, 19, 44, 16, 9, 25, 13]
        (‘Interation = ‘, 8)
        X file: [25, 28, 20, 19, 44]
        Predicted: [19, 44, 16, 9, 25]
        (‘Interation = ‘, 9)
        X file: []
        Predicted: []

        My idea was to keep predict future outputs each time we give it a predicted number, but unfortunately it shows a small vicious cycle list . if you can explain me please for my educational project. i let show you the piece of code i did about that :

        for i in range(1,10): #ITERATE 10 TIMES

        yhat_list = one_hot_decode(yhat) #Read last predicted list as an input for next prediction
        encoded = one_hot_encode(yhat_list)
        # create lag inputs
        df = DataFrame(encoded)
        df = concat([df.shift(4), df.shift(3), df.shift(2), df.shift(1), df], axis=1)
        # remove non-viable rows
        values = df.values
        values = values[5:,:]
        # convert to 3d for input
        X = values.reshape(len(values), 5, 100)
        yhat = model.predict(X, batch_size=5)
        print(“Interation = “, i)
        print(‘X file: %s’ % one_hot_decode(X))
        print(‘Predicted: %s’ % one_hot_decode(yhat))

        THANK YOU SO MUCH FOR YOUR EXPLICATIONS

        • Avatar
          Jason Brownlee December 11, 2018 at 7:48 am #

          I don’t follow. What is the problem you’re having exactly?

          • Avatar
            Scholes December 13, 2018 at 12:49 am #

            I want to continue predicting the next 100 outputs of “yhat” , so i pass it each time as X list in each iteration. i thought it

            will predict new outputs in each time.

            But unfortunately the loop never progress and keep getting smaller and predicting already known numbers. (as i showed you in the previous example )

            for i in range(1,100): #ITERATE 10 TIMES

            yhat_list = one_hot_decode(yhat) #Read last predicted list as an input for next prediction
            encoded = one_hot_encode(yhat_list)
            # create lag inputs
            df = DataFrame(encoded)
            df = concat([df.shift(4), df.shift(3), df.shift(2), df.shift(1), df], axis=1)
            # remove non-viable rows
            values = df.values
            values = values[5:,:]
            # convert to 3d for input
            X = values.reshape(len(values), 5, 100)
            yhat = model.predict(X, batch_size=5)
            print(“Interation = “, i)
            print(‘X file: %s’ % one_hot_decode(X))
            print(‘Predicted: %s’ % one_hot_decode(yhat))

          • Avatar
            Jason Brownlee December 13, 2018 at 7:54 am #

            I have a number of posts on multi-step predicting with LSTMs, perhaps start here:
            https://machinelearningmastery.com/start-here/#deep_learning_time_series

  7. Avatar
    Scholes December 13, 2018 at 11:33 pm #

    ok thank you jason i’ll check that out , best regards

  8. Avatar
    brandy January 31, 2019 at 8:19 am #

    Good afternoon Jason, I want to replace the random generator with my own dataset and iterate through it and get the most common numbers, it’s in a csv file how would I go about doing the in the code? Thanks in advanced

    • Avatar
      Jason Brownlee January 31, 2019 at 2:15 pm #

      I cannot write a custom example for you. What problem are you having exactly?

  9. Avatar
    jmaidagan April 2, 2019 at 8:46 pm #

    Very interesting your blog, I learned a lot reading it.

    I think the solution to the interesting problem that you have raised has nothing to do with the long LSTM memory. The result you are looking for is being supplied in axis 2 of the input. You can verify it through ‘stateful = False’ on line 47 of your code: the convergence to acc = 100% is even faster!
    On the other hand, what you call “a common pitfall for beginners” is exactly the way that is advised in https://keras.io/examples/lstm_stateful/. The calculation (tsteps = 2 lahead = 1) shows that, although the problem is not solved, there is an appreciable difference between stateful = True / False.
    From my point of view, the crucial question is:
    It is possible to train an LSTM so that (in production regime) it is capable of producing the echo receiving only a random sequence, a number at each time?
    This problem would illuminate the LSTM advantage over other networks, since it is impossible to solve it without memory.
    Regards

  10. Avatar
    Christophe May 8, 2019 at 3:19 pm #

    Hi Jason – May I ask why you used a custom-coded one-hot encoding rather than the keras function to_categorical?

    Also what would be the implementation if the n_unique value is extremely large?

    Thanks.

    • Avatar
      Jason Brownlee May 9, 2019 at 6:35 am #

      I don’t recall why, sorry.

      How large? It is common to one hot encode on NLP problems up to tens of thousands of tokens, or more.

      Also, embeddings work amazingly well for large cardinality categorical variables.

  11. Avatar
    aswin June 20, 2020 at 5:17 am #

    how to predict next number using this program?

  12. Avatar
    Shubham Chauhan July 3, 2020 at 12:40 pm #

    sir your code is working perfectly but I am not able to predict next number if I want to give my own choice of X . By using model.predict () Suppose if I want to give 34 then what will be the next number ? Sir plzzz its a two line of code plzz write in the comment section

    • Avatar
      Jason Brownlee July 3, 2020 at 2:24 pm #

      Perhaps start with the working code and adapt it for your required change.

  13. Avatar
    João Victor November 16, 2020 at 12:33 am #

    But, how can I predict the future random numbers? That’s not clear to me. Because we already put the random numbers as input, so how to predict the next one?

  14. Avatar
    Konradino December 14, 2020 at 2:37 am #

    If I want to limit the number of generated numbers from 25 to 5, or even to 6 – what other factors should be modified? Change just in amount of elements in generated numbers generates an error. BTW: going to buy your e-book, thanks!

    • Avatar
      Jason Brownlee December 14, 2020 at 6:20 am #

      Yes, change the definition of the problem, and the encoding.

      You might need to tune the model and learning hyperparameters for the change in difficulty of the problem.

  15. Avatar
    Darius.Nguyen March 8, 2021 at 6:19 pm #

    First of all, your post gives me more knowledge. But I have a question? When I test your model with [a,b,c,d,e], I try to change e with many numbers and keep a,b,c,d but your model can predict the right number d. In my mind, I think when we change some number, d of predict should be changed? Is it right?

  16. Avatar
    Marcos Berti October 1, 2022 at 4:38 am #

    Dear Jason,

    I already bought two of your e-books and learned a lot, both were on sequence and LSTM applications. Following your emails, I found this Echo Lag Observation fantastic. I´ve executed the Python code and is really interesting, around 500 epochs the accuracy reaches 1. But I need your help, cause I tried to predict the next value of the sequence, and I really don´t have the skills to do it. The program predicts exactly what it´s presented as the X input. I need to predict the next number in that sequence, and I really don´t know how to do it. Please, can you send me the code to predict the next value in the sequence?

    the last statements are:

    yhat = model.predict(X, batch_size=5)
    print(‘Expected: %s’ % one_hot_decode(y))
    print(‘Predicted: %s’ % one_hot_decode(yhat))

    Best regards,

    Marcos

  17. Avatar
    Marcos Berti October 1, 2022 at 11:57 pm #

    Thank you James.
    I want to use this Echo Random described here in this doc, the only thing I don´t know how to do is given a sequence of numbers, how to predict the next number of that sequence. This code shown by Jason predicts exactly what was presented to the algorithm. What I need to know is what are the code to predict the next number of the sequence. All examples that I see the algorithm predicts exactly the same sequence. I just need to get the next number of the sequence. I don’t know how.

  18. Avatar
    Tariq A April 5, 2024 at 12:47 pm #

    Thank you for the blog! Indeed, it works great for any length of random integers.

    I wanted to ask.. If I use conventional method and apply LSTM, I split the dataset into training and testing, and after prediction, I can validate the training and make prediction of the next values in the sequence too.

    In this case of random integers, if I input a certain random sequence to this mode, how do I predict the next values in the sequence?

    • Avatar
      James Carmichael April 7, 2024 at 7:16 am #

      Hi Tariq…Predicting the next values in a sequence of random integers using an LSTM (Long Short-Term Memory) model is an intriguing task because random sequences by definition do not have predictable patterns or dependencies that traditional time series or sequence prediction models exploit. However, if the sequence is pseudo-random or contains hidden patterns or dependencies not immediately apparent, LSTM models might be able to capture some of these characteristics.

      Here’s a general approach to how you might attempt to predict the next values in a sequence of integers using an LSTM model:

      ### 1. **Preparing the Dataset**
      – **Sequence Creation**: Convert the sequence of random integers into a supervised learning problem. This typically involves creating input-output pairs where the inputs are sequences of integers and the outputs are the next integer(s) in the sequence. For example, from a sequence \([x_1, x_2, x_3, x_4, …]\), you can create input-output pairs like \(([x_1, x_2, x_3], x_4)\).
      – **Normalization**: Depending on the range of integers, you might need to normalize or scale your data to help the LSTM model perform better.

      ### 2. **Model Design**
      – **Input Layer**: Design your LSTM with an input layer that matches the dimension of your data. For a sequence input, this is typically the sequence length and number of features (e.g., \((sequence\_length, 1)\) for a single-feature sequence).
      – **LSTM Layers**: Add one or more LSTM layers. The complexity of the model can be adjusted depending on the dataset size and the computational resources available.
      – **Output Layer**: Since you are predicting integers, the output layer could be a dense layer with a linear activation function (if predicting the next integer as a regression problem) or a softmax activation layer (if classifying into categories).

      ### 3. **Training the Model**
      – **Loss Function**: Use MSE (Mean Squared Error) for regression problems or cross-entropy for classification problems.
      – **Optimizer**: Common choices include Adam or SGD (Stochastic Gradient Descent).
      – **Epochs and Batches**: Choose appropriate values based on your dataset size and overfitting behavior.

      ### 4. **Prediction Phase**
      – **Using Last Known Values**: To predict the next integer(s) in the sequence, input the last known values into the model. For instance, if your last input sequence during training was \([x_{n-2}, x_{n-1}, x_n]\), to predict \(x_{n+1}\), you feed \([x_{n-2}, x_{n-1}, x_n]\) into the model.
      – **Sequential Prediction**: If you want to predict several future steps, you can use the predictions as new inputs. For example, predict \(x_{n+1}\), then use \([x_{n-1}, x_n, x_{n+1}]\) to predict \(x_{n+2}\), and so forth.

      ### 5. **Evaluation**
      – Evaluate the model’s performance using appropriate metrics (e.g., RMSE for regression).
      – Perform diagnostics to check if the model is merely memorizing the training data or actually capturing useful patterns.

      ### 6. **Challenges with Random Data**
      – **True Randomness**: If the data is truly random, you will likely find that the LSTM model does not perform well in predicting future values since there are no patterns to learn.
      – **Pseudo-random Patterns**: If there are underlying patterns or the sequence is generated through deterministic pseudo-random algorithms, the LSTM might capture some of these.

      Using an LSTM to predict future values in a sequence of random integers is more of an experimental than a practical approach, given the nature of randomness. If you’re exploring this as a theoretical exercise or for learning purposes, it can provide valuable insights into sequence modeling and LSTM capabilities.

Leave a Reply