Difference Between Return Sequences and Return States for LSTMs in Keras

By Jason Brownlee on August 14, 2019 in Long Short-Term Memory Networks 147

The Keras deep learning library provides an implementation of the Long Short-Term Memory, or LSTM, recurrent neural network.

As part of this implementation, the Keras API provides access to both return sequences and return state. The use and difference between these data can be confusing when designing sophisticated recurrent neural network models, such as the encoder-decoder model.

In this tutorial, you will discover the difference and result of return sequences and return states for LSTM layers in the Keras deep learning library.

After completing this tutorial, you will know:

That return sequences return the hidden state output for each input time step.
That return state returns the hidden state output and cell state for the last input time step.
That return sequences and return state can be used at the same time.

Kick-start your project with my new book Long Short-Term Memory Networks With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Understand the Difference Between Return Sequences and Return States for LSTMs in Keras
Photo by Adrian Curt Dannemann, some rights reserved.

Tutorial Overview

This tutorial is divided into 4 parts; they are:

Long Short-Term Memory
Return Sequences
Return States
Return States and Sequences

Long Short-Term Memory

The Long Short-Term Memory, or LSTM, is a recurrent neural network that is comprised of internal gates.

Unlike other recurrent neural networks, the network’s internal gates allow the model to be trained successfully using backpropagation through time, or BPTT, and avoid the vanishing gradients problem.

In the Keras deep learning library, LSTM layers can be created using the LSTM() class.

Creating a layer of LSTM memory units allows you to specify the number of memory units within the layer.

Each unit or cell within the layer has an internal cell state, often abbreviated as “c“, and outputs a hidden state, often abbreviated as “h“.

The Keras API allows you to access these data, which can be useful or even required when developing sophisticated recurrent neural network architectures, such as the encoder-decoder model.

For the rest of this tutorial, we will look at the API for access these data.

Return Sequences

Each LSTM cell will output one hidden state h for each input.

h = LSTM(X)

1	h = LSTM(X)

We can demonstrate this in Keras with a very small model with a single LSTM layer that itself contains a single LSTM cell.

In this example, we will have one input sample with 3 time steps and one feature observed at each time step:

t1 = 0.1
t2 = 0.2
t3 = 0.3

t1 = 0.1

t2 = 0.2

t3 = 0.3

The complete example is listed below.

Note: all examples in this post use the Keras functional API.

from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from numpy import array
# define model
inputs1 = Input(shape=(3, 1))
lstm1 = LSTM(1)(inputs1)
model = Model(inputs=inputs1, outputs=lstm1)
# define input data
data = array([0.1, 0.2, 0.3]).reshape((1,3,1))
# make and show prediction
print(model.predict(data))

from keras.models import Model

from keras.layers import Input

from keras.layers import LSTM

from numpy import array

# define model

inputs1 = Input(shape=(3, 1))

lstm1 = LSTM(1)(inputs1)

model = Model(inputs=inputs1, outputs=lstm1)

# define input data

data = array([0.1, 0.2, 0.3]).reshape((1,3,1))

# make and show prediction

print(model.predict(data))

Running the example outputs a single hidden state for the input sequence with 3 time steps.

Your specific output value will differ given the random initialization of the LSTM weights and cell state.

[[-0.0953151]]

1	[[-0.0953151]]

It is possible to access the hidden state output for each input time step.

This can be done by setting the return_sequences attribute to True when defining the LSTM layer, as follows:

LSTM(1, return_sequences=True)

1	LSTM(1, return_sequences=True)

We can update the previous example with this change.

The full code listing is provided below.

from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from numpy import array
# define model
inputs1 = Input(shape=(3, 1))
lstm1 = LSTM(1, return_sequences=True)(inputs1)
model = Model(inputs=inputs1, outputs=lstm1)
# define input data
data = array([0.1, 0.2, 0.3]).reshape((1,3,1))
# make and show prediction
print(model.predict(data))

from keras.models import Model

from keras.layers import Input

from keras.layers import LSTM

from numpy import array

# define model

inputs1 = Input(shape=(3, 1))

lstm1 = LSTM(1, return_sequences=True)(inputs1)

model = Model(inputs=inputs1, outputs=lstm1)

# define input data

data = array([0.1, 0.2, 0.3]).reshape((1,3,1))

# make and show prediction

print(model.predict(data))

Running the example returns a sequence of 3 values, one hidden state output for each input time step for the single LSTM cell in the layer.

[[[-0.02243521]
  [-0.06210149]
  [-0.11457888]]]

[[[-0.02243521]

[-0.06210149]

[-0.11457888]]]

You must set return_sequences=True when stacking LSTM layers so that the second LSTM layer has a three-dimensional sequence input. For more details, see the post:

Stacked Long Short-Term Memory Networks

You may also need to access the sequence of hidden state outputs when predicting a sequence of outputs with a Dense output layer wrapped in a TimeDistributed layer. See this post for more details:

How to Use the TimeDistributed Layer for Long Short-Term Memory Networks in Python

Return States

The output of an LSTM cell or layer of cells is called the hidden state.

This is confusing, because each LSTM cell retains an internal state that is not output, called the cell state, or c.

Generally, we do not need to access the cell state unless we are developing sophisticated models where subsequent layers may need to have their cell state initialized with the final cell state of another layer, such as in an encoder-decoder model.

Keras provides the return_state argument to the LSTM layer that will provide access to the hidden state output (state_h) and the cell state (state_c). For example:

lstm1, state_h, state_c = LSTM(1, return_state=True)

1	lstm1, state_h, state_c = LSTM(1, return_state=True)

This may look confusing because both lstm1 and state_h refer to the same hidden state output. The reason for these two tensors being separate will become clear in the next section.

We can demonstrate access to the hidden and cell states of the cells in the LSTM layer with a worked example listed below.

from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from numpy import array
# define model
inputs1 = Input(shape=(3, 1))
lstm1, state_h, state_c = LSTM(1, return_state=True)(inputs1)
model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c])
# define input data
data = array([0.1, 0.2, 0.3]).reshape((1,3,1))
# make and show prediction
print(model.predict(data))

from keras.models import Model

from keras.layers import Input

from keras.layers import LSTM

from numpy import array

# define model

inputs1 = Input(shape=(3, 1))

lstm1, state_h, state_c = LSTM(1, return_state=True)(inputs1)

model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c])

# define input data

data = array([0.1, 0.2, 0.3]).reshape((1,3,1))

# make and show prediction

print(model.predict(data))

Running the example returns 3 arrays:

The LSTM hidden state output for the last time step.
The LSTM hidden state output for the last time step (again).
The LSTM cell state for the last time step.

[array([[ 0.10951342]], dtype=float32),
 array([[ 0.10951342]], dtype=float32),
 array([[ 0.24143776]], dtype=float32)]

[array([[ 0.10951342]], dtype=float32),

array([[ 0.10951342]], dtype=float32),

array([[ 0.24143776]], dtype=float32)]

The hidden state and the cell state could in turn be used to initialize the states of another LSTM layer with the same number of cells.

Return States and Sequences

We can access both the sequence of hidden state and the cell states at the same time.

This can be done by configuring the LSTM layer to both return sequences and return states.

lstm1, state_h, state_c = LSTM(1, return_sequences=True, return_state=True)

1	lstm1, state_h, state_c = LSTM(1, return_sequences=True, return_state=True)

The complete example is listed below.

from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from numpy import array
# define model
inputs1 = Input(shape=(3, 1))
lstm1, state_h, state_c = LSTM(1, return_sequences=True, return_state=True)(inputs1)
model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c])
# define input data
data = array([0.1, 0.2, 0.3]).reshape((1,3,1))
# make and show prediction
print(model.predict(data))

from keras.models import Model

from keras.layers import Input

from keras.layers import LSTM

from numpy import array

# define model

inputs1 = Input(shape=(3, 1))

lstm1, state_h, state_c = LSTM(1, return_sequences=True, return_state=True)(inputs1)

model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c])

# define input data

data = array([0.1, 0.2, 0.3]).reshape((1,3,1))

# make and show prediction

print(model.predict(data))

Running the example, we can see now why the LSTM output tensor and hidden state output tensor are declared separably.

The layer returns the hidden state for each input time step, then separately, the hidden state output for the last time step and the cell state for the last input time step.

This can be confirmed by seeing that the last value in the returned sequences (first array) matches the value in the hidden state (second array).

[array([[[-0.02145359],
        [-0.0540871 ],
        [-0.09228823]]], dtype=float32),
 array([[-0.09228823]], dtype=float32),
 array([[-0.19803026]], dtype=float32)]

[array([[[-0.02145359],

[-0.0540871 ],

[-0.09228823]]], dtype=float32),

array([[-0.09228823]], dtype=float32),

array([[-0.19803026]], dtype=float32)]

Summary

In this tutorial, you discovered the difference and result of return sequences and return states for LSTM layers in the Keras deep learning library.

Specifically, you learned:

That return sequences return the hidden state output for each input time step.
That return state returns the hidden state output and cell state for the last input time step.
That return sequences and return state can be used at the same time.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

147 Responses to Difference Between Return Sequences and Return States for LSTMs in Keras

Nikeita October 24, 2017 at 5:22 pm #

Thanks for this!

To help people understand some applications of the output sequence and state visually, a picture like in the following stats overflow answer is great!

https://stats.stackexchange.com/a/181544/37863

Reply
- Jason Brownlee October 25, 2017 at 6:39 am #
  
  Thanks.
  
  Reply
Thabet Ali October 25, 2017 at 12:44 am #

Hi Jason,

Do you have plans to use more of the function API in your blog series?
if so, why?

Best regards
Thabet

Reply
- Jason Brownlee October 25, 2017 at 6:48 am #
  
  Yes, it is needed for more advanced model development.
  
  I will have a “how to…” post on the functional API soon. It is scheduled.
  
  Reply
Alex October 27, 2017 at 1:33 am #

Hi Jason, is it possible to access the internal states through return_state = True and return_sequences = True with the Sequencial API? Moreover, is it possible to set the hidden state through a function like set_state() ?

Thanks!

Reply
- Jason Brownlee October 27, 2017 at 5:24 am #
  
  Perhaps, but not as far as I know or have tested.
  
  Reply
Eldar October 27, 2017 at 6:43 am #

Hey Jason, I wanted to show you this cool new RNN cell I’ve been trying out called “Recurrent Weighted Average” – it implements attention into the recurrent neural network – the keras implementation is available at https://github.com/keisuke-nakata/rwa and the whitepaper is at https://arxiv.org/pdf/1703.01253.pdf

I’ve also seen that GRU is often a better choice unless the LSTM’s bias is initialized to ones, and it’s baked into Keras now (whitepaper for that at http://proceedings.mlr.press/v37/jozefowicz15.pdf )

Reply
- Jason Brownlee October 27, 2017 at 2:53 pm #
  
  Very cool!
  
  Reply
Alex October 28, 2017 at 5:13 am #

Just a note to say that return_state seems to be a recent addition to keras (since tensorflow 1.3 – if you are using keras in tensorflow contrib).

Shame it’s not available in earlier versions – I was looking forward to playing around with it 🙂

Alex

Reply
- Jason Brownlee October 28, 2017 at 5:16 am #
  
  Thanks Alex.
  
  Reply
Alex October 30, 2017 at 9:54 pm #

Hi Jason, in these example you don’t fit.
When you define the model like this: model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c]) and then fit, the fit() function expects three values for the output instead of 1. How to correctly print the states (o see they change during training and/or prediction ?

Reply
- Jason Brownlee October 31, 2017 at 5:33 am #
  
  For a model that takes 2 inputs, they must be provided to fit() as an array.
  
  Reply
  - Alex April 27, 2018 at 6:29 pm #
    
    Hi Jason, the question was about the outputs, not the inputs.. The problem is that if i set outputs=[lstm1, state_h, state_c] in the Model(), then the fit() function will expect three arrays as target arrays.
    
    Reply
    - Pradyumna Majumder November 9, 2018 at 11:53 pm #
      
      Hi Alex, did u find how to handle the fit in this case?
      
      Suppose i have
      
      model = Model(inputs=[input_x, h_one_in , h_two_in], outputs=[y1,y2,state_h,state_c])
      
      how would I write my mode.fit? in the input and outputs?
      
      Thanks,
      
      Reply
      - Fatih June 21, 2019 at 7:30 pm #
        
        +1
        
        I use random initialization but the results are disappointing.
        
        Any other ideas?
      - Prajwal T R May 27, 2024 at 3:41 pm #
        
        Check this notebook, not a LSTM but similar training and test data shape.
        
        in the inp_data_generator function it yields similar data matching model input and output (inputs and outputs are clubbed).
        ie. yield [np.array(inp_batch), np.array(ext_batch)], [np.array(touch_batch), np.array(cropped_batch)] # [x1, x2], [y1, y2]
        
        and model fit is called like so
        history = model.fit(train_data, validation_data = validation_data, validation_steps = validation_steps_per_epoch, steps_per_epoch = train_steps_per_epoch, epochs = epochs, callbacks = [early_stop])
MT November 8, 2017 at 9:47 am #

Jason,

Brilliant post as usual. I am also going to buy your LSTM book.

I however had a question on Keras LSTM error I have been getting and was hoping if you could help that?

Getting an error like this

“You must feed a value for placeholder tensor ’embedding_layer_input'”

/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py in raise_exception_on_not_ok_status()
465 compat.as_text(pywrap_tensorflow.TF_Message(status)),
–> 466 pywrap_tensorflow.TF_GetCode(status))
467 finally:

InvalidArgumentError: You must feed a value for placeholder tensor ’embedding_layer_input’ with dtype float
[[Node: embedding_layer_input = Placeholder[dtype=DT_FLOAT, shape=[], _device=”/job:localhost/replica:0/task:0/gpu:0″]()]]
[[Node: output_layer_2/bias/read/_237 = _Recv[client_terminated=false, recv_device=”/job:localhost/replica:0/task:0/cpu:0″, send_device=”/job:localhost/replica:0/task:0/gpu:0″, send_device_incarnation=1, tensor_name=”edge_1546_output_layer_2/bias/read”, tensor_type=DT_FLOAT, _device=”/job:localhost/replica:0/task:0/cpu:0″]()]]

During handling of the above exception, another exception occurred:

Here is the code I wrote for this:

def model_param(self):

# Method to do deep learning

from keras.models import Sequential
from keras.layers import Dense, Flatten, Dropout, Activation
from keras.layers import LSTM
from keras.layers.embeddings import Embedding
from keras.initializers import TruncatedNormal

tn=TruncatedNormal(mean=0.0, stddev=1/sqrt(self.x_train.shape[1]*self.x_train.shape[1]), seed=2)

self.model = Sequential()
self.model.add(Embedding(self.len_vocab,300,input_length=self.x_train.shape[1]))

# Adding LSTM cell
self.model.add(LSTM(self.num_units,dropout=0.30,kernel_initializer=tn,name=”lstm_1″))
# Adding the dense output layer for Output
self.model.add(Dense(1,activation=”sigmoid”,name=”output_layer”))

#sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
self.model.compile(loss=’binary_crossentropy’,
optimizer=”adam”,
metrics=[‘accuracy’])

self.model.summary()

def fit(self):
# Training the deep learning network on the training data

# Adding the callbacks for Logging

import keras
logger_tb=keras.callbacks.TensorBoard(
log_dir=”logs_sentiment_lstm”,
write_graph=True,
histogram_freq=5
)

self.model.fit(self.x_train, self.y_train,validation_split=0.20,
epochs=10,
batch_size=128,callbacks=[logger_tb]
)

Reply
- Jason Brownlee November 9, 2017 at 9:51 am #
  
  Ouch, I have not seen this fault before.
  
  Perhaps try simplifying the example to flush out the cause?
  
  Reply
- Kumar Nilay June 21, 2019 at 11:07 am #
  
  histogram_freq=5 is causing this error, this is a bug in keras, set histogram_freq=0 and it should work fine
  
  Reply
  - Jason Brownlee June 21, 2019 at 2:01 pm #
    
    Thanks for sharing.
    
    Reply
Julian November 12, 2017 at 8:27 am #

This is another great Post Jason! I am a fan of all your RRNs posts. 😉 Thanks!

In case anyone was wondering the difference between c (Internal state) and h (Hidden state) in a LSTM, this answer was very helpful for me:

https://www.quora.com/What-is-the-difference-between-states-and-outputs-in-LSTM

Would be correct to say that in a GRU and SimpleRNN, the c=h?

Thanks in advance!

Reply
- Jason Brownlee November 12, 2017 at 9:10 am #
  
  Thanks Julian.
  
  Reply
Kaushal Shetty November 24, 2017 at 12:21 am #

Hi Jason,
In the implementation of encoder-decoder in keras we do a return state in the encoder network which means that we get state_h and state_c after which the [state_h,state_c] is set as initial state of the decoder network. What does setting the initial state mean for a LSTM network? Is it that the state_h of decoder = [state_h,state_c].

Thanks in advance.

Reply
- Jason Brownlee November 24, 2017 at 9:47 am #
  
  Great question, here is an example:
  https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/
  
  Reply
  - Kaushal Shetty November 24, 2017 at 9:37 pm #
    
    Great. I think i get it now. Basically when we give return_state=True for a LSTM then the LSTM will accept three inputs that is decoder_actual_input,state_h,state_c. And during inference we again reset the state_c and state_h with state_h and state_c of previous prediction. Am i correct on my assumption ?
    
    I am still a little bit confused why we use three keras models (model,encoder_model and decoder_model).
    
    Thank You.
    
    Reply
    - Jason Brownlee November 25, 2017 at 10:19 am #
      
      The return_state argument only controls whether the state is returned. A different argument is used to initialize state for an LSTM (e.g. during the definition of the model with the functional API).
      
      Reply
      - Kaushal Shetty November 27, 2017 at 4:57 pm #
        
        Got it. Its initial_state. Thank You Jason.
      - Jason Brownlee November 28, 2017 at 8:36 am #
        
        Glad to hear it.
Nathan D. January 9, 2018 at 1:17 am #

Hi Jason,

I do enjoy reading your blog. I have 2 short questions for this post and hope you could kindly address them briefly:

1. Can we return the sequence of cell states (a sort of variable similar to *lstm1*)?

2. No matter the dimension (I mean #features) of the input sequence, as we place 1 LSTMcell in the layer, both the hidden and cell states are always a scalar, right? As such, the kernel_ and recurrent_kernel_ properties in Keras (at each gate) are not in the matrix form. However, I believe your standpoint on viewing each LSTM cell having 1Dim hidden state/cell makes sense in the case of dropout in deep learning.

Please correct me if I misunderstood your post. Thank you.

Reply
- Jason Brownlee January 9, 2018 at 5:32 am #
  
  Not directly, perhaps by calling the model recursively.
  
  I think you’re right.
  
  Reply
- Tianyu October 2, 2019 at 9:57 pm #
  
  I have the same questions like Q1, so how do you output the sequence of cell states? Thank you.
  
  Reply
  - Jason Brownlee October 3, 2019 at 6:47 am #
    
    You don’t, generally. You output a sequence of activations, referred to in the papers as h.
    
    Reply
Zebo Li January 31, 2018 at 2:06 pm #

Hi, very good explanation.
One question, I thought h = activation (o), is that correct? (h: hidden state output, o: hidden cell)
But tanh(-0.19803026) does not equals -0.09228823. (The default activation for LSTM should be tanh)

Reply
Vinayaka February 27, 2018 at 1:29 am #

Thank you so much, Jason. This cleared my doubt.

Reply
- Jason Brownlee February 27, 2018 at 6:34 am #
  
  I’m glad to hear that.
  
  Reply
Jason March 6, 2018 at 2:55 pm #

Thank you so much for writing this. This is really a big help.

Reply
- Jason Brownlee March 6, 2018 at 2:58 pm #
  
  Thanks, I’m glad it helped.
  
  Reply
tiopon March 16, 2018 at 3:53 pm #

hey Jason， how could i get the final hidden state and sequence both when using a bidirectional wrapper？

Reply
- Jason Brownlee March 17, 2018 at 8:31 am #
  
  Does the above post not help?
  
  Reply
Alex April 12, 2018 at 4:28 pm #

Hi Jason, can I connect the output of a dense layer to the c state of a LSTM in such a way that it initialize the state c with this value before each batch? Thanks

Reply
- Jason Brownlee April 13, 2018 at 6:34 am #
  
  I’m sure you can (it’s all just code), but it might require some careful programming. I don’t have good advice other than lots of trial and error.
  
  Let me know how you go.
  
  Reply
Andrew April 13, 2018 at 2:18 am #

What is the hidden state and cell state of the first input if it does not have a previous hidden or cell state to reference?

Reply
- Jason Brownlee April 13, 2018 at 6:42 am #
  
  It will be zero.
  
  Reply
Ali July 17, 2018 at 4:12 am #

When I use following code based on bidirectional LSTM, it retruns this error:
‘ is not connected, no input to return.’)
AttributeError: Layer sequential_1 is not connected, no input to return.

But when ordinary LSTM (commented code) is ran, it returns correctly.

self.model = Sequential()
# self.model.add(LSTM(input_shape=(None,self.num_encoder_tokens), units=self.n_hidden,
# return_sequences=True,name=’hidden’))
# self.model.add(LSTM(units=self.num_encoder_tokens, return_sequences=True))
# self.intermediate_layer = Model(inputs=self.model.input, outputs=self.model.get_layer(‘hidden’).output)
self.model.add(Bidirectional(LSTM(input_shape=(None,self.num_encoder_tokens), units=self.n_hidden,
return_sequences=True,name=’hidden’),merge_mode=’concat’))
self.model.add(Bidirectional(LSTM(units=self.num_encoder_tokens, return_sequences=True),merge_mode=’concat’))
self.intermediate_layer = Model(input=self.model.input,output=self.model.get_layer(‘hidden’).output)

why?

Reply
- Jason Brownlee July 17, 2018 at 6:21 am #
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
- Harish May 14, 2019 at 2:28 pm #
  
  https://stackoverflow.com/questions/49313650/how-could-i-get-both-the-final-hidden-state-and-sequence-in-a-lstm-layer-when-us
  
  This can help
  
  Reply
Ali July 17, 2018 at 11:01 pm #

Is there no reply for this?

Reply
Sam September 6, 2018 at 7:25 am #

Thank you so much for your explanation!

Reply
- Jason Brownlee September 6, 2018 at 2:09 pm #
  
  I’m happy it helped Sam!
  
  Reply
Klaas Brau November 14, 2018 at 4:23 am #

Awesome Work Jason. I always thought the last hidden state is equal to the cell state. So I was wrong and the hidden state and the cell state is never the same?
Thank you

Reply
- Jason Brownlee November 14, 2018 at 7:36 am #
  
  Yes, they are different things.
  
  Reply
Hussain November 23, 2018 at 8:20 pm #

Hi,
I just wanna thank you for the entire site.
Whenever I am stuck in code or concepts I visit your site and things get cleared up.

Reply
- Jason Brownlee November 24, 2018 at 6:31 am #
  
  Thanks, I’m glad its helpful!
  
  Reply
Clive December 23, 2018 at 4:55 am #

Hi so in the above example our network consist of only one lstm node or cell
And the output is feed to it of 3 timestamps one at a time ?

Or does it have 3 cells for each timestemp

Reply
- Jason Brownlee December 23, 2018 at 6:09 am #
  
  The number of nodes in the LSTM is unrelated to the number of time steps in the data sample.
  
  Perhaps this will make things clearer for you:
  https://machinelearningmastery.com/prepare-univariate-time-series-data-long-short-term-memory-networks/
  
  Reply
mk January 17, 2019 at 6:41 pm #

“Generally, we do not need to access the cell state unless we are developing sophisticated models where subsequent layers may need to have their cell state initialized with the final cell state of another layer, such as in an encoder-decoder model.”
in the another your post,encoder-decoder LSTM Model code as fllower:
model.add(LSTM(200, activation=’relu’, input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation=’relu’, return_sequences=True))
but return_state = false?

Reply
- Jason Brownlee January 18, 2019 at 5:31 am #
  
  Correct.
  
  Reply
M February 9, 2019 at 12:16 am #

Hello Jason,
Thanks for the great post. I have a quick question about the bottleneck in the LSTM encoder above. As I understand it, if the encoder has say 50 cells, then does this mean that the hidden state of the encoder LSTM layer contains 50 values, one per cell?

If this is correct, then would it be accurate to say that if the original data had 50 timesteps and a dimensionality/feature count of 3, then having an encoder LSTM with 20 cells (which would give a hidden state of 20 values) could be considered to be a sort of compression/dimensionality reduction (a la autoencoders and compressed representations) ?

Finally, does it make sense to apply have a fully-connected layer with some nonlinearity operating on the hidden state for purposes of dimensionality reduction i.e hidden state with 50 values -> FFlayer with 10 neurons, ‘compressing’ the 50 values to 10…?

Thanks again

Reply
- Jason Brownlee February 9, 2019 at 5:58 am #
  
  Yes, correct.
  
  If you want to use the hidden state as a learned feature, you could feed it into a new fully connected model.
  
  Reply
  - M February 9, 2019 at 11:14 pm #
    
    Brilliant, thanks!
    
    Reply
Shlomi Schwartz February 24, 2019 at 11:24 pm #

Excellent post, how would one save the state when prediction samples arrives from multiple sources, like the question posted here https://stackoverflow.com/questions/54850854/keras-restore-lstm-hidden-state-for-a-specific-time-stamp ?

Reply
- Jason Brownlee February 25, 2019 at 6:43 am #
  
  You can save state by retrieving it from the model and saving it to a file.
  
  Reply
  - Shlomi Schwartz February 25, 2019 at 6:46 pm #
    
    Thank you so much 🙂
    
    Reply
Sandeli March 3, 2019 at 8:50 pm #

Dear Jason

Thank you very much for the great post. Currently I am working on two-stream LSTM network(sequence of images) and I am trying to extract both LSTMs each time step’s cell state and calculate the average value. Afterwards update next time step with this previous time step’s average value + existing cell state value. And continue this process thru all time steps.

Greatly appreciate if you could explain me how do we update LSTM cell states(as each time steps) by giving additional value. Thank you very much.

Reply
- Jason Brownlee March 4, 2019 at 6:58 am #
  
  Why are you trying to average the cell state exactly?
  
  Reply
  - Sandeli March 4, 2019 at 1:47 pm #
    Thank you very much for your response. I am doing it the following way
    
    Reply
    - Sandeli March 4, 2019 at 1:50 pm #
      
      Please be noted 2nd LSTM is for Optical flow stream mistakenly comment both LSTMs for RGB. Thank you.
      
      Reply
    - Jason Brownlee March 4, 2019 at 2:18 pm #
      
      Why? Why do you want to do this?
      
      Reply
      - Sandeli March 4, 2019 at 6:04 pm #
        
        Currently I working on two-steam networks with image sequence. I want to study that is there any advantage of communicating cells states in each time steps of both streams rather than without communicate (just as normal 2-stream network) as part of my research. Thank you for your concern.
Sandeli March 4, 2019 at 4:56 pm #

Currently I working on two-steam networks with image sequence. I want to study that is there any advantage of communicating cells states in each time steps of both streams rather than without communicate (just as normal 2-stream network) as part of my research. Thank you for your concern.

Reply
- Jason Brownlee March 5, 2019 at 6:31 am #
  
  Interesting, let me know how you go.
  
  Reply
saria March 6, 2019 at 4:34 am #

Really great article, Thanks a lot:)

Reply
- Jason Brownlee March 6, 2019 at 7:56 am #
  
  Thanks.
  
  Reply
Jay April 27, 2019 at 6:10 pm #

It really solved my confusion. Thank you 🙂

Reply
- Jason Brownlee April 28, 2019 at 6:54 am #
  
  I’m happy to hear that.
  
  Reply
Harish May 14, 2019 at 2:23 pm #

Thanks Jason,

Can you pls tell me how to use return states with Bidirectional wrapper on LSTM? The unpacking of outputs throws error

code:
encoder = Bidirectional(LSTM(n_a, return_state=True))
encoder_outputs, state_h, state_c = encoder(encoder_inputs)

error:
ValueError: too many values to unpack (expected 3)

Reply
- Jason Brownlee May 14, 2019 at 2:29 pm #
  
  Perhaps assign the result to one variable and inspect it to see what you have?
  
  Reply
Niclas H June 19, 2019 at 3:45 pm #

Question: Is only the hidden state forwarded to upper layers in LSTM, or is also the memory cell state forwarded to upper layers?

Or is the memory cell state only forwarded along the time sequence?
Thank you!

Reply
- Jason Brownlee June 20, 2019 at 8:23 am #
  
  Only the hidden state is output, memory state remains internal the node.
  
  Reply
QuantCub June 27, 2019 at 4:58 am #

Hi Jason,

Thanks for sharing. I am not sure if I understand Keras.LSTM correctly. Could you please help me clarify / correct the following statements?

1. Keras LSTM is an output-to-hidden recurrent by default, e.g. it sends previous output to current hidden layers;
2. To create a hidden-to-hidden LSTM, can we do:
lstm1, state_h, state_c = LSTM(1, return_sequences=True, return_state=True)(inputs1)
model = Model(inputs=(inputs1, state_h), outputs=lstm1)
3. Does Keras train LSTM using teaching force or BPTT?

Reply
- Jason Brownlee June 27, 2019 at 8:04 am #
  
  Not sure I follow. The LSTM has outputs and hidden state.
  
  Reply
  - QuantCub June 28, 2019 at 1:16 am #
    
    Sorry for the confusion. My output-to-hidden refers to the 2nd of the three patterns in Goodfellow’s Deep Learning: Chapter 10
    ”
    Some examples of important design patterns for recurrent neural networks include the following:
    • Recurrent networks that produce an output at each time step and have recurrent connections between hidden units, illustrated in figure 10.3.
    • Recurrent networks that produce an output at each time step and have recurrent connections only from the output at one time step to the hidden units at the next time step, illustrated in figure 10.4
    • Recurrent networks with recurrent connections between hidden units, that read an entire sequence and then produce a single output, illustrated in figure 10.5.
    “
    
    Reply
    - QuantCub June 28, 2019 at 1:21 am #
      
      Back to me question:
      1. Keras LSTM is pattern 2 (previous output to current hidden) by default?
      2. Can we use return_state to create a pattern 1 (previous hidden to current hidden) model?
      3. Does Keras train LSTM using BPTT? or it can choose between teaching force and BPTT based on patterns?
      
      Reply
      - Jason Brownlee June 28, 2019 at 6:08 am #
        
        If you mean laterally within a layer, then no. If you across layers, then yes.
        
        Yes, Keras supports a version of BPTT, more details here in general:
        https://machinelearningmastery.com/gentle-introduction-backpropagation-time/
        
        And here for Keras:
        https://machinelearningmastery.com/truncated-backpropagation-through-time-in-keras/
      - QuantCub June 28, 2019 at 12:45 pm #
        
        Thank you, Jason!
Harshula June 29, 2019 at 7:11 pm #

Hi Jason.
I wanted to stack 2 GRUs. First one has hidden layers 64 and the second one 50 hidden layers
I am unsure how to go about defining that.
can you please help

encoder_inputs = Input(batch_shape=(32, 103, 1), name=’encoder_inputs’)

encoder_gru1 = GRU(64, return_sequences=True, return_state=True,name=’encoder_gru1′)
encoder_out1, encoder_state1 = encoder_gru1(encoder_inputs)

encoder_gru2 = GRU(50, return_sequences=True, name=’encoder_gru’) encoder_out, encoder_state = encoder_gru2(encoder_out1)

decoder_gru = GRU(50, return_sequences=True, name=’decoder_gru’)
decoder_out, decoder_state = decoder_gru(encoder_out)

However i get following error

encoder_out, encoder_state = encoder_gru2(encoder_out1)

File “C:\Users\Harshula\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py”, line 457, in __iter__
“Tensor objects are only iterable when eager execution is ”

TypeError: Tensor objects are only iterable when eager execution is enabled. To iterate over this tensor use tf.map_fn.

Reply
- Jason Brownlee June 30, 2019 at 9:36 am #
  
  You can use this tutorial as a starting point and change the LSTMs to GRUs:
  https://machinelearningmastery.com/stacked-long-short-term-memory-networks/
  
  Reply
sopa July 2, 2019 at 5:56 pm #

Thank you for these understandable article. I have a question, how to plot predictions. I mean when I apply sine wave to code to see three output of LSTM how can I plot outputs in the form of continues signal.

Reply
- Jason Brownlee July 3, 2019 at 8:26 am #
  
  You could use matplotlib and the plot() function.
  
  Reply
sopa July 9, 2019 at 4:37 pm #

I mean I want to plot lstm1, state_h, state_c. but when I write model.fit like that:

model.fit(trainX, trainY=[lstm1, state_h, state_c], epochs=10, batch_size=1, verbose=2)

I got this error:
TypeError: Unrecognized keyword arguments: {‘trainY’: [, array([[]],

dtype=object), array([],

dtype=object)]}

should all of lsrm1, state_h, state_c have three dimension?

Reply
- Jason Brownlee July 10, 2019 at 8:04 am #
  
  This looks really wrong, e.g. state variables as target variables in a call to fit.
  
  What are you trying to achieve exactly?
  
  Reply
Leo July 15, 2019 at 8:37 pm #

Hi Jason

My code has three output of lstm : output, hidden_state, cell_state.

I want to see all of them.

lstm, h, c = LSTM(units=20, batch_input_shape=(1,10, 2), return_sequences=True,

return_state=True)(inp)

dense = Dense(2)(lstm)

model = Model(inputs=inp, outputs=dense )

I want to plot all three of my output. I can plot lstm but I can’t plot h_state and c_state.

How can I do that?

Reply
- Jason Brownlee July 16, 2019 at 8:16 am #
  
  That is odd.
  
  Nevertheless, you could use matplotlib to plot anything you wish. e.g. plot(…)
  
  Reply
Youssef MELLAH August 7, 2019 at 8:00 pm #

Hello Jason,

Thank you for this good post.

I have two hidden states of two Bi-LSTM, H1 and H2, and i want to use them as inputs in two Dense layer. shoud i connect the two dense layers with the two Bi_LSTM and tha’s done? or connect them directly with the hidden states?

some thing like this :

hidden1 = Dense(100)(H1)
hidden2 = Dense(100)(H2)

thanks again!

Reply
- Jason Brownlee August 8, 2019 at 6:33 am #
  
  If by hidden states you mean those states that are internal to the LSTM layers, then I don’t think there is an effective way to pass them to a dense.
  
  If you mean the outputs of the layer (the common meaning), then this looks fine.
  
  Reply
  - Youssef MELLAH August 8, 2019 at 7:10 pm #
    
    for being mor clear, i have two text inputs and i use embedding for encoding them, and i puted the output of embeddings into two Bi-LSTMs.
    
    so i want to use the hidden states of the two Bi-LSTM to do predictions. The hidden state for the first input is returned as above :
    lstm, forward_h, forward_c, backward_h, backward_c= Bidirectional(..)(Embedding)
    and H1 is calculated as : H1 = Concatenate()([forward_h, backward_h]).
    
    the same thing i did for the seconde input and i calculated H2.
    
    mathematiccaly, how can i impliment the above formule :
    
    softmax(V tanh(W1*H1 + W2*H2))
    
    which W and V represent all trainable parameter matrices and vectors, respectively.
    
    thank’s Jason for helping me.
    
    Reply
    - Jason Brownlee August 9, 2019 at 8:09 am #
      
      Nice work!
      
      Looks like you want a weighted sum of the two vectors, perhaps a custom layer?
      
      Reply
      - Youssef MELLAH August 9, 2019 at 6:59 pm #
        
        perhaps but to decrease complexity, i removed the two Bi-LSTM so i use the embeddings only for encoding.
        
        so in order to do classification by using the 2 embeddings, can i use this mathematique formule: softmax(V tanh(W1*E1 + W2*E2)) ? if so, the code above is correct to represente it?
        
        input1 = Input(shape=(25,))
        E1 = Embedding(vocab_size, 100, input_length=25,
        weights=[embedding_matrix], trainable=False)(input1 )
        input1_hidden1 = Dense(100)(E1)
        
        input2 = Input(shape=(25,))
        E2 = Embedding(vocab_size, 100, input_length=25,
        weights=[embedding_matrix], trainable=False)(input2 )
        input1_hidden2 = Dense(100)(E2 )
        
        added = add([userQuestion_hidden1, tableShema_hidden1])
        added = Activation(‘tanh’)(added)
        output1 = Dense(1, activation=’softmax’)(added)
        model = Model(inputs=[input1 , input2],outputs=output1)
      - Jason Brownlee August 10, 2019 at 7:14 am #
        
        I’m eager to help, but I don’t have the capacity to review/debug your code.
        
        Perhaps try posting to the keras user group:
        https://machinelearningmastery.com/get-help-with-keras/
      - Youssef MELLAH August 15, 2019 at 12:44 am #
        
        that’s interesting, thanks a lot Jason
      - Jason Brownlee August 15, 2019 at 8:12 am #
        
        No problem.
Dawjidda August 16, 2019 at 1:57 am #

please i have an error

TypeError: GRU can accept only 1 positional arguments (‘units’,), but you passed the following positional arguments: [4, 200]

Reply
- Jason Brownlee August 16, 2019 at 8:00 am #
  
  Perhaps this will help you to better understand the input shape:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Reply
Usman October 2, 2019 at 2:44 am #

Amazing explanation!
Just have one confusion.

In the very first example, where LSTM is defined as LSTM(1)(inputs1) and Input as Input(shape=(3,1)).

So, the way you have reshaped data (1,3,1) means the timesteps value is 3(middle value) and the no of cells in LSTM is 1 i.e., LSTM(1).

I am confused about how 1-LSTM is going to process 3 timestep value. I mean shouldn’t there be 3 neurons/LSTM(3) to process the (1,3,1) shape data? Or is the LSTM going to process each input one after the other in sequence?

Thanks

Reply
- Jason Brownlee October 2, 2019 at 8:03 am #
  
  Thanks.
  
  More on time steps vs samples vs features here:
  https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input
  
  Number of LSTM units is unrelated to timesteps/features/samples.
  
  Reply
Hugo April 9, 2020 at 4:25 am #

Perfectly clear. It was ver helpful to me.

Reply
- Jason Brownlee April 9, 2020 at 8:08 am #
  
  Thanks, I’m happy to hear that!
  
  Reply
Nima May 12, 2020 at 1:41 am #

Hi,

Thanks for the clear discuss. I have a question about a little different implementation. I have a model with an lstm layer, where the hidden layer of the last time step will be passed to a softmax to create a sentiment. Is there any way that I could access the hidden states of this model when passing a new sequence to it?

Reply
- Jason Brownlee May 12, 2020 at 6:47 am #
  
  Yes, you can define the model using the functional api to output the hidden state as a separate output of the model.
  
  Reply
Nishchay Chawla May 18, 2020 at 8:28 pm #

Hi,
Thanks for the clear explanation. I am trying to make an encoder-decoder model, but this model will have two decoders(d1 and d2) and one encoder. One decoder(d1) gets input only from encoder while another one(d2) will get input from encoder and other decoder(d1). d2 must get hidden states from d1 only when d1 makes a particular type of prediction. Both decoders have a different set of vocabulary. Say d1 has “a,b,c,d” and d2 has “P,Q,R,S”. I want to pass a hidden state from d1 to d2 only when d1 predicts “b”.
I hope this statement gives some sense of what I am trying to do. Thanks!

Reply
- Jason Brownlee May 19, 2020 at 6:02 am #
  
  That sounds complex. Not sure what I can do for you, sorry.
  
  Perhaps experiment with different prototypes until you achieve what you need?
  
  Reply
Giri September 19, 2020 at 7:03 pm #

Dear Jason,

I usually visit your website lot of times for if i have any question. All your articles are so crisp and so is this return sequences and return state. No complex coding and point to point. Thanks for the good work you are doing.

I have a question. If in the above examples instead of LSTM(1), if we give LSTM(5) lets say.

inputs1 = Input(shape=(3, 1))
lstm1 = LSTM(5, return_sequences=True)(inputs1)

Then my output will be a 3 D array. But i wonder how 5 hidden states at each time step are
generated in LSTM. each LSTM has 1 hidden and 1 cell state right.

Can you please clarify my question?

Reply
- Jason Brownlee September 20, 2020 at 6:43 am #
  
  Good question, see this:
  https://machinelearningmastery.com/faq/single-faq/how-is-data-processed-by-an-lstm
  
  Reply
Pratik Sen November 12, 2020 at 4:37 am #

[[[0.1]
[0.2]
[0.3]]] is the input given to the LSTM. But instead, Can the input to the LSTM be [[0.1 0.2 0.3]]

Reply
Pratik Sen November 12, 2020 at 4:52 am #

Ok, I have found the Answer. The LSTM layer requires input only in 3D format.

https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/

Reply
Adam December 10, 2020 at 12:32 pm #

Jason,

A quick question.

When you produce a single hidden state output, does that mean the prediction for t4 based on the input data set of [t1, t2, t3]? or the prediction on t3?

Along the same line, when producing three steps hidden state output, does that mean the prediction on for [t1, t2. t3] or [t2, t3, t4]?

Also, if we were to want to get a single hidden state output say n steps ahead (t+n), how do we specify that in your example?

Thanks and hope to hear back from you soon!

Reply
- Adam December 10, 2020 at 12:51 pm #
  
  This was a dumb question.
  
  There’s no timestep-based prediction set up here including data prep and training accordingly for that need.
  
  I’d interpret hidden state outputs literally as outputs that carry over information up to t3 from t1.
  
  Thanks
  
  Reply
- Jason Brownlee December 10, 2020 at 1:27 pm #
  
  When you use return state, you are only getting the state for the last time step.
  
  Reply
Takhir January 6, 2021 at 1:06 am #

Thank you Jason!
Your materials helps me very much in learning.
Your simple and clear explanations is what newcommers realy need.

Reply
- Jason Brownlee January 6, 2021 at 6:29 am #
  
  You’re very welcome!
  
  Reply
Lawrence January 11, 2021 at 3:02 pm #

My LSTM is like this:

def _get_model(input_shape, latent_dim, num_classes):

inputs = Input(shape=input_shape)
lstm_lyr,state_h,state_c = LSTM(latent_dim,dropout=0.1,return_state = True)(inputs)
fc_lyr = Dense(num_classes)(lstm_lyr)
soft_lyr = Activation(‘relu’)(fc_lyr)
model = Model(inputs, [soft_lyr,state_h,state_c])
model.compile(optimizer=’adam’, loss=’mse’, metrics=[‘accuracy’])
return model
model =_get_model((n_steps_in, n_features),latent_dim ,n_steps_out)
history = model.fit(X_train,Y_train)

print (history.history.keys)
dict_keys(['loss', 'activation_26_loss', 'lstm_151_loss', 'activation_26_accuracy', 'lstm_151_accuracy', 'val_loss', 'val_activation_26_loss', 'val_lstm_151_loss', 'val_activation_26_accuracy', 'val_lstm_151_accuracy'])

I get 2 loss and 3 accuracies like this:

Epoch 1/2000
1/1 [==============================] – 1s 698ms/step – loss: 0.2338 – activation_26_loss: 0.1153 – lstm_151_loss: 0.1185 – activation_26_accuracy: 0.0000e+00 – lstm_151_accuracy: 0.0000e+00 – val_loss: 0.2341 – val_activation_26_loss: 0.1160 – val_lstm_151_loss: 0.1181 – val_activation_26_accuracy: 0.0000e+00 – val_lstm_151_accuracy: 0.0000e+00

How to read the losses and accuracies?

Reply
- Jason Brownlee January 12, 2021 at 7:48 am #
  
  If you are using MSE loss, then calculating accuracy is invalid. You can learn more here:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-calculate-accuracy-for-regression
  
  Reply
tanunchai September 19, 2021 at 8:26 am #

at page 102 , and page 104 of Long short term memory network with python book.

I found that when I run the program in list 8.25 I found that

no predict_classes method in Sequential object
see at
https://www.tensorflow.org/api_docs/python/tf/keras/Sequential

Please observe that there are only methods only “predict, predict_on_batch, predict_step . [***No longer use predict_classes any more]

at page 102 , and page 104 of Long short term memory network with python book.

I found that when I run the program in list 8.25 I found that

# prediction on new data
X, y = generate_examples(size, 1)
yhat = model.predict_classes(X, verbose=0) // Problem at this line
expected = “Right” if y[0]==1 else “Left”
predicted = “Right” if yhat[0]==1 else “Left”
print(‘Expected: %s, Predicted: %s’ % (expected, predicted)

Error from complie:
There is error message that ” Sequencial object has no*** attribute “predict_classes in its class” “.

and get some info from internet that
model.predict_classes method is deprecated.It has been removed after 2021-01-01

This means we can no longer use predict_classed any more.

Way to solve :

Then I solve by replace “yhat “with 2 lines below
” from numpy as np ”
“yhat = np.argmax( model.predict(X), axis= -1)”

after my replace with adding 2 lines
then all can run successfully.

I do it right ?
Please answer me

Reply
- Adrian Tam September 20, 2021 at 2:27 pm #
  
  That’s correct. You did it perfectly.
  
  Reply
Anthony The Koala November 17, 2021 at 2:56 am #

Dear Dr Jason,
Thank you for your tutorial.

In the section “Return Sequences”, you had a sequence

# define input data data = array([0.1, 0.2, 0.3]).reshape((1,3,1))

1
2

# define input data
data = array([0.1, 0.2, 0.3]).reshape((1,3,1))

Questions please:
* I would have thought that the next value in the sequence would be around 0.4, but in the example and trying it out myself as in the above example it is

-0.09.

1

-0.09.

YET, at https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/ where the sequence is

x_input = array([70, 80, 90])

1

x_input = array([70, 80, 90])

The output was close to 100:

[[102.09213]]

1

[[102.09213]]

In other words, in the first example of input = [0.1,0.2,0.3] why did we get -0.09 instead of something near 0.4? in the same was as the sequence of [70,80,90] we have something close to 100, at 101?

Thank you
Anthony of Sydney

Reply
- Adrian Tam November 17, 2021 at 7:16 am #
  
  As said “Your specific output value will differ given the random initialization of the LSTM weights and cell state.”
  The prediction is from a NOT TRAINED model hence it is random depends on the initialization in the LSTM layer. If you did model.fit() then you should see a better answer (given you have enough data to train it, which in this particular example, we don’t)
  
  Reply
  - Anthony The Koala November 17, 2021 at 2:52 pm #
    
    Dear Dr Adrian,
    The key concept was the model was “NOT TRAINED” and key statement in model building is “model.fit()” which was not implemented in order to train the model.
    Thank you for the clarification.
    Anthony from Sydney.
    
    Reply
    - Adrian Tam November 18, 2021 at 5:31 am #
      
      You’re welcomed!
      
      Reply

Anthony The Koala November 17, 2021 at 3:05 am #

Dear Dr Jason,
Under the subheading “Return States”,
When you were using return_states = True at

lstm1, state_h, state_c = LSTM(1, return_state=True)(inputs1)

1	lstm1, state_h, state_c = LSTM(1, return_state=True)(inputs1)

I need please clarification on the last two numbers.
In the text of the tutorial it says “..Running the example returns a sequence of 3 values, one hidden state output for each input time step for the single LSTM cell in the layer….”
If the first value is the predicted value, you then have “one hidden state output”, what is the 3rd output of -0.11 as in the following output.

[[[-0.02243521]
  [-0.06210149]
  [-0.11457888]]]

[[[-0.02243521]

[-0.06210149]

[-0.11457888]]]

Thank you,
Anthony of Sydney

Adrian Tam November 17, 2021 at 7:20 am #

LSTM has a cell state, a hidden state, and an output. The three numbers you quoted are all outputs, one for each of your input [0.1, 0.2, 0.3]
See the figure at this stackoverflow question: https://stackoverflow.com/a/50235563

Anthony The Koala November 17, 2021 at 4:17 pm #

Dear Dr Adrian,
Thank you for your reply.
In the LSTM model you mentioned that it had a cell state, a hidden state and an output.
I need clarification please on the order of presentation/print of the following which came from the example model.

[[[-0.02243521]
  [-0.06210149]
  [-0.11457888]]]

[[[-0.02243521]

[-0.06210149]

[-0.11457888]]]

Question is the order of the numbers in the 3D array, cell state, hidden state and output?

Then I tried to model.fit() using the following code:

from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from numpy import array,reshape
#split the sequence
# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
	X, y = list(), list()
	for i in range(len(sequence)):
		# find the end of this pattern
		end_ix = i + n_steps
		# check if we are beyond the sequence
		if end_ix > len(sequence)-1:
			break
		# gather input and output parts of the pattern
		seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
		X.append(seq_x)
		y.append(seq_y)
	return array(X), array(y)
# define input data
data =  [0.1,0.2,0.3,0.4]
x, y = split_sequence(data,1)
x= reshape(x,(x.shape[0],x.shape[1],1))
# define model
inputs1 = Input(shape=(3, 1))
lstm1 = LSTM(1, return_sequences=True)(inputs1)
model = Model(inputs=inputs1, outputs=lstm1)
model.compile(loss='mse', optimizer ='adam')
history = model.fit(x,y, epochs = 100, batch_size=1, verbose=2)
# make and show prediction
print(model.predict(data))

from keras.models import Model

from keras.layers import Input

from keras.layers import LSTM

from numpy import array,reshape

#split the sequence

# split a univariate sequence into samples

def split_sequence(sequence, n_steps):

X, y = list(), list()

for i in range(len(sequence)):

# find the end of this pattern

end_ix = i + n_steps

# check if we are beyond the sequence

if end_ix > len(sequence)-1:

break

# gather input and output parts of the pattern

seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]

X.append(seq_x)

y.append(seq_y)

return array(X), array(y)

# define input data

data = [0.1,0.2,0.3,0.4]

x, y = split_sequence(data,1)

x= reshape(x,(x.shape[0],x.shape[1],1))

# define model

inputs1 = Input(shape=(3, 1))

lstm1 = LSTM(1, return_sequences=True)(inputs1)

model = Model(inputs=inputs1, outputs=lstm1)

model.compile(loss='mse', optimizer ='adam')

history = model.fit(x,y, epochs = 100, batch_size=1, verbose=2)

# make and show prediction

print(model.predict(data))

I wanted to get a one step ahead prediction for x.
However I received an error.

Epoch 1/100
Traceback (most recent call last):

  File "C:\Python39\untitled4.py", line 31, in 
    history = model.fit(x,y, epochs = 100, batch_size=1, verbose=2)

  File "c:\python39\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None

  File "c:\python39\lib\site-packages\tensorflow\python\framework\func_graph.py", line 1129, in autograph_handler
    raise e.ag_error_metadata.to_exception(e)

ValueError: in user code:

    File "c:\python39\lib\site-packages\keras\engine\training.py", line 878, in train_function  *
        return step_function(self, iterator)
    File "c:\python39\lib\site-packages\keras\engine\training.py", line 867, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "c:\python39\lib\site-packages\keras\engine\training.py", line 860, in run_step  **
        outputs = model.train_step(data)
    File "c:\python39\lib\site-packages\keras\engine\training.py", line 808, in train_step
        y_pred = self(x, training=True)
    File "c:\python39\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "c:\python39\lib\site-packages\keras\engine\input_spec.py", line 263, in assert_input_compatibility
        raise ValueError(f'Input {input_index} of layer "{layer_name}" is '

    ValueError: Input 0 of layer "model_27" is incompatible with the layer: expected shape=(None, 3, 1), found shape=(1, 1, 1)

Epoch 1/100

Traceback (most recent call last):

File "C:\Python39\untitled4.py", line 31, in

history = model.fit(x,y, epochs = 100, batch_size=1, verbose=2)

File "c:\python39\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler

raise e.with_traceback(filtered_tb) from None

File "c:\python39\lib\site-packages\tensorflow\python\framework\func_graph.py", line 1129, in autograph_handler

raise e.ag_error_metadata.to_exception(e)

ValueError: in user code:

File "c:\python39\lib\site-packages\keras\engine\training.py", line 878, in train_function *

return step_function(self, iterator)

File "c:\python39\lib\site-packages\keras\engine\training.py", line 867, in step_function **

outputs = model.distribute_strategy.run(run_step, args=(data,))

File "c:\python39\lib\site-packages\keras\engine\training.py", line 860, in run_step **

outputs = model.train_step(data)

File "c:\python39\lib\site-packages\keras\engine\training.py", line 808, in train_step

y_pred = self(x, training=True)

File "c:\python39\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler

raise e.with_traceback(filtered_tb) from None

File "c:\python39\lib\site-packages\keras\engine\input_spec.py", line 263, in assert_input_compatibility

raise ValueError(f'Input {input_index} of layer "{layer_name}" is '

ValueError: Input 0 of layer "model_27" is incompatible with the layer: expected shape=(None, 3, 1), found shape=(1, 1, 1)

The “split_sequence” was based under subheading “Vanilla LSTM” at https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/

Thank you,
Anthony of Sydney

Adrian Tam November 18, 2021 at 5:35 am #

Your model expects input shape of (3,1) but you called “x, y = split_sequence(data,1)” to produce input shape of (1,1) — so change it into “x, y = split_sequence(data,3)” should fix

Anthony The Koala November 18, 2021 at 8:24 pm #

Dear Dr Adrian,
Unfortunately the modification did not work. I still had runtime errors.

So what I did was to replicate the code under the subheading “Return Sequences” and add compile and fit.

from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from numpy import array define model
inputs1 = Input(shape=(3, 1))
lstm1, state_h, state_c = LSTM(1, return_sequences=True,return_state=True)(inputs1)
model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c])
# define input data
x = array([0.1, 0.2, 0.3]).reshape((1,3,1))
y = array([0.2,0.3,0.4]).reshape((1,3))
model.compile(loss='mse', optimizer='adam')
history = model.fit(x,y, epochs = 300, batch_size=1, verbose=2)

# make and show prediction
print(model.predict(x))

from keras.models import Model

from keras.layers import Input

from keras.layers import LSTM

from numpy import array define model

inputs1 = Input(shape=(3, 1))

lstm1, state_h, state_c = LSTM(1, return_sequences=True,return_state=True)(inputs1)

model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c])

# define input data

x = array([0.1, 0.2, 0.3]).reshape((1,3,1))

y = array([0.2,0.3,0.4]).reshape((1,3))

model.compile(loss='mse', optimizer='adam')

history = model.fit(x,y, epochs = 300, batch_size=1, verbose=2)

# make and show prediction

print(model.predict(x))

The result: there were no errors, and there was no number which I expected to be 0.4

BUT I could not predict that the next value is expected to be 0.4

Epoch 300/300
1/1 - 0s - loss: 0.0935 - lstm_92_loss: 0.0451 - lstm_92_1_loss: 0.0390 - lstm_92_2_loss: 0.0095 - 16ms/epoch - 16ms/step
[array([[[0.06654506],
        [0.10294821],
        [0.12087241]]], dtype=float32), array([[0.12087241]], dtype=float32), array([[0.24834378]], dtype=float32)]

Epoch 300/300

1/1 - 0s - loss: 0.0935 - lstm_92_loss: 0.0451 - lstm_92_1_loss: 0.0390 - lstm_92_2_loss: 0.0095 - 16ms/epoch - 16ms/step

[array([[[0.06654506],

[0.10294821],

[0.12087241]]], dtype=float32), array([[0.12087241]], dtype=float32), array([[0.24834378]], dtype=float32)]

In sum:
* I made x, in the shape (1,3,1) = shape (samples, time_steps, features) as per https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
x = array([0.1, 0.2, 0.3]).reshape((1,3,1))

* I made y, in the shape (1,3) = shape (samples, time_steps)
y = array([0.2,0.3,0.4]).reshape((1,3))

* Added compile and fit which was not in the original model
* The aim was to predict the next value of y to be 0.5

Result: I could not find anything from the predictions resembling y being close to 0.5

Thank you,
Anthony of Sydney

Adrian Tam November 19, 2021 at 10:30 am #

300 epoch with bath_size=1 and dataset size of 1 means you allows only 300 chances to update the network weights. I tried to bump this up to 10000 and it seems to produce a better result. Also try too set return_state=False as you’re not using it and not training it anyway.

Anthony The Koala November 19, 2021 at 2:55 pm #

Dear Dr Adrian,
Thank you for your reply.
By setting return_state=False, I get the following runtime error:

runfile('C:/Users/A/.spyder-py3/autosave/temp.py', wdir='C:/Users/A/.spyder-py3/autosave')
Traceback (most recent call last):

  File "C:\Users\A\.spyder-py3\autosave\temp.py", line 20, in 
    lstm1, state_h, state_c = LSTM(1, return_sequences=True,return_state=False)(inputs1)

  File "c:\python39\lib\site-packages\keras\engine\keras_tensor.py", line 379, in __iter__
    raise TypeError(

TypeError: Cannot iterate over a Tensor with unknown first dimension.

runfile('C:/Users/A/.spyder-py3/autosave/temp.py', wdir='C:/Users/A/.spyder-py3/autosave')

Traceback (most recent call last):

File "C:\Users\A\.spyder-py3\autosave\temp.py", line 20, in

lstm1, state_h, state_c = LSTM(1, return_sequences=True,return_state=False)(inputs1)

File "c:\python39\lib\site-packages\keras\engine\keras_tensor.py", line 379, in __iter__

raise TypeError(

TypeError: Cannot iterate over a Tensor with unknown first dimension.

However keeping return_state=True, I don’t have problems. Nevertheless, I don’t have anything matching the predicted value of 0.4.

When epochs=1000. last value is close to 0.4 at 0.377

Epoch 1000/1000
1/1 - 0s - loss: 0.0358 - lstm_14_loss: 0.0154 - lstm_14_1_loss: 0.0078 - lstm_14_2_loss: 0.0126 - 10ms/epoch - 10ms/step
[array([[[0.08551978],
        [0.17597863],
        [0.26665625]]], dtype=float32), array([[0.26665625]], dtype=float32), array([[0.37717542]], dtype=float32)]

Epoch 1000/1000

1/1 - 0s - loss: 0.0358 - lstm_14_loss: 0.0154 - lstm_14_1_loss: 0.0078 - lstm_14_2_loss: 0.0126 - 10ms/epoch - 10ms/step

[array([[[0.08551978],

[0.17597863],

[0.26665625]]], dtype=float32), array([[0.26665625]], dtype=float32), array([[0.37717542]], dtype=float32)]

When epochs = 2000, last value is 0.345 closer to 0.4
When epochs = 4000, last value is 0.303 larger departure from 0.4
When epochs = 8000, last value is 0.322 closer to 0.4
When epochs = 16000, last value is 0.308 larger departure from 0.4

Summary:
* By setting return_state=False. a runtime error occurs
* Increasing the number of epochs does not necessarily mean more accurate. In this experiment, increasing the epochs to 16000 produced a prediction worse than when no. of epochs was 2000.
* I would like to understand why an LSTM for a really short array did not accurately predict the next value in the sequence.

Thank you,
Anthony of Sydney

Adrian Tam November 20, 2021 at 1:50 am #

If you set return_state=False, your LSTM will return only output but not the states. So you should write:

lstm1 = LSTM(1, return_sequences=True,return_state=False)(inputs1)

1	lstm1 = LSTM(1, return_sequences=True,return_state=False)(inputs1)

Anthony The Koala November 20, 2021 at 4:54 pm #

Dear Dr Adrian,
Thank you for your reply.
It appears that the results have much improved by this modifcation:

epochs = 300, final value = 0.32

Epoch 300/300
1/1 - 0s - loss: 0.0064 - 10ms/epoch - 10ms/step
[[[0.11324593]
  [0.22792535]
  [0.32113817]]]

Epoch 300/300

1/1 - 0s - loss: 0.0064 - 10ms/epoch - 10ms/step

[[[0.11324593]

[0.22792535]

[0.32113817]]]

epochs = 1000, final result = 0.417

Epoch 1000/1000
1/1 - 0s - loss: 8.4346e-04 - 15ms/epoch - 15ms/step
[[[0.15274288]
  [0.2961673 ]
  [0.41666785]]]

Epoch 1000/1000

1/1 - 0s - loss: 8.4346e-04 - 15ms/epoch - 15ms/step

[[[0.15274288]

[0.2961673 ]

[0.41666785]]]

epochs = 2000, final result = 0.4

Epoch 2000/2000
1/1 - 0s - loss: 2.2531e-04 - 9ms/epoch - 9ms/step
[[[0.176003  ]
  [0.30995682]
  [0.40018946]]]

Epoch 2000/2000

1/1 - 0s - loss: 2.2531e-04 - 9ms/epoch - 9ms/step

[[[0.176003 ]

[0.30995682]

[0.40018946]]]

epochs = 4000, final result = 0.391

Epoch 4000/4000
1/1 - 0s - loss: 1.9473e-04 - 7ms/epoch - 7ms/step
[[[0.18674503]
  [0.31818673]
  [0.39119297]]]

Epoch 4000/4000

1/1 - 0s - loss: 1.9473e-04 - 7ms/epoch - 7ms/step

[[[0.18674503]

[0.31818673]

[0.39119297]]]

Here is the final program then I have summary and questions:

from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from numpy import array
# define model
inputs1 = Input(shape=(3, 1))
#lstm1, state_h, state_c = LSTM(1, return_sequences=True,return_state=True)(inputs1)
#The 'simplified' model only interested in lstm1 
lstm1 = LSTM(1, return_sequences=True,return_state=False)(inputs1)
#model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c])
#The 'simplified' model only interested in lstm1 
model = Model(inputs=inputs1, outputs=[lstm1])
# define input data
x = array([0.1, 0.2, 0.3]).reshape((1,3,1))
y = array([0.2,0.3,0.4]).reshape((1,3))
model.compile(loss='mse', optimizer='adam')
history = model.fit(x,y, epochs = 4000, batch_size=1, verbose=2)

# make and show prediction
print(model.predict(x))

from keras.models import Model

from keras.layers import Input

from keras.layers import LSTM

from numpy import array

# define model

inputs1 = Input(shape=(3, 1))

#lstm1, state_h, state_c = LSTM(1, return_sequences=True,return_state=True)(inputs1)

#The 'simplified' model only interested in lstm1

lstm1 = LSTM(1, return_sequences=True,return_state=False)(inputs1)

#model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c])

#The 'simplified' model only interested in lstm1

model = Model(inputs=inputs1, outputs=[lstm1])

# define input data

x = array([0.1, 0.2, 0.3]).reshape((1,3,1))

y = array([0.2,0.3,0.4]).reshape((1,3))

model.compile(loss='mse', optimizer='adam')

history = model.fit(x,y, epochs = 4000, batch_size=1, verbose=2)

# make and show prediction

print(model.predict(x))

Summary:
* The ‘simplified’ model produced no runtime errors. Only interested in lstm output
* More accurate results with return_state=False.
* Increasing the number of epochs may also increase accuracy as evidenced by the magnitude of the error at each epoch. However, increasing the number of epochs may also result in the error increasing. In this case epochs > 2000 resulted in an increased error and less-accurate result.
* A more accurate number of epochs to get the minimum error may well be when number of epochs between 1000 and 2000. The optimum number of epochs can be determined graphically or by early stopping mechanism.
* Note on “return_states=True” and “return_states=False”. Less accurate results when “return_states = True”

Further questions please:
* To get an optimum result, the number of epochs was over 1000. It is such a simple AR(1) process, why so many epochs required to produce an accurate result?
* I wanted to predict with a new x:

#x is new , want to predict when x = [0.2, 0.3, 0.4]
x = array([0.2,0.3,0.4]).reshape((1,3,1))
print(model.predict(x))

#x is new , want to predict when x = [0.2, 0.3, 0.4]

x = array([0.2,0.3,0.4]).reshape((1,3,1))

print(model.predict(x))

(i) why did I need a length 3 array of [0.2, 0.3, 0.4] – Why couldn’t I predict with 1 size array of [0.4]?

(ii) to get as close to a prediction of 0.5, I had to increase no. of epochs to 10000, but the result was only 0.45805833, not 0.5 as expected.

(iii) when I used return_states=True, the result was less accurate. Need clarification of return_states=False and return_states=True, there is documentation, BUT not on the consequences on the final result in a simple model when return_states = True and return_states = False.

Thank you again for your assistance as it gets me to understand the concept a little more clearly.

Anthony of Sydney

Adrian Tam November 21, 2021 at 7:50 am #

(i) because you set return_sequence=True
(ii) because your data is too little for the LSTM to learn this is the rule
(iii) empirically you see the effect. I didn’t dig into the code to tell why, but I guess the internal design of tensorflow make it not care the unused variable and hence trained the network better, kind of like increasing the signal and reducing noise

Anthony The Koala November 21, 2021 at 4:55 pm #

Dear Dr Adrian,
Thank you again for your kind reply, it is appreciated.

I understood the answers to (ii) and (iii), especially requiring the length of the sequence to be longer and accepting the possible quirkiness of the TensorFlow backend in processing whether return_states=True or False.

Nevertheless for (i), even setting return_sequences=False, I don’t understand why I still need to have len(x) = 3 instead of 1 when I don’t have.

lstm1 = LSTM(1, return_sequences=False,return_state=False)(inputs1)
# blah blah blah blah blah - all the other code skipped

#
# make and show prediction
x = array([0.2,0.3,0.4]).reshape((1,3,1))
#x=array ([0.4]).reshape((1,1,1))
print(model.predict(x))

lstm1 = LSTM(1, return_sequences=False,return_state=False)(inputs1)

# blah blah blah blah blah - all the other code skipped

# make and show prediction

x = array([0.2,0.3,0.4]).reshape((1,3,1))

#x=array ([0.4]).reshape((1,1,1))

print(model.predict(x))

I had to have x to be of length 3 to predict.

I could not have x = array([0.4]).reshape((1,1,1)) to predict..

# make and show prediction
#######x = array([0.2,0.3,0.4]).reshape((1,3,1))

x=array ([0.4]).reshape((1,1,1)); # I only want to predict using one value, x = array([[[0.4]]]) input in LSTM form samples, time steps, features
print(model.predict(x))

# make and show prediction

#######x = array([0.2,0.3,0.4]).reshape((1,3,1))

x=array ([0.4]).reshape((1,1,1)); # I only want to predict using one value, x = array([[[0.4]]]) input in LSTM form samples, time steps, features

print(model.predict(x))

I get a runtime error

Epoch 300/300
1/1 - 0s - loss: 0.0081 - 7ms/epoch - 7ms/step
Traceback (most recent call last):

  File "C:\Users\A\.spyder-py3\autosave\temp.py", line 33, in 
    print(model.predict(x))

  File "c:\python39\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None

  File "c:\python39\lib\site-packages\tensorflow\python\framework\func_graph.py", line 1129, in autograph_handler
    raise e.ag_error_metadata.to_exception(e)

ValueError: in user code:

    File "c:\python39\lib\site-packages\keras\engine\training.py", line 1621, in predict_function  *
        return step_function(self, iterator)
    File "c:\python39\lib\site-packages\keras\engine\training.py", line 1611, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "c:\python39\lib\site-packages\keras\engine\training.py", line 1604, in run_step  **
        outputs = model.predict_step(data)
    File "c:\python39\lib\site-packages\keras\engine\training.py", line 1572, in predict_step
        return self(x, training=False)
    File "c:\python39\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "c:\python39\lib\site-packages\keras\engine\input_spec.py", line 263, in assert_input_compatibility
        raise ValueError(f'Input {input_index} of layer "{layer_name}" is '

    ValueError: Input 0 of layer "model_17" is incompatible with the layer: expected shape=(None, 3, 1), found shape=(None, 1, 1)

Epoch 300/300

1/1 - 0s - loss: 0.0081 - 7ms/epoch - 7ms/step

Traceback (most recent call last):

File "C:\Users\A\.spyder-py3\autosave\temp.py", line 33, in

print(model.predict(x))

File "c:\python39\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler

raise e.with_traceback(filtered_tb) from None

File "c:\python39\lib\site-packages\tensorflow\python\framework\func_graph.py", line 1129, in autograph_handler

raise e.ag_error_metadata.to_exception(e)

ValueError: in user code:

File "c:\python39\lib\site-packages\keras\engine\training.py", line 1621, in predict_function *

return step_function(self, iterator)

File "c:\python39\lib\site-packages\keras\engine\training.py", line 1611, in step_function **

outputs = model.distribute_strategy.run(run_step, args=(data,))

File "c:\python39\lib\site-packages\keras\engine\training.py", line 1604, in run_step **

outputs = model.predict_step(data)

File "c:\python39\lib\site-packages\keras\engine\training.py", line 1572, in predict_step

return self(x, training=False)

File "c:\python39\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler

raise e.with_traceback(filtered_tb) from None

File "c:\python39\lib\site-packages\keras\engine\input_spec.py", line 263, in assert_input_compatibility

raise ValueError(f'Input {input_index} of layer "{layer_name}" is '

ValueError: Input 0 of layer "model_17" is incompatible with the layer: expected shape=(None, 3, 1), found shape=(None, 1, 1)

In other words, if I only want to predict one value, why do I need three values as input?

Anyway, I appreciate your replies because it helps me better understand,

Thank you,
Anthony of Sydney

Adrian Tam November 23, 2021 at 1:08 pm #

You should notice you passed on an input layer to the LSTM layer. In the input layer, you mentioned what shape you are expecting. Hence it is 3, not 1.

Anthony The Koala November 23, 2021 at 3:14 pm #

Dear Dr Adrian,

The conclusion is if you want to predict for an LSTM model, the shape of the prediction data must be the same 3D shape of the LSTM’s input layer.

Again thank you very much for your kind reply,
Anthony of Sydney

Reply
- Adrian Tam November 24, 2021 at 1:02 pm #
  
  That’s correct. I think that’s partially a limitation imposed by Keras’ design.
  
  Reply
Anthony The Koala November 24, 2021 at 7:16 pm #

Dear Dr Adrian,
Thank you, it is appreciated
Anthony of Sydney

Reply
Steffen January 26, 2022 at 3:04 am #

Hi Adrian,

if it is possible to get the outputs like that, is then also possible to change the RNN & LSTM Layer in some way, so that several hidden states can be used as input & internally in the lstmcell?

if yes, what needs to be changed in the rnn layer/what can be used? i think to concat the hidden states before doesnt work alone, since i want to seperate them afterwards in the cell again.

Thank you for your answer

Reply
- James Carmichael February 27, 2022 at 12:26 pm #
  
  Hi Steffen…Please kindly reduce your post to a specific question regarding the tutorial content, code listing or ebook so that I may better assist you.
  
  Reply
Ugur Kahveci January 31, 2022 at 8:24 pm #

I am confused about input-hidden-output layers. Is there a 1 input and 1 output layer by default and when we add layers in the model are we just changing hidden layer?

Reply
Ananth May 21, 2024 at 1:57 pm #

The advertisements are important to continue running the show but unfortunately they have really become an irritant making us unable to concentrate while reading through the content.

Reply

Navigation

Difference Between Return Sequences and Return States for LSTMs in Keras

Tutorial Overview

Long Short-Term Memory

Return Sequences

Return States

Return States and Sequences

Further Reading

Summary

Develop LSTMs for Sequence Prediction Today!

Develop Your Own LSTM models in Minutes

Finally Bring LSTM Recurrent Neural Networks to
Your Sequence Predictions Projects

More On This Topic

147 Responses to Difference Between Return Sequences and Return States for LSTMs in Keras

Leave a Reply Click here to cancel reply.

Navigation

Tutorial Overview

Long Short-Term Memory

Return Sequences

Return States

Return States and Sequences

Further Reading

Summary

Develop LSTMs for Sequence Prediction Today!

Develop Your Own LSTM models in Minutes

Finally Bring LSTM Recurrent Neural Networks to Your Sequence Predictions Projects

More On This Topic

147 Responses to Difference Between Return Sequences and Return States for LSTMs in Keras

Leave a Reply Click here to cancel reply.

Finally Bring LSTM Recurrent Neural Networks to
Your Sequence Predictions Projects