Exploding gradients are a problem where large error gradients accumulate and result in very large updates to neural network model weights during training. This has the effect of your model being unstable and unable to learn from your training data. In this post, you will discover the problem of exploding gradients with deep artificial neural […]

# Archive | Long Short-Term Memory Networks

Long Short-Term Memory Networks

## What is Teacher Forcing for Recurrent Neural Networks?

Teacher forcing is a method for quickly and efficiently training recurrent neural network models that use the output from a prior time step as input. It is a network training method critical to the development of deep learning language models used in machine translation, text summarization, and image captioning, among many other applications. In this […]

## How to Develop an Encoder-Decoder Model for Sequence-to-Sequence Prediction in Keras

The encoder-decoder model provides a pattern for using recurrent neural networks to address challenging sequence-to-sequence prediction problems such as machine translation. Encoder-decoder models can be developed in the Keras Python deep learning library and an example of a neural machine translation system developed with this model has been described on the Keras blog, with sample […]

## Gentle Introduction to Global Attention for Encoder-Decoder Recurrent Neural Networks

The encoder-decoder model provides a pattern for using recurrent neural networks to address challenging sequence-to-sequence prediction problems such as machine translation. Attention is an extension to the encoder-decoder model that improves the performance of the approach on longer sequences. Global attention is a simplification of attention that may be easier to implement in declarative deep […]

## Understand the Difference Between Return Sequences and Return States for LSTMs in Keras

The Keras deep learning library provides an implementation of the Long Short-Term Memory, or LSTM, recurrent neural network. As part of this implementation, the Keras API provides access to both return sequences and return state. The use and difference between these data can be confusing when designing sophisticated recurrent neural network models, such as the […]

## Implementation Patterns for the Encoder-Decoder RNN Architecture with Attention

The encoder-decoder architecture for recurrent neural networks is proving to be powerful on a host of sequence-to-sequence prediction problems in the field of natural language processing. Attention is a mechanism that addresses a limitation of the encoder-decoder architecture on long sequences, and that in general speeds up the learning and lifts the skill of the […]

## How to Develop an Encoder-Decoder Model with Attention for Sequence-to-Sequence Prediction in Keras

The encoder-decoder architecture for recurrent neural networks is proving to be powerful on a host of sequence-to-sequence prediction problems in the field of natural language processing such as machine translation and caption generation. Attention is a mechanism that addresses a limitation of the encoder-decoder architecture on long sequences, and that in general speeds up the […]

## A Gentle Introduction to RNN Unrolling

Recurrent neural networks are a type of neural network where the outputs from previous time steps are fed as input to the current time step. This creates a network graph or circuit diagram with cycles, which can make it difficult to understand how information moves through the network. In this post, you will discover the […]

## Making Predictions with Sequences

Sequence prediction is different from other types of supervised learning problems. The sequence imposes an order on the observations that must be preserved when training models and making predictions. Generally, prediction problems that involve sequence data are referred to as sequence prediction problems, although there are a suite of problems that differ based on the […]

## How to Diagnose Overfitting and Underfitting of LSTM Models

It can be difficult to determine whether your Long Short-Term Memory model is performing well on your sequence prediction problem. You may be getting a good model skill score, but it is important to know whether your model is a good fit for your data or if it is underfit or overfit and could do […]