Search results for "attention"

Photo by <a href="https://unsplash.com/photos/tfBlExFIVTw">Egor Lyfar</a>. Some rights reserved.

Text Generation with LSTM in PyTorch

By Adrian Tam on April 8, 2023 in Deep Learning with PyTorch 5

Recurrent neural network can be used for time series prediction. In which, a regression neural network is created. It can also be used as generative model, which usually is a classification neural network model. A generative model is to learn certain pattern from data, such that when it is presented with some prompt, it can […]

A<br/>Photo by <a href="https://unsplash.com/photos/JVCozvGeKNs">SHUJA OFFICIAL</a>. Some rights reserved.

Using Activation Functions in Deep Learning Models

By Adrian Tam on April 8, 2023 in Deep Learning with PyTorch 2

A deep learning model in its simplest form are layers of perceptrons connected in tandem. Without any activation functions, they are just matrix multiplications with limited power, regardless how many of them. Activation is the magic why neural network can be an approximation to a wide variety of non-linear function. In PyTorch, there are many […]

A Brief Introduction to BERT

By Adrian Tam on January 6, 2023 in Attention 1

As we learned what a Transformer is and how we might train the Transformer model, we notice that it is a great tool to make a computer understand human language. However, the Transformer was originally designed as a model to translate one language to another. If we repurpose it for a different task, we would […]

Inferencing the Transformer Model

By Stefania Cristina on January 6, 2023 in Attention 11

We have seen how to train the Transformer model on a dataset of English and German sentence pairs and how to plot the training and validation loss curves to diagnose the model’s learning performance and decide at which epoch to run inference on the trained model. We are now ready to run inference on the […]

Plotting the Training and Validation Loss Curves for the Transformer Model

By Stefania Cristina on January 6, 2023 in Attention 7

We have previously seen how to train the Transformer model for neural machine translation. Before moving on to inferencing the trained model, let us first explore how to modify the training code slightly to be able to plot the training and validation loss curves that can be generated during the learning process. The training and […]

Training the Transformer Model

By Stefania Cristina on January 6, 2023 in Attention 44

We have put together the complete Transformer model, and now we are ready to train it for neural machine translation. We shall use a training dataset for this purpose, which contains short English and German sentence pairs. We will also revisit the role of masking in computing the accuracy and loss metrics during the training […]

Joining the Transformer Encoder and Decoder Plus Masking

By Stefania Cristina on January 6, 2023 in Attention 32

We have arrived at a point where we have implemented and tested the Transformer encoder and decoder separately, and we may now join the two together into a complete model. We will also see how to create padding and look-ahead masks by which we will suppress the input values that will not be considered in […]

Implementing the Transformer Decoder from Scratch in TensorFlow and Keras

By Stefania Cristina on January 6, 2023 in Attention 11

There are many similarities between the Transformer encoder and decoder, such as their implementation of multi-head attention, layer normalization, and a fully connected feed-forward network as their final sub-layer. Having implemented the Transformer encoder, we will now go ahead and apply our knowledge in implementing the Transformer decoder as a further step toward implementing the […]

Implementing the Transformer Encoder from Scratch in TensorFlow and Keras

By Stefania Cristina on January 6, 2023 in Attention 5

Having seen how to implement the scaled dot-product attention and integrate it within the multi-head attention of the Transformer model, let’s progress one step further toward implementing a complete Transformer model by applying its encoder. Our end goal remains to apply the complete model to Natural Language Processing (NLP). In this tutorial, you will discover how […]

The Vision Transformer Model

By Stefania Cristina on January 6, 2023 in Attention 5

With the Transformer architecture revolutionizing the implementation of attention, and achieving very promising results in the natural language processing domain, it was only a matter of time before we could see its application in the computer vision domain too. This was eventually achieved with the implementation of the Vision Transformer (ViT). In this tutorial, you […]

← Previous 1 … 3 4 5 … 17 Next →