Author Archive | Adrian Tam

ye-min-htet-uEFxAxZwyBs-unsplash

A Gentle Introduction to Multi-Head Attention and Grouped-Query Attention

Language models need to understand relationships between words in a sequence, regardless of their distance. This post explores how attention mechanisms enable this capability and their various implementations in modern language models. Let’s get started. Overview This post is divided into three parts; they are: Why Attention is Needed The Attention Operation Multi-Head Attention (MHA) […]

Continue Reading
enkuu-smile_-kbHvA6oXP8E-unsplash

Interpolation in Positional Encodings and Using YaRN for Larger Context Window

Transformer models are trained with a fixed sequence length, but during inference, they may need to process sequences of different lengths. This poses a challenge because positional encodings are computed based on the sequence length. The model might struggle with positional encodings it hasn’t encountered during training. The ability to handle varying sequence lengths is […]

Continue Reading
svetlana-gumerova-6fAgnT3Dhl4-unsplash

Positional Encodings in Transformer Models

Natural language processing (NLP) has evolved significantly with transformer-based models. A key innovation in these models is positional encodings, which help capture the sequential nature of language. In this post, you will learn about: Why positional encodings are necessary in transformer models Different types of positional encodings and their characteristics How to implement various positional […]

Continue Reading
pexels-satoshi-3322920

Word Embeddings in Language Models

Natural language processing (NLP) has long been a fundamental area in computer science. However, its trajectory changed dramatically with the introduction of word embeddings. Before embeddings, NLP relied primarily on rule-based approaches that treated words as discrete tokens. With word embeddings, computers gained the ability to understand language through vector space representations. In this article, […]

Continue Reading
pexels-belle-co-99483-402028

Tokenizers in Language Models

Tokenization is a crucial preprocessing step in natural language processing (NLP) that converts raw text into tokens that can be processed by language models. Modern language models use sophisticated tokenization algorithms to handle the complexity of human language. In this article, we will explore common tokenization algorithms used in modern LLMs, their implementation, and how […]

Continue Reading
pexels-stephan-streuders-2134979-3767837

Encoders and Decoders in Transformer Models

Transformer models have revolutionized natural language processing (NLP) with their powerful architecture. While the original transformer paper introduced a full encoder-decoder model, variations of this architecture have emerged to serve different purposes. In this article, we will explore the different types of transformer models and their applications. Let’s get started. Overview This article is divided […]

Continue Reading
andre-benz-MsMISAIe8Qw-unsplash

A Gentle Introduction to Attention and Transformer Models

Transformer is a deep learning architecture popular in natural language processing (NLP) tasks. It is a type of neural network that is designed to process sequential data, such as text. In this article, we will explore the concept of attention and the transformer architecture. Specifically, you will learn: What problems do the transformer models address […]

Continue Reading
geraldo-stanislas-JfPIfLTaJY4-unsplash

Next-Level Data Science (7-Day Mini-Course)

Data science was originally known as statistical analysis before it got its name, as that was the primary method for extracting information from data. With recent advances in technology, machine learning models have been introduced, expanding our ability to analyze and understand data. There are many machine learning models available, but you don’t need to […]

Continue Reading

Machine Learning Mastery is part of Guiding Tech Media, a leading digital media publisher focused on helping people figure out technology. Visit our corporate website to learn more about our mission and team.