Archive | Training Transformer Models

Creating a Llama or GPT Model for Next-Token Prediction

By Adrian Tam on January 21, 2026 in Training Transformer Models 0

Natural language generation (NLG) is challenging because human language is complex and unpredictable. A naive approach of generating words randomly one by one would not be meaningful to humans. Modern decoder-only transformer models have proven effective for NLG tasks when trained on large amounts of text data. These models can be huge, but their structure […]

Training a Tokenizer for the Llama Model

By Adrian Tam on January 12, 2026 in Training Transformer Models 0

The Llama family of models are large language models released by Meta (formerly Facebook). These decoder-only transformer models are used for generation tasks. Almost all decoder-only models nowadays use the Byte-Pair Encoding (BPE) algorithm for tokenization. In this article, you will learn about BPE. In particular, you will learn: What BPE is compared to other […]

How to Speed Up Training Convergence of Language Models

By Adrian Tam on January 12, 2026 in Training Transformer Models 0

Language model training is slow, even when your model is not very large. This is because you need to train the model on a large dataset and handle a large vocabulary. Therefore, the model requires many training steps to converge. However, some techniques can speed up training. In this article, you will learn about them. […]

Fine-Tuning a BERT Model

By Adrian Tam on January 12, 2026 in Training Transformer Models 0

BERT is a foundational NLP model trained to understand language, but it may not perform well on any specific task out of the box. However, you can build upon BERT by adding appropriate model heads and training it for a specific task. This process is called fine-tuning. In this article, you will learn how to […]

matheus-camara-da-silva-NL2ORrGh8KM-unsplash

Pretrain a BERT Model from Scratch

By Adrian Tam on January 21, 2026 in Training Transformer Models 0

BERT is a transformer-based model for NLP tasks. As an encoder-only model, it has a highly regular architecture. In this article, you will learn how to create and pretrain a BERT model from scratch using PyTorch. Let’s get started. Overview This article is divided into three parts; they are: Creating a BERT Model the Easy […]

Preparing Data for BERT Training

By Adrian Tam on January 21, 2026 in Training Transformer Models 0

BERT is an encoder-only transformer model pretrained on the masked language model (MLM) and next sentence prediction (NSP) tasks before being fine-tuned for various NLP tasks. Pretraining requires special data preparation. In this article, you will learn how to: Create masked language model (MLM) training data Create next sentence prediction (NSP) training data Set up […]

BERT Models and Its Variants

By Adrian Tam on January 12, 2026 in Training Transformer Models 0

BERT is a transformer-based model for NLP tasks that was released by Google in 2018. It is found to be useful for a wide range of NLP tasks. In this article, you will obtain an overview of the architecture of BERT and how it is trained. Then you will learn about some of its later […]

Training a Tokenizer for BERT Models

By Adrian Tam on January 21, 2026 in Training Transformer Models 0

BERT is an early transformer-based model for NLP tasks that’s small and fast enough to train on a home computer. Like all deep learning models, it requires a tokenizer to convert text into integer tokens. This article explains how to train a WordPiece tokenizer according to BERT’s original design. Let’s get started. Overview This article […]

Datasets for Training a Language Model

By Adrian Tam on January 21, 2026 in Training Transformer Models 0

A language model is a mathematical model that describes a human language as a probability distribution over its vocabulary. To train a deep learning network to model a language, you need to identify the vocabulary and learn its probability distribution. You can’t create the model from nothing. You need a dataset for your model to […]

← Previous 1 2