Archive | Training Transformer Models

roman-kraft-g_gwdpsCVAY-unsplash

Creating a Llama or GPT Model for Next-Token Prediction

Natural language generation (NLG) is challenging because human language is complex and unpredictable. A naive approach of generating words randomly one by one would not be meaningful to humans. Modern decoder-only transformer models have proven effective for NLG tasks when trained on large amounts of text data. These models can be huge, but their structure […]

Continue Reading
joss-woodhead-3wFRlwS91yk-unsplash

Training a Tokenizer for the Llama Model

The Llama family of models are large language models released by Meta (formerly Facebook). These decoder-only transformer models are used for generation tasks. Almost all decoder-only models nowadays use the Byte-Pair Encoding (BPE) algorithm for tokenization. In this article, you will learn about BPE. In particular, you will learn: What BPE is compared to other […]

Continue Reading
martin-krchnacek-OyoaCpMCR0U-unsplash

Fine-Tuning a BERT Model

BERT is a foundational NLP model trained to understand language, but it may not perform well on any specific task out of the box. However, you can build upon BERT by adding appropriate model heads and training it for a specific task. This process is called fine-tuning. In this article, you will learn how to […]

Continue Reading
matheus-camara-da-silva-NL2ORrGh8KM-unsplash

Pretrain a BERT Model from Scratch

BERT is a transformer-based model for NLP tasks. As an encoder-only model, it has a highly regular architecture. In this article, you will learn how to create and pretrain a BERT model from scratch using PyTorch. Let’s get started. Overview This article is divided into three parts; they are: Creating a BERT Model the Easy […]

Continue Reading
daniel-gimbel-WDf6wlhiL28-unsplash

Preparing Data for BERT Training

BERT is an encoder-only transformer model pretrained on the masked language model (MLM) and next sentence prediction (NSP) tasks before being fine-tuned for various NLP tasks. Pretraining requires special data preparation. In this article, you will learn how to: Create masked language model (MLM) training data Create next sentence prediction (NSP) training data Set up […]

Continue Reading
john-towner-UO02gAW3c0c-unsplash

Training a Tokenizer for BERT Models

BERT is an early transformer-based model for NLP tasks that’s small and fast enough to train on a home computer. Like all deep learning models, it requires a tokenizer to convert text into integer tokens. This article explains how to train a WordPiece tokenizer according to BERT’s original design. Let’s get started. Overview This article […]

Continue Reading

Machine Learning Mastery is part of Guiding Tech Media, a leading digital media publisher focused on helping people figure out technology. Visit our corporate website to learn more about our mission and team.