Archive | Training Transformer Models

nick-night-8LTlfHL47Ac-unsplash

A Gentle Introduction to Language Model Fine-tuning

After pretraining, a language model learns about human languages. You can enhance the model’s domain-specific understanding by training it on additional data. You can also train the model to perform specific tasks when you provide a specific instruction. These additional training after pretraining is called fine-tuning. In this article, you will learn how to fine-tune […]

Continue Reading
seth-kane-XOEAHbE_vO8-unsplash

Train Your Large Model on Multiple GPUs with Tensor Parallelism

Tensor parallelism is a model-parallelism technique that shards a tensor along a specific dimension. It distributes the computation of a tensor across multiple devices with minimal communication overhead. This technique is suitable for models with very large parameter tensors where even a single matrix multiplication is too large to fit on a single GPU. In […]

Continue Reading
francois-genon-IvlV_Dlt9hg-unsplash

Train a Model Faster with torch.compile and Gradient Accumulation

Training a language model with a deep transformer architecture is time-consuming. However, there are techniques you can use to accelerate training. In this article, you will learn about: Using torch.compile() to speed up the model Using gradient accumulation to train a model with a larger effective batch size Let’s get started! Overview This article is […]

Continue Reading
meduana-PdnseHuDFZU-unsplash

Training a Model with Limited Memory using Mixed Precision and Gradient Checkpointing

Training a language model is memory-intensive, not only because the model itself is large but also because training data batches often contain long sequences. Training a model with limited memory is challenging. In this article, you will learn techniques that enable model training in memory-constrained environments. In particular, you will learn about: Low-precision floating-point numbers […]

Continue Reading

Machine Learning Mastery is part of Guiding Tech Media, a leading digital media publisher focused on helping people figure out technology. Visit our corporate website to learn more about our mission and team.