Text Generation with GPT-2 Model

Text generation is one of the most fascinating applications of deep learning. With the advent of large language models like GPT-2, we can now generate human-like text that’s coherent, contextually relevant, and surprisingly creative. In this tutorial, you’ll discover how to implement text generation using GPT-2. You’ll learn through hands-on examples that you can run right away, and by the end of this guide, you’ll understand both the theory and practical implementation details.

After completing this tutorial, you will know:

  • How GPT-2’s transformer architecture enables sophisticated text generation
  • How to implement text generation with different sampling strategies
  • How to optimize generation parameters for different use cases

Kick-start your project with my book NLP with Hugging Face Transformers. It provides self-study tutorials with working code.

Let’s get started.

Text Generation with GPT-2 Model
Photo by Peter Herrmann. Some rights reserved.

Overview

This tutorial is in four parts; they are:

  • The Core Text Generation Implementation
  • What are the Parameters in Text Generation?
  • Batch Processing and Padding
  • Tips for Better Generation Results

The Core Text Generation Implementation

Let’s start with a basic implementation that demonstrates the fundamental concept. In below, you are going to create a class that generates text based on a given prompt, using a pre-trained GPT-2 model. You will extend this class in the subsequent sections of this tutorial.

Let’s break down this implementation.

In this code, you use the GPT2LMHeadModel and GPT2Tokenizer classes from the transformers library to load a pre-trained GPT-2 model and tokenizer. As a user, you don’t even need to understand how GPT-2 works. The TextGenerator class hosts them and uses them in a GPU if you have one. If you haven’t installed the library, you can do so with the pip command:

In the generate_text method, you handle the core generation process with several important parameters:

  • max_length: Controls the maximum length of generated text
  • temperature: Adjusts randomness (higher values = more creative)
  • top_k: Limits vocabulary to $k$ highest probability tokens
  • top_p: Uses nucleus sampling to dynamically limit tokens

Here’s how to use this implementation to generate text:

The output may be:

You used three different prompts here, and three strings of text were generated. The model is trivial to use. You just need to pass on a tokenizer-encoded prompt to the generate_text method along with the attention mask. The attention mask is provided by the tokenizer, but essentially just a tensor of all ones in the same shape as the input.

What are the Parameters in Text Generation?

If you look at the generate_text method, you will see that there are several parameters passed via gen_kwargs. Some of the most important parameters are top_k, top_p, and temperature. You can see the effect of top_k and top_p by experimenting with different values:

The sample output may be:

The top_k and top_p parameters are to fine-tune the sampling strategy. To understand what it is, remember that the model outputs a probability distribution over the vocabulary for each token. There are a lot of tokens. Of course, you can always pick the token with the highest probability, but you can also pick a random token so that you can generate different output from the same prompt. This is the algorithm of multinomial sampling that used by GPT-2.

The top_k parameter limits the choice to the $k>0$ most likely tokens. Instead of considering thousands of tokens in the vocabulary, setting top_k shortlists the consideration to a more tractable subset.

The top_p parameter further shortlists the choice. It considers only the tokens that their cumulative probability meets the top_p parameter $P$. Then the generated token is sampled based on the probability.

The code above demonstrates three different sampling approaches.

  • The first example set top_k to a small value, limiting the choice. The output is focused but potentially repetitive.
  • The second example turns off top_k by setting to 0. It sets top_p to use nucleus sampling. The sampling pool will have the low probability tokens removed, offering more natural variation.
  • The third example, a combined approach, leverages both strategies for optimal results. Set a larger top_k to allow better diversity, so subsequently a larger top_p can still provide a high-quality,natural generation.

However, what is probability of a token? That’s the temperature parameter. Let’s look at another example:

Note that the same prompt is used for all three examples. The output may be:

So what is the effect of temperature? You can see that:

  • A low temperature of 0.3 produces more focused and deterministic output. The output is boring. Making it suitable for tasks requiring accuracy.
  • The medium temperature of 0.7 strikes a balance between creativity and coherence.
  • The high temperature of 1.0 generates more diverse and creative text. Each example uses the same max_length for fair comparison.

Behind the scenes, temperature is a parameter in the softmax function, which is applied to the output of the model to determine the output token. The softmax function is:

$$
s(x_j) = \frac{e^{x_j/T}}{\sum_{i=1}^{V} e^{x_i/T}}
$$

where $T$ is the temperature parameter and $V$ is the vocabulary size. Scaling model outputs $x_1,\dots,x_V$ with $T$ changes the relative probabilities of the tokens. A high temperature makes the probabilities more uniform, let the improbable tokens more likely to be chosen. A low temperature makes the probabilities more concentrated on the highest probability tokens, hence the output is more deterministic.

Batch Processing and Padding

The code above is good for a single prompt. However, in practice, you may need to generate text for multiple prompts. The following code shows how to handle multiple prompts generation efficiently:

The output may be:

The BatchGenerator implementation made some slight changes. The generate_batch method takes a list of prompts and pass on other parameters to the generate method of the model. Most importantly, it pads the prompts to the same length and then generates text for each prompt in the batch. The results are returned in the same order as the prompts.

GPT-2 model is trained to handle batched input. But to present the input in a tensor, all prompts need to be padded to the same length. The tokenizer can readily handle batched input. But GPT-2 model does not specify what should the padded token be. Hence you need to specify it, using the function add_special_tokens(). The code above uses the EOS token. But indeed, you can use any token since the attention mask will force the model to ignore it.

Tips for Better Generation Results

You know how to use GPT-2 model to generate text. But what should you expect from the output? Indeed this is a question that depends on the task. But here are some tips that can help you get better results.

First is prompt engineering. You need to be specific and clear in your prompts for a high quality output. Ambiguous words or phrases can lead to ambiguous output and hence you should be specific, concise, and precise. You may also include relevant context to help the model understand the task.

Besides, you can also tune the parameters to get better results. Depends on the task, you may want the output to be more focused or more creative. You can adjust the temperature parameter to control the randomness of the output. You can also adjust the temperature, top_k and top_p parameters to control the diversity of the output. The output generation is auto-regressive. You can set the max_length parameter to control the length of the output by trading off the speed.

Finally, the code above is not fault-tolerant. You need to implement proper error handling, set reasonable timeouts, monitor memory usage, and implement rate limiting in production.

Further Reading

Below are some further readings that can help you understand the text generation with GPT-2 model better.

Summary

In this tutorial, you learned how to generate text with GPT-2 and use the transfomers library to build real-world applications with a few lines of code. Particularly, you learned:

  • How to implement text generation using GPT-2
  • How to control generation parameters for different use cases
  • How to implement batch processing for efficiency
  • Best practices and common pitfalls to avoid

Want to Use Powerful Language Models in Your NLP Projects?

NLP with Hugging Face Transformers

Run State-of-the-Art Models on Your Own Machine

...with just a few lines of Python code

Discover how in my new Ebook:
NLP with Hugging Face Transformers

It covers hands-on examples and real-world use cases on tasks like: Text classification, summarization, translation, Q+A, and much more...

Finally Bring Advanced NLP to
Your Own Projects

No theory. Just Practical, Working Code

See What's Inside

No comments yet.

Leave a Reply

Machine Learning Mastery is part of Guiding Tech Media, a leading digital media publisher focused on helping people figure out technology. Visit our corporate website to learn more about our mission and team.