Building Q&A Systems with DistilBERT and Transformers

Question Answering is a crucial natural language processing task that enables machines to understand and respond to human questions by extracting relevant information from a given context. DistilBERT, a distilled version of BERT, offers an excellent balance between performance and computational efficiency for building Q&A systems.

In this tutorial, you will learn how to build a powerful Question Answering (Q&A) system using DistilBERT and the transformers library. You’ll learn everything from basic implementation to advanced features. In particular, you will learn:

  • How to implement a basic Q&A system with DistilBERT
  • Advanced techniques for improving answer quality

Let’s get started.

Building Q&A Systems with DistilBERT and Transformers
Photo by Ana Municio. Some rights reserved.

Overview

This post is in three parts; they are:

  • Building a simple Q&A system
  • Handling Large Contexts
  • Building an Expert System

Building a Simple Q&A System

Question and answering system is not just to throw a question at a model and get an answer. You want the answer to be accurate and well-supported. The way to do this is to provide a “context” in which the answer should be found. While this prevents the model from answering an open-ended question, it also prevents it from hallucinating an answer. The model that can do this task will be able to understand the question and the context, which is more than just a language model.

A model that can do this is BERT. Below, you will use the DistilBERT model to build a simple Q&A system:

This is the output you will get:

The model we used is distilbert-base-uncased-distilled-squad which is a DistilBERT model fine-tuned with SQuAD dataset. It is an “uncased” model, which means it treats input as case-insensitive. This is a fine-tuned model that can perform better on knowledge distillation. Hence, it is particularly good for question-answering tasks that require understanding both the question and the context.

To use it, you created a pipeline using transformers library. You requested it to be a question-answering pipeline but specified the model and tokenizer to use rather than let the pipeline() function pick one for you.

When you invoke the pipeline, you provide the question and the context. The model will find the answer in the context and return the answer. However, instead of a simple answer, it returns the positions from the context where the answer is found, together with the score (between 0 and 1) of the answer. Since the top_k is set to 3, three such answers are returned.

From the output, you can find that the one with the highest score is simply “Paris” (character positions 54 to 59 in the context string), but the other answers are not wrong, just presented differently. You can modify the code above to pick the best answer based on the score.

Handling Large Contexts

This simple Q&A system’s problem is that it can only handle short contexts. The model has a limit on the maximum sequence length that it can accept, which in this particular model is 512 tokens.

Usually, the problem with this limit is not on the question but on the context since you usually have a large piece of text as the background information, while the question is a single sentence that you want to find the answer from the context. To handle this, you can “chunk”, namely, split the long context string into smaller chunks to feed into the Q&A model one by one. You should repeat the question but iterate on the different chunks to find the answer.

With top_k=3, you can expect to have 3 answers from each chunk. Since each answer has a score, you can simply pick the answer with the highest score. You can also discard the answers with low scores before finding the best answer. In this way, you can tell if the context does not provide enough information to answer the question.

Let’s see how to implement this:

This wraps the workflow into a class to make it easier to use. You pass the question and the context to the get_answer() method, and it will return the answer with the highest score.

In the get_answer() method, it will return the answer immediately if it is already in the cache. Otherwise, it will preprocess the context into chunks by splitting at spaces to keep each chunk below the length limit. Then each chunk is matched with the question to get the answers (with scores) from the Q&A model. Only the answers with scores above the threshold are considered valid. Then the best answer is picked. There may be no answer found with a high enough score. In that case, you marked it as “No answer found”.

For syntax convenience, the parameters used are stored in a dataclass object. Note that it sets max_sequence_length to 512. It is a conservative choice since the model can handle up to 512 tokens, which is approximately 1500 characters. However, setting a low sequence length can help the model run more efficiently since the time and memory complexity of transformer models is quadratic to the sequence length.

The output of this code is:

You may notice that the implementation above may have a problem in that a chunk is split in the middle of a sentence where the most appropriate answer lies. In this case, you may find that the Q&A model cannot find the answer, or a suboptimal answer is returned. This is a problem in the algorithm of preprocess_context() method. You may consider using a longer chunk size or creating chunks with overlapping words. You can try to implement it as an exercise.

Building an Expert System

With the Q&A system above as a building block, you can automate the process of constructing a context for a question. With a database of documents that can be used as context for Q&A, you can build an expert system that can answer a wide range of questions.

Building a good expert system is a complex task that involves a lot of considerations. However, the high-level framework is not difficult to understand. This is similar to the idea of RAG, retrieval-augmented generation, where the context is retrieved from a database of documents, and the answer is generated by the model. One key component is a database that can retrieve the most relevant context for a question. Let’s see how you can build one:

This class is named ContextManager. You can add a piece of text to it with a context ID and the context manager keeps only a limited number of contexts. You can get back the text using the context ID. But the most important method is search_relevant_context(), which will search for the most relevant contexts based on the provided question. You can use a different algorithm to calculate the relevance score. Here a simple one is used, which is to count the number of overlap words, or the Jaccard similarity.

With this class, you can build an expert system that can answer a wide range of questions. Here is an example of how to use it:

You first add some contexts to the context manager. Depending on the desired maximum size of the context manager, you can add a lot of text to the system. Then with a question, you can search for the most relevant context. Then, you can feed the question and the context to the Q&A system to get the answer as in the previous section, in which the chunking and iteratively finding the best answer is done behind the scenes.

You can extend this to try more than the top context to find the answer in a wider range of contexts. This is a simple way to avoid missing the answer in the context of not scoring the best. However, if you have a better way to score the relevance of the context, such as using a neural network model to compute the relevance score, you may not need to try a lot of contexts.

The output of the above will be:

Putting it all together, below is the complete code:

Further Readings

Below are some resources that you may find useful:

Summary

In this tutorial, you have built a comprehensive Q&A system using DistilBERT. In particular, you learned how to:

  • Build a Q&A system using pipeline function in transformers
  • Handle large contexts by chunking
  • Using a context manager to manage the contexts and build an expert system on top of it

No comments yet.

Leave a Reply

Machine Learning Mastery is part of Guiding Tech Media, a leading digital media publisher focused on helping people figure out technology. Visit our corporate website to learn more about our mission and team.