A Gentle Introduction to Hallucinations in Large Language Models

Large Language Models (LLMs) are known to have “hallucinations.” This is a behavior in that the model speaks false knowledge as if it is accurate. In this post, you will learn why hallucinations are a nature of an LLM. Specifically, you will learn:

  • Why LLMs hallucinate
  • How to make hallucinations work for you
  • How to mitigate hallucinations

Get started and apply ChatGPT with my book Maximizing Productivity with ChatGPT. It provides real-world use cases and prompt examples designed to get you using ChatGPT quickly.

Let’s get started.

A Gentle Introduction to Hallucinations in Large Language Models
Picture generated by the author using Stable Diffusion. Some rights reserved.


This post is divided into three parts; they are

  • What are Hallucinations in Large Language Models
  • Using Hallucinations
  • Mitigating Hallucinations

What are Hallucinations in Large Language Models

A large language model is a trained machine learning model that generates text based on the prompt you provided. The model’s training equipped it with some knowledge derived from the training data we provided. It is difficult to tell what knowledge a model remembers or what it does not. In fact, when a model generates text, it can’t tell if the generation is accurate.

In the context of LLMs, “hallucination” refers to a phenomenon where the model generates text that is incorrect, nonsensical, or not real. Since LLMs are not databases or search engines, they would not cite where their response is based on. These models generate text as an extrapolation from the prompt you provided. The result of extrapolation is not necessarily supported by any training data, but is the most correlated from the prompt.

To understand hallucination, you can build a two-letter bigrams Markov model from some text: Extract a long piece of text, build a table of every pair of neighboring letters and tally the count. For example, “hallucinations in large language models” would produce “HA”, “AL”, “LL”, “LU”, etc. and there is one count of “LU” and two counts of “LA.” Now if you started with a prompt of “L”, you are twice as likely to produce “LA” than “LL” or “LS”. Then with a prompt of “LA”, you have an equal probability of producing “AL”, “AT”, “AR”, or “AN”. Then you may try with a prompt of “LAT” and continue this process. Eventually, this model invented a new word that didn’t exist. This is a result of the statistical patterns. You may say your Markov model hallucinated a spelling.

Hallucination in LLMs is not much more complex than this, even if the model is much more sophisticated. From a high level, hallucination is caused by limited contextual understanding since the model is obligated to transform the prompt and the training data into an abstraction, in which some information may be lost. Moreover, noise in the training data may also provide a skewed statistical pattern that leads the model to respond in a way you do not expect.

Using Hallucinations

You may consider hallucinations a feature in large language models. You want to see the models hallucinate if you want them to be creative. For example, if you ask ChatGPT or other Large Language Models to give you a plot of a fantasy story, you want it not to copy from any existing one but to generate a new character, scene, and storyline. This is possible only if the models are not looking up data that they were trained on.

Another reason you may want hallucinations is when looking for diversity, for example, asking for ideas. It is like asking the models to brainstorm for you. You want to have derivations from the existing ideas that you may find in the training data, but not exactly the same. Hallucinations can help you explore different possibilities.

Many language models have a “temperature” parameter. You can control the temperature in ChatGPT using the API instead of the web interface. This is a parameter of randomness. The higher temperature can introduce more hallucinations.

Mitigating Hallucinations

Language models are not search engines or databases. Hallucinations are unavoidable. What is annoying is that the models generate text with mistakes that is hard to spot.

If the contaminated training data caused the hallucination, you can clean up the data and retrain the model. However, most models are too large to train on your own devices. Even fine-tuning an existing model may be impossible on commodity hardware. The best mitigation may be human intervention in the result, and asking the model to regenerate if it went gravely wrong.

The other solution to avoid hallucinations is controlled generation. It means providing enough details and constraints in the prompt to the model. Hence the model has limited freedom to hallucinate. The reason for prompt engineering is to specify the role and scenario to the model to guide the generation, so that it does not hallucinate unbounded.


In this post, you learned how an LLM hallucinates. In particular,

  • Why hallucination would be useful
  • How to limit the hallucination

It’s worth noting that while hallucination can be mitigated, but probably not completely eliminated. There is a trade-off between creativity and accuracy.

Maximize Your Productivity with ChatGPT!

Maximizing Productivity with ChatGPT

Let Generative AI Help You Work Smarter

...by leveraging the power of advanced AI from ChatGPT, Google Bard, and many other tools online

Discover how in my new Ebook:
Maximizing Productivity with ChatGPT

It provides great tips with examples of all kinds to make you the boss of AI robots
for brainstorming, editing, expert helper, translator, and much more...

Make AI work for you with my latest book

See What's Inside

7 Responses to A Gentle Introduction to Hallucinations in Large Language Models

  1. Avatar
    Carlos Altamirano June 10, 2023 at 6:30 am #

    Thanks, excellent article!.

    • Avatar
      James Carmichael June 10, 2023 at 1:01 pm #

      Thank you Carlos for your kind words and support! We greatly appreciate it!

  2. Avatar
    Tagore June 15, 2023 at 2:35 pm #

    Informative ????

  3. Avatar
    Bob August 5, 2023 at 7:08 pm #

    Good article, but I’d be careful with your terminology. You use the term “looking up” as if LLMs actually look up data, but they don’t.

    • Avatar
      James Carmichael August 6, 2023 at 9:32 am #

      Thank you for your feedback and suggestion Bob!

  4. Avatar
    llmwesee October 25, 2023 at 4:57 pm #

    Good article! I would like to inquire about the concurrent request capacity of an NVIDIA A100 80 GB GPU when employing the LLMa-2-13b model with full document capabilities like query through PDFs for production purposes. Specifically, I am interested in understanding the optimal level of concurrency attainable when operating on a local server, with a primary focus on achieving low latency and maximising throughput.

    • Avatar
      James Carmichael October 26, 2023 at 10:51 am #

      Hi llmwesee…You are very welcome! We appreciate the question! Perhaps others with direct experience with your configuration may weigh in.

Leave a Reply