Hallucination in language models¶
Hallucination is a critical failure mode where language models generate text that is fluent and coherent but factually incorrect or inconsistent with the provided context. Unlike random errors, hallucinations are often internally coherent falsehoods that can mislead users into trusting incorrect information.
Definition and characteristics¶
Hallucination occurs when a model generates output that: - Contradicts established facts or ground truth - Contradicts its own context or prior statements (internal inconsistency) - Invents entities, relationships, or events that don't exist - Attributes false statements to real people or sources
Key distinction from other failures: - Unlike non-hallucinating errors (refusals, vague answers), hallucinations are confidently wrong - The plausibility and fluency make hallucinations particularly dangerous—users may not notice the errors
Types of hallucinations¶
Intrinsic hallucinations: Output contradicts source material provided in the context (e.g., a model is given a Wikipedia article and generates a fact contradicting it).
Extrinsic hallucinations: Output is unverifiable against any source but inconsistent with world knowledge (e.g., inventing a false scientific discovery or historical event).
Internal hallucinations: The output is internally contradictory (the model says "Paris is in France" then later "Paris is the capital of Germany").
Causes¶
Knowledge gaps: Models lack certain facts in their training data, particularly recent information or specialized domain knowledge.
Retrieval failures: Even when facts are in the model's parameters, it may fail to retrieve them correctly during generation.
Decoding artifacts: The greedy or nucleus sampling procedures can select tokens that are individually likely but collectively form falsehoods.
Prompt adversarialism: Some prompts or tasks naturally elicit hallucinations more than others (e.g., prompts asking for novel facts vs. retrieval of training data).
Impact and applications¶
Hallucinations are especially problematic in high-stakes domains: - Medicine: False diagnoses or drug interactions - Law: Fabricated case law or statutes - Finance: False market data or investment advice - Journalism: Spreading misinformation - Academic research: False citations or invented results
Detection and measurement¶
Fact-checking approaches: - Comparing output against curated knowledge bases - Using external retrievers to verify claims - Human expert annotation
Metrics: - Hallucination rate: percentage of generated statements that are false - Inconsistency rate: fraction of internally contradictory statements - Attribution scores: whether claims are supported by provided context
Benchmarks: - HaluEval: specifically designed to test hallucination propensity - TruthfulQA: knowledge-based questions designed to catch confabulations - AFHB: Adversarial Factual Hallucination Benchmark
Mitigation strategies¶
Retrieval-augmented generation (RAG): - Retrieve relevant documents and condition generation on them - Provides grounding and access to up-to-date information
Fine-tuning approaches: - Training on data with high factual accuracy - RLHF with rewards for factual outputs - Learning to abstain when uncertain
Prompting techniques: - Chain-of-thought to improve reasoning - Few-shot examples of correct factuality - Explicitly instructing models to cite sources
Decoding constraints: - Constrained decoding to enforce consistency with context - Reducing sampling temperature for more certain outputs - Beam search with factuality-aware scoring
Key papers¶
- Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment — Survey with hallucination evaluation framework; measures hallucination rates across six LLMs (davinci, OPT-1.3B, text-davinci-003, flan-t5-xxl, ChatGPT, GPT-4) using multiple-choice fact-based questions
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection — SELF-RAG reduces hallucinations by teaching models to retrieve grounding and predict support tokens for their outputs
- A Survey on Evaluation of Large Language Models — comprehensive survey with dedicated section on hallucination detection and factuality evaluation
- Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity — in-depth coverage of hallucination causes, detection methods, and mitigation strategies
Related topics¶
- Factuality in large language models — broader problem of ensuring LLM outputs match reality
- Large Language Models — the architectures prone to hallucination
- Misinformation — false information that can be propagated through hallucinations
- Retrieval-Augmented Generation — reducing hallucinations through grounding