Abstractive Summarization¶

Abstractive summarization systems read source documents and generate shorter, human-readable summaries that capture the essential information. Unlike extractive summarization (which selects existing sentences), abstractive systems must paraphrase, compress, and reorganize content—tasks that require deep language understanding and generation.

Architectures¶

Modern abstractive summarization uses transformer-based sequence-to-sequence models (BART, T5, PEGASUS) trained on large paired datasets of documents and summaries. These models encode the source document and decode a summary token-by-token, maximizing the likelihood of reference summaries.

Hallucination problem¶

Abstractive summarization is particularly vulnerable to hallucination—generating plausible but factually incorrect or unsupported claims not present in the source document. This is critical for high-stakes applications (medical, legal, news summarization) where false information can cause harm.

Intrinsic hallucinations: Summary contradicts the source document (e.g., claiming a vaccine was approved in 2021 when the source says 2019).

Extrinsic hallucinations: Summary includes factually correct but unsupported information not mentioned in the source (e.g., adding background knowledge about a person not discussed in the article).

Evaluation and detection¶

Key metrics for measuring hallucination in abstractive summarization: - Information Extraction (IE)-based: Extract entities and relations from source and summary; compare overlap - Natural Language Inference (NLI): Check whether summary is entailed by the source - QA-based: Generate questions from the summary and check if the source answers them consistently - Human evaluation: Crowdsourced judgment of faithfulness

Mitigation strategies¶

Architecture methods: Modify encoders/decoders to enforce grounding (e.g., adding explicit retrieval of supporting phrases from source)
Training methods: Contrastive learning, reinforcement learning with factuality-aware rewards, joint training on entailment
Post-processing: Filter or correct hallucinated spans using learned correction models or fact-checking systems

Key papers¶

Survey of Hallucination in Natural Language Generation — Section 7 provides comprehensive overview of hallucination definition, metrics, and mitigation in abstractive summarization

Natural Language Generation — broader task area
Hallucination in language models — cross-task hallucination phenomenon
Information Extraction — used in evaluation metrics
Neural language models — underlying models