Skip to content
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Authors: Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela

Venue: ICLR 2021 — arXiv:2005.11401

TL;DR

This work proposes RAG (Retrieval-Augmented Generation), a hybrid architecture that augments pre-trained language models with a non-parametric memory component. The model retrieves relevant documents from Wikipedia at generation time and uses them to ground the generation process, achieving state-of-the-art results on multiple knowledge-intensive NLP tasks without task-specific retraining or architecture changes.

Contributions

  • Hybrid architecture combining parametric and non-parametric memory: parametric memory is a pre-trained seq2seq model (BART); non-parametric memory is a dense vector index of Wikipedia
  • Two formulations: RAG-Sequence uses the same retrieved document for the entire sequence, while RAG-Token selects different documents for each output token
  • Unified framework for knowledge-intensive tasks: evaluated on open-domain QA, abstractive QA, Jeopardy question generation, and fact verification
  • Knowledge updatability: the retrieval index can be swapped at test time without retraining, enabling the model to adapt to new facts
  • End-to-end training: both retriever and generator components are jointly fine-tuned on task-specific data

Method

RAG models use Maximum Inner Product Search (MIPS) to efficiently retrieve the top-K documents using a query encoder and document index. The retrieved documents are then treated as latent variables that condition the generation process.

RAG-Sequence: The top-K retrieved documents are scored jointly, and the model marginalizes over their probabilities to compute the generation likelihood:

\[p(y|x) \approx \sum_{z \in \text{top-}k(p(\cdot|x))} p_\eta(z|x) p_\theta(y|x, z)\]

The retriever uses Dense Passage Retrieval (DPR) with a BERT-based bi-encoder. The generator is initialized from BART.

RAG-Token: For each generated token, a different set of documents can be retrieved and marginalized:

\[p(y|x) \approx \prod_i \sum_{z \in \text{top-}k(p(\cdot|x))} p_\eta(z|x) p_\theta(y_i|x, z, y_{1:i-1})\]

This formulation allows the model to leverage different evidence for different output tokens.

Results

  • Open-Domain QA: RAG-Token achieves 44.1% on Natural Questions, outperforming REALM (41.5%) and the retrieval-only DPR baseline (41.3%)
  • Abstractive QA (MS-MARCO NLG): RAG-Sequence outperforms BART by 2.6 BLEU points
  • Jeopardy Question Generation: RAG-Token surpasses BART in factuality (42.7% vs 7.1% of evaluators preferring it)
  • Fact Verification (FEVER): RAG achieves 4.3% within state-of-the-art on 3-way classification
  • Knowledge grounding: RAG can generate correct answers even when the answer is not in any retrieved document (11.8% accuracy for NQ), showing it leverages both parametric and non-parametric memory

Generation from RAG models is more specific, diverse, and factually accurate than parametric-only BART baselines. The model can also be updated at test time by replacing the retrieval index (e.g., switching from December 2016 to 2018 world leader data, achieving 70% accuracy with the newer index).

Connections

Notes

This is a highly influential paper that introduced RAG and has spawned numerous follow-up works. The key insight—that pre-trained models can be augmented with retrieval at generation time without task-specific architectures—has become foundational in modern NLP. The factuality improvements are particularly relevant for applications like fact-checking and misinformation detection where grounding in evidence is critical. The work demonstrates that both parametric and non-parametric knowledge are useful, and that jointly training both components is effective.