Skip to content

Evidence Extraction

Evidence extraction is the task of automatically identifying and retrieving the most relevant text snippets or sentences from a document collection that either support or refute a given claim. This is a core subtask in the fact-checking pipeline, bridging document retrieval and claim verification.

Problem formulation

Given: - A claim (string) - A collection of documents or sentences

Find: - The subset of sentences/passages that provide evidence for or against the claim - A ranking or relevance score for each piece of evidence

Evidence extraction can be formulated as: 1. Classification: labeling sentences as supporting, refuting, or irrelevant 2. Ranking: ordering sentences by their relevance for validating the claim 3. Span extraction: identifying the minimal text spans containing essential evidence

Challenges

  • Relevance vs. similarity: Lexically similar sentences may not be relevant evidence (e.g., mentioning the same topics without addressing the claim)
  • Multi-hop reasoning: Evidence sometimes requires combining information across multiple sentences
  • Source reliability: In heterogeneous document collections, unreliable sources may return false "evidence"
  • Granularity: Determining the right unit (word span, sentence, paragraph, document) for evidence
  • Fine-grained evidence: Annotating which parts of a sentence are actually evidence vs. background

Key papers

See also