Evidence Extraction¶
Evidence extraction is the task of automatically identifying and retrieving the most relevant text snippets or sentences from a document collection that either support or refute a given claim. This is a core subtask in the fact-checking pipeline, bridging document retrieval and claim verification.
Problem formulation¶
Given: - A claim (string) - A collection of documents or sentences
Find: - The subset of sentences/passages that provide evidence for or against the claim - A ranking or relevance score for each piece of evidence
Evidence extraction can be formulated as: 1. Classification: labeling sentences as supporting, refuting, or irrelevant 2. Ranking: ordering sentences by their relevance for validating the claim 3. Span extraction: identifying the minimal text spans containing essential evidence
Challenges¶
- Relevance vs. similarity: Lexically similar sentences may not be relevant evidence (e.g., mentioning the same topics without addressing the claim)
- Multi-hop reasoning: Evidence sometimes requires combining information across multiple sentences
- Source reliability: In heterogeneous document collections, unreliable sources may return false "evidence"
- Granularity: Determining the right unit (word span, sentence, paragraph, document) for evidence
- Fine-grained evidence: Annotating which parts of a sentence are actually evidence vs. background
Key papers¶
- Hanselowski et al. (2019) — A Richly Annotated Corpus for Different Tasks in Automated Fact-Checking — fine-grained evidence extraction with detailed sentence-level annotations; compared ranking (FEVER-style) and classification approaches; best models (BilSTM, rankingESIM) achieve recall@5 of 0.637 and 0.507 respectively; identifies challenges of paraphrased evidence and topic overlap without relevance
- Thorne et al. (2018) — FEVER: A Large-Scale Dataset for Fact Extraction and VERification — pipeline combining document retrieval, sentence selection, and textual entailment; evidence is Wikipedia sentences with document-level evidence supervision
- Thorne et al. (2018) — The Fact Extraction and VERification (FEVER) Shared Task — shared task with emphasis on sentence selection as evidence retrieval
See also¶
- Fact-checking and corrections — evidence extraction as a pipeline stage
- Claim Verification — evidence typically feeds into claim verification
- Natural Language Inference — often used to assess evidence-claim relationships