MultiFC¶

Largest publicly available dataset of naturally occurring factual claims for automatic claim verification. Consists of 34,918 claims collected from 26 fact-checking websites in English, paired with rich metadata and evidence pages retrieved via Google Search.

Key Features¶

34,918 naturally occurring claims from fact-checking websites (not artificially constructed)
26 fact-checking domains spanning organizational diversity (news outlets, dedicated fact-checking sites, government agencies)
Rich metadata: claim text, labels, speakers, fact-checkers, claim dates, publication dates, categories
Evidence retrieval: Claims matched with evidence pages via Google Search API; Wikipedia and news sources heavily represented
Entity linking: 25,763 unique entities linked to Wikipedia; 42% of claims contain linkable entities
Multi-domain labels: Domain-specific veracity label schemas (2–27 distinct labels per domain)

Dataset Statistics¶

Total claims: 34,918 (after deduplication, filtering claims with <5 labels)
Domains: 26 fact-checking websites with varying claim counts (20–2,943 per domain)
Entities: 25,763 unique entities; average 2.9 entities per claim (range 1–35)
Evidence sources: Wikipedia (4.43%), Snopes (3.99%), news outlets (Washington Post, NYT, Guardian)
Label distribution: Varies dramatically by domain; global label heterogeneity necessitates multi-task learning approaches

Format¶

Claims are provided as JSON with: - Claim ID and text - Veracity label (domain-specific) - Metadata (speaker, fact-checker, publication date, claim date, label reason, category) - Evidence pages (URLs, titles, snippets, full text) - Entity annotations (linked to Wikipedia)

FEVER: A Large-Scale Dataset for Fact Extraction and VERification — FEVER dataset with 185,445 constructed claims and Wikipedia evidence
Liar, Liar Pants on Fire: A New Benchmark Dataset for Fake News Detection — LIAR dataset with 12,836 claims from POLITIFACT
EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection — Evidence-aware neural networks for multimodal fake news detection
Embracing Domain Differences in Fake News: Cross-domain Fake News Detection using Multimodal Data — Cross-domain multimodal disinformation detection

Benchmark Results¶

Best multi-task learning model achieves Macro F1 of 49.2% across domains (17.4% improvement over evidence-agnostic baseline of 41.8%), demonstrating that the dataset presents a challenging testbed for real-world claim verification.

Download¶

Available at https://github.com/copenlu/multifc

Introduced in: MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims