DeClareE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning¶

Authors: Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, Gerhard Weikum

Affiliation: Max Planck Institute for Informatics (Saarbrücken), Amazon Inc. (Seattle)

TL;DR¶

DeClareE is an end-to-end neural network for evidence-aware credibility assessment of arbitrary natural-language claims. The model combines claim and evidence article representations using bidirectional LSTMs with claim-specific attention, automatically discovering which web articles support or refute a claim without hand-crafted features or lexicons. Evaluation on four datasets (Snopes, PolitiFact, NewsTrust, SemEval) shows the method outperforms text-only and distant-supervision baselines, demonstrating the value of joint modeling with external evidence.

Contributions¶

End-to-end model: An automated neural network (DeClareE) for assessing credibility of textual claims without hand-crafted features, lexicons, or rich linguistic preprocessing.
Evidence integration: Incorporates external evidence from web articles; searches for and retrieves relevant articles, then jointly models claim-evidence interactions via bidirectional LSTMs.
Claim-specific attention: Attention mechanism focuses on salient words in evidence articles with respect to the claim, making the model's verdicts interpretable and transparent.
Extensive evaluation: Benchmarks across four diverse datasets (fact-checking websites, Twitter rumors, news credibility reviews) and ablation studies demonstrating the contribution of each component.

Method¶

DeClareE operates in two stages:

Input representations: Claims and articles are represented as sequences of word embeddings. For each claim \(C_n\) with length \(l\), the words are embedded as \(\mathbf{c} = [c_1, c_2, \ldots, c_l]\) where \(c_t \in \mathbb{R}^d\). Reporting articles are retrieved via web search and represented similarly.

Article representation: A bidirectional LSTM encodes each article's content while attending to claim-relevant terms. The model computes attention weights \(\alpha_k\) for each word in an article with respect to the claim, capturing which parts of the article are relevant to assess the claim's credibility. Formally:

\[\alpha_k = \frac{\exp(a_k')}{\sum_j \exp(a_j')}\]

where \(a_k'\) is a learned relevance score. The article representation is then a weighted average of word embeddings guided by these attention weights.

Credibility aggregation: Multiple reporting articles contribute to the final credibility score. The model aggregates per-article credibility scores via averaging:

\[\text{cred}(C) = \frac{1}{M} \sum_{m=1}^{M} s_m\]

where \(M\) is the number of reporting articles and \(s_m\) is the credibility score for article \(m\). For classification tasks, a sigmoid or softmax layer produces the final label; for regression (credibility rating prediction), a linear layer outputs a continuous score.

Results¶

Snopes and PolitiFact (classification): DeClareE (Full) achieves 78.96% accuracy on true claims and 78.32% on false claims on Snopes, outperforming LSTM-text (64.65% / 64.21%) and CNN-text (67.15% / 63.14%) baselines. On PolitiFact, the model achieves 67.32% / 69.62% (true/false) with 0.75 AUC, compared to 62.67% / 69.05% for the best baseline. Ablation studies show both attention and source embeddings contribute ~2–3% improvement.

NewsTrust (regression): Mean squared error (MSE) of 0.29 with the full model, compared to 0.35 for LSTM-text and 0.53 for CNN-text baselines.

SemEval-2017 Task 8 (rumor veracity): Macro-accuracy of 0.57 on the closed variant, outperforming IITP (0.39) and NileTMRG (0.54).

Connections¶

Related to Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking via joint claims and article language analysis, though DeClareE leverages external evidence rather than intrinsic linguistic features alone.
Related to FEVER: A Large-Scale Dataset for Fact Extraction and VERification and SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours on evidence-based fact verification; DeClareE targets open-domain claim credibility rather than structured knowledge-base matching.
Differs from CSI: A Hybrid Deep Model for Fake News Detection in that CSI combines temporal engagement and user behavior, while DeClareE focuses exclusively on content and external evidence.
Precursor to later evidence-aware methods like The Fact Extraction and VERification (FEVER) Shared Task which explicitly model evidence retrieval and entailment.

Notes¶

Strengths: - End-to-end neural approach eliminates need for feature engineering, making the system generalizable to new domains. - Clear interpretability via attention weights—the model highlights which parts of evidence articles drive its credibility verdicts. - Comprehensive evaluation across diverse datasets shows robustness. - Joint modeling of claim and evidence outperforms text-only approaches by 10–15% absolute accuracy.

Weaknesses: - Relies on web search API to retrieve articles; coverage depends on search engine indexing and may miss relevant evidence. - Limited analysis of failure modes—when does evidence retrieval fail or mislead? - Computational cost of retrieving and processing multiple articles per claim is not discussed. - The model assumes credibility can be derived from evidence articles alone; malicious evidence crafted specifically to fool the model is not evaluated.

Open questions: - How does the model perform when evidence articles themselves contain contradictions or are from low-credibility sources? - Can the attention mechanism be fooled by adversarially crafted articles designed to manipulate credibility scores? - How does performance degrade in low-resource domains where web articles are scarce?