Explainable fake news detection¶
Explainable (or interpretable) fake news detection goes beyond binary classification to identify why a news article is predicted to be fake. Rather than a single score, explainable methods surface evidence: key sentences in the article, important features, or user comments that support the fake/real label. This transparency is critical for academic research, legal contexts, and user trust.
Key approaches:
- Attention mechanisms: Hierarchical attention weights on words and sentences highlight which parts of the news drove the classification decision. See hierarchical attention.
- Comment-based explanations: User responses (skepticism, fact-checking comments) signal which sentences are disputed or false. Jointly modeling news content and comments makes explanations grounded in reader feedback.
- Feature importance analysis: Post-hoc interpretation of learned representations (e.g., attention weights, word embeddings) to identify which input tokens or linguistic features contributed most to the prediction.
- Fact-checking integration: Explicit modeling of fact-checked claims within the article text; predicted fake-news labels paired with sentence-level fact check scores.
Evaluation of explainability is challenging: metrics include human evaluations of explanation quality (e.g., annotators rating whether top-k sentences truly "explain" the fake label), precision/recall of identified check-worthy sentences, and ranking metrics like MAP (Mean Average Precision).
Key papers¶
- Shu et al. (2019) — dEFEND: Hierarchical attention + sentence-comment co-attention to jointly detect fake news and explain via top-k sentences and comments; human evaluation via AMT shows dEFEND ranks check-worthy sentences better than HPA-BLSTM.
- Jin et al. (2021) — Towards Fine-Grained Reasoning for Fake News Detection: Provides explainability through fine-grained reasoning over claim-evidence graphs, identifying which evidence groups matter most and which tokens within evidence drive predictions. Uses kernel-based attention mechanisms and importance priors to surface interpretable reasoning steps; case studies show the model correctly identifies suspicious evidence (e.g., anonymous server hack claims) versus mainstream coverage.
Connections¶
- Hierarchical attention networks are a primary mechanism for explainability in neural fake-news detectors.
- User comments provide external signals that ground explanations in community skepticism and fact-checking.
- Content-based detection typically uses explainability mechanisms internally; social-context methods can add explanations via social features.