Fake news detection¶
Methods, approaches, and psychological mechanisms underlying the detection and identification of false news stories and misinformation.
Key papers and articles¶
- DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection (2024) — DELL integrates LLMs at three stages: generating synthetic user reactions from diverse perspectives, creating explainable proxy tasks with LLM explanations, and merging task-specific expert predictions; achieves up to 16.8% improvement in macro F1-score on fake news detection across seven datasets.
- Su, Cardie & Nakov (2023) — Adapting Fake News Detection to the Era of Large Language Models: Evaluates fake news detectors across three eras (human-written, mixed human-machine, machine-dominated content) revealing that detectors trained exclusively on human-written fake news fail on machine-generated fakes. Recommends balanced training data mixing human and AI-generated content for robust detection across GossipCop++ and PolitiFact++.
- Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements: Comprehensive tutorial covering user engagement in misinformation dissemination, detection techniques using weak social supervision (TriFN, dEFEND, MWSS), and mitigation strategies. Weak social supervision leverages user behavior patterns (sentiment, credibility, network structure) as training signals; TriFN achieves 0.87 AUC; dEFEND achieves ~0.9 F1 with explainability via sentence-comment co-attention.
- TI-CNN: Convolutional Neural Networks for Fake News Detection: TI-CNN combines text and image information via parallel CNNs to detect fake news about the 2016 US presidential election. Achieves F₁ 0.9210 by integrating explicit features (word patterns, image properties) and learned representations, outperforming text-only and image-only approaches.
- Tacchini et al. (2017) — Some Like it Hoax: Detects Facebook hoaxes with >99% accuracy using user interaction patterns (likes) rather than content. Proposes logistic regression and harmonic boolean crowdsourcing approaches, showing knowledge transfers across Facebook communities even with minimal labeled training data.
- Wang (2017) — Liar, Liar Pants on Fire: A New Benchmark Dataset for Fake News Detection: Introduces LIAR, a foundational benchmark dataset of 12,836 labeled political statements from PolitiFact spanning 2007–2016 with 6-way fine-grained labels (pants-fire, false, barely-true, half-true, mostly-true, true) and rich metadata; demonstrates that integrating speaker credibility history with text improves detection over text-only methods
- Fighting an Infodemic: COVID-19 Fake News Dataset: Curated dataset of 10,700 COVID-19 related posts/articles with binary labels (real vs. fake) from social media and fact-checking websites; benchmarks four ML baselines achieving 93.32% F1-score with SVM; addresses pandemic-specific misinformation with balanced, annotated public dataset.
- Oshikawa, Qian, & Wang (2020) — A Survey on Natural Language Processing for Fake News Detection: NLP-focused survey systematically comparing task formulations (classification vs. regression), nine benchmark datasets (LIAR, FEVER, FakeNewsNet, SNS data), and five methodological approaches; demonstrates attention-LSTM models achieve highest accuracy while highlighting importance of meta-data exploitation
- Lu & Li (2020) — GCAN: Graph-aware Co-Attention Networks for Explainable Fake News Detection: Novel neural model for short-text fake news detection on Twitter using only source tweets and retweet sequences; combines GCN-based user interaction graphs with dual co-attention mechanism to achieve 87.7% accuracy on Twitter15 and 90.8% on Twitter16, outperforming prior work by ~18%; provides explainability by highlighting suspicious users and informative words.
- Shu et al. (2017) — Fake News Detection on Social Media: A Data Mining Perspective: Comprehensive survey organizing fake news detection through characterization (psychological/social foundations) and detection methods (content-based: knowledge/style-based; context-based: stance/propagation-based)
- Sharma et al. (2018) — Combating Fake News: A Survey on Identification and Mitigation Techniques: Survey organizing detection methods into content-based (POS tags, PCFG, CNNs, RNNs with attention) and feedback-based (propagation kernels, SEIZ process models, temporal patterns, user/group analysis); enumerates 23+ datasets with detailed task/annotation characteristics and proposes future directions in intent detection and dynamic knowledge bases
- Potthast et al. (2017) — A Stylometric Inquiry into Hyperpartisan and Fake News: Stylometric analysis of fake news and hyperpartisan content using a manually fact-checked corpus of 1,627 articles; finds style-based fake news detection achieves only F1=0.46 but shows hyperpartisan detection (F1=0.78) could serve as a pre-filter
- Ruchansky, Seo, & Liu (2017) — CSI: A Hybrid Deep Model for Fake News Detection: Deep learning model combining text, temporal engagement patterns, and user behavior for detection; achieves 89.2% accuracy on Twitter by explicitly modeling source characteristic through group behavior analysis
- Rashkin et al. (2017) — Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking: Linguistic analysis of fake news across satire, hoax, and propaganda categories; demonstrates stylistic features distinguish unreliable from trusted news
- Guess et al. (2020) — A digital media literacy intervention increases discernment between mainstream and false news in the United States and India: RCT of teaching simple heuristics for evaluating news credibility; improves accuracy discrimination between false and mainstream news
- Jones-Jang, Mortensen, & Liu (2021) — Does Media Literacy Help Identification of Fake News?: Empirical investigation of which literacy types predict accurate fake news identification; information literacy is the key predictor, not general media literacy
- Solaiman et al. (2019) — OpenAI Release Strategies: Analysis of detection capabilities for GPT-2 generated news articles. Shows RoBERTa fine-tuned detection achieves ~95% accuracy on largest models; human studies show 75% credibility for 1.5B outputs. Demonstrates fine-tuning reduces detection by existing systems. Key findings on the arms race between generation and detection quality.
- Lazy, not biased: Susceptibility to partisan fake news is better explained by lack of reasoning than by motivated reasoning — Pennycook & Rand on analytic thinking and fake news susceptibility
- Dou et al. (2021) — User Preference-aware Fake News Detection (UPFD): Leverages user endogenous preferences (from historical posts) alongside exogenous propagation context via GNNs for detection. Shows confirmation bias drives sharing decisions; jointly modeling user behavior and news propagation patterns outperforms content-only and graph-only baselines by ~1% on Politifact/Gossipcop; releases augmented FakeNewsNet benchmark.
- Vo & Lee (2021) — Hierarchical Multi-head Attentive Network: Proposes MAC, combining word-level and document-level multi-head attention for evidence-aware fact-checking. Jointly learns to identify important words in claims/evidence and important documents among multiple retrieved articles. Achieves 88.7% AUC on Snopes (9.47% improvement) and 75.8% on PolitiFact, with ablation studies showing both attention mechanisms are essential.
- A Benchmark Study of Machine Learning Models for Online Fake News Detection: Comprehensive benchmark comparing 19 machine learning models (traditional, deep learning, pre-trained transformers) on three fake news datasets. Finds BERT-based models (RoBERTa) achieve 96%+ accuracy on large datasets and >90% with only 500 training samples, substantially outperforming traditional and deep learning approaches.
- Mayank, Sharma & Sharma (2021) — DEAP-FAKED: Knowledge Graph based Approach for Fake News Detection: Combines biLSTM news encoding with knowledge graph embeddings using only article titles. Extracts named entities, maps to Wikidata, and embeds via ComplEx. Achieves 88% F1 on Kaggle and 78% on CoAID datasets after systematic bias removal; demonstrates entity information provides ~13% F1 improvement over text-only baselines.
- Kaliyar, Goswami & Narang (2021) — FakeBERT: Fake News Detection in Social Media with a BERT-based Deep Learning Approach: Combines BERT embeddings with parallel 1D CNNs using multiple kernel sizes for multi-scale feature extraction. Achieves 98.90% accuracy on real-world fake news dataset (2016 election), substantially outperforming CNN (92.70%) and LSTM (97.55%) baselines with BERT embeddings, demonstrating the value of contextualized transformers with architectural innovation.
- Jin et al. (2021) — Towards Fine-Grained Reasoning for Fake News Detection: Proposes FinerFact, a framework for fine-grained reasoning over claim-evidence graphs using mutual-reinforcement-based evidence ranking and a bi-channel kernel graph network. Achieves 91.7% F1 on PolitiFact and 86.4% F1 on GossipCop while providing human-interpretable explanations of predictions; demonstrates that evidence quality and reasoning at claim-level improves detection over coarse-grained document classification.
- Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News: Addresses multimodal neural fake news detection using visual-semantic inconsistency. Proposes DIDAN, which identifies mismatches between text, images, and captions by analyzing named entity co-occurrence. Introduces NeuralNews dataset of 128K articles (real and machine-generated). Shows naive humans achieve only 46.2% accuracy at detection, while trained humans reach 67.8% with visual-semantic cues; identifies Type C articles (generated text + real images) as most deceptive.