Misinformation and fake news detection¶

Detection of misinformation combines computational methods (NLP, machine learning, network analysis) with human curation (fact-checkers, journalists, expert annotators). Detection systems operate at multiple levels: identifying false claims, detecting unreliable sources, spotting coordinated inauthentic behavior, and flagging deepfakes.

Approaches¶

Linguistic and stylometric methods:
Analyzing language patterns, emotional intensity, certainty claims, and rhetorical structure to detect fabrication or exaggeration.

Source credibility analysis:
Evaluating publisher reputation, network position, historical accuracy, and financial incentives to assess information source reliability.

Network and propagation analysis:
Studying how false claims spread—velocity, reach, user characteristics—to detect anomalous patterns indicative of coordinated campaigns or bot amplification.

Multimodal detection:
For image- and video-based misinformation: reverse image search, deepfake detection, caption-image consistency, metadata analysis.

Automated fact verification:
Pairing claims with knowledge bases or external sources to assess factuality; challenges include incomplete knowledge bases and context-dependence of truth.

Key papers in this wiki¶

A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaboration — Large-scale dataset of 800+ million COVID-19 tweets for misinformation research; enables real-time sentiment tracking, cascade analysis, and source credibility assessment during pandemic
Suarez-Lledo & Alvarez-Galvez (2021) — Systematic review of 69 studies identifying dominant health misinformation topics and characterizing detection methodologies (content analysis, sentiment analysis, social network analysis, quality evaluation); shows platform-specific methodological preferences suggesting different analytical affordances
Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data — empirical taxonomy of real-world GenAI-enabled misinformation and disinformation; documents tactics (falsification, impersonation, scaling & amplification) and emergence of low-tech GenAI-based content creation at scale
Misinformation Detection on YouTube Using Video Captions (2021) — Demonstrates that video captions on YouTube effectively distinguish misinformation (0.92–0.95 F1-score binary classification); shows metadata (views, likes) insufficient, but NLP-based caption analysis with pre-trained embeddings achieves strong performance across five conspiracy topics
DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection (2024) — Proposes DELL, which generates synthetic user reactions from diverse perspectives, creates explainable proxy tasks with LLM explanations, and merges task-specific expert predictions; achieves 16.8% improvement in macro F1-score on fake news, framing, and propaganda detection.
Can LLM-Generated Misinformation Be Detected? (2024) — Empirical evidence that LLM-generated misinformation is harder for humans (9.6% vs 40.7% success) and detectors to identify than human-written content with same semantics
Combating Misinformation in the Age of LLMs: Opportunities and Challenges (2023) — Comprehensive survey of opportunities and challenges for using large language models in misinformation detection, intervention, and attribution
Computational fact checking from knowledge networks (2015) — Frames fact checking as graph analysis; uses semantic proximity in Wikipedia knowledge graph to assess statement truthfulness
Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams — Early network-based detection approach for political astroturfing; combines diffusion topology, sentiment analysis, and supervised learning to identify coordinated deceptive campaigns with ~90% accuracy
Lazer et al. (2018) — The Science of Fake News — reviews prevalence and detection challenges; highlights that scientific capacity to measure human attention to fake news is limited; discusses algorithmic detection potential and bot vulnerabilities
Zhou & Zafarani (2020) — A Survey of Fake News — comprehensive survey of detection methods and datasets
Lee et al. (2020) — Misinformation Has High Perplexity — exploits language model perplexity as a falseness signal when model is trained on truthful evidence; data-efficient approach achieving 75% accuracy on COVID-19 claims with minimal supervision
Truthful AI: Developing and Governing AI That Does Not Lie — governance frameworks for preventing AI-generated misinformation through truthfulness standards
On the Risk of Misinformation Pollution with Large Language Models — Investigates vulnerability of ODQA systems to LLM-generated misinformation; proposes detection and defense strategies

Social media and misinformation — platforms as primary distribution channel
Platform algorithms and curation — algorithmic amplification of false content
Fact-checking and corrections — post-hoc corrections vs. prevention
Coordinated inauthentic behavior and bots — adversarial manipulation of detection systems

Misinformation and fake news detection¶

Approaches¶

Key papers in this wiki¶

Related concepts¶