Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements¶

Authors: Kai Shu, Suhang Wang, Dongwon Lee, Huan Liu

Venue: Book chapter, Springer — arXiv preprint 2001.00623

TL;DR¶

This comprehensive tutorial surveys the landscape of fake news and disinformation research across three dimensions: understanding user engagement in information disorder, techniques for detecting and mitigating false content, and emerging challenges like fake news literacy and neural generation. The authors emphasize weak social supervision (leveraging social media engagements as training signals) as a practical approach to detection under limited labeled data.

Contributions¶

Defines information disorder taxonomy: disinformation (false and harmful), misinformation (false but not necessarily harmful), and malinformation (true but harmful or deceptive).
Categorizes fake news characterization along intent-based axes (who creates it, why, how it spreads) and content-based axes (linguistic, visual, multimodal cues).
Presents weak social supervision (WSS) framework: leveraging user behavior patterns (sentiment, credibility, network structure) as weak labels for detection.
Surveys detection strategies: content-based approaches, social context approaches, and multimodal fusion.
Reviews mitigation strategies including fact-checking, credibility modeling, and user interventions.
Covers emerging challenges: fake news literacy, machine-generated content, blockchain defense, and incongruent news headlines.

Method¶

The paper is organized as a tutorial with three parts:

Part I: User Engagements in Information Disorder

Examines who engages with and spreads misinformation. Key insights from social theories: - Social homophily: Users tend to follow like-minded peers and receive news confirming existing beliefs, creating echo chambers. - Cognitive biases: Naive realism and confirmation bias make users vulnerable to false narratives. - Network structure: Hierarchical propagation networks (posting, reposting, replying) differ statistically between fake and real news.

Proposes characterization of users along credibility and bias dimensions—high-bias users more likely to spread misinformation; high-credibility users act as gatekeepers.

Part II: Detection and Mitigation

Core approach: Weak Social Supervision (WSS). Rather than relying on expensive manual annotation, extract weak labels from three aspects of social media engagements: - Sentiment: Conflicting viewpoints or high sentiment variance → fake news signal. - Publisher bias: Measure from historical tweet patterns; biased publishers more likely to disseminate misinformation. - User credibility: Credibility inferred from user behavior clusters; low-credibility users more likely to share false content.

Key models presented: - TriFN (Tri-relationship for Fake News): Models publisher bias and user credibility jointly; achieves 0.87 AUC on real datasets. - dEFEND: Uses bidirectional LSTM with co-attention between news sentences and user comments to detect fake news with explanations. - MWSS (Multiple sources of Weak Social Supervision): Jointly learns from multiple weak supervision sources (sentiment, bias, credibility) for early detection with minimal content.

Visual and multimodal approaches: Exploit image features (statistical, content-based, neural CNN features) and video transcripts for detection.

Part III: Emerging Issues

Fake news literacy and semantic understanding.
Neural generation of fake news using GANs, GPT-2, and adversarial methods.
Incongruent news headlines (misleading titles paired with factual content).
YouTube information environment and comment toxicity.
Blockchain for immutable source verification and tamper-proof records.

Results¶

Key empirical findings: - TriFN achieves 0.87 AUC on detecting disinformation using publisher bias and user credibility signals. - dEFEND achieves ~0.9 F1-score on explainable fake news detection, with interpretable sentence-comment alignments. - MWSS improves detection even with limited labeled data by exploiting multiple weak supervision sources simultaneously. - Hierarchical propagation networks show structural differences between fake and real news, useful as early detection signals before widespread dissemination.

Connections¶

Related to Zafarani's prior work on social bots and rumor propagation via shared focus on social dynamics.
Cites Zhou and Zafarani's survey on fake news detection and datasets (LIAR, FakeNewsNet).
Extends work on credibility and trust in social media by Castillo and others.
Precursor to later work on neural fake news generation and adversarial robustness.

Notes¶

Strengths: - Exceptionally comprehensive and well-structured tutorial; balances conceptual clarity with technical depth. - Practical emphasis on weak supervision addresses the real bottleneck of labeled data in production systems. - Clear taxonomy of information disorder concepts (disinformation/misinformation/malinformation) helpful for framing research. - Strong coverage of multimodal approaches and user behavior modeling, not just content features.

Limitations: - Primarily focuses on English-language social media (Twitter). Cross-lingual and non-Western platforms underrepresented. - Neural fake news generation (Part III) covers early models (SeqGAN, GPT-2); later approaches (Transformers, large language models) only briefly mentioned. - Blockchain defense section speculative; practical deployment challenges not deeply explored. - Some results (e.g., TriFN) evaluated on datasets from the authors' own prior work; external validation would strengthen claims.

Impact: This tutorial has become a key reference in the fake news detection literature, widely cited for its integrated view of information disorder and weak supervision paradigm. The systematic treatment of user engagement and network effects was influential in shifting focus from purely content-centric approaches.