Weak supervision¶
Weak supervision is a machine learning paradigm that leverages noisy, incomplete, or indirect signals as training labels instead of requiring expensive, hand-annotated datasets. In the context of fake news detection, weak supervision typically extracts signals from social media user behavior, engagement patterns, and network structure—creating weak labels without human annotation effort.
Core concepts¶
Motivation: Manual annotation of misinformation is costly and time-consuming. Weak supervision converts readily available signals (user behavior, propagation patterns, metadata) into training examples, enabling scalable learning with minimal labeled data.
Sources of weak labels in misinformation: - User engagement patterns: Sentiment variance (conflicting viewpoints), posting frequency, comment sentiment distribution - User credibility: Inferred from user behavior clusters, historical accuracy, follower counts - Publisher bias: Measured from historical publishing patterns across news sources - Network structure: Hierarchical propagation patterns (retweets, replies, shares) that differ between fake and real news - Interaction networks: Publisher-user-news tripartite graphs modeling relationships
Weak label generation: Instead of binary {fake, real} labels, extract intermediate signals (e.g., "high sentiment variance" → weak fake signal) that serve as training constraints.
Key papers in this wiki¶
- Shu et al. (2020) — Mining Disinformation and Fake News: Introduces weak social supervision (WSS) framework; proposes TriFN (Tri-relationship for Fake News) modeling publisher bias and user credibility jointly; dEFEND for explainable detection; MWSS for multiple weak supervision sources. Demonstrates WSS enables effective detection with minimal labeled data.
- Shu et al. (2019) — Beyond News Contents: The Role of Social Context for Fake News Detection: TriFN model leverages publisher bias and user credibility as weak social supervision; achieves 0.87 AUC on real datasets using primarily engagement-based signals rather than article content.
- Shu et al. (2019) — dEFEND: Explainable Fake News Detection: Uses user comments as weak supervisory signals; bidirectional LSTM with co-attention between sentences and comments identifies which aspects of content are suspicious; achieves high performance with interpretability.
Related concepts¶
- Fake news detection — broader category; weak supervision is one detection paradigm
- Misinformation detection methods — alternative approaches (supervised learning, unsupervised, semi-supervised)
- Social context for detection — using network and engagement features