Skip to content

Weak supervision

Weak supervision is a machine learning paradigm that leverages noisy, incomplete, or indirect signals as training labels instead of requiring expensive, hand-annotated datasets. In the context of fake news detection, weak supervision typically extracts signals from social media user behavior, engagement patterns, and network structure—creating weak labels without human annotation effort.

Core concepts

Motivation: Manual annotation of misinformation is costly and time-consuming. Weak supervision converts readily available signals (user behavior, propagation patterns, metadata) into training examples, enabling scalable learning with minimal labeled data.

Sources of weak labels in misinformation: - User engagement patterns: Sentiment variance (conflicting viewpoints), posting frequency, comment sentiment distribution - User credibility: Inferred from user behavior clusters, historical accuracy, follower counts - Publisher bias: Measured from historical publishing patterns across news sources - Network structure: Hierarchical propagation patterns (retweets, replies, shares) that differ between fake and real news - Interaction networks: Publisher-user-news tripartite graphs modeling relationships

Weak label generation: Instead of binary {fake, real} labels, extract intermediate signals (e.g., "high sentiment variance" → weak fake signal) that serve as training constraints.

Key papers in this wiki