The Spread of True and False News Online¶

Authors: Soroush Vosoughi, Deb Roy, Sinan Aral Venue: MIT Initiative on the Digital Economy (IDE) Research Brief, 2017 URL: ide.mit.edu

TL;DR¶

Analyzes 126,000 verified true and false news cascades from Twitter (2006–2017) covering 3 million users and 4.5 million shares. False news diffuses significantly farther, faster, deeper, and more broadly than truth across all categories, with false political news spreading fastest. Humans, not bots, drive this differential spread—novelty perception emerges as the key driver explaining why people preferentially share false information.

Contributions¶

Large-scale empirical evidence of differential diffusion: falsehood reached 1,500 people ~6× faster than truth; reached depth 19 nearly 10× faster than truth reached depth 10
Comprehensive cascade metrics quantifying depth, size, breadth, and structural virality across verified true/false news
Fact-checking dataset using 95–98% agreement from six independent fact-checking organizations (Snopes, PolitiFact, FactCheck, TruthOrFiction, HoaxSlayer, UrbanLegends)
Human-vs.-bot analysis showing robots accelerate both truth and falsehood equally; humans, not automated systems, are responsible for the dramatic spread of false news
Novelty hypothesis demonstrating false news is perceived as more novel, and that novel information elicits higher retweet likelihood, explaining the disparity

Method¶

Data collection: Rumor cascades on Twitter from 2006–2017. A cascade is defined as an unbroken retweet chain originating from a single tweet. Approximately 126,000 cascades spread by ~3 million users over 4.5 million times.

Verification: All cascades were fact-checked against verdicts from six independent fact-checking organizations, achieving 95–98% inter-organization agreement on true/false/mixed classifications.

Cascade quantification: Four metrics characterize each cascade: 1. Depth: Number of retweet hops from origin over time 2. Size: Number of users involved in cascade over time 3. Maximum breadth: Total unique users at any depth 4. Structural virality: Measure interpolating between single large broadcast vs. multi-generational spread with distributed contribution

Bot detection and removal: Used a sophisticated bot-detection algorithm to identify and exclude automated accounts, then recomputed findings with bots included to assess their differential contribution.

Results¶

Falsehood spreads more pervasively: - Truth took ~6× longer to reach 1,500 people vs. falsehood - Truth never exceeded cascade depth 10; falsehood reached depth 19 nearly 10× faster - Falsehood diffused to more unique users at every cascade depth

Category-specific findings: - False political news showed the most dramatic spread (depth, speed, breadth) - Political, urban legend, and science news reached the most people overall - Political and urban legend news spread fastest and were most viral

Retweeting likelihood: - Falsehoods were 70% more likely to be retweeted than truth (controlling for many factors)

Human vs. bot responsibility: - Bots accelerated both true and false news at similar rates - Humans, not robots, are predominantly responsible for the differential spread of falsehood

Novelty as explanatory mechanism: - False news was perceived as significantly more novel than true news - Twitter users more likely to retweet novel information - This aligns with information theory (novelty provides greatest decision-making aid) and social psychology (novel information signals insider status)

Connections¶

Related to A Survey of Fake News which frames propagation-based detection as one of four perspectives
Foundational empirical work in propagation dynamics; later studies like Network-based Fake News Detection build on these cascade patterns
Directly addresses the human-behavior dimension that The Role of User Profiles for Fake News Detection investigates
Informs policy implications discussed in Fake News Early Detection: An Interdisciplinary Study

Notes¶

Strengths: - Largest longitudinal study of misinformation diffusion at the time; comprehensive fact-checking across 126,000 cascades via multiple independent organizations - Novel contribution separating human from bot behavior, dispelling myths about automated disinformation spread - Clear, actionable finding: novelty preference explains the differential spread, moving beyond network structure or individual characteristics - Implications for policy and intervention (behavioral vs. bot-centric approaches) - Well-written research brief accessible to both practitioners and academics

Weaknesses: - Limited to Twitter; generalizability to other platforms (Facebook, Reddit) unclear - Temporal scope (2006–2017) predates modern platforms and algorithmic amplification changes - Fact-checking limited to established fact-checking organizations; doesn't address emerging or non-English rumors - Novelty metric inferred from linguistic features; causal mechanism not definitively established - Policy implications (e.g., labeling interventions) suggested but not empirically validated in the brief

Follow-up opportunities: - Cross-platform diffusion studies (Facebook, TikTok, Instagram) - Interventions testing behavioral nudges vs. labels vs. friction - Understanding how algorithmic amplification (post-2017) changes human-bot dynamics - Multilingual and cross-cultural spread patterns - Temporal evolution of false news: does novelty decay, and when does fact-checking reach users?