The spread of low-credibility content by social bots¶

Authors: Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Kaicheng Yang, Alessandro Flammini, Filippo Menczer Venue: arXiv, 2017 — arxiv:1707.07592

TL;DR¶

Empirical analysis of 14 million Twitter messages during the 2016 U.S. election finds social bots disproportionately amplify low-credibility content: though bots comprise only 6% of accounts sharing such content, they account for 31% of tweets. Bots employ two strategies—rapid early amplification and targeting of influential users—making them significantly more effective than humans at spreading misinformation.

Contributions¶

Large-scale quantitative analysis of bot-driven misinformation spread on Twitter, using complete corpus of tweets (not samples)
Evidence that accounts actively spreading low-credibility content are 5× more likely to be bots than the general population
Identification of two distinct bot manipulation strategies: early viral amplification and targeting of high-influence users
Network analysis showing that disconnecting bot accounts is critical for reducing low-credibility content diffusion
Validation of robustness across different bot-score thresholds and source-selection criteria

Method¶

The authors analyzed 389,569 articles from 120 low-credibility news websites identified via consensus of fact-checking organizations (e.g., Snopes, FactCheck.org, PolitiFact). They tracked all 13,617,425 public tweets linking to these articles from mid-May 2016 through March 2017 using the Hoaxy platform, which provides complete coverage of public tweets via Twitter's streaming API.

To classify accounts as bot or human, they used Botometer, a machine learning classifier trained on thousands of Twitter accounts combining features from user metadata, network patterns, temporal activity, and sentiment analysis. Botometer produces a bot score (0–100%) rather than binary classification; they used a threshold of 0.5 to maximize accuracy while acknowledging inherent ambiguity in automated bot detection.

For network analysis, they constructed directed retweet networks with 630,368 nodes (accounts) and 2,236,041 edges, weighting edges by the number of retweets between account pairs. Each account was assigned a bot score, and diffusion impact was measured by in-degree (retweet reception) and out-degree (retweet broadcast).

Results¶

Bot prevalence in low-credibility spread: - Only 6% of accounts sharing low-credibility content score as likely bots, but they account for 31% of all tweets linking to such articles - Accounts spreading low-credibility content are significantly more likely to be bots than the general population - By contrast, the distribution of bot participation in fact-checking content is indistinguishable from that of low-credibility content

Early amplification strategy: Bots engage in "super-spreader" activity, particularly in the critical first seconds after an article is published. Accounts in the top tercile of bot scores mention 7.5–8 million followers on average, compared to 7 million for lower-score accounts, suggesting bots preferentially target influential users through replies and mentions.

Network dismantling analysis: Removing bot-score-ranked accounts is an effective strategy for reducing low-credibility content spread. Disconnecting accounts in the top 10% of bot scores reduces overall retweet volume by 43% but eliminating the same number of accounts by random selection has negligible impact, demonstrating that bot targeting is necessary for effective mitigation.

Content popularity: Successful low-credibility articles receive amplification comparable to fact-checked articles in total tweet volume, but the distribution is heavily skewed—a small number of articles achieve viral status while most remain unshared. Satire websites (e.g., The Onion) are most heavily bot-amplified.

Connections¶

Related to Propagation Models via shared network-diffusion framework
Builds on Bot detection work, specifically using Botometer for account classification
Extends Misinformation and fake news detection research by focusing on amplification mechanism rather than content veracity
Compared against recent work (Vosoughi et al.) analyzing rumors on Twitter with different methodologies
Relevant to Social media manipulation and countermeasures via automated account detection

Notes¶

This paper provides robust empirical evidence for bot-driven amplification of misinformation, addressing prior work that lacked systematic data or relied on anecdotal evidence. The use of complete tweet corpora (not samples) is methodologically strong, and the results hold across different bot-score thresholds and source-selection criteria.

The findings have clear policy implications: curbing social bots may effectively mitigate low-credibility content spread. However, the paper notes trade-offs—bot detection at scale risks false positives that could suspend legitimate accounts. The authors acknowledge limitations: Botometer is not perfectly accurate, and the distinction between automated and human-manipulated accounts is blurry. The focus on Twitter means findings may not generalize to Facebook, where similar dynamics have been observed but lack systematic study.

The paper is cited extensively in subsequent bot-detection and misinformation work and contributed to increased attention on automated account detection as a mitigation strategy.