Disinformation Warfare: Understanding State-Sponsored Trolls on Twitter and Their Influence on the Web¶
Authors: Savvas Zannettou, Tristan Caulfield, Emiliano De Cristofaro, Michael Sirivianos, Gianluca Stringhini, Jeremy Blackburn
Venue: arXiv:1801.09288, 2018
TL;DR¶
Large-scale empirical study of 2.7K Russian state-sponsored troll accounts and 27K tweets (Jan 2016–Sep 2017) reveals minimal direct Twitter influence but significant content amplification through RT news outlet. Trolls actively target political events, adopt multiple identities, concentrate geographically in USA/Germany/Eastern Europe, and employ sophisticated account behavior including batch tweet deletion. Their influence is primarily through Russian state-sponsored news (RT) rather than organic Twitter virality.
Contributions¶
- Ground-truth dataset: Leverages US Congressional disclosure of 2.7K Russian troll accounts, enabling unprecedented scale analysis (27K tweets) of state-sponsored behavior on Twitter
- Multi-platform influence analysis: Quantifies troll influence on Twitter, Reddit, 4chan, and RT using Hawkes processes; finds Russian state-sponsored news outlet RT as primary amplification vector, not organic social media spread
- Temporal and geographic characterization: Documents activity patterns tied to real-world events (US election, Charlottesville protests, Republican National Convention); identifies geographic targeting concentrated in USA, Germany, and Eastern Europe
- Account behavior dynamics: Documents account creation patterns (71% before 2017), screen name changes, follower/friend evolution, client usage, and tweet deletion behavior
- Content dissemination analysis: Analyzes language distribution (61% English, 27% Russian), hashtag strategy, URL usage, mentions of political figures, and sentiment analysis of tweet content
Method¶
Data collection: - Twitter IDs of 2.7K Russian troll accounts disclosed by US Congress during 2016 election investigation - 27K tweets collected via Twitter Streaming API (Jan 2016 – Sep 2017) - Baseline: 1K random Twitter users with similar temporal coverage (96K tweets) - Platforms analyzed: Twitter, Reddit, 4chan, RT (Russia Today) - Analysis scope: Account creation dates, profile metadata, content, temporal patterns, and cross-platform dissemination
Features analyzed: 1. Temporal patterns: Hour of day and day of week distributions; peak activity periods 2. Account characteristics: Screen names (top words: news, info, Trump, politics), account creation dates, language distribution in profile descriptions 3. Client analysis: Device/application used to post (Web Client, TweetDeck, TwitDeck, automated tools) 4. Geographic analysis: Self-reported location from user profiles (geo-coded to 178+ unique locations) 5. Content analysis: Tweet length, character/word distributions, hashtags, mentions, URLs (domains and expansion) 6. Sentiment analysis: Pattern library (lexicon-based) scoring of tweet sentiment; comparison with random baseline users 7. Influence quantification: Hawkes processes modeling information cascades across platforms (Twitter, Reddit, /pol/, RT)
Statistical methods: - Cumulative Distribution Functions (CDF) for account metrics - Hawkes processes for modeling temporal influence propagation - Latent Dirichlet Allocation (10 topics) for topic extraction - Kolmogorov-Smirnov tests for distribution comparisons
Results¶
Temporal characteristics: - Russian trolls active 8:00–15:00 UTC (suggesting Moscow timezone coordination); peak on Mondays–Wednesdays - Account creation spike July 2016 (Republican National Convention period); steady creation through 2017
Account behavior: - 71% of accounts created before 2017; notable creation bursts around major events (July 2016 RNC, Aug 2017 Charlottesville) - Screen names heavily weighted toward "news" terms (1.3%), suggesting impersonation of news sources - 19% of troll accounts changed screen names during observation period (up to 11 name changes per account)
Language and clients: - 61% of tweets in English, 27% in Russian, 3.5% in German - Primary client: Twitter Web Client (50.1%); mobile clients (22.6%); third-party (TweetDeck, IFTTT, Zapier) - 65% of Russian trolls used single client; 28% used two different clients
Geographic patterns: - 261 unique self-reported locations geolocalized; 75% of tweets from USA, Russia, and Eastern Europe - Concentration in Moscow, St. Petersburg, Bern, and major US cities (NYC, LA, DC) - Likely location spoofing to appear local and manipulate opinions in target regions
Content characteristics: - Tweet length: Russian trolls longer tweets than baseline (mean character count higher) - Hashtag usage: 32% of tweets; top hashtags #news, #politics, #POTUS, #ISIS, #MAGA; some controversial (#IslamKills, #BlackLivesMatter) - URL usage: 35% of tweets contain URLs; extensive use of URL shorteners (bit.ly, tinyurl); top domains: news aggregators, political outlets, social networks - Mentions: 46% of tweets include mentions to 8.5K unique users; heavy mention of politicians (Trump, Obama, Clinton)
Sentiment analysis: - 30% of Russian troll tweets positive sentiment; 18% negative; 52% neutral - Significant sentiment differences vs. baseline users (p < 0.01)
Influence estimation (Hawkes process): - Reddit: Russian trolls have higher influence (4.10% mean increase in event probability) compared to other news sources - Twitter: Minimal direct influence (0.01% mean increase); exception: news URLs show higher propagation - RT news outlet: Dominant amplification vector; Russian state-sponsored news more likely to resonate on all platforms - /pol/ and 4chan: Limited but notable troll presence; mean 3.27% influence
Tweet deletion patterns: - 13% of Russian troll accounts deleted some tweets; median 9.7% of tweets deleted - 27% of accounts deleted ≥1 tweet; deletion heavily concentrated October 2016 (suggesting pre-election cleanup)
Account evolution: - Follower increase: mean +2,065 followers during observation period (baseline: +425) - Friend/follower count accumulation suggests targeted network building and influence amplification
Connections¶
- Zannettou et al. (2018) — Who Let The Trolls Out: later comparative study of Russian and Iranian trolls across multiple platforms; this earlier paper focuses narrowly on Russian Twitter operations
- Coordinated inauthentic behavior topic: empirical characterization of troll account coordination tactics
- State-sponsored disinformation topic: documents Russian operational playbook and influence mechanisms
- Twitter platform analysis topic: reveals state-sponsored behavior patterns on mainstream social networks
- Information operations topic: maps specific tactics (identity spoofing, news outlet impersonation, geographic targeting)
Notes¶
Strengths:
- Ground-truth dataset at scale: 2.7K troll accounts from Congressional disclosure provides unambiguous attribution and substantial scale
- Temporal grounding: Activity patterns clearly linked to real-world political events (RNC, election, Charlottesville), strengthening causal interpretation
- Multi-platform scope: Extends beyond Twitter to Reddit, 4chan, and RT to quantify cross-platform amplification; identifies RT as primary influence vector rather than organic Twitter virality
- Sophisticated behavior documentation: Captures account evolution (screen name changes, follower growth), deletion patterns, and client usage as evidence of coordinated infrastructure
- Influence quantification methodology: Uses Hawkes processes (standard for information cascade modeling) to estimate causal influence across platforms; rare for this level of analytical rigor
Limitations & caveats:
- Temporal truncation: Data ends September 2017; misses post-suspension dynamics (most IRA accounts removed late 2018)
- Influence uncertainty: Hawkes process estimates represent statistical correlation (which platform pairs show timing relationships), not confirmed causality. Tweet deletion and account changes during observation make causal inference fragile
- Platform access restrictions: Twitter 1% streaming sample limits completeness; retweet counts may be artificially deflated compared to full API
- Attribution assumption: Congressional disclosure provides strong evidence but independent verification not shown for all accounts; potential for misclassification or actor-spoofing not discussed
- Sentiment and topic coarseness: LDA with 10 topics provides broad strokes; lexicon-based sentiment (no context) may misclassify sarcasm or double-meaning content typical of troll posts
- Baseline comparison fairness: Random Twitter users may differ in account age, verification status, and network position; differences don't necessarily reflect coordinated troll sophistication vs. platform mechanics
Significance:
This paper provides empirical evidence that Russian state-sponsored trolls operated at scale, coordinated behavior across platforms, and strategically targeted political content—but their direct influence on Twitter was minimal. The key finding is that influence flowed through RT (Russian state-sponsored news outlet), not through organic viral spread. This nuance is crucial: it suggests the troll ecosystem functions as a distribution layer for state media rather than as an autonomous propaganda engine. The paper is foundational for understanding how state-sponsored disinformation leverages official news outlets as amplification vectors.
The dataset and methodology (Hawkes processes for cross-platform influence) establish a template for future comparative studies of state-sponsored operations.