Who Let The Trolls Out? Towards Understanding State-Sponsored Trolls¶
Authors: Savvas Zannettou, Tristan Caulfield, William Setzer, Michael Sirivianos, Gianluca Stringhini, Jeremy Blackburn
Venue: arXiv:1811.03130, 2018
TL;DR¶
Comparative analysis of 934 Russian and 770 Iranian state-sponsored trolls across Twitter and Reddit (10M posts) reveals distinct operational patterns: Russian trolls prioritized URL amplification (retweets, mentions) and pro-Trump messaging; Iranian trolls pursued regional and anti-Trump narratives motivated by real-world events. Both groups' behavior evolved over time with sophisticated platform-specific tactics, but Russian trolls were consistently more effective at spreading content across networks.
Contributions¶
- Ground-truth troll dataset: First study combining Twitter's October 2018 IRA disclosure with Reddit's discovery of Russian/Iranian accounts (944 total), enabling comparative analysis impossible with platform-only data.
- Multi-platform behavior characterization: Documents how troll operations adapted content, timing, and client usage differently across Twitter, Reddit, Gab, and 4chan; reveals platform-aware strategic shifts.
- Temporal dynamics: Identifies critical inflection points (e.g., 2016 U.S. election, Ukrainian conflict, Crimean referendum) showing real-time adaptation to geopolitical events.
- Influence measurement: Uses word embeddings, Hawkes processes, and URL mention networks to quantify troll influence; Russian trolls achieved 2-3x higher URL mention rates than Iranian trolls on Twitter.
- Ideological profiling: Demonstrates Russian trolls cluster around pro-Trump, divisive hashtags (#MAGA, #TrumpRally); Iranian trolls cluster around anti-Trump, regional conflict topics (#Iran, #FreePalestine, #SaveYemen).
Method¶
Data collection: - Twitter: 934 Russian trolls, 770 Iranian trolls identified by Twitter's October 2018 IRA disclosure and Reddit's April 2018 identification of state-sponsored accounts - Reddit: 944 accounts (Russian and Iranian combined) - Gab: State-sponsored accounts identified via prior research and manual verification - 4chan: Politically Incorrect board (/pol/) posts (limited dataset, ~17K posts) - Temporal span: 2012–2018 for Twitter; varies by platform for other communities - Total posts/tweets analyzed: 10M+
Feature analysis: 1. Account characteristics: Follower/friend ratios, account creation dates, profile descriptions, language diversity 2. Temporal patterns: Hour of day/week, tweet volume trends, seasonal variations, activity clustering around events 3. Language analysis: Word embeddings (word2vec), language composition, linguistic markers of ideological positioning 4. Client usage: Extraction of Twitter clients (Web, mobile, automated tools) as proxy for coordination sophistication 5. Content analysis: Hashtag networks (visualization and community detection), URL domains, topic modeling (Latent Dirichlet Allocation) 6. Influence estimation: Mention/retweet networks, Hawkes process modeling of information cascade timing and magnitude
Statistical methods: - Cumulative Distribution Functions (CDF) for follower counts, account age - Community detection (Louvain algorithm) on hashtag co-occurrence graphs - word2vec embeddings (100-dimensional) for semantic word similarity - Hawkes processes to model temporal clustering and influence decay - Latent Dirichlet Allocation (10 topics) for semantic topic extraction
Results¶
Account characteristics:
Russian trolls: Majority created 2014–2016 (Ukrainian conflict period); pro-Trump orientation evident in profile descriptions. Iranian trolls: More even distribution across years; regional/anti-U.S. focus.
Temporal analysis:
- Russian trolls on Twitter: Active throughout day with slight dip on Sunday; coordinated activity spikes around major political events (2016 election, Trump presidency announcement)
- Iranian trolls: Less activity in early hours (UTC); more concentrated in late evening
- Both groups: Substantial engagement ramp during 2016 U.S. election period; Russian activity detected heavily on Reddit Jan 2015 onwards
Language and client usage:
- Russian trolls: Predominantly Russian (53% of tweets), English (36%), German (9%), French (8%); primary client "Twitter Web Client" (28.5%); heavy use of automated posting tools
- Iranian trolls: More multilingual (English, Arabic, Turkish, Farsi); initially used Facebook's "Share" button (before 2015); shifted to "Twitter Web Client" by 2016
- Geographic indicators: Russian tweets geolocation showed USA (29%), Russia (34%), with concentrated activity in major cities; Iranian tweets showed USA (8%), France (26%), Brazil (9%)—suggesting targeting of foreign audiences, particularly French speakers regarding Iran nuclear deal negotiations
Content analysis:
Hashtags: - Russian trolls: Sports/news hashtags (#news, #politics) in general audience center; pro-Trump cluster (#MAGA, #TrumpRally, #MakeAmericaGreat); Black Lives Matter co-opted (#BlackLivesMatter, #BLM) - Iranian trolls: Regional topics dominate (#Iran, #FreePalestine, #SaveYemen, nuclear deal); anti-Trump messaging secondary to regional geopolitical focus
URLs and Subreddits: - Russian Twitter: 5.4% tweet rate with URLs (highest among analyzed platforms); top domains: news aggregators, political outlets - Russian Reddit: r/uncen (11% of posts)—dedicated to state-sponsored narrative control; other subreddits r/uncen, r/russian_ira exploited for cryptocurrency, election manipulation - Iranian Twitter/Reddit: Limited URL sharing; localized domains (jordan-times.com, irandaily.com)
Influence estimation:
Hawkes process analysis of mention/retweet cascades: - Russian trolls: Tweets mentioning URLs reached mean 500–1,000 retweets within cascade windows; influential multiplier 1.5–2.0x human baselines - Iranian trolls: Tweets reached 100–500 retweets; less efficiency in cascade initiation - URL domain reach: Russian-shared domains appeared in 5.4% of all tweets on Twitter; Iranian domains <1%
Behavior evolution:
Russian trolls: 2014–2015 initial broad campaign; 2016 shift toward election influence; 2017–2018 tactical adaptation (cryptographic, cryptocurrency, diverse narrative themes)
Iranian trolls: 2016 emergence; steady growth; 2017–2018 campaign refinement around Middle Eastern conflicts (Saudi Arabia, Yemen, Palestine)
Connections¶
- Linvill & Warren (2020) — Troll Factories: complementary organizational analysis of Russian IRA structure; this paper reveals behavioral patterns across platforms
- Bail et al. (2020) — Assessing Russian IRA Impact: causal assessment of IRA's political effects; this paper documents the mechanisms of content distribution and influence amplification
- Golovchenko et al. (2020) — Cross-Platform State Propaganda: examines Russian propaganda across Twitter and YouTube; this paper extends the platform portfolio to include Reddit and Gab
- Lukito (2019) — Multi-Platform Disinformation: studies coordinated campaigns; this paper adds comparative analysis of competing state actors (Russia vs. Iran)
- Coordinated inauthentic behavior topic: foundational empirical characterization of state-sponsored coordination
- State-sponsored information operations topic: maps Russian and Iranian tactics across multiple platforms
Notes¶
Strengths:
- Novel comparative framing: First side-by-side analysis of Russian and Iranian operations, enabling hypothesis generation about state strategic priorities (Russia: U.S. election/polarization; Iran: regional geopolitics/anti-U.S. messaging)
- Multi-platform scope: Extends beyond Twitter to Reddit, Gab, 4chan—revealing platform-specific adaptation (e.g., different client usage, domain focus) suggests actors learned and optimized per-platform strategies
- Temporal grounding: Connecting activity spikes to real-world events (Crimea, Ukrainian conflict, 2016 election) strengthens causal interpretation of why troll campaigns intensified
- Diverse analysis methods: Combines network analysis (hashtag/URL graphs), NLP (word embeddings, LDA), time series (Hawkes), and qualitative content review—triangulation across methods
- Ground truth at scale: 10M posts from verified state-sponsored accounts (not heuristic-detected bots); enables precision unavailable in prior bot-detection studies
Limitations & caveats:
- Platform bias: Analysis restricted to platforms disclosed accounts (Twitter via DOJ; Reddit via user identification); activity on VK, Telegram, or closed platforms unmeasured. Findings may not generalize to Russian/Iranian operations on domestic platforms where they may be more active
- Temporal truncation: Data predominantly 2012–2018; miss post-2018 operational evolution, particularly after platform suspensions (IRA accounts mostly removed late 2018)
- Attribution uncertainty: While accounts were disclosed or identified by Reddit researchers, causal linkage to Russian/Iranian governments assumed rather than independently verified; possibility of misattribution or actor-spoofing not discussed
- Influence vs. reach: Study measures retweet/mention counts and URL propagation as proxies for "influence" but cannot assess persuasion (cf. Bail et al. 2020 for causal effects). High reach ≠ high impact
- Geographic bias: Over-sampled U.S. and Western audiences (focus on Twitter, English hashtags); may underestimate domestic influence in Russia, Iran, or allied regions
- Mechanism opacity: Identifies what trolls did (content, timing, platforms) and where they had reach, but not why certain campaigns succeeded (algorithm, network homophily, incumbent amplification unclear)
Significance:
This paper is foundational for empirical characterization of state-sponsored troll operations at scale. It demonstrates that:
- State actors employ sophisticated, platform-aware, real-time responsive strategies—not rote mass spamming
- Different states prioritize different geographic and ideological targets, aligned with geopolitical interests
- Operational effectiveness varies dramatically by platform and content type (Russians excel at URL amplification; Iranians at narrative-building in niche communities)
- Troll influence operates through legitimacy imitation (appearing as authentic news sources, hashtag exploitation) and cascade mechanics, not just follower inflation
The comparative lens and multi-platform scope make this a key reference for understanding state-sponsored disinformation operations (distinct from effects, which are debated; see Bail et al. 2020 for null findings on polarization).
The dataset and code are released for reproducibility, enabling follow-up studies on troll detection, bot-human interaction patterns, and downstream influence on genuine users—a model for responsible disclosure in this sensitive domain.