Who Let The Trolls Out? Towards Understanding State-Sponsored Trolls¶

Authors: Savvas Zannettou, Tristan Caulfield, William Setzer, Michael Sirivianos, Gianluca Stringhini, Jeremy Blackburn

Venue: arXiv:1811.03130, 2018

TL;DR¶

Comparative analysis of 934 Russian and 770 Iranian state-sponsored trolls across Twitter and Reddit (10M posts) reveals distinct operational patterns: Russian trolls prioritized URL amplification (retweets, mentions) and pro-Trump messaging; Iranian trolls pursued regional and anti-Trump narratives motivated by real-world events. Both groups' behavior evolved over time with sophisticated platform-specific tactics, but Russian trolls were consistently more effective at spreading content across networks.

Contributions¶

Ground-truth troll dataset: First study combining Twitter's October 2018 IRA disclosure with Reddit's discovery of Russian/Iranian accounts (944 total), enabling comparative analysis impossible with platform-only data.
Multi-platform behavior characterization: Documents how troll operations adapted content, timing, and client usage differently across Twitter, Reddit, Gab, and 4chan; reveals platform-aware strategic shifts.
Temporal dynamics: Identifies critical inflection points (e.g., 2016 U.S. election, Ukrainian conflict, Crimean referendum) showing real-time adaptation to geopolitical events.
Influence measurement: Uses word embeddings, Hawkes processes, and URL mention networks to quantify troll influence; Russian trolls achieved 2-3x higher URL mention rates than Iranian trolls on Twitter.
Ideological profiling: Demonstrates Russian trolls cluster around pro-Trump, divisive hashtags (#MAGA, #TrumpRally); Iranian trolls cluster around anti-Trump, regional conflict topics (#Iran, #FreePalestine, #SaveYemen).

Method¶

Data collection: - Twitter: 934 Russian trolls, 770 Iranian trolls identified by Twitter's October 2018 IRA disclosure and Reddit's April 2018 identification of state-sponsored accounts - Reddit: 944 accounts (Russian and Iranian combined) - Gab: State-sponsored accounts identified via prior research and manual verification - 4chan: Politically Incorrect board (/pol/) posts (limited dataset, ~17K posts) - Temporal span: 2012–2018 for Twitter; varies by platform for other communities - Total posts/tweets analyzed: 10M+

Feature analysis: 1. Account characteristics: Follower/friend ratios, account creation dates, profile descriptions, language diversity 2. Temporal patterns: Hour of day/week, tweet volume trends, seasonal variations, activity clustering around events 3. Language analysis: Word embeddings (word2vec), language composition, linguistic markers of ideological positioning 4. Client usage: Extraction of Twitter clients (Web, mobile, automated tools) as proxy for coordination sophistication 5. Content analysis: Hashtag networks (visualization and community detection), URL domains, topic modeling (Latent Dirichlet Allocation) 6. Influence estimation: Mention/retweet networks, Hawkes process modeling of information cascade timing and magnitude

Statistical methods: - Cumulative Distribution Functions (CDF) for follower counts, account age - Community detection (Louvain algorithm) on hashtag co-occurrence graphs - word2vec embeddings (100-dimensional) for semantic word similarity - Hawkes processes to model temporal clustering and influence decay - Latent Dirichlet Allocation (10 topics) for semantic topic extraction

Results¶

Account characteristics:

Russian trolls: Majority created 2014–2016 (Ukrainian conflict period); pro-Trump orientation evident in profile descriptions. Iranian trolls: More even distribution across years; regional/anti-U.S. focus.

Temporal analysis:

Russian trolls on Twitter: Active throughout day with slight dip on Sunday; coordinated activity spikes around major political events (2016 election, Trump presidency announcement)
Iranian trolls: Less activity in early hours (UTC); more concentrated in late evening
Both groups: Substantial engagement ramp during 2016 U.S. election period; Russian activity detected heavily on Reddit Jan 2015 onwards

Language and client usage:

Russian trolls: Predominantly Russian (53% of tweets), English (36%), German (9%), French (8%); primary client "Twitter Web Client" (28.5%); heavy use of automated posting tools
Iranian trolls: More multilingual (English, Arabic, Turkish, Farsi); initially used Facebook's "Share" button (before 2015); shifted to "Twitter Web Client" by 2016
Geographic indicators: Russian tweets geolocation showed USA (29%), Russia (34%), with concentrated activity in major cities; Iranian tweets showed USA (8%), France (26%), Brazil (9%)—suggesting targeting of foreign audiences, particularly French speakers regarding Iran nuclear deal negotiations

Content analysis:

Hashtags: - Russian trolls: Sports/news hashtags (#news, #politics) in general audience center; pro-Trump cluster (#MAGA, #TrumpRally, #MakeAmericaGreat); Black Lives Matter co-opted (#BlackLivesMatter, #BLM) - Iranian trolls: Regional topics dominate (#Iran, #FreePalestine, #SaveYemen, nuclear deal); anti-Trump messaging secondary to regional geopolitical focus

URLs and Subreddits: - Russian Twitter: 5.4% tweet rate with URLs (highest among analyzed platforms); top domains: news aggregators, political outlets - Russian Reddit: r/uncen (11% of posts)—dedicated to state-sponsored narrative control; other subreddits r/uncen, r/russian_ira exploited for cryptocurrency, election manipulation - Iranian Twitter/Reddit: Limited URL sharing; localized domains (jordan-times.com, irandaily.com)

Influence estimation:

Hawkes process analysis of mention/retweet cascades: - Russian trolls: Tweets mentioning URLs reached mean 500–1,000 retweets within cascade windows; influential multiplier 1.5–2.0x human baselines - Iranian trolls: Tweets reached 100–500 retweets; less efficiency in cascade initiation - URL domain reach: Russian-shared domains appeared in 5.4% of all tweets on Twitter; Iranian domains <1%

Behavior evolution:

Russian trolls: 2014–2015 initial broad campaign; 2016 shift toward election influence; 2017–2018 tactical adaptation (cryptographic, cryptocurrency, diverse narrative themes)

Iranian trolls: 2016 emergence; steady growth; 2017–2018 campaign refinement around Middle Eastern conflicts (Saudi Arabia, Yemen, Palestine)

Connections¶

Linvill & Warren (2020) — Troll Factories: complementary organizational analysis of Russian IRA structure; this paper reveals behavioral patterns across platforms
Bail et al. (2020) — Assessing Russian IRA Impact: causal assessment of IRA's political effects; this paper documents the mechanisms of content distribution and influence amplification
Golovchenko et al. (2020) — Cross-Platform State Propaganda: examines Russian propaganda across Twitter and YouTube; this paper extends the platform portfolio to include Reddit and Gab
Lukito (2019) — Multi-Platform Disinformation: studies coordinated campaigns; this paper adds comparative analysis of competing state actors (Russia vs. Iran)
Coordinated inauthentic behavior topic: foundational empirical characterization of state-sponsored coordination
State-sponsored information operations topic: maps Russian and Iranian tactics across multiple platforms

Notes¶

Strengths:

Novel comparative framing: First side-by-side analysis of Russian and Iranian operations, enabling hypothesis generation about state strategic priorities (Russia: U.S. election/polarization; Iran: regional geopolitics/anti-U.S. messaging)
Multi-platform scope: Extends beyond Twitter to Reddit, Gab, 4chan—revealing platform-specific adaptation (e.g., different client usage, domain focus) suggests actors learned and optimized per-platform strategies
Temporal grounding: Connecting activity spikes to real-world events (Crimea, Ukrainian conflict, 2016 election) strengthens causal interpretation of why troll campaigns intensified
Diverse analysis methods: Combines network analysis (hashtag/URL graphs), NLP (word embeddings, LDA), time series (Hawkes), and qualitative content review—triangulation across methods
Ground truth at scale: 10M posts from verified state-sponsored accounts (not heuristic-detected bots); enables precision unavailable in prior bot-detection studies

Limitations & caveats:

Platform bias: Analysis restricted to platforms disclosed accounts (Twitter via DOJ; Reddit via user identification); activity on VK, Telegram, or closed platforms unmeasured. Findings may not generalize to Russian/Iranian operations on domestic platforms where they may be more active
Temporal truncation: Data predominantly 2012–2018; miss post-2018 operational evolution, particularly after platform suspensions (IRA accounts mostly removed late 2018)
Attribution uncertainty: While accounts were disclosed or identified by Reddit researchers, causal linkage to Russian/Iranian governments assumed rather than independently verified; possibility of misattribution or actor-spoofing not discussed
Influence vs. reach: Study measures retweet/mention counts and URL propagation as proxies for "influence" but cannot assess persuasion (cf. Bail et al. 2020 for causal effects). High reach ≠ high impact
Geographic bias: Over-sampled U.S. and Western audiences (focus on Twitter, English hashtags); may underestimate domestic influence in Russia, Iran, or allied regions
Mechanism opacity: Identifies what trolls did (content, timing, platforms) and where they had reach, but not why certain campaigns succeeded (algorithm, network homophily, incumbent amplification unclear)

Significance:

This paper is foundational for empirical characterization of state-sponsored troll operations at scale. It demonstrates that:

State actors employ sophisticated, platform-aware, real-time responsive strategies—not rote mass spamming
Different states prioritize different geographic and ideological targets, aligned with geopolitical interests
Operational effectiveness varies dramatically by platform and content type (Russians excel at URL amplification; Iranians at narrative-building in niche communities)
Troll influence operates through legitimacy imitation (appearing as authentic news sources, hashtag exploitation) and cascade mechanics, not just follower inflation

The comparative lens and multi-platform scope make this a key reference for understanding state-sponsored disinformation operations (distinct from effects, which are debated; see Bail et al. 2020 for null findings on polarization).

The dataset and code are released for reproducibility, enabling follow-up studies on troll detection, bot-human interaction patterns, and downstream influence on genuine users—a model for responsible disclosure in this sensitive domain.