Characterizing the Use of Images in State-Sponsored Information Warfare Operations by Russian Trolls on Twitter¶
Authors: Savvas Zannettou, Tristan Caulfield, Barry Bradlyn, Emiliano De Cristofaro, Gianluca Stringhini, Jeremy Blackburn
Venue: 14th International AAAI Conference on Web and Social Media (ICWSM 2020)
ArXiv: 1901.05997
TL;DR¶
Large-scale analysis of 1.8M images posted by ~3.6K Russian state-sponsored Twitter accounts (IRA-controlled). Russian trolls strategically deployed images and memes to maximize political influence, with political imagery 2+ times more effective than general content. The study reveals how state-sponsored actors weaponize visual content to amplify disinformation across Web communities.
Contributions¶
- First comprehensive empirical study of image use by state-sponsored accounts on Twitter, analyzing 1.8M images from 9M+ tweets over October 2015–October 2018.
- Image processing pipeline using Google Cloud Vision API for entity extraction, clustering, and semantic analysis of image content.
- Temporal analysis showing spike in image posting after 2016, particularly around political events (e.g., 2017 Unite the Right rally, Charlottesville).
- Entity-based analysis identifying top targeted entities: Vladimir Putin, Donald Trump, Barack Obama, Hillary Clinton, Ukraine, Russia.
- Cross-platform influence analysis using Hawkes Processes to quantify how images posted by Russian trolls propagated across Reddit, The_Donald, Gab, 4chan, and mainstream news outlets.
- Finding that political-related images were over 2× more influential (23.6% efficiency) than non-political content (11.3%), particularly effective on The_Donald and Reddit.
Method¶
The authors built an image processing pipeline operating on a ground truth dataset of tweets posted by Russian trolls. Steps included:
-
Dataset construction: Collected all tweets (9M+) from 3.6K Twitter accounts identified as IRA-controlled, identified by the Russian Internet Research Agency. Dataset spans October 2015 to October 2018, yielding 1.8M unique images (19% of tweets contained images).
-
Image preprocessing: Extracted pHashes for each image to identify visual duplicates. Found 78,624 clusters containing 753,634 images, with significant duplication (43% of images identical to others).
-
Entity extraction and annotation: For each image cluster's medoid, used Google Cloud Vision API to extract entities (objects, people, text) and associated URLs appearing in the image. This yielded structured annotations about what the images depicted.
-
Clustering and semantic analysis: Grouped visually similar images using pHash distance and Cloud Vision APIs. Manually annotated clusters to understand thematic content and identify image communities.
-
Hawkes Process modeling: For each domain (Reddit, Twitter, The_Donald, Gab, etc.), fit self-exciting temporal point processes to model how image postings on one platform influence subsequent postings on others. Fitted separate models for images vs. URLs to compare their influence.
-
Cross-platform influence analysis: Estimated influence matrices showing the probability that an image appearance on source platform causes appearance on destination platform. Compared these to baseline rates and to influence of regular Twitter users.
Results¶
General characterization: - Only 9.7% of Russian troll accounts shared no images. - Average 502.2 images per account; median 37. - Peak activity in August 2017 (Charlottesville rally and counter-protester death), coinciding with real-world political events. - 19% of tweets contained images; 11.8% of those auto-generated via URL preview, rest explicitly posted.
Entity analysis: - Top entities in troll-shared images: Russia (783 clusters), Vladimir Putin (1,377 clusters), Donald Trump (1,281 clusters), Barack Obama (571 clusters). - Entities heavily focused on US politics, Russia, Ukraine, and Western nations.
Cross-platform influence: - Russian trolls' images appeared across major platforms: Pinterest (9,433 clusters), Twitter (8,371), mainstream news outlets, Reddit, YouTube. - Hawkes analysis showed Russian state-sponsored accounts had highest influence on The_Donald subreddit (68.5% efficiency), followed by Reddit (19.5%) and Gab (1.31%). - Political-related imagery 2.3× more efficient than non-political (23.6% vs 11.3%). - Republican party–related images: Russian trolls ~96.5% efficient on /pol/, 66.3% on Twitter, highly influential on Reddit. - Democratic party–related images: Russian trolls more efficient on Gab (4.0%) and 4chan (3.18%) than mainstream platforms. - Comparison: Russian trolls more efficient spreading images (6.5% external influence on mainstream communities) vs. news URLs (1.29%).
Most influential images: - Democratic Party: Bigfoot meme (mocking environmentalism), images of Hillary Clinton, Bernie Sanders. - Republican Party: Trump campaign imagery, pro-Trump political messaging, Mike Pence images.
Connections¶
- Related to Computational Propaganda via shared focus on state-sponsored platform manipulation.
- Complements Russian disinformation and state-sponsored information operations work by analyzing visual (not textual) content strategies.
- Builds on temporal analysis methods used in Information diffusion in social networks and Information Spread.
- Related to Visual misinformation detection and spread.
- Connected to Twitter-based disinformation research and computational propaganda detection.
- Cited by subsequent work on meme-based disinformation and cross-platform influence.
Notes¶
Strengths: - First large-scale empirical analysis of images by state-sponsored accounts; novel methodological contribution with image clustering + Hawkes modeling. - Comprehensive dataset (1.8M images, 9M tweets) and rigorous ground truth identification. - Clear temporal patterns (spike post-2016) align with known political events, validating findings. - Cross-platform influence modeling goes beyond single-platform analysis.
Limitations: - Relies on Google Cloud Vision API for entity extraction, which has unknown accuracy/bias on politically charged content. - Ground truth relies on Twitter's (likely incomplete) identification of Russian troll accounts; dataset methodology remains contested. - Analysis focuses on image sharing patterns, not actual human reception or downstream behavior change. - No analysis of false positives or confirmation that detected images were actually disinformation (some may be legitimate news).
Open questions: - Why were Russian trolls particularly effective on The_Donald vs. mainstream outlets? (Platform norms, audience composition?) - How do automatically-generated image previews (from URLs) interact with deliberate image posting strategies? - What role did image-based memes play in 2016 and 2020 US elections specifically?