The Gospel According to Q: Understanding the QAnon Conspiracy from the Perspective of Canonical Information¶
Authors: Antonis Papasavva, Max Aliapoulios, Cameron Ballard, Emiliano De Cristofaro, Gianluca Stringhini, Savvas Zannettou, Jeremy Blackburn
Venue: Proceedings of the 16th International AAAI Conference on Web and Social Media (ICWSM 2022), 2021 — arXiv
TL;DR¶
This paper systematically studies the QAnon conspiracy theory by collecting and analyzing 4,961 unique Q drops from six aggregation sites. The authors find that aggregation sites show poor agreement on what constitutes a canonical Q drop, that Q drops are incoherent and low-quality relative to other social media platforms, and that stylometric analysis suggests multiple authors were behind the Q persona. The study traces QAnon's spread from fringe imageboards (4chan, 8kun) to mainstream social networks (Reddit, Twitter, YouTube), with Reddit playing a critical role in mainstreaming the conspiracy theory.
Contributions¶
- Dataset and collection: Gathered 30,320 Q drops (4,961 unique) from six aggregation sites (qagg, qalerts, operationq, qanonews, qanonpub, qmap.pub) from 2017–2020; collected corresponding Reddit, 4chan, and 8kun data; published dataset as a research resource.
- Canonicalization analysis: Demonstrates that aggregation sites—platforms meant to archive and preserve Q's authentic posts—exhibit low agreement (poor Fleiss κ scores) on what constitutes the canonical set of Q drops, raising questions about the authenticity and integrity of the conspiracy theory's source material.
- Stylometric characterization: Uses character-level and word-level features (digits, special characters, punctuation, vocabulary richness) to show that Q's writing habits change significantly over time and that multiple individuals likely authored posts under the Q persona.
- Content analysis: Applies word embeddings, topic modeling, and sentiment analysis to characterize the topics, toxicity, and coherence of Q drops; finds Q drops are exceptionally incoherent and low-quality compared to mainstream platforms.
- Platform spread analysis: Traces QAnon's dissemination across social media platforms, documenting Reddit's critical role in transitioning the conspiracy from fringe communities to mainstream adoption; examines aggregation link dynamics on Reddit and impact of platform enforcement actions.
Method¶
The study employs multiple complementary methods:
Data collection: Six aggregation sites were scraped for all available Q drops using custom crawlers; corresponding Reddit posts mentioning Q drops were collected via Pushshift; 4chan and 8kun threads were harvested from archive.org. Total: 4,961 unique Q drops, 121,956 Reddit posts and comments, and data from 4chan/8kun boards.
Canonicalization analysis: Compared the set of Q drops across aggregation sites using Fleiss' kappa as a measure of inter-annotator agreement to assess how well different archives agree on the canonical set.
Stylometric analysis: Extracted character-level (digits, special characters, punctuation) and word-level (vocabulary richness) features for each tripcode era; used Kolmogorov-Smirnov tests to determine whether distributions differ between tripcode-defined eras, providing evidence that different individuals wrote Q posts.
Content analysis: Used word2vec embeddings to identify topic clusters within Q drops via Latent Dirichlet Allocation and community detection; analyzed toxicity and incoherence using Google's Perspective API with models for "severe toxicity," "threat," and "incoherence"; performed sentiment analysis.
Platform dissemination: Tracked temporal patterns of Q mentions on Reddit across subreddits; measured link sharing behavior to aggregation sites; analyzed how subreddit bans affected the platform's Q-related activity.
Results¶
Canonicalization: The six aggregation sites show poor agreement on which drops are canonical. The authors identified 302 drops that all sites exclude, suggesting that aggregation site operators make editorial decisions about authenticity and curation, undermining the claim that aggregation sites provide an authoritative archive of Q's words.
Stylometry: Tripcode analysis shows statistically significant differences between era-specific writing habits (e.g., digit usage, special character frequency, punctuation patterns). This indicates that multiple individuals likely authored Q posts, or that Q deliberately changed writing style over time—either way, undermining claims of a singular authentic Q persona.
Content: Word embeddings reveal Q drops discuss government oversight, religious/spiritual themes, and accusations against officials. Q drops score significantly lower on coherence (median 0.04) than mainstream platforms like Reddit (0.14) or news sites (0.19–0.27), suggesting that Q's appeal lies not in logical consistency but in other factors (conspiracy mystique, interpretive flexibility, emotional resonance).
Toxicity: Q drops are not particularly toxic (median 0.04 on the API's severity scale), contradicting claims that Q promotes violent rhetoric. However, Q drops are incoherent: 99% score above 0.5 on incoherence, much higher than baseline platforms.
Dissemination: Reddit played a critical role in QAnon's transition to mainstream visibility. r/CBTS_Stream and r/greatawakening were the primary QAnon subreddits, with peaks in late 2017 and early 2018 before the subreddits were banned by Reddit. After bans, activity persisted but declined overall. Aggregation site link dissemination on Reddit was initially limited to QAnon-focused communities but later expanded wider, accelerating the conspiracy's reach.
Connections¶
- Related to Conspiracy theories as an empirical case study of how conspiracy theories evolve, canonicalize, and spread across social networks.
- Extends psychological work on conspiracy beliefs to the specific case of QAnon, examining mechanisms of belief adoption and community formation.
- Complements Marwick & Lewis on disinformation ecosystems by analyzing QAnon as a large-scale coordination effort across multiple platforms.
- Contributes to Platform moderation research by documenting how platform enforcement (subreddit bans) impacts conspiracy theory dissemination and resilience.
- Relevant to Computational Propaganda and information operation analysis; QAnon exemplifies large-scale coordinated narratives spread across decentralized channels.
- Related to social media analysis of fringe-to-mainstream pipeline; demonstrates how algorithmic recommendations and aggregation sites accelerate conspiracy adoption.
Notes¶
Strengths: - Comprehensive multi-platform dataset combining original sources (4chan, 8kun) with aggregation sites and mainstream platforms. - Rigorous stylometric analysis providing empirical evidence against the "single authentic author" narrative. - Clear quantitative documentation of canonicalization failures—a novel perspective on how conspiracy theories lose internal coherence as they scale. - Honest assessment of Q's incoherence and low quality, resisting the temptation to over-interpret Q's content.
Limitations: - Stylometric analysis assumes that tripcode changes mark distinct authors, but cannot rule out deliberate style-shifting by a single individual. - Content analysis via topic modeling and Perspective API has known limitations (Perspective API's toxicity/threat/incoherence models are imperfect and may encode cultural biases). - Dataset does not capture all Q drops or all related activity; some archive.org data may be incomplete.
Open questions: - How do aggregation site editorial decisions influence believer interpretations of Q authenticity? - What specific factors drove Q's appeal despite its demonstrable incoherence and low logical quality? - How did algorithmic amplification on Reddit contribute to the rapid mainstreaming between late 2017 and 2018?