Botometer 101: Social bot practicum for computational social scientists¶
Authors: Kai-Cheng Yang, Emilio Ferrara, Filippo Menczer
Venue: arXiv, 2022 — arXiv:2201.01608
TL;DR¶
This tutorial introduces Botometer, a supervised machine learning classifier for detecting social bots on Twitter. The authors explain how Botometer extracts features from account profiles and tweets, describe multiple versions optimized for different use cases (including BotometerLite for large-scale analysis), and provide practical guidance on interpreting bot scores and avoiding common pitfalls in bot detection research.
Contributions¶
- Practical tutorial on using Botometer for computational social scientists unfamiliar with bot detection
- Explanation of Botometer's architecture: supervised ML classifier using ~1,000 features across six categories (user profile, friends, network, temporal, content/language, sentiment)
- Documentation of Botometer versions (V1–V4 with ESC architecture; Lite for speed) and their evolution
- Case study applying Botometer to cryptocurrency-related Twitter discussions
- Recommended practices for threshold selection, score interpretation, and responsible use
Method¶
Botometer-V4 is a supervised machine learning classifier that distinguishes bot-like from human-like accounts using feature vectors extracted from account metadata and recent tweets. Features span six categories: user profile characteristics (name patterns, profile image, age, description), friends and followers networks, temporal patterns (posting frequency and regularity), content analysis (word counts, parts of speech), language-based features (emphasis on English), and sentiment analysis. The classifier uses an Ensemble of Specialized Classifiers (ESC) architecture with multiple Random Forest models—one for each bot type in training data—to capture diverse bot behaviors.
During inference, Botometer fetches 200 recent tweets and user metadata from Twitter, extracts features from this data, and produces both raw bot scores (in the [0, 1] range) and Complete Automation Probability (CAP) scores (rescaled Bayesian posteriors representing the probability an account is automated). CAP scores balance false positives and false negatives by incorporating both classifier predictions and prior knowledge of bot prevalence on Twitter.
BotometerLite is a faster variant introduced to enable large-scale analysis. It relies only on user metadata (embedded in each tweet), avoiding extra API queries and dramatically reducing overhead. BotometerLite uses data selection mechanisms to choose a subset of training data that optimizes for accuracy, generalization, and consistency across bot types.
Results¶
Botometer-V4 achieves an AUC (area under the receiver operating characteristic curve) of 0.99 in controlled experimental settings on annotated datasets, indicating strong discrimination between bots and humans. The case study on three cryptocurrency-related hashtags ($SHIB, $FLOKI, $AAPL) reveals bimodal bot score distributions, with some hashtags showing significantly higher bot activity than others (e.g., 66.4% of tweets from likely bots on $SHIB using a 0.5 threshold, vs. 36.9% on $AAPL). Importantly, Botometer scores fluctuate over time for individual accounts (as shown in a time series from September 2020 to November 2021), reflecting the transient nature of bot detection based on recent activity.
Connections¶
- Online Human-Bot Interactions: Detection, Estimation, and Characterization — foundational work on detecting and characterizing bot behavior on Twitter
- The spread of low-credibility content by social bots — analyzes social bots' role in spreading low-credibility information
- The Rise of Social Bots — early overview of the social bot phenomenon and its prevalence
- Scalable and Generalizable Social Bot Detection through Data Selection — prior work by Yang et al. on efficient bot detection through data selection
- Computational social science and large-scale text analysis — broader methodological context for applying computational methods to social phenomena
- Twitter Analysis — platform-specific research on Twitter and misinformation
- Machine Learning Feature Engineering — technical foundation for supervised bot detection
Notes¶
This is a practicum paper: a tutorial and not a novel methodological contribution. Its value lies in making Botometer accessible to researchers unfamiliar with bot detection tools. The authors appropriately emphasize limitations—Botometer's imperfect accuracy, transient nature of bot scores, and potential for misuse (e.g., calling users "bots" as harassment). The case study appropriately acknowledges its own limitations (small sample, potential non-representativeness). For researchers planning large-scale Twitter analysis, the distinction between full Botometer-V4 (comprehensive but slower) and BotometerLite (faster but less detailed) is practical and well-motivated. The paper includes executable Python code and encourages reproducibility, which strengthens its utility as a research tool guide.