Skip to content

Machine learning

Machine learning methods—particularly supervised learning with feature engineering and ensemble approaches—are widely used in misinformation and bot detection systems. Researchers train classifiers on labeled datasets of authentic and inauthentic accounts or content, extract features from user profiles (metadata, posting patterns, temporal signatures, content), social networks (follower structure, retweet cascades), and text (linguistic patterns, sentiment), and deploy these models to score new accounts or posts.

Common approaches

Supervised classification: Train a classifier (random forest, logistic regression, SVM, neural networks) to distinguish bots from humans, false claims from true ones, or misinformation sources from credible ones. Features are hand-engineered from metadata, network structure, and text.

Ensemble methods: Combine multiple classifiers (bagging, boosting, stacking) to improve robustness and generalization. Example: Botometer v4 uses an ensemble of specialized classifiers, one per bot type.

Unsupervised clustering: Group accounts or content by behavioral similarity without labeled training data. Useful for discovering bot networks and coordinated behavior.

Deep learning: Use neural networks (CNNs, RNNs, transformers) to learn representations of text and network structure end-to-end, without hand-engineered features.

Challenges

Data quality and bias: Labeled datasets are expensive and subject to annotation error; ground truth is often ambiguous (e.g., what counts as "disinformation"?).

Concept drift: Bots and disinformation tactics evolve faster than models are retrained; systems that perform well on historical data degrade on new data.

Cross-domain generalization: Models trained on one domain (e.g., 2016 U.S. election) fail to generalize to different contexts or time periods. Cross-domain generalization

Fairness: Classifiers may exhibit disparate error rates across demographic groups, languages, or regions.

Interpretability: Black-box models (deep neural networks) make predictions hard to explain; practitioners need to understand why an account is flagged as a bot to act on the decision.

Key papers in this wiki