Skip to content

Ensemble methods

Ensemble methods combine predictions from multiple models to achieve better overall performance than any single model alone. In machine learning, ensembles leverage the principle that diverse models often make different errors, and aggregating their predictions reduces both bias and variance.

Key concepts

Diversity and complementarity: Ensemble effectiveness depends on model diversity—ensembles of identical models provide no benefit. Diversity can arise from different architectures, hyperparameters, training data subsets, or random initializations.

Aggregation strategies: - Voting (hard): Select the class with the most votes among ensemble members. Common for classification tasks. - Averaging (soft): Average probability scores from each model. Generally outperforms hard voting when models are well-calibrated. - Weighted averaging: Assign model-specific weights based on validation performance or confidence scores. - Stacking: Use a meta-learner trained on ensemble member outputs to learn optimal combination weights.

Ensemble architectures: - Bagging: Train multiple models on random subsets (with replacement) of training data, then aggregate. Reduces variance. - Boosting: Train models sequentially; later models focus on examples the previous models misclassified. Reduces bias. - Stacking: First-level models trained on the training set; second-level meta-model trained on first-level predictions on a held-out validation set.

Key papers in this wiki

  • DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection (2024) — LLM-based expert ensemble for misinformation detection; proposes three ensemble merging strategies (Vanilla, Confidence, Selective) that combine task-specific expert predictions using LLMs as judges; demonstrates that ensemble merging with confidence scoring improves calibration and achieves better-calibrated predictions than individual experts.
  • A Heuristic-driven Uncertainty based Ensemble Framework for Fake News Detection in Tweets and News Articles: Ensemble of multiple pre-trained language models (BERT, RoBERTa, XLNet, DeBERTa, ERNIE 2.0, ELECTRA) with soft voting for fake news detection; demonstrates soft voting outperforms hard voting and individual models; augmented with Statistical Feature Fusion Network (SFFN) and heuristic post-processing to achieve state-of-the-art results on COVID-19 Fake News (F1=0.9892) and FakeNewsNet (F1=0.9156).