Pre-trained language models¶
Transfer learning through fine-tuning pre-trained contextual language models (BERT, RoBERTa, ELECTRA, DistilBERT, ELMo) has emerged as the state-of-the-art approach for fake news and misinformation detection. These models learn bidirectional contextual representations from massive unlabeled corpora (Wikipedia, BookCorpus) and transfer to downstream tasks with minimal domain-specific labeled data, achieving high accuracy even in low-resource settings.
Key characteristics¶
Pre-training: Unsupervised learning on billions of tokens via masked language modeling (BERT) or token replacement (ELECTRA) produces contextualized word representations that capture semantic and syntactic knowledge.
Transfer learning: Fine-tuning adds a single classification layer on top of pre-trained weights; most parameters are frozen or updated with low learning rates to preserve learned representations.
Low-resource robustness: Unlike traditional machine learning or deep learning from scratch, pre-trained models achieve strong performance with limited training data—often >90% accuracy with only 500 labeled examples.
Trade-offs: High inference cost (more parameters, slower prediction) but lower training burden. Smaller variants (DistilBERT, ALBERT) reduce computational requirements while maintaining competitive performance.
Key papers in this wiki¶
- A Benchmark Study of Machine Learning Models for Online Fake News Detection — Benchmark comparing BERT, RoBERTa, DistilBERT, ELECTRA, ELMo on fake news datasets. RoBERTa achieves 96% on large dataset and 98% on election-focused data; pre-trained models substantially outperform traditional ML and deep learning, especially on small datasets (>90% accuracy with 500 samples vs. 65% for Naive Bayes, 75% for Bi-LSTM).
- Oshikawa, Qian, & Wang (2020) — A Survey on Natural Language Processing for Fake News Detection: Surveys NLP methods for detection, noting recent dominance of pre-trained transformers and their superior accuracy over hand-crafted linguistic features.