Skip to content

Pre-trained language models

Transfer learning through fine-tuning pre-trained contextual language models (BERT, RoBERTa, ELECTRA, DistilBERT, ELMo) has emerged as the state-of-the-art approach for fake news and misinformation detection. These models learn bidirectional contextual representations from massive unlabeled corpora (Wikipedia, BookCorpus) and transfer to downstream tasks with minimal domain-specific labeled data, achieving high accuracy even in low-resource settings.

Key characteristics

Pre-training: Unsupervised learning on billions of tokens via masked language modeling (BERT) or token replacement (ELECTRA) produces contextualized word representations that capture semantic and syntactic knowledge.

Transfer learning: Fine-tuning adds a single classification layer on top of pre-trained weights; most parameters are frozen or updated with low learning rates to preserve learned representations.

Low-resource robustness: Unlike traditional machine learning or deep learning from scratch, pre-trained models achieve strong performance with limited training data—often >90% accuracy with only 500 labeled examples.

Trade-offs: High inference cost (more parameters, slower prediction) but lower training burden. Smaller variants (DistilBERT, ALBERT) reduce computational requirements while maintaining competitive performance.

Key papers in this wiki