Skip to content

BERT and transformer embeddings

BERT (Bidirectional Encoder Representations from Transformers) and related pre-trained contextual language models enable state-of-the-art performance on fake news detection and misinformation identification tasks by capturing semantic meaning and long-range dependencies in text.

Key papers

  • RoBERTa: A Robustly Optimized BERT Pretraining Approach — Improves upon BERT pretraining through systematic study of design choices, showing BERT was undertrained; proposes RoBERTa with dynamic masking, removal of NSP loss, longer sequences, and larger batch sizes, achieving state-of-the-art on GLUE, RACE, and SQuAD benchmarks and becoming the foundational pretrained model for numerous downstream applications.
  • A Benchmark Study of Machine Learning Models for Online Fake News Detection — Comprehensive benchmark comparing 19 machine learning models on fake news datasets; demonstrates BERT-based models (RoBERTa) achieve 96%+ accuracy on large datasets and >90% with 500 training samples, substantially outperforming traditional ML and CNN/LSTM approaches.
  • FakeBERT: Fake News Detection in Social Media with a BERT-based Deep Learning Approach — Combines BERT embeddings with parallel 1D CNNs using varying kernel sizes, achieving 98.90% accuracy on real-world fake news dataset and demonstrating the effectiveness of bidirectional transformer embeddings for social media misinformation detection.

Notes

BERT's bidirectional training and pre-training on large corpora provide substantial advantages over traditional embeddings (GloVe, Word2Vec) for fake news detection. The contextualized nature of BERT embeddings captures semantic nuance and long-distance dependencies critical for understanding writing style and factual claims in misinformation.