BERT and transformer embeddings¶

BERT (Bidirectional Encoder Representations from Transformers) and related pre-trained contextual language models enable state-of-the-art performance on fake news detection and misinformation identification tasks by capturing semantic meaning and long-range dependencies in text.

Key papers¶

RoBERTa: A Robustly Optimized BERT Pretraining Approach — Improves upon BERT pretraining through systematic study of design choices, showing BERT was undertrained; proposes RoBERTa with dynamic masking, removal of NSP loss, longer sequences, and larger batch sizes, achieving state-of-the-art on GLUE, RACE, and SQuAD benchmarks and becoming the foundational pretrained model for numerous downstream applications.
A Benchmark Study of Machine Learning Models for Online Fake News Detection — Comprehensive benchmark comparing 19 machine learning models on fake news datasets; demonstrates BERT-based models (RoBERTa) achieve 96%+ accuracy on large datasets and >90% with 500 training samples, substantially outperforming traditional ML and CNN/LSTM approaches.
FakeBERT: Fake News Detection in Social Media with a BERT-based Deep Learning Approach — Combines BERT embeddings with parallel 1D CNNs using varying kernel sizes, achieving 98.90% accuracy on real-world fake news dataset and demonstrating the effectiveness of bidirectional transformer embeddings for social media misinformation detection.

Deep learning (implementation method)
Natural Language Processing (broader discipline)
Transformers (architectural family)
Fake news detection (application domain)

Notes¶

BERT's bidirectional training and pre-training on large corpora provide substantial advantages over traditional embeddings (GloVe, Word2Vec) for fake news detection. The contextualized nature of BERT embeddings captures semantic nuance and long-distance dependencies critical for understanding writing style and factual claims in misinformation.

BERT and transformer embeddings¶

Key papers¶

Related topics¶

Notes¶