Transformers¶
The Transformer is a neural network architecture that relies entirely on self-attention mechanisms to capture dependencies in sequences. Unlike recurrent and convolutional architectures, Transformers allow for highly parallelizable training and have become the foundation for most modern NLP models.
Foundational work¶
- Attention Is All You Need — the original Transformer paper, introducing the architecture and demonstrating state-of-the-art results on machine translation.
Key papers¶
- RoBERTa: A Robustly Optimized BERT Pretraining Approach — Systematic study of BERT pretraining design choices and improvements (dynamic masking, removal of NSP, longer sequences, larger batches); establishes RoBERTa as the foundational pretrained Transformer model for transfer learning across NLP tasks.
- A Benchmark Study of Machine Learning Models for Online Fake News Detection — Comprehensive benchmark comparing machine learning approaches for fake news detection; demonstrates Transformer-based models (BERT, RoBERTa) substantially outperform traditional and CNN/LSTM methods.
Related topics¶
- Attention mechanisms in NLP — core mechanism in Transformers
- Sequence Modeling — broader context of sequence-to-sequence learning
- Neural NLP — application domain in natural language processing