Transformers¶

The Transformer is a neural network architecture that relies entirely on self-attention mechanisms to capture dependencies in sequences. Unlike recurrent and convolutional architectures, Transformers allow for highly parallelizable training and have become the foundation for most modern NLP models.

Foundational work¶

Attention Is All You Need — the original Transformer paper, introducing the architecture and demonstrating state-of-the-art results on machine translation.

Key papers¶

RoBERTa: A Robustly Optimized BERT Pretraining Approach — Systematic study of BERT pretraining design choices and improvements (dynamic masking, removal of NSP, longer sequences, larger batches); establishes RoBERTa as the foundational pretrained Transformer model for transfer learning across NLP tasks.
A Benchmark Study of Machine Learning Models for Online Fake News Detection — Comprehensive benchmark comparing machine learning approaches for fake news detection; demonstrates Transformer-based models (BERT, RoBERTa) substantially outperform traditional and CNN/LSTM methods.

Attention mechanisms in NLP — core mechanism in Transformers
Sequence Modeling — broader context of sequence-to-sequence learning
Neural NLP — application domain in natural language processing

Transformers¶

Foundational work¶

Key papers¶

Related topics¶