Skip to content

Transformers

The Transformer is a neural network architecture that relies entirely on self-attention mechanisms to capture dependencies in sequences. Unlike recurrent and convolutional architectures, Transformers allow for highly parallelizable training and have become the foundation for most modern NLP models.

Foundational work

  • Attention Is All You Need — the original Transformer paper, introducing the architecture and demonstrating state-of-the-art results on machine translation.

Key papers