Language Models¶
Language models are statistical or neural models that assign probabilities to sequences of tokens (words, characters, or subword units). They are trained to predict the next token given a sequence of preceding tokens, typically using maximum likelihood estimation.
Key papers¶
- Efficient Estimation of Word Representations in Vector Space — Proposes CBOW and Skip-gram architectures for learning word representations efficiently from large corpora; foundational to modern embedding-based approaches
- RoBERTa: A Robustly Optimized BERT Pretraining Approach — Systematic study of BERT pretraining design choices showing the original BERT was significantly undertrained; proposes RoBERTa with dynamic masking, removal of NSP, longer sequences, and larger batches, achieving state-of-the-art and becoming canonical foundation for language model fine-tuning
- [[2018-howard-ulmfit]] — Demonstrates practical fine-tuning of pretrained language models for text classification using discriminative layer-wise learning rates and gradual unfreezing; widely applicable transfer learning approach for NLP
- Discovering Latent Knowledge in Language Models Without Supervision — Discovers latent knowledge in language models through unsupervised probing with logical consistency constraints
- Discovering Language Model Behaviors with Model-Written Evaluations — Uses language models to generate evaluations for testing diverse LM behaviors including bias, political views, and goal-seeking
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection — Teaches language models to perform adaptive retrieval and self-critique via reflection tokens
- [[2020-guu-realm]] — Augments language model pre-training with a learned neural knowledge retriever; introduces unsupervised pre-training for jointly learning retrieval and language generation
- Language Models are Few-Shot Learners — Demonstrates how large language models achieve few-shot learning without fine-tuning
- Atlas: Few-shot Learning with Retrieval Augmented Language Models — Combines T5 language models with dense retrieval for efficient few-shot learning on knowledge-intensive tasks
- The Spread of True and False News Online — Studies information propagation, relevant to understanding how language models affect content spread
- A General Language Assistant as a Laboratory for Alignment — Develops evaluation framework for language model alignment using human feedback; compares scaling properties of training techniques