Text Representations¶
Methods for converting text into continuous vector representations or other learned representations that capture semantic and syntactic properties. Text representation learning is foundational to modern NLP, enabling algorithms to operate on meaningful features rather than discrete tokens.
Approaches¶
- Word embeddings: Static vector representations of individual words
- Phrase embeddings: Representations of multi-word units and idioms
- Contextual embeddings: Context-dependent representations (e.g., BERT, ELMo) that assign different vectors to the same word in different contexts
- Document embeddings: Representations of entire documents, paragraphs, or sentences
Key Papers¶
- Efficient Estimation of Word Representations in Vector Space — Foundational work on efficient word embedding methods (CBOW, Skip-gram)
- Distributed Representations of Words and Phrases and their Compositionality — Extends embeddings to phrases with data-driven identification
- Attention Is All You Need — Transformer architecture enabling contextual embeddings and modern language models
Related topics¶
- Word Embeddings (word-level representations)
- Phrase Embeddings (multi-word representations)
- Natural Language Processing (NLP applications)