Text Representations¶

Methods for converting text into continuous vector representations or other learned representations that capture semantic and syntactic properties. Text representation learning is foundational to modern NLP, enabling algorithms to operate on meaningful features rather than discrete tokens.

Approaches¶

Word embeddings: Static vector representations of individual words
Phrase embeddings: Representations of multi-word units and idioms
Contextual embeddings: Context-dependent representations (e.g., BERT, ELMo) that assign different vectors to the same word in different contexts
Document embeddings: Representations of entire documents, paragraphs, or sentences

Key Papers¶

Efficient Estimation of Word Representations in Vector Space — Foundational work on efficient word embedding methods (CBOW, Skip-gram)
Distributed Representations of Words and Phrases and their Compositionality — Extends embeddings to phrases with data-driven identification
Attention Is All You Need — Transformer architecture enabling contextual embeddings and modern language models

Word Embeddings (word-level representations)
Phrase Embeddings (multi-word representations)
Natural Language Processing (NLP applications)

Text Representations¶

Approaches¶

Key Papers¶

Related topics¶