Skip to content

Text Representations

Methods for converting text into continuous vector representations or other learned representations that capture semantic and syntactic properties. Text representation learning is foundational to modern NLP, enabling algorithms to operate on meaningful features rather than discrete tokens.

Approaches

  • Word embeddings: Static vector representations of individual words
  • Phrase embeddings: Representations of multi-word units and idioms
  • Contextual embeddings: Context-dependent representations (e.g., BERT, ELMo) that assign different vectors to the same word in different contexts
  • Document embeddings: Representations of entire documents, paragraphs, or sentences

Key Papers