Phrase Embeddings¶
Distributed vector representations of multi-word units (phrases, idioms, named entities) that capture meaning beyond the composition of individual word vectors. Phrase embeddings address a fundamental limitation of word embeddings: many phrases (e.g., "Boston Globe", "New York Times") have non-compositional meanings not captured by combining word vectors.
Key Papers¶
- Distributed Representations of Words and Phrases and their Compositionality — Data-driven phrase identification via unigram/bigram frequency scoring; shows phrase embeddings achieve 72% accuracy on phrase analogy tasks; demonstrates compositional structure via vector addition (Czech + currency → koruna)
Related topics¶
- Word Embeddings (atomic units)
- Text Representations (broader category)
- Natural Language Processing (NLP applications)