Long-Context Language Models¶
Language models extended with the ability to process much longer input sequences than traditional transformers. Recent advances have extended context windows from ~2K tokens to 100K+ tokens through techniques like sparse attention, rotary position embeddings, and efficient attention mechanisms. However, simply increasing context window length does not guarantee effective use of that context.
Key papers¶
- [[2023-liu-lost-in-middle]] — demonstrates that models struggle to use information in the middle of long contexts despite having large context windows