Large Language Models¶
Large language models (LLMs) are transformer-based neural language models trained on massive corpora of text to achieve state-of-the-art performance on a diverse array of downstream NLP tasks. Scaling laws suggest that performance improves smoothly as model capacity increases, with emerging capabilities appearing at certain scale thresholds.
Key papers¶
- Zhang et al. (2023) — How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent Advances — Comprehensive survey of methods to keep LLMs up-to-date with world knowledge without retraining; categorizes approaches into implicit methods (knowledge editing, continual learning) and explicit methods (memory-augmented, retrieval-augmented, internet-enhanced); analyzes scalability-efficiency trade-offs and identifies evaluation challenges
- Quelle & Bovet (2023) — The Perils & Promises of Fact-checking with Large Language Models — Evaluates GPT-3.5 and GPT-4 for fact-checking across English and 16+ languages; demonstrates that LLMs with retrieval-augmented reasoning (ReAct + Google Search) achieve 79–87% accuracy but are limited by language-specific training-data bias; identifies data leakage risks and veracity inconsistencies in ambiguous categories
- Wang & Shu (2023) — Explainable Claim Verification via Knowledge-Grounded Reasoning with Large Language Models — proposes FOLK, which uses FOL decomposition and knowledge-grounding to guide LLMs in claim verification and explanation generation; achieves state-of-the-art on multiple fact-checking benchmarks
- On the Opportunities and Risks of Foundation Models — Comprehensive report on opportunities and risks of foundation models, covering capabilities, misuse, fairness, and societal impact
- Red Teaming Language Models with Language Models — Demonstrates safety vulnerabilities in LLMs through automated red teaming; uncovers offensive outputs, data leakage, and distributional biases
- A Survey on Evaluation of Large Language Models — Comprehensive survey on evaluation methodologies across three dimensions: what to evaluate, where to evaluate, and how to evaluate
- Language Models are Few-Shot Learners — GPT-3, a 175B parameter model, demonstrates strong in-context learning and few-shot capabilities
- RAIDAR: Generative AI Detection via Rewriting — Studies intrinsic behavioral properties of LLMs showing they preserve their own generated text while modifying human text when asked to rewrite; enables detection of machine-generated content
- Advancing Graph Representation Learning with Large Language Models: A Comprehensive Survey of Techniques — Comprehensive survey of integrating LLMs with graph representation learning; proposes taxonomy decomposing models into knowledge extractors (attribute, structure, label) and organizers (GNN-centric, LLM-centric, hybrid); covers integration strategies (input-level, hidden-level, alignment-based) and training techniques (pre-training, prompting, instruction tuning)