Large Language Models¶

Large language models (LLMs) are transformer-based neural language models trained on massive corpora of text to achieve state-of-the-art performance on a diverse array of downstream NLP tasks. Scaling laws suggest that performance improves smoothly as model capacity increases, with emerging capabilities appearing at certain scale thresholds.

Key papers¶

Zhang et al. (2023) — How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent Advances — Comprehensive survey of methods to keep LLMs up-to-date with world knowledge without retraining; categorizes approaches into implicit methods (knowledge editing, continual learning) and explicit methods (memory-augmented, retrieval-augmented, internet-enhanced); analyzes scalability-efficiency trade-offs and identifies evaluation challenges
Quelle & Bovet (2023) — The Perils & Promises of Fact-checking with Large Language Models — Evaluates GPT-3.5 and GPT-4 for fact-checking across English and 16+ languages; demonstrates that LLMs with retrieval-augmented reasoning (ReAct + Google Search) achieve 79–87% accuracy but are limited by language-specific training-data bias; identifies data leakage risks and veracity inconsistencies in ambiguous categories
Wang & Shu (2023) — Explainable Claim Verification via Knowledge-Grounded Reasoning with Large Language Models — proposes FOLK, which uses FOL decomposition and knowledge-grounding to guide LLMs in claim verification and explanation generation; achieves state-of-the-art on multiple fact-checking benchmarks
On the Opportunities and Risks of Foundation Models — Comprehensive report on opportunities and risks of foundation models, covering capabilities, misuse, fairness, and societal impact
Red Teaming Language Models with Language Models — Demonstrates safety vulnerabilities in LLMs through automated red teaming; uncovers offensive outputs, data leakage, and distributional biases
A Survey on Evaluation of Large Language Models — Comprehensive survey on evaluation methodologies across three dimensions: what to evaluate, where to evaluate, and how to evaluate
Language Models are Few-Shot Learners — GPT-3, a 175B parameter model, demonstrates strong in-context learning and few-shot capabilities
RAIDAR: Generative AI Detection via Rewriting — Studies intrinsic behavioral properties of LLMs showing they preserve their own generated text while modifying human text when asked to rewrite; enables detection of machine-generated content
Advancing Graph Representation Learning with Large Language Models: A Comprehensive Survey of Techniques — Comprehensive survey of integrating LLMs with graph representation learning; proposes taxonomy decomposing models into knowledge extractors (attribute, structure, label) and organizers (GNN-centric, LLM-centric, hybrid); covers integration strategies (input-level, hidden-level, alignment-based) and training techniques (pre-training, prompting, instruction tuning)

Large Language Models¶

Key papers¶

Related topics¶