Skip to content

Foundation Models

Foundation models are large neural networks trained on broad, unlabeled data (often with self-supervision) that can be adapted to a wide variety of downstream tasks through fine-tuning, prompt-based learning, or in-context adaptation. The term encompasses large language models (BERT, GPT-3, LLaMA), vision models (ViT, DALL-E), multimodal models (CLIP, GPT-4V), and models across other domains.

Key characteristics

  • Scale: Foundation models typically have billions of parameters, trained on terabytes of data.
  • Broad pretraining: Trained on diverse, general-purpose data rather than task-specific curated datasets.
  • Emergence: Foundation models exhibit unexpected capabilities that emerge only at scale (few-shot learning, in-context learning, instruction-following).
  • Homogenization: A single foundation model can be adapted to many downstream tasks, reducing the need for task-specific architectures.
  • Transfer learning: Knowledge learned during pretraining transfers to specialized domains, reducing the data and compute required for downstream applications.

Opportunities

Foundation models enable powerful applications across language (machine translation, question answering, text summarization), vision (image recognition, image generation, visual question answering), and reasoning (semantic understanding, logical inference). They support low-resource languages, rare tasks, and rapid deployment.

Risks and harms

Generation of harmful content

Foundation models can generate high-quality misinformation, deepfakes, fake profiles, and personalized manipulative content at scale, lowering the cost and skill barrier for malicious actors.

Misrepresentation and bias

Foundation models inherit and amplify biases present in training data, leading to harms for underrepresented groups. Representational bias (underrepresentation), misrepresentation (negative stereotypes), and allocation harms (denial of opportunities) compound across many downstream applications using the same foundation model.

Environmental impact

Training and deploying foundation models requires massive computational resources, incurring high carbon emissions. This environmental burden is often borne by regions and communities with the least capacity to mitigate climate change.

Concentration of power

Foundation models are developed primarily by large technology companies and well-resourced institutions, concentrating power over AI capabilities and limiting the diversity of perspectives embedded in these systems.

Security and interpretability challenges

Foundation models are difficult to interpret, making it hard to understand why they make specific predictions or generate specific outputs. This opacity complicates security auditing and accountability.

Key papers