Foundation Models¶
Foundation models are large neural networks trained on broad, unlabeled data (often with self-supervision) that can be adapted to a wide variety of downstream tasks through fine-tuning, prompt-based learning, or in-context adaptation. The term encompasses large language models (BERT, GPT-3, LLaMA), vision models (ViT, DALL-E), multimodal models (CLIP, GPT-4V), and models across other domains.
Key characteristics¶
- Scale: Foundation models typically have billions of parameters, trained on terabytes of data.
- Broad pretraining: Trained on diverse, general-purpose data rather than task-specific curated datasets.
- Emergence: Foundation models exhibit unexpected capabilities that emerge only at scale (few-shot learning, in-context learning, instruction-following).
- Homogenization: A single foundation model can be adapted to many downstream tasks, reducing the need for task-specific architectures.
- Transfer learning: Knowledge learned during pretraining transfers to specialized domains, reducing the data and compute required for downstream applications.
Opportunities¶
Foundation models enable powerful applications across language (machine translation, question answering, text summarization), vision (image recognition, image generation, visual question answering), and reasoning (semantic understanding, logical inference). They support low-resource languages, rare tasks, and rapid deployment.
Risks and harms¶
Generation of harmful content¶
Foundation models can generate high-quality misinformation, deepfakes, fake profiles, and personalized manipulative content at scale, lowering the cost and skill barrier for malicious actors.
Misrepresentation and bias¶
Foundation models inherit and amplify biases present in training data, leading to harms for underrepresented groups. Representational bias (underrepresentation), misrepresentation (negative stereotypes), and allocation harms (denial of opportunities) compound across many downstream applications using the same foundation model.
Environmental impact¶
Training and deploying foundation models requires massive computational resources, incurring high carbon emissions. This environmental burden is often borne by regions and communities with the least capacity to mitigate climate change.
Concentration of power¶
Foundation models are developed primarily by large technology companies and well-resourced institutions, concentrating power over AI capabilities and limiting the diversity of perspectives embedded in these systems.
Security and interpretability challenges¶
Foundation models are difficult to interpret, making it hard to understand why they make specific predictions or generate specific outputs. This opacity complicates security auditing and accountability.
Key papers¶
- On the Opportunities and Risks of Foundation Models — comprehensive report on opportunities and risks of foundation models
- A Survey of Large Language Models — survey focused on large language models
- A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT — survey on generative AI safety and misuse
Related topics¶
- Large Language Models (a specific class of foundation models)
- Generative Models (models designed to generate new data)
- Pre-trained language models (related to foundation models in NLP)
- AI Safety (safety considerations for foundation models)
- Misuse (how foundation models can be misused)
- Fake news detection methods (using foundation models to detect misinformation)