Bias and Fairness¶
Biases that emerge from AI systems (particularly machine learning and language models) and approaches to achieving fairness in algorithmic systems. This includes identification of stereotypes and unfair outcomes across demographic groups, mitigation strategies, and the broader challenge of defining and measuring fairness in AI applications.
Key papers¶
- Red Teaming Language Models with Language Models — Uncovers distributional biases in language models showing significantly different offensiveness rates when discussing different demographic groups
Related topics¶
- Bias in language models (biases specific to language models)
- Large Language Models (core technology)
- AI in Education (application domain where fairness is critical)
- AI Safety (broader ethical AI concerns)