Bias detection¶
Detection and measurement of discriminatory biases in machine learning systems, datasets, and AI-generated content. Bias detection focuses on identifying systematic disparities in how models treat different demographic groups, entity categories, and populations.
Scope¶
Bias detection encompasses:
- Dataset bias: Systematic imbalances or stereotypes in training data
- Model bias: Disparate performance or behavior across demographic groups
- Output bias: Discriminatory, stereotypical, or unfair content generation
- Measurement: Quantitative metrics for demographic parity, equal opportunity, and calibration
Key papers¶
- A Survey on Evaluation of Large Language Models — comprehensive survey including extensive evaluation methodologies for bias detection and ethical concerns in LLMs
- Toxicity in ChatGPT: Analyzing Persona-assigned Language Models — systematic analysis of discriminatory bias in ChatGPT, showing higher toxicity toward certain races, sexual orientations, countries (especially non-colonial nations), and professions; demonstrates that persona assignment amplifies these biases
- Toxicity Detection with Generative Prompt-based Inference — toxicity detection and measurement across demographic categories
- Krishnamurthy et al. (2018) — Detecting Deception in Multimodal Narratives — analysis of deception in visual and textual narratives with consideration of demographic factors
Related topics¶
- Fairness — broader fairness and equity concerns in AI systems
- Language Models — biases inherent in pre-trained language models
- Toxicity detection — detection of discriminatory toxic content
- AI Safety — ensuring AI systems do not perpetuate or amplify discrimination