Bias in language models¶

Large-scale language models encode biases from their training data—gender, race, religion, and cultural stereotypes—which are replicated and sometimes amplified in generated text. Understanding and mitigating these biases is crucial for fair NLP applications.

Scope¶

Language models learn patterns from internet-scale text (e.g., Common Crawl, web pages, books), which reflect real-world societal biases in language use. These biases manifest as:

Stereotypical associations: Words like "nurse" strongly associated with female pronouns; "engineer" with male.
Demographic disparities: Different model behavior for text about different racial, religious, or gender groups.
Language preference: Models trained on English-dominant data may perform poorly or reflect Western-centric biases.

Biases can harm downstream users and communities through recommendation systems, content moderation, hiring tools, and educational applications.

Key mechanisms¶

Training data biases:
Models learn from text produced by humans whose writing reflects social biases. If certain groups are underrepresented or stereotyped in training data, the model learns to replicate and generalize those patterns.

Fine-tuning effects:
Fine-tuning on domain-specific data (e.g., online forums, specialized corpora) can amplify or introduce new biases depending on the target domain's composition.

Generation artifacts:
When prompted with ambiguous context, models may default to stereotypical completions. For example, "The doctor said the nurse should..." may default to female pronouns for nurse even with neutral input.

Key papers¶

Solaiman et al. (2019) — OpenAI GPT-2 Release Report: Exploratory analysis of biases in 774M and 1.5B parameter GPT-2 models. Found strong gender bias (male associations with "criminal"), religion bias (Christianity strongly associated with "God"), and language preference shifts (1.5B more receptive to non-English/non-Latin scripts). Published top 1,000 WebText training domains to facilitate bias research; distributed model card documenting limitations.

Language Models — neural language models more broadly
Fairness NLP — fairness and ethics in NLP
Generated text detection — detecting AI-generated content with potential biases

Bias in language models¶

Scope¶

Key mechanisms¶

Key papers¶

Related topics¶