Measuring Political Bias in Large Language Models: What Is Said and How It Is Said¶
Authors: Yejin Bang, Delong Chen, Nayeon Lee, Pascale Fung Affiliation: Centre for Artificial Intelligence Research (CAiRE), The Hong Kong University of Science and Technology arXiv: 2403.18932
TL;DR¶
This paper proposes a two-tiered framework for measuring political bias in LLM-generated content, separating bias into what is said (content bias) and how it is said (style/framing bias). Using extreme anchor comparison and frame analysis on 11 open-source LLMs across 14 politically divisive topics, the authors find that models exhibit liberal leanings on social issues, have a US-centric focus despite claims of global training data, and show nuanced biases that vary by issue and model architecture. The framework provides fine-grained, explainable measurement of political bias beyond traditional left-right spectrum analysis.
Contributions¶
- Two-tiered evaluation framework for political bias in LLMs that combines political stance analysis with framing bias decomposition, enabling fine-grained topic-specific assessment
- Political stance analysis methodology using extreme anchor comparison: prompting models to generate opposed stances (e.g., "pro-choice" vs. "pro-life") and comparing model outputs to reference distributions to measure degree of leaning
- Framing bias decomposition into content bias (what topics/entities are mentioned) and style bias (sentiment/lexical polarity in how topics are discussed), using Boydstun frame dimensions and NER-based entity extraction
- Large-scale empirical evaluation across 11 open-source models (LLaMa-2-Chat, Yi-Chat, Vicuna, Falcon, Solar, Mistral, Jais, etc.) on 14 politically sensitive topics totaling 14,000 generated samples, revealing patterns in model bias
- Detailed analysis of bias manifestations including US-centric entity focus, differential entity mention frequencies across models, lexical polarity patterns, and how multilingual training affects content focus
Method¶
Political Stance Analysis (Section 2.1): The framework measures a model's political stance on a topic by comparing its generated content distribution to two reference anchor distributions representing opposed stances. For each of 14 politically divisive topics, the authors prompt models to generate news headlines and extract stance via extreme anchor comparison:
- Obtain two reference distributions \(P(Y_{\text{pro}})\) and \(P(Y_{\text{opp}})\) by prompting the LLM with "pro" and "opposing" stance tags (e.g., "pro same-sex marriage" vs. "anti same-sex marriage")
- Generate \(n\) samples from the LLM's natural distribution \(P(Y)\)
- Compute similarity distances between model outputs and each reference using SentenceBERT embeddings
- Estimate stance vector \(\vec{s}\) by finding the nearest neighbor in each reference distribution
- Calculate degree of stance imbalance: \(||\vec{s}|| = |d_{\text{pro}} - d_{\text{opp}}|\)
Framing Bias Analysis (Section 2.2): To understand how bias manifests beyond stance, the authors decompose framing into content and style components by extracting a latent variable \(Z = [C, S]\) from model generations:
- Content bias: Uses frame dimensions from Boydstun et al. (15 dimensions including "economics," "morality," "health and safety," "cultural identity") and NER-based entity extraction to identify what topics and entities are discussed
- Style bias: Analyzes lexical polarity (positive/negative/neutral sentiment) toward identified entities using a target sentiment analysis classifier
- Inverse functions \(h_C^{-1}: P(Y) \to C\) and \(h_S^{-1}: P(Y) \to S\) extract these variables from model generations for comparative analysis across models
Experiment Setup: - Generated 14,000 samples (1,000 per topic) across 14 topics: reproductive rights, immigration, gun control, same-sex marriage, death penalty, climate change, drug price regularization, public education, healthcare reform, social media regulation, plus 4 political events - Evaluated 11 open-source instruction-tuned LLMs of varying sizes and training data - Task: generate 10 news headlines per topic with varying stance tags for reference anchor generation - Distance function: cosine similarity between SentenceBERT embeddings (Equation 1) - Threshold for neutrality: p-value > 0.01 on imbalance score
Results¶
Political Stance Findings (Section 4.1):
Models show consistent liberal bias on social issues: - Same-sex marriage: LLaMa-2-Chat(13B) shows score of 7.4 (strongly proponent), while opposing immigration with 14 (strongly opponent) - Climate change & education: Models broadly support climate action and public education reform - Reproductive rights: Highest polarization across models with five models showing anti-choice stance, five pro-choice, one neutral - Neutrality rare: Only 10.9% of model-topic combinations exhibit genuine neutrality (12 of 110 combinations) - Topic heterogeneity: The topic of reproductive rights shows widest range of stances across models; models exhibit variable intensity of bias depending on issue
Framing Bias Findings (Section 4.2):
Content analysis reveals significant variation in how models discuss topics: - Entity mention frequency: Different models emphasize different entities when discussing same-sex marriage (Figure 5)—LLaMa-2-Chat mentions "Supreme Court" twice as often as average across models, while Yi-Chat(34B) mentions "LGBTQ" and other entities with different frequencies - Frame dimensions: Significant variation in topic coverage; on same-sex marriage, models vary most in emphasis on "fairness and equality" frame dimensions while maintaining overall similar focus - Lexical polarity: Style bias clearly evident—LLaMa-2(13B) and Yi-Chat(34B) generate proportionate amounts about same-sex marriage but with different sentiment polarity toward entities like "Same-Sex Marriage Ban" (negative) and "LGBTQ" (positive)
Cross-model patterns:
- Models are liberal-leaning: Results consistent with recent research finding liberal bias in LLMs on gun control, same-sex marriage, public education, healthcare reform
- Model heterogeneity: Even models in same family (e.g., LLaMa-2-Chat 7B vs. 13B) show different political biases across topics, suggesting factors beyond model capacity influence bias
- US-centric focus: Analysis reveals strong focus on US-related topics; "US" entities appear in top-10 mentions for majority of models; Trump ranks among top entities for nearly all models (27% average mention rate)
- Multilingual models differ: JAIS (Arabic-English bilingual) predominantly features UAE-related topics in 64% of examined areas; Yi does not consistently highlight China-specific issues except single instance on drug price regulation
- Larger size ≠ neutrality: Yi-6B shows lower overall stance intensity (4.77) and media bias despite smaller size; Falcon-40B shows both liberal and conservative views
- Model family variance: LLaMa-2-Chat models (7B vs. 13B) show 19% stance difference on same-sex marriage despite shared family lineage, illustrating importance of issue-specific analysis
Connections¶
- Political bias detection in online discourse and media shares analytical frameworks with LLM bias measurement
- Related to media bias detection methods that examine framing, entity selection, and sentiment; LLM bias follows similar patterns
- Complements broader work on bias in language models including gender and racial biases, extending to political ideological bias
- Contributes to understanding misinformation potential in LLM-generated content by revealing how models' political stances shape content generation
- Relevant to LLM safety and alignment concerns, particularly how training and instruction-tuning create ideological biases
- Related to framing analysis as a communication and media bias technique, extending framing measurement to LLM outputs
Notes¶
Strengths: - Novel two-tiered framework decomposing bias into measurable stance and framing components rather than monolithic left-right spectrum - Comprehensive evaluation across 11 diverse open-source models and 14 politically salient topics with 14,000 generated samples - Methodologically rigorous: uses established NLP tools (SentenceBERT, frame dimensions, NER) and statistical testing (p-value thresholds for neutrality claims) - Granular analysis revealing nuance: models show context-dependent bias (conservative on some issues, liberal on others) contradicting simplified narratives - Practical framing extraction methods (Boydstun frames, entity-based) are explainable and interpretable for stakeholders - Open-sources codebase enabling reproducibility and application to other topics/models
Weaknesses: - Limited to 14 topics; generalization to other political issues untested - News headline generation task may not reflect bias in other LLM output formats (open-ended completions, question-answering, dialogue) - Anchor-based stance approach requires manual design of opposed stance prompts; potential bias in prompt engineering not analyzed - Reference distributions obtained from same models being evaluated (circular ground truth); lack external human annotation of stance baselines - Multilingual analysis limited to two models (Jais, Yi); insufficient for strong claims about language-bias interactions - No analysis of instruction-tuning recipe or training data effects; unclear which factors drive observed biases - Style bias measurement relies on target sentiment classifier; robustness to classifier errors not assessed
Follow-up opportunities: - Extend topic set beyond 14; investigate whether certain political dimensions (economic vs. social; US-centric vs. global) show consistent bias signatures - Test bias in other task formats: open-ended completions, dialogue, question-answering where constraints are lighter than headline generation - Investigate effect of different prompting strategies and whether adversarial prompting can mitigate detected biases - Analyze instruction-tuning datasets and base models to trace bias origins - Compare with human annotator stance judgments to validate anchor-based methodology - Develop bias mitigation techniques leveraging decomposition into content and style components - Extend to additional languages and multilingual models to understand global vs. US-centric bias patterns