Automated identification of media bias in news articles: an interdisciplinary literature review¶

Authors: Felix Hamborg, Karsten Donnay, Bela Gipp

Venue: International Journal on Digital Libraries, 2018 — DOI

TL;DR¶

This paper surveys automated and manual methods for identifying media bias in news articles, bridging social science and computer science approaches. It defines nine distinct forms of media bias and maps social science research onto computational techniques, finding that NLP-based methods are promising for automated detection but remain underexplored compared to manual analysis approaches.

Contributions¶

Establishes a shared conceptual framework mapping social science media bias research to computer science approaches
Comprehensively categorizes nine distinct forms of media bias: event selection, source selection, commission/omission, labeling, story placement, size allocation, picture selection, picture explanation, and spin
Reviews manual approaches from social sciences (content analysis, frame analysis, meta-analysis) for analyzing media bias
Surveys computational and automated methods from computer science, primarily in NLP, suitable for identifying each bias form
Identifies gaps between social science understanding and computational capabilities, highlighting opportunities for future NLP research

Method¶

The paper employs a systematic interdisciplinary literature review. The authors:

Define media bias in the context of news production and consumption, distinguishing between intentional bias (reflecting conscious choice) and systematic bias (reflecting tendencies rather than isolated incidents)
Decompose the news production process into stages where bias can arise (gathering, writing, editing) and identify distinct bias forms
Map each bias form to the target object (news outlet, article, text element, or picture) and stage of emergence
For each bias form, survey both manual analysis methods from social sciences and automated approaches from computer science
Discuss the applicability, limitations, and computational feasibility of methods for detecting each form

The conceptual framework distinguishes between: - Gathering stage biases: event selection, source selection, commission/omission - Writing stage biases: labeling, word choice - Editing stage biases: story placement, size allocation, picture selection, picture explanation - Cross-phase biases: spin (overall bias across the entire article)

Results¶

The review identifies that social science has developed comprehensive theoretical understanding of media bias but relies heavily on manual, labor-intensive analysis methods. Key findings include:

Nine distinct bias forms are documented in the literature with varying prevalence in social science research (Table 1)
Manual approaches (content analysis, frame analysis, meta-analysis) are well-developed in social sciences but require human interpretation and are not fully automatable
NLP methods are available or could be adapted for most bias forms:
Event selection analysis can leverage news aggregation and matrix factorization techniques
Source selection can use link-based analysis and plagiarism detection
Labeling/word choice is amenable to sentiment analysis and word embedding methods
Story placement, size allocation, and picture selection can be identified through document structure analysis
Spin is the most challenging, requiring comprehensive analysis across multiple bias forms
Computer science research on automated media bias detection is limited relative to social science literature, particularly in systematically analyzing all forms of bias
Interdisciplinary opportunities exist where established CS techniques (NLP, document clustering, graph analysis) can be applied to previously manual analysis tasks

Connections¶

Related to media profiling approaches for news outlets and source credibility assessment
Complements surveys of NLP for fake news detection by focusing specifically on bias rather than binary veracity
Cites Zhou's fake news survey for foundational taxonomy
Cited by Nakov et al. 2021 on media profiling and factuality-bias correlation

Notes¶

This is a foundational interdisciplinary review that bridges a significant gap: social science researchers had developed rich understanding of media bias but computer scientists were applying automated methods without systematic mapping to the bias forms studied in social sciences. The nine-form taxonomy and processing-stage decomposition provide a structured framework for future computational research. The paper's main limitation is that it reviews the state-of-the-art as of 2018; subsequent advances in transformer-based NLP, particularly BERT and GPT-family models, may enable stronger automated approaches to several bias forms. The distinction between intentional and systematic bias is important but underexplored in the computational literature.