Source reliability¶
Source reliability refers to the degree to which we can trust information published by a particular source—whether a news outlet, user, or website. Estimating source reliability is critical for both manual and automated fact-checking, as the credibility of evidence retrieved during verification depends on the trustworthiness of its source.
Levels of analysis¶
User-level: Assessing individual social media users or commenters based on their posting history, follower relationships, and interaction patterns. Includes detecting paid trolls, sockpuppets, and inauthentic accounts.
Source-level (media outlet): Profiling entire news organizations based on their publishing patterns, audience composition, and infrastructure. This is more scalable than checking individual claims and enables rapid detection of unreliable content.
Prediction methods¶
Source reliability can be inferred from:
- Content analysis: Linguistic features, factual accuracy of past articles, tone and sentiment
- Audience signals: Composition and ideology of followers/readers; who interacts with the outlet
- Network features: Shared links, citations, and relationships with other outlets; co-citation patterns with known reliable sources
- Infrastructure: Domain registration practices, hosting patterns, SSL certificate characteristics
- Editorial patterns: Consistency of fact-checking labels, corrections published, update frequency
Challenges¶
- Temporal drift: Source reliability changes over time; outlets may improve or degrade their practices
- Issue-specific variation: An outlet may be reliable on some topics but biased on others
- Ground truth scarcity: Limited manually-verified labels of source reliability; reliance on sources like Media Bias/Fact Check or AllSides
- Class imbalance: Most outlets fall into moderate categories; extremes are rare
Key papers¶
- Predicting Factuality of Reporting and Bias of News Media Sources: pioneering work on media-level factuality and bias prediction using multiple information sources (text, Wikipedia, Twitter, URLs, web traffic); introduces 1,066-website dataset
- A Survey on Predicting the Factuality and the Bias of News Media: survey of source reliability prediction using textual, multimedia, audience, and infrastructure features
Related topics¶
- Media profiling: predicting factuality and bias of entire outlets
- Factuality assessment: determining whether information is true or false
- Fact-checking and corrections: manual or automated verification of claims, often informed by source credibility