Skip to content

Source reliability

Source reliability refers to the degree to which we can trust information published by a particular source—whether a news outlet, user, or website. Estimating source reliability is critical for both manual and automated fact-checking, as the credibility of evidence retrieved during verification depends on the trustworthiness of its source.

Levels of analysis

User-level: Assessing individual social media users or commenters based on their posting history, follower relationships, and interaction patterns. Includes detecting paid trolls, sockpuppets, and inauthentic accounts.

Source-level (media outlet): Profiling entire news organizations based on their publishing patterns, audience composition, and infrastructure. This is more scalable than checking individual claims and enables rapid detection of unreliable content.

Prediction methods

Source reliability can be inferred from:

  • Content analysis: Linguistic features, factual accuracy of past articles, tone and sentiment
  • Audience signals: Composition and ideology of followers/readers; who interacts with the outlet
  • Network features: Shared links, citations, and relationships with other outlets; co-citation patterns with known reliable sources
  • Infrastructure: Domain registration practices, hosting patterns, SSL certificate characteristics
  • Editorial patterns: Consistency of fact-checking labels, corrections published, update frequency

Challenges

  • Temporal drift: Source reliability changes over time; outlets may improve or degrade their practices
  • Issue-specific variation: An outlet may be reliable on some topics but biased on others
  • Ground truth scarcity: Limited manually-verified labels of source reliability; reliance on sources like Media Bias/Fact Check or AllSides
  • Class imbalance: Most outlets fall into moderate categories; extremes are rare

Key papers