Skip to content

Factuality assessment

Factuality assessment involves determining whether a claim, article, or source is truthful. It can be performed at multiple granularities: claim-level (fact-checking individual assertions), article-level (fake news detection), or source-level (media outlet reliability prediction).

Levels of analysis

Claim-level fact-checking: Verifying specific factual assertions by retrieving evidence from reliable sources (Wikipedia, fact-checking websites, news articles, academic papers). Requires NLP for claim-evidence matching and truth judgment.

Article-level fake news detection: Classifying whether an entire article contains false information. Uses linguistic features (deceptive language, sensationalism), network signals (who shares it), and source credibility.

Source-level factuality prediction: Assessing whether a news outlet publishes reliable information based on its history. Enables rapid detection without analyzing every article; useful when the outlet has many published articles.

Challenges

Temporal variation: An outlet's factuality rating is not static. Sources may improve through corrections or degrade through changed editorial standards.

Ground truth scarcity: Limited labeled datasets; most work relies on manually-verified labels from sources like Media Bias/Fact Check, PolitiFact, or Snopes.

Annotation vs. automation gap: Human annotators judge factuality using criteria not always accessible to automated systems (e.g., external expert knowledge, domain-specific facts).

Issue-specificity: Outlets may be reliable on some topics but unreliable on others, making global factuality ratings imperfect.

Measurement bias: Outlet factuality is inferred from a sample of articles; non-uniform sampling across topics introduces bias in the estimated label.

Prediction approaches

Textual features: Linguistic markers of deceptiveness (hedging, negation, subjectivity); readability; claim-to-evidence language matching.

Multimedia signals: Image forensics (detection of manipulated or deepfaked images); reverse image search to check authenticity of photo sources.

Social signals: Propagation speed and reach; retweet patterns; crowd verification scores from crowdsourcing platforms.

Evidence retrieval: Matching article claims to fact-checked claims in databases; retrieving supporting evidence from Wikipedia, news, or scientific sources.

Source credibility: Estimating factuality from source reputation rather than article content.

Key papers