Factuality assessment¶
Factuality assessment involves determining whether a claim, article, or source is truthful. It can be performed at multiple granularities: claim-level (fact-checking individual assertions), article-level (fake news detection), or source-level (media outlet reliability prediction).
Levels of analysis¶
Claim-level fact-checking: Verifying specific factual assertions by retrieving evidence from reliable sources (Wikipedia, fact-checking websites, news articles, academic papers). Requires NLP for claim-evidence matching and truth judgment.
Article-level fake news detection: Classifying whether an entire article contains false information. Uses linguistic features (deceptive language, sensationalism), network signals (who shares it), and source credibility.
Source-level factuality prediction: Assessing whether a news outlet publishes reliable information based on its history. Enables rapid detection without analyzing every article; useful when the outlet has many published articles.
Challenges¶
Temporal variation: An outlet's factuality rating is not static. Sources may improve through corrections or degrade through changed editorial standards.
Ground truth scarcity: Limited labeled datasets; most work relies on manually-verified labels from sources like Media Bias/Fact Check, PolitiFact, or Snopes.
Annotation vs. automation gap: Human annotators judge factuality using criteria not always accessible to automated systems (e.g., external expert knowledge, domain-specific facts).
Issue-specificity: Outlets may be reliable on some topics but unreliable on others, making global factuality ratings imperfect.
Measurement bias: Outlet factuality is inferred from a sample of articles; non-uniform sampling across topics introduces bias in the estimated label.
Prediction approaches¶
Textual features: Linguistic markers of deceptiveness (hedging, negation, subjectivity); readability; claim-to-evidence language matching.
Multimedia signals: Image forensics (detection of manipulated or deepfaked images); reverse image search to check authenticity of photo sources.
Social signals: Propagation speed and reach; retweet patterns; crowd verification scores from crowdsourcing platforms.
Evidence retrieval: Matching article claims to fact-checked claims in databases; retrieving supporting evidence from Wikipedia, news, or scientific sources.
Source credibility: Estimating factuality from source reputation rather than article content.
Key papers¶
- Predicting Factuality of Reporting and Bias of News Media Sources: source-level factuality prediction (low/mixed/high) using article text, Wikipedia, Twitter, URL structure, and web traffic; shows article features most predictive; introduces 1,066-website dataset
- Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media: ordinal regression for jointly predicting outlet trustworthiness (3-point scale) and political ideology (7-point scale); shows multi-task learning with auxiliary tasks reduces prediction error
- What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context: source-level factuality prediction via article text, YouTube features, and audience demographics; factuality harder than bias to predict
- A Survey on Predicting the Factuality and the Bias of News Media: survey of source-level factuality prediction; shows factuality harder than bias to predict because it requires external ground truth
- A Survey on Multimodal Disinformation Detection: multimodal approaches to factuality detection covering text, images, video, and network signals
Related topics¶
- Fact-checking and corrections: systematic verification of claims, often informed by source factuality
- Source reliability: assessing whether a source is trustworthy; factuality is one dimension
- Media profiling: predicting both factuality and bias of news outlets
- Rumour Verification: determining truth of social media claims