Credibility assessment for fake news detection¶
Credibility assessment frames fake news detection as an evaluation of source reliability and content quality, rather than a pure text classification problem. The core insight is that fake and credible news differ not only in what they say but in who produces them and how they are written.
Two complementary perspectives dominate:
Source credibility concerns the origin of news — author identity, authorship count, author history, coauthorship networks, and organizational affiliation. Key empirical findings show that fake news articles disproportionately lack bylines or list single authors, while true news tends to have multiple named authors affiliated with recognized outlets. Author publication history is highly consistent (84% of authors are fake-only or true-only), making past behavior a strong predictor of article credibility.
Content credibility concerns characteristics of the article text — sentiment patterns, domain vocabulary, use of numerical evidence, readability, article length, and typographical errors. Content signals are generally weaker discriminators than source signals: with balanced datasets, content-only features reach F1-macro ~0.68 while source-only features reach ~0.77–0.83.
The credibility framework complements propagation-based and user-profile-based approaches by requiring only byline and textual metadata, without needing access to social network data or tweet histories.
Key papers¶
- Shu, Bernard & Liu (2018) — Studying Fake News via Network Analysis — introduces credibility networks (undirected graphs where edges represent supporting or opposing relationships between posts); models viewpoint-level credibility scores and propagates them via belief propagation to infer overall news veracity; discusses user credibility inference from stance information
- DeClareE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning — end-to-end neural network for evidence-aware credibility assessment; combines claim and evidence article representations using bidirectional LSTMs with claim-specific attention; achieves 78.96% accuracy on Snopes without hand-crafted features.
- Shu et al. (2019) — Beyond News Contents: The Role of Social Context for Fake News Detection (TriFN): tri-relationship embedding incorporating publisher partisan bias and user credibility via clustering-inferred scores; shows both publisher-level and user-level credibility signals are complementary to content and improve detection jointly.
- Castillo et al. (2011) — Information Credibility on Twitter: foundational work framing Twitter credibility assessment via user reputation and propagation signals; shows user features (registration age, followers, activity) and propagation structure (retweet tree patterns) are stronger predictors of content credibility than text features alone; achieves 86% accuracy on binary credibility classification.
- Zhou et al. (2019) — WSDM Tutorial on Fake News Detection: presents credibility-based detection as one of four unified perspectives (alongside knowledge, style, and propagation); frames detection as assessing credibility of headlines (clickbait), sources (publishers), comments (spam), and users (account profiles).
- Sitaula et al. (2019) — Credibility-based Fake News Detection: defines the source vs. content credibility taxonomy; shows 3 source features (author count, past fake/true history) outperform 23 content features across PolitiFact and BuzzFeed data.
- Zhou et al. (2020) — ReCOVery: operationalizes credibility at the publisher level using NewsGuard + MBFC dual-authority labeling (thresholds >90/very high for reliable, <30/low for unreliable); constructs a 2,029-article COVID-19 corpus where publisher credibility is inherited by individual articles as a scalability-accuracy trade-off.
Connections¶
- Feature engineering covers the broader landscape of hand-crafted feature representations, of which credibility features are a subset.
- Social-context detection and user profiles share the motivation of going beyond article text, though they rely on propagation data rather than byline metadata.