Credibility-based Fake News Detection¶

Authors: Niraj Sitaula, Chilukuri K. Mohan, Jennifer Grygiel, Xinyi Zhou, Reza Zafarani Venue: arXiv:1911.00643 [cs.CL], November 2019

TL;DR¶

Rather than relying solely on news text or propagation graphs, this paper frames fake news detection as a credibility assessment problem and operationalizes credibility through source-level signals (author count, coauthorship homophily, author publication history) and content-level signals (sentiments, domain vocabulary, argumentation, readability, surface statistics). Evaluated on combined PolitiFact + BuzzFeed data, source-credibility features alone — using only 3 features — achieve F1-macro 0.77 (0.83 on PolitiFact), significantly outperforming 23 content-credibility features (F1-macro 0.68), and combining all 26 features reaches 0.80 with logistic regression.

Contributions¶

Hierarchical taxonomy of news credibility aspects: Source (Authors, Coauthorships) vs. Content (Sentiments, Domain Expertise, Argumentation, Readability, Character/Word/Sentence statistics, Typos).
Empirical finding that fake news articles have significantly fewer authors (mean 0.66) than true news articles (mean 1.97; Pearson correlation with label = 0.406; Mann-Whitney p < 0.001).
Coauthorship homophily analysis: fake-news authors predominantly co-author with other fake-news authors; only 12.7% of the 87 multi-article authors crossed categories.
Historical author credibility feature: 84% of authors exhibit consistent fake-only or true-only behavior over time, making past publication history a reliable credibility signal.
Systematic comparison of 7 classifiers on source-only, content-only, and combined feature sets, showing source features dominate content features in discriminative power.

Method¶

Dataset. Two publicly available datasets merged for most experiments: PolitiFact (240 news items, 120 fake / 120 true; 23,865 users, 32,791 news–user links) and BuzzFeed (182 items, 91/91; 15,257 users, 22,779 links). Both include news labels and user social network information. For historical analysis, 289 articles with publication dates were retained, yielding 69 authors with ≥2 articles (163 articles total).

Source credibility features (3 total).

Number of authors: zero-author articles are disproportionately fake; the distribution is non-normal (Shapiro-Wilk p < 0.05) and significantly different between classes (Mann-Whitney p < 0.001). Articles with more than one author skew true.
Past fake news count: number of fake news articles previously authored, sorted chronologically. Only 11 of 69 multi-article authors ever contradicted their prior trajectory (e.g., shifted from fake to true or vice versa), making prior history a reliable predictor.
Past true news count: analogous to the above for true news history.

Coauthorship network construction classifies authors as True-news (≥2 true, 0 fake), Fake-news (≥2 fake, 0 true), or Fake+True. Homophily is confirmed: fake-news authors cluster together, true-news authors cluster together, and cross-category collaboration is rare. Author affiliations with known credible organizations (ABC News, AP, CNN, Politico) further signal credibility; implausible author names ("Fed up," "About the Potatriot," "About Stryker") are associated with fake news.

Content credibility features (23 total).

Sentiments: VADER via NLTK scores each sentence as positive, negative, or neutral; features are the three fractional counts and all 9 bigram-type transition proportions. True news has slightly more neutral sentences; fake news slightly more negative — but the signal is weak and sentiment sequences show near-uniform distributions across classes.
Domain expertise: count of NCSL (National Conference of State Legislatures) legislative vocabulary words. True news uses more domain terms (mean 7.46 vs. 4.37; Mann-Whitney p < 0.05). Distinctive fake-news words: petition, legislator, impeachment, adhere; true-news words: fiscal, bipartisan, nonpartisan, caucus, statute, veto, repeal.
Argumentation: count of numeric digits in article text; true news contains significantly more digits (mean 739.33 vs. 490.82; Mann-Whitney p = 0.011), suggesting more data-backed argumentation.
Readability: Flesch-Kincaid reading ease; fake news scores higher (mean 67.32 vs. 65.30 for true; Mann-Whitney p = 0.02) — counterintuitively, lower readability correlates with credibility, possibly because accurate reporting of facts involves complex sentence structures.
Characters, words, sentences: true news articles are longer (mean 739 vs. 490 words), have more sentences (26.59 vs. 19.44), and more special characters. Fake news has longer titles (mean 14.16 vs. 12.09 words).
Typos: normalized against NLTK word list; unexpectedly, true news has slightly more typos (mean 0.22) than fake news (0.19; Mann-Whitney p < 0.001).

Classification. 26 features fed to seven scikit-learn classifiers with 10-fold cross-validation. A combined filter + wrapper feature selection reduces to 13 features with equal or better performance: number of authors, past history, presence of domain words, readability, words/characters/sentences/digits, and typos. Sentiments and digit count were the least important features.

Results¶

All results are average F1 scores (micro = macro = weighted on balanced datasets):

Feature set	Best classifier	F1-macro
All 26 features	Logistic Regression	0.80
13 selected features	LR / Linear SVM	0.80
Source-only (3 features)	AdaBoost	0.77
Content-only (23 features)	Linear SVM	0.68

On individual datasets (13 features):

Classifier	PolitiFact F1-macro	BuzzFeed F1-macro
Linear SVM	0.82	0.77
Logistic Regression	0.82	0.76

With only 3 source-credibility features on PolitiFact, best classifier achieves F1-macro 0.83; content-only achieves 0.66. The source–content gap is consistent across both datasets, establishing that authorship signals dominate textual ones for this task.

Connections¶

Shares two authors (Xinyi Zhou, Reza Zafarani) with Shu et al. (2019); both use FakeNewsNet and show that source-side signals outperform content baselines.
The feature engineering topic covers classical hand-crafted feature approaches; this paper contributes a source-credibility feature family not explored in earlier feature-engineering work.
The credibility assessment topic synthesizes this and related work on evaluating information reliability.
The coauthorship homophily finding complements the propagation-graph literature (cf. social-context detection) by showing author collaboration networks also encode credibility.
Zhou et al. (2020) — SAFE is a subsequent paper from the same group that pursues neural content-based detection via text-image multi-modal similarity, where the content features are learned end-to-end rather than hand-crafted.

Notes¶

The dominance of source features over content features reinforces the finding in Shu et al. (2019) that social/source signals outperform textual ones. However, unlike the UPF approach (which needs Twitter account histories for sharers), the features here only require byline information — substantially more deployable in low-resource settings.

The counterintuitive readability and typo results (fake news is more readable, has fewer typos) deserve replication on larger datasets. Both findings contradict prior assumptions and may reflect characteristics of the specific BuzzFeed/PolitiFact news domains (US political news in 2016) rather than general properties of fake news.

The 87-author coauthorship network is small and the homophily analysis is observational. The 12.7% cross-category rate could be compatible with noise given the dataset size; a null model comparison would strengthen the claim.

Source-credibility features are gameable: a determined fake news producer can add co-author names, use professional-sounding affiliations, and build a history of credible articles before pivoting.