Skip to content

Fake news detection methods

Fake news detection encompasses a range of computational approaches operating at different levels of analysis, from low-level linguistic features to high-level propagation patterns and user behavioral signals.

Taxonomy

Approaches broadly divide into four families based on what signal they exploit:

  • Content-based detection: Text-only features (linguistic style, semantic coherence, readability), visual features, or multimodal combinations. Early work used hand-engineered features (LIWC, word n-grams); recent work uses neural representations (BERT, CNNs).
  • Propagation-based detection: Temporal and structural patterns in how claims spread through social networks. Fake news often spreads faster, wider, and deeper; exhibits distinct bot participation patterns; shows characteristic engagement signatures.
  • User-based detection: Demographic, network, and behavioral features of users who share false claims. Key finding: account age, follower count, and activity patterns predict sharing likelihood.
  • External knowledge-based detection: Fact-checking claims against knowledge bases or retrieving related verified claims.
  • AI-generated content detection: Identifying text produced by language models and other generative systems to detect synthetic disinformation and fraudulent content. Methods include statistical analysis, watermarking, rewriting-based detection, and neural classifiers.

Within these families, methods differ in scope: early detection (hours after posting) vs. full lifecycle, single-domain (trained and tested on one topic, e.g., politics) vs. cross-domain (generalization across topics), and supervised (requires labeled training data) vs. unsupervised (domain discovery, anomaly detection).

Key papers

Core references by family:

Specific methodological innovations:

Connections