Skip to content

Cross-domain learning for fake news

A critical practical challenge in fake news detection: models trained on one domain (e.g., politics) dramatically fail when tested on others (e.g., entertainment, COVID-19, health). Cross-domain learning addresses this by developing methods that either: (1) generalize across domains without domain-specific retraining, or (2) efficiently adapt to new domains with limited labeled data.

Problem statement

Real-world news streams cover diverse topics with distinct terminology, source bases, and linguistic styles. A model achieving 85% accuracy on politics may drop to 65% on COVID-19 or entertainment. Two core causes:

  • Domain-specific language use: Politics emphasizes policy and partisan actors; entertainment covers celebrities; COVID-19 discussions use medical terminology. Vocabulary overlap is limited.
  • Topic-specific propagation patterns: Political rumors may go viral quickly with high engagement; entertainment rumors spread more slowly; COVID-19 misinformation exhibits distinct bot participation.

Unsupervised domain transfer (e.g., pretraining on a large unlabeled corpus) helps but does not fully solve the problem because fake vs. real signals are inherently domain-dependent.

Key papers

  • Nan et al. (2021) — MDFEND: Multi-domain fake news detection using mixture-of-experts with a domain gate to adaptively aggregate representations across domains; introduces Weibo21, the first multi-domain dataset from a single platform with 9 domains; achieves 0.9137 F₁ on Weibo21, outperforming single-domain and mixed-domain baselines; directly addresses domain shift in terminology and propagation patterns.

  • Wang et al. (2018) — EANN: Event Adversarial Neural Networks, the first to frame fake news detection as a domain adaptation problem; uses minimax game between a feature extractor and an event discriminator to learn event-invariant representations; 71.5% / 82.7% accuracy on Twitter / Weibo.

  • Silva et al. (2021) — Cross-domain Multimodal Detection: Proposes unsupervised domain discovery using propagation networks to identify domain clusters without manual labels, then trains a supervised classifier that explicitly preserves both domain-specific and cross-domain knowledge; LSH-based instance selection for cost-effective labeling; achieves 7.55% F₁ improvement on rarely-appearing domains.

  • Zhou et al. (2020) — SAFE: While primarily a multimodal method, SAFE's cross-modal similarity design is inherently domain-robust because text-image mismatch is a universal fake signal.

Connections

  • Multimodal fake news detection — Some multimodal approaches (e.g., EANN) also address cross-domain transfer.
  • Fake news detection methods — Cross-domain learning is a sub-problem within the broader detection landscape.
  • Domain adaptation — Closely related but domain-adaptation emphasizes fine-tuning to specific target domains, whereas cross-domain learning emphasizes zero-shot or few-shot generalization.