COVID-19 misinformation and the infodemic¶
The COVID-19 pandemic was accompanied by what the WHO termed an "infodemic" — a surge in low-credibility information that spread rapidly through social media alongside the disease itself. The infodemic included false cures (e.g., drinking chlorine dioxide, eating boiled garlic), conspiracy theories about the virus's origin, and politically motivated misrepresentations of epidemiological data.
This topic covers research specifically targeting COVID-19 misinformation as a domain object — distinct from general fake news detection work that happens to be evaluated on non-COVID data. The defining features of the COVID-19 infodemic that distinguish it from prior misinformation research contexts are:
- Rapid temporal evolution: the volume of COVID-19 news grew exponentially from January to May 2020, creating a non-stationary detection environment.
- High-stakes consequences: false health guidance (bleach cures, anti-mask content) carried real harm, distinguishing COVID-19 misinformation from entertainment gossip or political spin.
- Cross-domain spread: low-credibility content was produced by outlets spanning healthcare, politics, and conspiracy media, making source-agnostic content classification more important than in prior work focused on political fake news.
- Multi-language, multi-country origin: COVID-19 content came from US, Russian, UK, Iranian, Cypriot, and Canadian publishers even in English-language corpora.
Key papers¶
- van der Linden, Roozenbeek & Compton (2020) — Inoculating Against Fake News About COVID-19: perspective on applying psychological inoculation to COVID-19 misinformation; documents the scope of the infodemic (46–48% exposure in UK/US; 25%+ of top YouTube videos misleading) and connects conspiracy belief to vaccine hesitancy and reduced health-guideline compliance. Proposes prebunking via the Bad News game and Go Viral! as scalable alternatives to reactive fact-checking.
- Cui & Lee (2020) — CoAID: COVID-19 healthcare misinformation dataset with 4,251 news articles (204 fake, 3,565 true), 28 false and 454 true claims, and 926 social platform posts with user engagement data (296,000 tweets and replies). Integrates diverse sources and multimodal engagement features; benchmarks detection methods including SVM, CNN, BiGRU, and state-of-the-art contextual models.
- Li et al. (2020) — MM-COVID: Multilingual and multimodal COVID-19 dataset with 3,981 fake news pieces in six languages (English, Spanish, Portuguese, Hindi, French, Italian) and 7,192 tweets. Demonstrates that social context enables cross-lingual transfer; dEFEND achieves 0.91–0.96 accuracy with full training data and 0.85 accuracy in zero-resource Portuguese transfer.
- Krause et al. (2020) — Fact-checking as risk communication: applies decades of risk communication research to the COVID-19 "misinfodemic," arguing that misinformation should be viewed as a multi-layered risk interacting with pandemic risk perception. Shows why simple fact-checking fails: publics differ in how they define misinformation risk, trust in fact-checkers is low (especially media-affiliated), and uncertainty is inherent to COVID-19 science.
- Roozenbeek et al. (2020) — Susceptibility to misinformation about COVID-19 around the world: international survey of 3,750 adults across five countries examining susceptibility to COVID-19 misinformation and its link to vaccine hesitancy and health-guidance compliance. Key finding: higher trust in scientists and numeracy are robust protective factors.
- Pennycook et al. (2020) — Accuracy-nudge intervention: two experiments (1,700+ participants) demonstrating that prompting people to consider accuracy nearly triples their ability to discriminate true from false COVID-19 headlines when deciding what to share on social media.
- Zhou et al. (2020) — ReCOVery: constructs the primary multimodal COVID-19 news credibility dataset (2,029 articles, 140,820 tweets); establishes publisher-level credibility labeling methodology and benchmarks LIWC, RST, Text-CNN, and SAFE on the corpus.
- Yang et al. (2020) — CHECKED: introduces the first Chinese-language COVID-19 fake news dataset — 2,104 Weibo microblogs with per-item expert labels, multimedia, and full propagation graphs; TextCNN achieves macro F₁ = 0.938.
- Du et al. (2021) — Cross-lingual COVID-19 Fake News Detection: addresses COVID-19 misinformation in low-resource languages (Chinese) by training on English COVID-19 news and applying via machine translation. Proposes CrossFake, which uses BERT with sub-text slicing to preserve long-document information; achieves 75% accuracy on 200 Chinese news articles (86 fake, 114 real), outperforming monolingual and cross-lingual baselines.
- A Heuristic-driven Uncertainty based Ensemble Framework for Fake News Detection in Tweets and News Articles: Ensemble of seven pre-trained language models (BERT, RoBERTa, XLNet, DeBERTa, ERNIE 2.0, ELECTRA) for detection on CONSTRAINT COVID-19 Fake News dataset; combines soft voting, Statistical Feature Fusion Network with statistical features (URL domains, usernames), and Monte Carlo Dropout for uncertainty quantification; achieves F1=0.9892 on 10,700 tweets and articles.
Key datasets¶
- CoAID — 4,251 news articles (204 fake, 3,565 true), 28 false and 454 true claims, 926 social platform posts, and 296,000 user engagements (tweets/replies); multiplatform (websites, Twitter, Facebook, Instagram, YouTube, TikTok) healthcare focus.
- MM-COVID — 3,981 fake news pieces in six languages (English, Spanish, Portuguese, Hindi, French, Italian) with 7,192 tweets; enables multilingual and cross-lingual COVID-19 detection research.
- ReCOVery — 2,029 COVID-19 news articles with NewsGuard/MBFC credibility labels and 140,820 associated tweets; the primary benchmark for COVID-19 news credibility prediction.
- CHECKED — 2,104 Weibo microblogs (344 fake, 1,760 real) from December 2019–August 2020; Chinese-language; includes images, video URLs, and 1.87M repost/1.19M comment propagation threads.
Connections¶
- Credibility assessment provides the broader framework; ReCOVery operationalizes it at publisher level specifically for COVID-19.
- Multimodal detection is the dominant method family tested on COVID-19 data — ReCOVery includes both text and image data, enabling multi-modal baselines.
- Social-context detection is an open direction: ReCOVery includes propagation data (tweets, user graphs) that existing baselines do not exploit.