Skip to content
A Survey on Stance Detection for Mis- and Disinformation Identification

A Survey on Stance Detection for Mis- and Disinformation Identification

Authors: Momchil Hardalov, Arnav Arora, Preslav Nakov, Isabelle Augenstein Venue: arXiv, 2021 — arxiv:2103.00242

TL;DR

Comprehensive survey of stance detection for misinformation and disinformation. Stance detection—determining whether a text supports, denies, questions, or merely comments on a claim—plays dual roles: as a standalone fact-checking tool and as a component in broader verification pipelines. The paper reviews task formulations, datasets, methods, and applications across multiple languages and platforms.

Contributions

  • Systematic review of stance detection literature with explicit focus on applications to misinformation and disinformation
  • Taxonomy of stance detection formulations: standalone fact-checking, component in multi-stage pipelines, and application to rumour verification
  • Comprehensive catalog of existing stance detection datasets in English and non-English languages, with source, target, context, and evidence characteristics
  • Overview of approaches from early lexical and feature-based methods to modern neural and pre-trained language models (BERT, RoBERTa, GPT)
  • Discussion of stance detection challenges specific to misinfo/disinfo: class imbalance, implicit stance, cross-platform variation

Method

The survey organizes stance detection literature along multiple dimensions:

  • Stance formulations: (i) Direct fact-checking where stance of author toward document is veracity label; (ii) Component task within fact-checking pipelines requiring evidence retrieval and justification; (iii) Rumour stance detection in social media threads
  • Stance definitions: From Pang & Lee's (2007) speaker standpoint definition to Kucuk & Can's (2020) classification-task framing with support/deny/question/comment/neutral categories
  • Datasets: Overview of 15+ English datasets (Rumour Has It, PHEME, Emergent, FNC-1, RumourEval, FEVER, Snopes, TibFact) and non-English resources (Arabic FC, DART, AraStance)
  • Approaches: Feature engineering (lexical, topic models, graph features) → LSTM/CNN → BERT-based fine-tuning → cross-lingual transfer and pre-training strategies (RoBERTa fine-tuning, pattern-exploiting training, adversarial robustness)
  • Evidence handling: Methods for single vs. multiple evidence documents, retrieval-then-classification pipelines (FNC-1, FEVER)

Results

The survey documents significant progress but persistent challenges:

  • Modern pre-trained models (BERT, RoBERTa, GPT) substantially improve over feature-based baselines on most benchmarks
  • Best published results on FEVER reach ~70 F1 (Zhou et al. 2020 with graph neural networks); on FNC-1 approaches upper bounds through careful feature engineering
  • Transfer and zero-shot learning strategies work across languages—Arabic stance models exceed 76 F1 on ANS dataset using mBERT
  • Key remaining challenges: class imbalance (unrelated posts dominate), implicit stance (sarcasm, negation), cross-platform variation, need for multi-hop reasoning on multiple evidence documents

Connections

Notes

This is a foundational and comprehensive reference for anyone working in automated misinformation detection. Strengths: systematic taxonomy of formulations, breadth across task variants and languages, clear positioning of stance within broader verification pipelines. The paper captures the state-of-the-art circa 2021 and reflects the NLP community's shift from feature engineering toward pre-trained contextualized embeddings. Useful for grounding design choices in new stance detection work and understanding historical context of the shift from FNC-1 to FEVER and beyond.