Skip to content

Shared tasks and benchmarks

Shared tasks are organized evaluation campaigns where researchers develop and submit systems for a standardized problem using a common dataset, evaluation protocol, and set of metrics. They play a critical role in advancing the field by enabling direct comparison of approaches and identifying state-of-the-art performance.

Role in misinformation research

Shared tasks have become central to misinformation and rumour detection research, providing:

  • Benchmark datasets that enable reproducible research
  • Standardized evaluation metrics for fair comparison
  • Community participation from researchers across institutions and countries
  • Published results and analysis of participating systems
  • Reusable resources for follow-up research

Key characteristics

  • Common dataset: Participants develop systems on the same training and test data
  • Defined task(s): Clear problem formulation and evaluation criteria
  • Blind evaluation: Test set not accessible to participants during development
  • Shared results: Organizers publish leaderboards and comparative analysis
  • Workshop: Presentation and discussion of approaches at a venue (e.g., ACL, SemEval)

Notable shared task venues

  • SemEval: Annual series of NLP shared tasks (Semantic Evaluation workshops)
  • CLEF: Cross Language Evaluation Forum with specialized fact-checking and rumour tracks
  • FEVER: Fact Extraction and VERification shared task series

Contribution to the field

Shared tasks establish benchmarks that: - Enable decade-long progress tracking in a subfield - Lower barriers to entry for new researchers - Consolidate diverse approaches into comparable results - Guide future research directions through remaining challenges

Key benchmarks in misinformation detection