Skip to content

Veracity prediction

Veracity prediction is the task of automatically determining whether a claim or statement is true, false, or unverifiable based on available evidence and external knowledge.

Problem formulation

Given a claim (e.g., a tweet, news headline, or statement), predict:

  • True: The claim is factually accurate
  • False: The claim is factually incorrect
  • Unverifiable/Unknown: The claim cannot be verified with available information

Systems may also return confidence scores (0–1) indicating certainty.

Approaches

Text-based (closed variant)

Predict veracity using only the claim text itself: - Linguistic cues (hedging, certainty markers, temporal language) - Lexical patterns associated with misinformation - Pre-trained language models with fine-tuning

Context-augmented (open variant)

Use additional external information: - Wikipedia articles and knowledge bases - Archived web content and linked URLs - Community responses and stance labels - Temporal metadata and event context

Community-informed

Leverage collective intelligence: - Aggregating community stance (support/deny/query) to infer veracity - User credibility and comment patterns - Conversational signals and debate outcomes

Challenges

  • AI-hard problem: Veracity often requires domain expertise, event knowledge, and real-world reasoning
  • Bias and subjectivity: Determining ground truth for controversial claims is difficult
  • Temporal sensitivity: Claims may be true in one time period and false in another
  • Information gaps: Necessary evidence may not be available at prediction time
  • Class imbalance: False, true, and unverifiable claims have different distributions

Evaluation metrics

  • Accuracy: Ratio of correct predictions (simple baseline)
  • Macro-averaged accuracy: Average per-class accuracy (addresses imbalance)
  • Confidence-aware metrics: RMSE of predicted vs. reference confidence scores
  • F1 / Precision / Recall: Per-class performance metrics

Key papers and benchmarks