Skip to content

Trustworthy AI

Trustworthy AI encompasses a set of design principles for building systems that are safe, fair, transparent, and subject to human oversight. In the context of fake news detection, trustworthiness is essential: automated detection systems deployed at scale can suppress speech, discriminate against certain populations, or reinforce errors if not designed carefully.

Key dimensions of trustworthy AI systems:

  • Explainability: users and operators understand how and why decisions are made
  • Robustness: systems degrade gracefully under distribution shift, adversarial input, or uncertainty
  • Fairness: systems do not discriminate; treatment is equitable across demographic groups and topics
  • Controllability: humans can intervene, correct, or guide system behavior without requiring retraining
  • Accountability: designers and operators can be held responsible for harms
  • Security: systems resist manipulation, poisoning, or unauthorized access

Key papers

  • TELLER: A Trustworthy Framework For Explainable, Generalizable and Controllable Fake News Detection — operationalizes three core principles for trustworthy fake news detection: explainability (decomposed questions + symbolic rules), generalizability (cross-domain transfer), and controllability (human intervention on rules)
  • [[2019-jobin-trustworthy-ai-principles|Jobin et al. (2019)]] — surveys global initiatives on trustworthy AI; identifies five overarching principles (transparency, justice/fairness, non-maleficence, accountability, environmental/societal impact)

Connections