Trustworthy AI¶

Trustworthy AI encompasses a set of design principles for building systems that are safe, fair, transparent, and subject to human oversight. In the context of fake news detection, trustworthiness is essential: automated detection systems deployed at scale can suppress speech, discriminate against certain populations, or reinforce errors if not designed carefully.

Key dimensions of trustworthy AI systems:

Explainability: users and operators understand how and why decisions are made
Robustness: systems degrade gracefully under distribution shift, adversarial input, or uncertainty
Fairness: systems do not discriminate; treatment is equitable across demographic groups and topics
Controllability: humans can intervene, correct, or guide system behavior without requiring retraining
Accountability: designers and operators can be held responsible for harms
Security: systems resist manipulation, poisoning, or unauthorized access

Key papers¶

TELLER: A Trustworthy Framework For Explainable, Generalizable and Controllable Fake News Detection — operationalizes three core principles for trustworthy fake news detection: explainability (decomposed questions + symbolic rules), generalizability (cross-domain transfer), and controllability (human intervention on rules)
[[2019-jobin-trustworthy-ai-principles|Jobin et al. (2019)]] — surveys global initiatives on trustworthy AI; identifies five overarching principles (transparency, justice/fairness, non-maleficence, accountability, environmental/societal impact)

Connections¶

Explainable AI — explainability is one critical pillar of trustworthiness
Neural-symbolic AI — neural-symbolic systems better support human oversight and intervention
Fake news detection methods — trustworthiness must be designed into detection systems from the start, not retrofitted