Trustworthy AI¶
Trustworthy AI encompasses a set of design principles for building systems that are safe, fair, transparent, and subject to human oversight. In the context of fake news detection, trustworthiness is essential: automated detection systems deployed at scale can suppress speech, discriminate against certain populations, or reinforce errors if not designed carefully.
Key dimensions of trustworthy AI systems:
- Explainability: users and operators understand how and why decisions are made
- Robustness: systems degrade gracefully under distribution shift, adversarial input, or uncertainty
- Fairness: systems do not discriminate; treatment is equitable across demographic groups and topics
- Controllability: humans can intervene, correct, or guide system behavior without requiring retraining
- Accountability: designers and operators can be held responsible for harms
- Security: systems resist manipulation, poisoning, or unauthorized access
Key papers¶
- TELLER: A Trustworthy Framework For Explainable, Generalizable and Controllable Fake News Detection — operationalizes three core principles for trustworthy fake news detection: explainability (decomposed questions + symbolic rules), generalizability (cross-domain transfer), and controllability (human intervention on rules)
- [[2019-jobin-trustworthy-ai-principles|Jobin et al. (2019)]] — surveys global initiatives on trustworthy AI; identifies five overarching principles (transparency, justice/fairness, non-maleficence, accountability, environmental/societal impact)
Connections¶
- Explainable AI — explainability is one critical pillar of trustworthiness
- Neural-symbolic AI — neural-symbolic systems better support human oversight and intervention
- Fake news detection methods — trustworthiness must be designed into detection systems from the start, not retrofitted