Truthfulness¶
Truthfulness refers to the property of statements matching reality. In the context of AI, truthfulness standards aim to prevent AI systems from generating false claims, either accidentally (negligent falsehoods) or strategically (lies). This is distinct from honesty (whether a system's statements match its own beliefs) and from transparency or explainability.
Key papers¶
- [[2022-burns-latent-knowledge|Burns et al. (2022) — Discovering Latent Knowledge in Language Models Without Supervision]] — unsupervised method to recover latent truthfulness knowledge from language model activations using logical consistency constraints.
- Truthful AI: Developing and Governing AI That Does Not Lie — foundational governance framework for AI truthfulness standards, institutions, and technical approaches
- Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking — linguistic analysis of truth and deception gradations
- Oshikawa et al. (2020) — A Survey on Natural Language Processing for Fake News Detection — NLP methods for fact verification and false-claim detection
Related topics¶
- AI Safety (broader concern with beneficial AI)
- AI Governance (institutional mechanisms)
- Content moderation (platform practice)
- Misinformation and fake news detection (detection of false claims)