Truthfulness¶

Truthfulness refers to the property of statements matching reality. In the context of AI, truthfulness standards aim to prevent AI systems from generating false claims, either accidentally (negligent falsehoods) or strategically (lies). This is distinct from honesty (whether a system's statements match its own beliefs) and from transparency or explainability.

Key papers¶

[[2022-burns-latent-knowledge|Burns et al. (2022) — Discovering Latent Knowledge in Language Models Without Supervision]] — unsupervised method to recover latent truthfulness knowledge from language model activations using logical consistency constraints.
Truthful AI: Developing and Governing AI That Does Not Lie — foundational governance framework for AI truthfulness standards, institutions, and technical approaches
Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking — linguistic analysis of truth and deception gradations
Oshikawa et al. (2020) — A Survey on Natural Language Processing for Fake News Detection — NLP methods for fact verification and false-claim detection

AI Safety (broader concern with beneficial AI)
AI Governance (institutional mechanisms)
Content moderation (platform practice)
Misinformation and fake news detection (detection of false claims)

Truthfulness¶

Key papers¶

Related topics¶