Jacob Steinhardt¶
Professor at UC Berkeley working on AI safety, adversarial robustness, and understanding how machine learning systems can fail in concerning ways. His research focuses on identifying fundamental failure modes and developing safer learning methods.
Sources in this wiki¶
- Discovering Latent Knowledge in Language Models Without Supervision
- Jailbroken: How Does LLM Safety Training Fail?