Sam Ringer¶
Researcher at Anthropic working on language model evaluation, behavior discovery, and AI safety.
Sources in this wiki¶
- Discovering Language Model Behaviors with Model-Written Evaluations — Co-author; proposes language models to generate evaluations for testing LM behaviors including inverse scaling and RLHF side effects