Deep Ganguli¶
Deep Ganguli is a research scientist at Anthropic focusing on large language model behavior, interpretability, and alignment. His work investigates how to train and evaluate language models to be helpful, harmless, and honest.
Sources in this wiki¶
- Discovering Language Model Behaviors with Model-Written Evaluations — Co-author; contributes to discovery of inverse scaling phenomena and RLHF side effects
- [[2023-ganguli-moral-self-correction]]