AI Safety & Interpretability Lab

We make advanced AI systems safer and more transparent by developing methods to understand model behavior, model internals, AI agent interaction, and collective action in agent populations.

Learn more