In the AI Safety & Interpretability Lab at SDU, we develop interpretability-informed control methods to ensure the safe and beneficial deployment of advanced AI systems. As AI systems grow more capable and autonomous, our ability to understand how and why they behave the way they do is crucial for retaining human control – from single models to agent populations.
Learn more
Contact us via galke@imada.sdu.dk