AI Safety & Interpretability Lab

In the AI Safety & Interpretability Lab at SDU, we investigate how and why advanced AI systems behave the way they do and how we can use these insights to make them more reliable, interpretable, and safe.

Learn more

From The Probe

Jun 30, 2026

The Arbiter Agent

Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment by Filippo Tonini As AI systems built from multiple language-model agents become more common, they are increasingly tasked with collaborative decision-making: discussing,...

Jun 25, 2026

Auditability Accuracy Tradeoff

The Auditability-Accuracy Tradeoff by Lukas Galke Poech Monitoring reasoning traces of large language models is currently one of the most promising methods to detect when language models do not behave...