The Probe | AI Safety & Interpretability Lab
AI Safety & Interpretability Lab

The Probe

The Probe is the blog of the AI Safety & Interpretability Lab at the University of Southern Denmark. The posts highlight selected topics from our research and updates from the lab.

Jun 30, 2026

The Arbiter Agent

Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment by Filippo Tonini As AI systems built from multiple language-model agents become more common, they are increasingly tasked with collaborative decision-making: discussing,...

Jun 25, 2026

Auditability Accuracy Tradeoff

The Auditability-Accuracy Tradeoff by Lukas Galke Poech Monitoring reasoning traces of large language models is currently one of the most promising methods to detect when language models do not behave...

Contact us via galke@imada.sdu.dk