The Probe

The Probe is the blog of the AI Safety & Interpretability Lab at the University of Southern Denmark. The posts highlight selected topics from our research and updates from the lab.

Jun 30, 2026

The Arbiter Agent

Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment by Filippo Tonini As AI systems built from multiple language-model agents become more common, they are increasingly tasked with collaborative decision-making: discussing,...

Jun 25, 2026

Auditability Accuracy Tradeoff

The Auditability-Accuracy Tradeoff by Lukas Galke Poech Monitoring reasoning traces of large language models is currently one of the most promising methods to detect when language models do not behave...