The Probe
The Probe is the blog of the AI Safety & Interpretability Lab at the University of Southern Denmark. The posts highlight selected topics from our research and updates from the lab.
Jun 30, 2026
The Arbiter Agent
Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment by Filippo Tonini As AI systems built from multiple language-model agents become more common, they are increasingly tasked with collaborative decision-making: discussing,...
Jun 25, 2026
Auditability Accuracy Tradeoff
The Auditability-Accuracy Tradeoff by Lukas Galke Poech Monitoring reasoning traces of large language models is currently one of the most promising methods to detect when language models do not behave...