AI safety tools can't tell if a chatbot is actually changing your mind

What happened

Researchers built a new way to measure if a chatbot is actually changing a person's mind. It turns out current AI safety tools cannot predict when this happens.

Why it matters

AI developers have been building safety tools that check for manipulative language. This paper shows those tools miss the actual problem: whether the chatbot makes someone believe something new. It means current safety checks are not measuring real-world impact on users.

The signal

Watch whether AI companies start using this new dataset and task to evaluate their models, or if they stick to older methods.