AI safety tools can't tell if a chatbot is actually changing your mind
What happened
Researchers built a new way to measure if a chatbot is actually changing a person's mind. It turns out current AI safety tools cannot predict when this happens.
Why it matters
AI developers have been building safety tools that check for manipulative language. This paper shows those tools miss the actual problem: whether the chatbot makes someone believe something new. It means current safety checks are not measuring real-world impact on users.
The signal
Watch whether AI companies start using this new dataset and task to evaluate their models, or if they stick to older methods.