Language models get tricked by weird sentences in ways humans don't

What happened

Researchers tested how language models and humans judge whether sentences make sense or break into figurative language when they don't. Language models failed at the nuanced part — when a sentence is implausible, the models just assume it's figurative instead of recognizing it as actually wrong, while humans can tell the difference.

Why it matters

This is a measurement of a real gap in how language models process meaning. Humans can detect when something is genuinely implausible versus when it's poetic or intentionally weird — a distinction that matters if you're using these models to catch errors, detect lies, or understand what someone actually said. The models' shortcut (implausible equals figurative) would be a problem in any domain where you need to know whether something is actually false or just creatively stated.

The signal

Watch whether follow-up work finds this bias shows up in real-world tasks — content moderation, fact-checking, or medical text understanding — where mistaking 'false' for 'figurative' would have actual consequences.