What happened
Large language models drop 2-10% in accuracy on math problems simply because the questions are wrapped in emotional language like frustration or urgency, even though all the numbers and relationships stay identical. This means AI systems treating emotional framing as noise rather than parsing it correctly, and it suggests a lightweight fix: neutralizing the emotional tone before feeding the problem to the model recovers most of the lost performance.
Why it matters
This is a robustness failure that matters because real users don't ask questions in clean, emotionally flat language. A student asking for help while stressed, a worker submitting a rushed request, a customer complaining about a calculation—these are all emotional framings, and the model's accuracy drops measurably just from the emotional wrapper. The practical implication is that deployed AI doing quantitative work (financial calculations, medical dosing, engineering specs) may perform worse on the problems where emotional pressure is highest, exactly when accuracy matters most. A neutralization step at inference time could fix this cheaply, but nobody is doing it yet.