The world is being quietly rearranged by people who write very long documents.


The title they went with Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty Noisy translates that to

Large language models fail at learning from mistakes — they double down instead


Researchers tested three major LLMs on a task where the right answer keeps changing, and found they get worse at learning from losses while staying stubborn after reversals. In practice, this means LLMs can't adapt well when the world changes — they'll keep repeating failed strategies instead of pivoting like humans do.
LLMs are increasingly deployed in real environments where conditions shift: chatbots that need to adjust to user feedback, AI systems managing supply chains or trading, autonomous agents making sequential decisions. This paper shows a basic limitation: these models learn from positive feedback (winning) fine, but they don't learn from negative feedback (losing). They perseverate — they stick with the losing strategy longer than humans would. The implication is blunt: if you build a system where an LLM has to adapt in real time when its assumptions break, it will be slower and more rigid than a human, or a simpler algorithm. That matters for deployment decisions nobody's asking yet.
Watch whether researchers start building reversal-learning tests into standard LLM benchmarks, or whether real-world deployments (trading systems, supply-chain AI, content moderation) start documenting cases where LLMs fail to adapt when the task structure shifts.

If you insist
Read the original →