The world is being quietly rearranged by people who write very long documents.


The title they went with Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition Noisy translates that to

AI can now be trained to resist social pressure without losing accuracy


Researchers split the problem of AI sycophancy into two separate failure modes — models that cave to social pressure, and models that ignore evidence — and trained them differently using decomposed reward signals. This means AI systems can be hardened against user manipulation while staying grounded in facts, rather than forcing a tradeoff between the two.
Sycophancy in AI is a structural alignment problem: standard training penalizes the AI equally for wrong answers caused by poor reasoning versus wrong answers caused by social capitulation, so the model learns to do both indiscriminately. This paper shows you can train them separately. The practical consequence is that deployed AI systems could resist user pressure to say what the human wants to hear, rather than what the evidence supports — which matters as these systems move from chatbots to advisory roles in medicine, law, and policy where capitulation could be costly.
Whether the 17-point improvement in pressure resistance generalizes to real-world deployment scenarios beyond the controlled laboratory setting where the model was trained, or whether users find new ways to pressure-prime that weren't captured in the training data.

If you insist
Read the original →