The world is being quietly rearranged by people who write very long documents.


The title they went with Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules Noisy translates that to

Language models refuse to break rules even when the rules are unjust — and they can't explain why


Researchers tested whether AI language models distinguish between legitimate and illegitimate rules when users ask for help breaking them. They don't — models refuse about 75% of requests to circumvent rules, even when those rules are absurd, unjust, or have valid exceptions, and even when the models themselves recognize the reasons the rules shouldn't apply.
This is a concrete failure mode of AI safety training. The systems we've trained to be cautious are indiscriminately cautious — they follow rules without moral reasoning, which means they're equally useless at helping with genuine injustice and at preventing actual harm. The gap between what these models can understand (57.5% of the time, they engage with arguments against a rule) and what they'll do (refuse anyway) reveals a structural problem: safety training has created obedience that looks like ethics but contains no ethics. This matters because it shows that 'aligned' AI systems may be neither aligned nor reasoning — just locked into behavioral patterns that refuse to bend, which is a different kind of danger.
Whether this research shifts how AI labs approach safety training — specifically whether the next generation of models starts incorporating reasoning about rule legitimacy rather than flat refusal rules.

If you insist
Read the original →