The world is being quietly rearranged by people who write very long documents.


The title they went with CRISP: Compressed Reasoning via Iterative Self-Policy Distillation Noisy translates that to

AI reasoning models now think faster without getting dumber — a compression technique cuts token use by half while improving accuracy


Researchers built a method that teaches AI reasoning models to think out loud less and think more efficiently instead. By having models teach themselves to be concise, they cut the number of tokens (computational steps) by 40–60% while actually improving accuracy on math and planning tasks.
Reasoning models are expensive to run because they work by talking through problems step-by-step, generating thousands of tokens per query. This technique forces models to compress their reasoning without sacrificing correctness — which means the same hardware can handle more requests, or smaller models can solve harder problems. The finding that self-compression works better than explicit token targets suggests the future of efficient reasoning isn't about brute-force constraints, but teaching models to naturally think more carefully about what matters.
Whether deployed reasoning systems (Claude with extended thinking, OpenAI o1, Qwen) adopt self-distillation methods within the next 12 months, which would show this isn't just a research trick but a production improvement that companies actually need.

If you insist
Read the original →