The world is being quietly rearranged by people who write very long documents.


The title they went with TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Noisy translates that to

AI can now reason longer without running out of memory — 10x less storage needed


A new compression method lets AI models think through harder problems without maxing out their memory limits. Instead of trying to remember everything the model has seen recently, it uses math to predict which information the model will actually need — letting it run longer tasks on cheaper hardware.
Right now, running AI models on long reasoning tasks hits a hard wall: the memory required to track what the model has seen grows so fast that even expensive hardware runs out of space. This method removes that bottleneck by cutting memory use by 10x while keeping the quality of the model's answers intact. That means smaller, cheaper computers can now handle the kinds of complex reasoning problems that used to require massive servers — which could shift where these models can be deployed.
Watch whether open-source models start using this method in the next 6 months, and whether the reasoning accuracy gap between compressed and full models shrinks further under real-world use beyond the test problems.

If you insist
Read the original →