The world is being quietly rearranged by people who write very long documents.


The title they went with MAC-Attention: a Match-Amend-Complete Scheme for Fast and Accurate Attention Computation Noisy translates that to

New algorithm cuts the cost of running large AI models on long texts by up to 60 percent


Researchers created a faster method for how large language models process very long documents by reusing calculations from similar previous sentences instead of recomputing everything from scratch. This means the same AI model can now answer questions about 128,000-word documents in significantly less time and with lower computing costs, while maintaining the same accuracy as before.
Long-context AI is expensive because the model has to reread everything it's seen so far in the conversation or document every time it generates a new word — the computational equivalent of re-reading an entire book before writing each sentence. This technique removes that bottleneck by recognizing when a new sentence is similar to one just processed, copying the old calculation, and only fixing up the small differences. That matters because it directly affects the cost and speed of commercial AI services: cheaper processing means companies can either cut prices, serve more users with the same hardware, or make long-document analysis feasible for use cases that are currently too expensive to run at scale.
Watch whether major AI inference platforms (like those used by Claude, ChatGPT, or open-source model providers) integrate this method into production systems within the next 6–12 months — adoption by at least one major vendor would indicate whether the speedup holds up outside the lab.

If you insist
Read the original →