The world is being quietly rearranged by people who write very long documents.


The title they went with LPC-SM: Local Predictive Coding and Sparse Memory for Long-Context Language Modeling Noisy translates that to

New language model architecture splits the job between fast local attention and slower long-term memory


Researchers built a language model that replaces the traditional all-purpose attention mechanism with specialized components: fast local attention for nearby context, persistent memory for distant information, and a predictive correction layer. In tests on 4,096-token sequences, the model stayed stable and improved performance on long-range reasoning tasks without scaling to larger sizes.
Every major language model today uses attention for everything—keeping track of nearby words and distant context in the same mechanism. This is computationally expensive and inefficient. The paper shows that splitting these jobs into separate parts produces measurably better results, which means future models might process long documents faster and with less memory. If this approach works at scale, it could make long-context AI cheaper to run.
Whether larger models (billions of parameters, not millions) built on this architecture outperform attention-only models on real-world long-document tasks like summarizing research papers or analyzing full codebases.

If you insist
Read the original →