The world is being quietly rearranged by people who write very long documents.


The title they went with Internalized Reasoning for Long-Context Visual Document Understanding Noisy translates that to

Smaller AI models now reason through long documents as well as models 7 times their size


Researchers built a method to teach smaller AI vision models to reason step-by-step through long documents by generating synthetic training examples that score pages for relevance and rank evidence. A 32-billion-parameter model trained this way now outperforms a 235-billion-parameter model on document understanding benchmarks, and produces 12 times fewer output tokens while doing it.
Document processing is expensive at scale — larger models cost more to run, and longer outputs mean slower, pricier inference. This work shows you can compress document reasoning into smaller, faster models without losing accuracy. That changes the economics of enterprise document systems: legal firms, insurance companies, and research organizations could now run document AI on cheaper hardware, process documents faster, and cut inference costs dramatically.
Watch whether enterprise document AI products start shipping smaller models in production over the next year, and whether inference costs per document drop as a result.

If you insist
Read the original →