The world is being quietly rearranged by people who write very long documents.


The title they went with Towards the AI Historian: Agentic Information Extraction from Primary Sources Noisy translates that to

Historians can now query AI to extract data from old documents — no fixed pipeline required


A new AI tool lets historians convert scans of primary sources into structured data by talking to an AI agent in plain language, rather than forcing all documents through one fixed extraction process. This means researchers can adapt the tool to different types of historical sources and test whether the AI is actually working on their specific materials.
For decades, historians working with large document collections either hired teams to manually transcribe and code materials, or used software designed for modern digital text — neither fit the messy reality of old papers, handwriting, and degraded scans. This tool lets historians use AI without surrendering control to a pre-built system. The real question is whether it actually works: can historians trust the extractions enough to build arguments on them, or does the AI introduce enough errors to make the output useless for scholarship.
Track whether historians actually adopt this on substantial archival projects in the next 18 months, and whether published historical scholarship starts citing datasets extracted this way as reliable evidence.

If you insist
Read the original →