The world is being quietly rearranged by people who write very long documents.


The title they went with HCRE: LLM-based Hierarchical Classification for Cross-Document Relation Extraction with a Prediction-then-Verification Strategy Noisy translates that to

Researchers test whether large language models can extract facts from multiple documents at once


A research paper shows that when asked to find connections between information scattered across separate documents, large language models don't automatically do better than smaller ones — they get confused by having too many possible answers to choose from. The researchers built a system that breaks the problem into smaller steps, reducing confusion and improving accuracy.
This is a narrow technical problem in one corner of AI research: how to make language models reliable when they need to link information across documents. The paper shows that more parameters and general knowledge don't automatically solve harder classification tasks — sometimes you have to change the architecture itself. The work is incremental rather than foundational, addressing a specific failure mode rather than unlocking new capability.
Whether this hierarchical approach generalizes to other domains where large language models struggle with many classification options, or remains specific to cross-document relation extraction.

If you insist
Read the original →