The world is being quietly rearranged by people who write very long documents.


The title they went with CALRK-Bench: Evaluating Context-Aware Legal Reasoning in Korean Law Noisy translates that to

New test exposes how AI struggles with real-world legal reasoning


Researchers built a benchmark that tests whether AI language models can handle the messier parts of law — when rules change over time, when information is incomplete, and when the same facts lead to different outcomes depending on context. Most AI systems fail these tests badly, performing far worse than they do on simpler legal tasks that just require memorizing rules.
This reveals a real gap between what AI can do (memorize legal text) and what lawyers actually do (reason about how rules apply when circumstances shift). If courts or legal firms start deploying AI on the assumption it handles context the way humans do, they'll get wrong answers on cases where timing, missing information, or competing norms matter.

If you insist
Read the original →