What happened
Researchers built a benchmark that tests whether AI language models can handle the messier parts of law — when rules change over time, when information is incomplete, and when the same facts lead to different outcomes depending on context. Most AI systems fail these tests badly, performing far worse than they do on simpler legal tasks that just require memorizing rules.
Why it matters
This reveals a real gap between what AI can do (memorize legal text) and what lawyers actually do (reason about how rules apply when circumstances shift). If courts or legal firms start deploying AI on the assumption it handles context the way humans do, they'll get wrong answers on cases where timing, missing information, or competing norms matter.