New test exposes how AI struggles with real-world legal reasoning

What happened

Researchers built a benchmark that tests whether AI language models can handle the messier parts of law — when rules change over time, when information is incomplete, and when the same facts lead to different outcomes depending on context. Most AI systems fail these tests badly, performing far worse than they do on simpler legal tasks that just require memorizing rules.

Why it matters

This reveals a real gap between what AI can do (memorize legal text) and what lawyers actually do (reason about how rules apply when circumstances shift). If courts or legal firms start deploying AI on the assumption it handles context the way humans do, they'll get wrong answers on cases where timing, missing information, or competing norms matter.