The world is being quietly rearranged by people who write very long documents.


The title they went with Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy Noisy translates that to

LLMs hit a hard wall on formal logic — and it's not getting better with scale


Researchers built a test suite that measures how well AI language models can handle structured logical reasoning at different levels of complexity. It turns out current models fail badly at moderately difficult tasks and would need absurd amounts of computing power to get reliable at them — and even then they're much slower than traditional software tools designed for the same work.
Everyone building AI-for-coding tools assumes that bigger models and better training will eventually solve formal reasoning. This paper shows that's backwards — the problem isn't missing capability, it's that language models are fundamentally inefficient at tasks that need step-by-step logical verification. You can't fix that by scaling up. This means the boundary between where AI actually helps (writing prose, finding patterns) and where humans still need traditional tools (compilers, constraint solvers, formal proof checkers) is probably permanent, not temporary.
Watch whether teams actually building code-generation tools start shipping hybrid systems that use traditional symbolic solvers for verification instead of betting on pure LLM reasoning — or whether they keep pretending the LLM can do it.

If you insist
Read the original →