Large language models solve planning puzzles better than humans thought — by doing math, not mimicking humans

What happened

Researchers tested whether advanced AI language models can solve block-stacking puzzles optimally, not just successfully. It turns out they do — they find near-perfect solutions even in complex cases where traditional planning algorithms fail, apparently by executing hidden mathematical reasoning rather than semantic pattern-matching.

Why it matters

This is about what's actually happening inside language models when they solve problems. For years, the assumption was that these models just pattern-match from their training data — if they've seen similar puzzles solved, they repeat the strategy. But this paper shows something weirder: they're doing actual algorithmic work, solving novel problems via what looks like internal simulation or geometric reasoning. That's a capability question, not a performance question. It suggests these models have a latent problem-solving ability that wasn't obvious from just measuring success rates.

The signal

Test whether this capability transfers to real planning domains — scheduling, logistics, supply chains — where optimal solutions matter more than just finding any solution, and where the cost difference between good and mediocre plans is measurable in dollars.