The world is being quietly rearranged by people who write very long documents.


The title they went with Analysis of Optimality of Large Language Models on Planning Problems Noisy translates that to

Large language models solve planning puzzles better than humans thought — by doing math, not mimicking humans


Researchers tested whether advanced AI language models can solve block-stacking puzzles optimally, not just successfully. It turns out they do — they find near-perfect solutions even in complex cases where traditional planning algorithms fail, apparently by executing hidden mathematical reasoning rather than semantic pattern-matching.
This is about what's actually happening inside language models when they solve problems. For years, the assumption was that these models just pattern-match from their training data — if they've seen similar puzzles solved, they repeat the strategy. But this paper shows something weirder: they're doing actual algorithmic work, solving novel problems via what looks like internal simulation or geometric reasoning. That's a capability question, not a performance question. It suggests these models have a latent problem-solving ability that wasn't obvious from just measuring success rates.
Test whether this capability transfers to real planning domains — scheduling, logistics, supply chains — where optimal solutions matter more than just finding any solution, and where the cost difference between good and mediocre plans is measurable in dollars.

If you insist
Read the original →