Asking an AI to think longer doesn't help — the first answer is usually the best one

What happened

Researchers tested large reasoning models and found that when you let them explore multiple solution paths, the extra thinking actually introduces more errors rather than better answers. This means the standard assumption about AI reasoning — that more computation time equals better results — is backwards for these systems.

Why it matters

For the past year, the AI industry has assumed that letting reasoning models spend more compute exploring alternatives would improve accuracy, the same way letting humans think longer about a problem usually helps. This paper shows the opposite: extra reasoning paths accumulate errors in a specific structural pattern. The practical implication is immediate: companies spending money on extended inference to improve accuracy may be wasting it. The approach they propose, instead of exploring alternatives, suppresses error growth in the first solution path and cuts compute by 37-70 percent while improving accuracy by up to 19 percent.

The signal

Whether major AI labs adopt similar 'first solution only' approaches in their production reasoning models, or continue betting on test-time scaling and see whether accuracy actually improves or stalls.