AI forecasting works best when markets are uncertain — then gets worse as consensus builds

What happened

Researchers tested 10 AI language models on real prediction markets and found they perform well early in a market's lifecycle when outcomes are uncertain, but deteriorate as the market approaches resolution and consensus hardens. This means AI forecasts are most useful when humans are genuinely undecided, but become less reliable as information accumulates and the crowd converges on an answer.

Why it matters

Most AI benchmarks test models on static datasets or artificial tasks where nothing changes. This tests AI forecasters against real markets where the information environment shifts — meaning it's actually measuring something close to how these systems would perform if deployed. The results are blunt: AI is good at weighing competing signals when signals conflict, but worse at knowing when to defer to consensus. That's the opposite of what you'd want in a forecasting tool meant to augment human judgment in high-stakes decisions.

The signal

Watch whether prediction market platforms actually start using these multi-model ensembles in live markets, and whether they reduce errors compared to raw market prices or individual human forecasters.