The world is being quietly rearranged by people who write very long documents.


The title they went with TimeSeek: Temporal Reliability of Agentic Forecasters Noisy translates that to

AI forecasting works best when markets are uncertain — then gets worse as consensus builds


Researchers tested 10 AI language models on real prediction markets and found they perform well early in a market's lifecycle when outcomes are uncertain, but deteriorate as the market approaches resolution and consensus hardens. This means AI forecasts are most useful when humans are genuinely undecided, but become less reliable as information accumulates and the crowd converges on an answer.
Most AI benchmarks test models on static datasets or artificial tasks where nothing changes. This tests AI forecasters against real markets where the information environment shifts — meaning it's actually measuring something close to how these systems would perform if deployed. The results are blunt: AI is good at weighing competing signals when signals conflict, but worse at knowing when to defer to consensus. That's the opposite of what you'd want in a forecasting tool meant to augment human judgment in high-stakes decisions.
Watch whether prediction market platforms actually start using these multi-model ensembles in live markets, and whether they reduce errors compared to raw market prices or individual human forecasters.

If you insist
Read the original →