AI trained to predict what happens next on a screen learns to automate tasks 16% better than before

What happened

Researchers found that teaching AI agents to predict future screen states works better than teaching them to copy human actions. This means AI that controls computers and websites can now be trained on cheap synthetic data instead of expensive human demonstrations, making it faster and cheaper to build automation software.

Why it matters

For years, the bottleneck in building AI that can control a computer has been collecting human demonstrations — expensive, slow, and limits scale. This paper shows that if you train the AI to understand the physics of how interfaces respond to actions (forward modeling), it scales much better and works across different websites and applications. It's the difference between teaching a driver to follow roads by showing them hours of dashcam footage versus teaching them to predict where the car will go if they turn the wheel.

The signal

Watch whether real software companies start shipping web automation tools trained this way in the next 18 months, and whether those tools work reliably on websites they haven't seen before.