The world is being quietly rearranged by people who write very long documents.


The title they went with UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics Noisy translates that to

AI trained to predict what happens next on a screen learns to automate tasks 16% better than before


Researchers found that teaching AI agents to predict future screen states works better than teaching them to copy human actions. This means AI that controls computers and websites can now be trained on cheap synthetic data instead of expensive human demonstrations, making it faster and cheaper to build automation software.
For years, the bottleneck in building AI that can control a computer has been collecting human demonstrations — expensive, slow, and limits scale. This paper shows that if you train the AI to understand the physics of how interfaces respond to actions (forward modeling), it scales much better and works across different websites and applications. It's the difference between teaching a driver to follow roads by showing them hours of dashcam footage versus teaching them to predict where the car will go if they turn the wheel.
Watch whether real software companies start shipping web automation tools trained this way in the next 18 months, and whether those tools work reliably on websites they haven't seen before.

If you insist
Read the original →