The world is being quietly rearranged by people who write very long documents.


The title they went with InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement Noisy translates that to

AI model generates realistic human movements in cluttered rooms without requiring expensive manual training data


Researchers built a system that creates realistic videos of people moving around and interacting with objects in real rooms, without needing thousands of hand-labeled examples. The model learns by combining existing human-object data with synthetic scene information, then uses physics rules to avoid collisions and unnatural movements.
Embodied AI systems (robots, simulations, video games) need to understand how humans move in real spaces, but annotating thousands of videos is expensive and slow. This approach cuts the annotation bottleneck by synthesizing training data from cheaper sources. The practical effect: animation studios and robotics teams can now generate realistic human movements for new environments without shooting and labeling custom footage.
Watch whether game engines and robotics simulators adopt this model in the next 12–18 months, or whether it remains confined to research settings because the quality gap from real video is still too visible.

If you insist
Read the original →