The world is being quietly rearranged by people who write very long documents.


The title they went with ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs Noisy translates that to

AI systems fail at controlling two arms together — a problem for robot learning


Researchers tested 30 large language models on a simple task: control a two-armed robot to pick up and move objects. The models could plan what to do but couldn't execute the physical movements — a gap between reasoning and doing that gets worse when the task requires real coordination.
This is the first time someone has measured a specific failure mode in AI systems that attempt to control physical robots: the models can think through a plan but can't convert that plan into precise, synchronized muscle commands. The practical problem is that today's leading AI systems excel at language and logic but struggle with the continuous, high-dimensional control signals that bodies require — which means companies building embodied AI assistants (robots that do physical tasks) will hit this wall immediately. The bottleneck isn't intelligence. It's the translation from thought to motion.
Whether the gap between planning and execution narrows as models get larger, or whether it persists regardless of scale — which would suggest the problem is structural, not just capacity.

If you insist
Read the original →