The world is being quietly rearranged by people who write very long documents.


The title they went with ARM: Advantage Reward Modeling for Long-Horizon Manipulation Noisy translates that to

Robot learns to fold towels by learning what progress looks like, not what the end goal is


Researchers built a system that teaches robots long-term tasks by having humans label whether each step moved forward, backward, or stayed stuck — instead of requiring humans to label every single correct action. This cuts the human annotation work dramatically while letting robots learn from messy, incomplete demonstrations.
For years, teaching robots to do complex multi-step tasks required either massive amounts of human labeling (expensive and fragile) or perfect demonstrations (rare in practice). This approach sidesteps both problems by asking humans to judge only the direction of progress, which is cognitively easy and consistent across different people. The practical shift: robots can now learn from real, imperfect data people actually have, not pristine synthetic data someone has to build.
Watch whether other robot learning teams adopt this labeling approach and whether it scales beyond the towel-folding benchmark to real warehouse or manufacturing tasks where incomplete or failed attempts are the norm.

If you insist
Read the original →