The world is being quietly rearranged by people who write very long documents.


The title they went with Rubrics to Tokens: Bridging Response-level Rubrics and Token-level Rewards in Instruction Following Tasks Noisy translates that to

AI training method now assigns credit to individual words instead of rating whole responses


Researchers created a technique that tells an AI which specific words in its output were good or bad, rather than just scoring the entire answer. This means training signals are now granular enough to fix problems word-by-word instead of having to retrain on entire responses.
Until now, training language models on complex tasks meant giving them a single grade per output — like marking a 500-word essay with one number. That's noisy and slow. This method identifies which tokens (pieces of words) caused success or failure in a response, making the training signal roughly 500 times denser. The practical effect: training cycles get tighter feedback loops, models learn faster from fewer examples, and you can steer behavior at the token level instead of having to regenerate entire responses. This is an incremental efficiency gain, not a capability leap — but in the race to train larger models on instruction-following, efficiency at this scale compounds.
Check whether models trained with this method show faster convergence on instruction-following benchmarks with fewer training tokens, or whether the method requires so much extra computational overhead to track token-level attribution that the gains vanish in practice.

If you insist
Read the original →