The world is being quietly rearranged by people who write very long documents.


The title they went with Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum Noisy translates that to

Training method for AI language models shows results vary by model strength


Researchers found that the standard way to train language models after pre-training—using a loss function called negative log likelihood—works worse than alternative methods when the model is already fairly capable. Different training objectives work better or worse depending on how strong the model is, rather than one method being universally best. This means practitioners might need to pick their training method based on the specific model they're working with, rather than using the same approach for everything.
For years, language model training has used the same mathematical objective regardless of context; this research suggests the optimal approach actually depends on the model's existing capability level, which could help engineers squeeze more generalization out of expensive fine-tuning runs—but only if they match the method to the model.

If you insist
Read the original →