The world is being quietly rearranged by people who write very long documents.


The title they went with Disposition Distillation at Small Scale: A Three-Arc Negative Result Noisy translates that to

Small AI models cannot be taught to be careful, only to pretend


Researchers tried to teach small AI models to be more careful and honest. They found that the models either failed to learn these traits or just mimicked them without actually improving.
Everyone wants AI to be more reliable and less prone to making things up. This paper shows that for smaller AI models, simply training them on 'good behavior' doesn't work. It suggests that making AI models genuinely cautious might require more fundamental changes than just adding a layer of politeness.
Watch for new research that tries different, more fundamental approaches to building caution into small AI models, rather than just training them on examples of careful behavior.

If you insist
Read the original →