The world is being quietly rearranged by people who write very long documents.


The title they went with Backdoor Attacks on Decentralised Post-Training Noisy translates that to

One bad actor can trick an AI model trained in pieces


Researchers found a new way to trick large AI models. This attack works even when many different groups train the model, each handling only a small part. A single bad actor can secretly insert a "backdoor" into the model. This makes the AI give wrong answers when a specific trigger word appears, even after safety checks.
Many large AI models are too big for one company to build. They often get made using decentralized methods, where different teams or organizations handle different parts of the training. This paper shows that even if an attacker only controls a small, intermediate step in this process, they can still corrupt the final model. This means the security of these complex AI systems is only as strong as their weakest link, and that link might be much smaller than anyone thought.
Watch for new security standards or architectural changes in decentralized AI training pipelines that specifically address vulnerabilities in intermediate steps.

If you insist
Read the original →