The world is being quietly rearranged by people who write very long documents.


The title they went with When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models Noisy translates that to

AI models can be poisoned by emotional tone, not just specific phrases


Researchers found a new way to plant hidden commands in large language models. These commands activate when the model detects a certain emotional style, making them much harder to find than old methods.
Most ways to sneak bad instructions into AI models relied on specific words or phrases. Those are relatively easy to spot. This new method uses the emotional tone of a user's input to trigger a hidden command. An AI could be made to respond maliciously to, say, an angry customer, without anyone realizing it was programmed to do so. Securing these models just became much more complex.
Watch for new security tools or research focused on detecting and neutralizing 'style-based' backdoor attacks in large language models.

If you insist
Read the original →