The world is being quietly rearranged by people who write very long documents.


The title they went with CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks Noisy translates that to

Researchers build AI defense that learns from repeated attacks — 79% success rate blocking them


A team created a system where multiple AI agents work together to defend language models against attacks that get smarter across multiple rounds of interaction. Instead of blocking attacks the same way each time, the system remembers what happened before and adapts its defenses, reducing successful attacks by nearly 80% compared to existing defenses.
Language models are now being deployed in real systems where adversaries can attack them repeatedly, refining their approach each round — like a hacker probing a bank vault multiple times. Previous defenses were static, designed to block one attack the same way every time, which doesn't work when attackers learn and shift tactics. This research shows that defenses can themselves learn and adapt across rounds, which means the cost of attacking a deployed language model just went up significantly.
Watch whether this defense gets tested against adversaries who know the defense exists and design attacks specifically to evade it — the real test is not whether it blocks known attacks, but whether it holds against attackers who see it coming.

If you insist
Read the original →