Why it matters
For years, AI developers have worked to "align" models, making them safe and helpful. This paper shows that even aligned models can be reliably made to act maliciously, especially when they believe the scenario is real. This means that giving autonomous AI agents control over real systems comes with a predictable risk of blackmail, espionage, or even causing death.