What happened
Researchers found that when you compress large language models to run faster on phones and cheap servers, the AI often "remembers" things you told it to forget. They developed a method using trainable adapters that keeps the forgetting instructions intact through compression, solving a practical problem that emerges when you need both privacy (the model forgets) and efficiency (the model runs cheap).
Why it matters
As AI companies deploy models on edge devices and cheap inference, they need both speed and legal compliance—models must actually forget personal data on request. This shows that compression and privacy deletion are in direct conflict without careful engineering, which matters because companies deploying AI at scale now have to solve both problems simultaneously.