What happened
Researchers created a benchmark to measure whether AI image models (like CLIP) can actually forget sensitive information they've learned, and whether they forget cleanly without erasing unrelated knowledge. Right now, existing methods either fail to forget the sensitive stuff or accidentally erase too much — this benchmark gives the field a way to measure the problem precisely.
Why it matters
As multimodal AI models become embedded in real products, the ability to remove learned associations (like linking demographics to stereotypes) matters legally and practically — but there was no standard way to test whether deletion actually worked or what side effects it caused. This benchmark changes what's measurable.