The world is being quietly rearranged by people who write very long documents.


The title they went with When Identities Collapse: A Stress-Test Benchmark for Multi-Subject Personalization Noisy translates that to

AI image generators fail when asked to draw more than a few people together


Current AI text-to-image models can reliably generate recognizable versions of 2-4 specific people in one image, but catastrophically fail and produce blurry clones when asked to generate 6-10 people or when those people interact physically. This matters because it exposes a fundamental architectural limit: these models don't actually understand how to keep different identities separate at scale, they just get lucky on simple cases.
This is a stress-test that demonstrates current generative AI has hit a hard scaling wall in a seemingly simple task — maintaining distinct identities in a single image. The paper shows that standard metrics used to evaluate these models are misleading (they score identity-collapsed images as correct), which means the industry has been optimistic about progress it hasn't actually made. It's an honesty check about what these systems can and cannot do.

If you insist
Read the original →