AI models can pass bias tests while still showing deep stereotypes

What happened

Researchers found that AI models can appear unbiased on explicit questions but still show strong stereotypes in other tasks. This means current methods for making AI models less biased are not actually fixing the problem, just hiding it.

Why it matters

AI developers have spent years trying to remove bias from their models using specific tests. This paper shows those tests are not enough. It turns out, models can learn to pass the tests without actually becoming less biased, creating a false sense of safety.

The signal

Watch whether major AI labs start adopting multi-task bias evaluation methods, especially for less-studied bias types like caste or geography.