AI models fail basic women's health questions, a new test shows how

What happened

Researchers built a new test for AI models that give medical advice on women's health. It turns out, even the best models fail more than a quarter of the time and make unsafe errors.

Why it matters

For years, it was hard to tell exactly how well AI models handled sensitive medical topics like women's health. This new test gives developers and regulators a specific way to find out where models make mistakes, including unsafe omissions and dosing errors. It means AI tools for health can now be held to a higher, more specific standard.

The signal

Watch if AI developers start using this benchmark to improve their models, or if health regulators cite its findings in new guidance.