Medical AI for eye disease needs better datasets — this review maps where the gaps are

What happened

Diabetic retinopathy detection using AI has hit a hard ceiling: the datasets doctors use to train and test these systems are fragmented, regionally narrow, and poorly labeled. That means an AI trained on one hospital's images may fail when it hits a different patient population, different camera equipment, or different annotation styles — making these tools unreliable for actual clinical use.

Why it matters

For the past five years, researchers have built diabetic retinopathy AI systems and released them as production-ready. But they all trained on datasets that are geographically isolated, inconsistently labeled, and too small to cover real-world variation. This paper catalogs the problem in detail and shows that without standardized, large, multiethnic datasets with consistent lesion-level annotation, any AI system trained on these fragmented collections is essentially untested on the populations it will actually see. That's the gap between 'works in the lab' and 'works in the clinic.'

The signal

Track whether the next round of FDA-cleared diabetic retinopathy AI systems disclose their training data sources and validation performance across different ethnic groups and camera types — that would signal whether the field is actually responding to this standardization problem.