Medical AI papers are mostly describing what LLMs could do, not what they actually do in hospitals

What happened

This is a review paper surveying how large language models are being applied across medical specialties like cancer care, dermatology, and mental health. It describes potential uses and existing limitations, but presents no deployment data, no measurement of actual patient outcomes, and no evidence that any of these applications are working in real clinical settings.

Why it matters

This paper is a symptom of a wider pattern in medical AI research: promising capabilities in controlled settings that never materialize into working tools hospitals actually use. The review canvasses six medical domains and finds potential everywhere, but potential is not the same as evidence. What matters is whether any of these applications reduce diagnosis time, improve accuracy on real patients, or save labor — and this paper provides none of those numbers. It's a catalog of what researchers think LLMs could do, not proof that they do it.

The signal

Watch whether any of the specific clinical applications cited in this review — diagnostic support in cancer care, dermatology screening, mental health assessment — show up in actual hospital procurement data or published deployment studies within 18 months. If they don't, the review is a record of research aspirations, not emerging medical practice.