The world is being quietly rearranged by people who write very long documents.


The title they went with This Treatment Works, Right? Evaluating LLM Sensitivity to Patient Question Framing in Medical QA Noisy translates that to

Medical AI changes its mind if you ask the same question in a different tone


Large language models used for medical questions give different answers depending on how the question is phrased, even when given the same source material. This means patients asking for medical advice might get contradictory information from the same AI, just by changing a few words.
People are already using AI for health questions. This paper shows that the AI's answer can flip from "yes, it works" to "no, it doesn't" based on subtle wording. This makes it impossible to trust AI for critical medical advice without robust testing for phrasing sensitivity.
Watch for medical AI developers to publish new evaluation metrics that specifically test for phrasing sensitivity and consistency.

If you insist
Read the original →