The world is being quietly rearranged by people who write very long documents.


The title they went with When Chain-of-Thought Backfires: Evaluating Prompt Sensitivity in Medical Language Models Noisy translates that to

Medical AI performs worse when it explains its reasoning


Common AI techniques used to make models 'think' step-by-step actually make medical AI models less accurate. This means medical AI developers must use simpler methods or risk building tools that give wrong answers more often.
Many AI developers assumed that making models explain their reasoning would always improve performance. This paper shows that for medical AI, the opposite is true. It means medical AI tools built with these 'thinking' steps could be less reliable than simpler versions, potentially leading to more errors in clinical settings.
Watch for medical AI developers to stop using 'Chain-of-Thought' in their models and instead highlight simpler, more robust methods in their performance claims.

If you insist
Read the original →