AI models in high-stakes jobs now have a way to measure how their confidence degrades over long conversations
What happened
AI models get less confident the longer you talk to them, especially if you try to convince them of something. This means the ways we currently check AI reliability for critical jobs, like in finance or healthcare, are not good enough for real conversations.
Why it matters
Companies building AI for sensitive tasks, like medical advice or financial guidance, assumed their models would stay reliable. This paper shows the AI's confidence can drop over time in a conversation. Its answers become less trustworthy. Developers now need to build systems that constantly re-evaluate how sure the AI is, not just at the start.
The signal
Watch for AI developers in finance and healthcare to start using these new metrics and methods in their product development and safety checks.