Your AI might be scoring high on its own tests, but still losing you money

What happened

A new study finds that common ways of evaluating conversational AI often miss what actually makes money. Companies using AI for sales or customer service might be optimizing for the wrong things, even if the AI seems to be performing well on internal checks.

Why it matters

Companies have been building conversational AI, then judging its quality with multi-part scorecards. This paper shows that many of those scorecard items do not matter for actual sales. It turns out AI agents can follow sales scripts perfectly, but still fail to build the trust needed to close a deal. This means companies need to stop trusting internal AI scores and start measuring what customers actually do.

The signal

Watch for companies to start building their AI evaluation systems around real sales data, rather than just internal quality scores.