AI still can't automate how software developers estimate work — RAG performs no better than basic baselines

What happened

Researchers tested whether a retrieval-based AI system could replace the manual process of estimating how long software tasks will take. The AI didn't work better than simpler existing methods, and the results showed no meaningful improvement across different projects or embedding models.

Why it matters

This is a narrow technical failure — it shows that even RAG-based systems struggle with a task that looks automatable but isn't: converting vague task descriptions into time estimates requires judgment that depends on team context, not information retrieval. The paper matters because it's honest about the gap between what looks like a reasonable automation target and what actually can be automated. The implicit signal is that estimation meetings, despite being tedious, are capturing something real that documents alone don't contain.

The signal

Watch whether organizations stop trying to automate story-point estimation, or whether someone figures out what the RAG approach was actually missing — the difference between those two outcomes tells you whether this is just an engineering tuning problem or a structural limitation.