What happened
Researchers created a benchmark testing AI systems on messy, real-world data analytics tasks—spreadsheets, databases, financial reports—that humans themselves find hard. The best AI models today can only complete about 59% of these tasks correctly, revealing a significant gap between lab performance and what businesses actually need.
Why it matters
For the first time, we have measured evidence of what AI systems actually struggle with in practice rather than on simplified academic tests, which means companies buying these tools now have concrete data showing what they can and cannot do instead of vendor claims.