First real-world benchmark shows AI still struggles with complex data work

What happened

Researchers created a benchmark testing AI systems on messy, real-world data analytics tasks—spreadsheets, databases, financial reports—that humans themselves find hard. The best AI models today can only complete about 59% of these tasks correctly, revealing a significant gap between lab performance and what businesses actually need.

Why it matters

For the first time, we have measured evidence of what AI systems actually struggle with in practice rather than on simplified academic tests, which means companies buying these tools now have concrete data showing what they can and cannot do instead of vendor claims.