The world is being quietly rearranged by people who write very long documents.


The title they went with AIDABench: AI Data Analytics Benchmark Noisy translates that to

First real-world benchmark shows AI still struggles with complex data work


Researchers created a benchmark testing AI systems on messy, real-world data analytics tasks—spreadsheets, databases, financial reports—that humans themselves find hard. The best AI models today can only complete about 59% of these tasks correctly, revealing a significant gap between lab performance and what businesses actually need.
For the first time, we have measured evidence of what AI systems actually struggle with in practice rather than on simplified academic tests, which means companies buying these tools now have concrete data showing what they can and cannot do instead of vendor claims.

If you insist
Read the original →