AI models can retrieve facts but fail at creative leaps — new benchmark shows the gap

What happened

Researchers built a benchmark that tests whether AI language models can actually solve real-world puzzles by combining knowledge from different domains, not just answer factual questions. The models retrieved relevant information correctly but failed to make the non-obvious creative connections needed to solve the problems, dropping accuracy by up to 17 percentage points when creativity was required.

Why it matters

This is the first benchmark that separates two different things: whether an AI model knows a fact versus whether it can use facts creatively to solve a novel problem. Most existing benchmarks measure only the first. What matters here is that the gap is enormous and systematic — the models consistently choked on the creative integration step, even when they had the raw knowledge. This suggests AI models are good at retrieval and pattern-matching but struggle with the kind of lateral thinking humans do naturally when solving unfamiliar problems.

The signal

Whether downstream AI applications that depend on creative problem-solving (research, design, strategy work, diagnosis in medicine or law) show similar performance drops when tested on real-world scenarios instead of synthetic tasks.