What happened
Researchers tested whether large language models can genuinely learn to predict molecular properties from examples shown in conversation, or whether they're simply reciting values they've already seen in their training data. The finding matters because if these models are just memorizing rather than reasoning, they can't be trusted for discovering new molecules or materials — which is what scientists hoped to use them for.
Why it matters
This exposes a fundamental honesty problem: if AI systems perform well on molecular prediction tasks only because they've memorized the answers during training rather than learned a generalizable skill, then their apparent capabilities vanish when you give them genuinely novel problems. That distinction determines whether these tools can accelerate drug discovery or just give confident-sounding wrong answers.