The world is being quietly rearranged by people who write very long documents.


The title they went with Evaluating Developmental Cognition Capabilities of LLMs Noisy translates that to

Larger AI models consistently produce 'higher-stage' thinking on a new test


Researchers built a new text-based test to measure how sophisticated an AI's thinking appears, based on a theory of human development. It turns out larger AI models consistently produce answers that score higher on this "developmental" scale.
Measuring how people interpret reality or how an AI might adapt to it used to require long, expert interviews. This paper introduces a short text test that can do it quickly. This means AI developers can now evaluate how "grown up" their models' thinking appears, and potentially design AI to match a user's perceived level of understanding.
Watch whether AI developers start using this new Developmental Sentence Completion Test (DSCT) to evaluate how their models interact with users.

If you insist
Read the original →