Larger AI models consistently produce 'higher-stage' thinking on a new test

What happened

Researchers built a new text-based test to measure how sophisticated an AI's thinking appears, based on a theory of human development. It turns out larger AI models consistently produce answers that score higher on this "developmental" scale.

Why it matters

Measuring how people interpret reality or how an AI might adapt to it used to require long, expert interviews. This paper introduces a short text test that can do it quickly. This means AI developers can now evaluate how "grown up" their models' thinking appears, and potentially design AI to match a user's perceived level of understanding.

The signal

Watch whether AI developers start using this new Developmental Sentence Completion Test (DSCT) to evaluate how their models interact with users.