The world is being quietly rearranged by people who write very long documents.


The title they went with When simulations look right but causal effects go wrong: Large language models as behavioral simulators Noisy translates that to

AI can describe how people think but not predict if interventions actually work


Researchers tested whether large language models can predict whether climate interventions will change people's behavior, using data from 59,508 people across 62 countries. The models could describe observed attitudes accurately but failed to predict which interventions actually move people — the fit looked good on paper but the causal predictions were wrong, especially for interventions requiring emotional engagement rather than simple information.
This matters because researchers and policymakers increasingly use AI to simulate how populations will respond to interventions before running expensive trials. The trap: an AI can look right on descriptive metrics while being fundamentally wrong about causation. You can't see the difference just by checking the fit. The second problem is worse: AI simulations showed equal accuracy across countries with different wealth levels, but when checked against actual causal effects, accuracy varied wildly by country — meaning simulations masked the exact disparities that matter most for fairness.
Watch whether intervention studies start explicitly testing causal prediction accuracy (not just descriptive fit) before deploying simulations at scale, or whether institutions begin discovering misaligned effects only after running real-world pilots.

If you insist
Read the original →