AI language models are built almost entirely on American English — and researchers can now measure the cost
What happened
Researchers analyzed six major AI language models and found they're systematically trained on American English rather than British English or other varieties, despite both being standard forms used by hundreds of millions of people. This means the models work less efficiently with British spellings and word choices, and actively prefer American English in their outputs — a structural bias baked into the training data, tokenization process, and model behavior itself.
Why it matters
Language models deployed globally are quietly standardizing how English-speakers communicate. When a British person uses their own English variant, the AI has to work harder to understand it and will often 'correct' them toward American forms. The deeper problem: this isn't neutral. It reflects whose data was easiest to grab during training (American), whose digital infrastructure was most dominant (American), and whose language gets treated as the default. For countries where English is official or widely used, this means AI tools subtly push users toward one dialect, eroding linguistic diversity and treating entire regions' English as inferior variants of a 'correct' American standard.
The signal
Watch whether AI companies begin publishing dialect distribution metrics for their training data and tokenizers, or whether this remains invisible in model documentation.