Multilingual AI models organize around how languages look, not how they work

What happened

Researchers analyzed how multilingual language models store representations of different languages and found they organize primarily by writing system (script and spelling) rather than by deeper linguistic structure. This means an AI trained on multiple languages treats romanized Chinese as closer to English than to native-script Chinese, even though the underlying language is identical.

Why it matters

This reveals a blind spot in how current multilingual AI systems actually work. The models engineers assume are learning abstract linguistic patterns are instead latching onto surface-level orthographic cues. This matters because it suggests multilingual models may fail systematically on tasks where writing system differs from meaning—like code-switching between scripts, or processing the same language written multiple ways. It also hints that claims about AI developing a unified understanding across languages are premature: what looks like linguistic abstraction may just be script recognition.

The signal

Test whether multilingual models fail predictably when the same language is presented in different scripts, or when scripts switch mid-sentence—an observable failure mode that would confirm whether this orthographic bias creates real performance gaps in production systems.