LLMs can't generate multilingual explanations as well as English ones — even when they translate

What happened

Researchers tested large language models at generating counterfactuals (minimally edited inputs that change a model's prediction) across six languages. Direct generation in non-English languages produces lower-quality explanations than English, and translation-based workarounds require more edits and still underperform, meaning AI systems cannot explain their behavior equally well across languages.

Why it matters

AI systems are being deployed globally, but the tools we use to understand and debug those systems work better in English than anywhere else. This creates a structural asymmetry: if you're trying to audit an AI system for bias or failure in Portuguese or Hindi, you get worse explanations than someone auditing the same system in English. The gap isn't tiny either — translated counterfactuals required substantially more modifications to work at all. This matters because explainability isn't a luxury feature; it's what regulators and companies use to determine whether a system is trustworthy enough to deploy.

The signal

Watch whether multilingual debugging tools are built into the next generation of model evaluation software, or whether English remains the only language where model behavior is actually explainable.