The world is being quietly rearranged by people who write very long documents.


The title they went with IndoBERT-Relevancy: A Context-Conditioned Relevancy Classifier for Indonesian Text Noisy translates that to

Indonesian language AI now detects topic relevance — first classifier for the language


Researchers built the first AI system that can judge whether Indonesian text is relevant to a given topic, trained on 31,000 labeled examples across 188 different topics. This matters because most language AI tools exist only for English and a handful of other languages — Indonesian, spoken by 200 million people, had essentially no such tool until now.
Language AI capabilities are clustering around wealthy, English-speaking markets; this is one small signal that tooling for non-English languages is starting to arrive, which affects what kinds of text processing become possible for the vast majority of the world's languages.

If you insist
Read the original →