The world is being quietly rearranged by people who write very long documents.


The title they went with XMark: Reliable Multi-Bit Watermarking for LLM-Generated Texts Noisy translates that to

Researchers solve the watermarking problem that blocks AI detection in short texts


A new technique embeds hidden tracking codes in text generated by large language models while keeping the text quality high and readable. This matters because existing watermarking methods fail on short outputs — the most common real-world case — making it hard to trace where AI-generated text actually came from.
Right now, watermarking LLM outputs is theoretically interesting but practically broken for the cases that happen most: a single paragraph, a product description, a customer service response. Existing methods either trash the text quality to embed the tracking code, or lose the code when the text is short. This paper shows a path to watermarking that survives in practical conditions. The structural problem is that once AI text is harder to detect and trace, accountability gets harder — and detection gaps have already become a bottleneck for deployment in regulated industries like finance and healthcare.
Watch whether production LLM companies adopt this method in their APIs, and whether watermarked text actually withstands removal attacks in adversarial testing — the code is public, so bad actors will test it immediately.

If you insist
Read the original →