Researchers measure what AI loses when writing academic papers — a trade-off between sounding good and making things up

What happened

Computer scientists built a test to see how well AI coding agents can write research papers from just an outline. They found that one AI model (Claude) writes better-sounding papers but invents fake facts about 10 times per paper, while another (Codex) makes fewer things up but produces messier writing. This matters because academic papers are already entering peer review, and nobody yet knows how to catch AI-invented citations, data, or claims before they get published.

Why it matters

For the first time, we have a measurable picture of how AI fails at academic writing in specific, quantifiable ways. Until now, the risk has been abstract — 'AI might write papers with false information.' This paper shows the actual failure mode: better-looking papers hide more lies, and models that avoid lying sound worse, so there's no easy setting that optimizes for both. Research institutions and journals now have to choose whether they're willing to accept prettier papers with hidden factual errors, or demand readable papers with fewer invented claims. Neither is acceptable, and there's no clear technical fix coming soon.

The signal

Watch whether journals add AI-detection or fact-checking steps to their submission process in the next 12 months, or whether the first few highly-cited papers written by these AI agents eventually get retracted for hallucinated citations or data.