The world is being quietly rearranged by people who write very long documents.


The title they went with How AI Aggregation Affects Knowledge Noisy translates that to

AI trained on AI summaries learns worse than AI trained on raw data — speed of retraining matters


When AI systems feed their own summaries back into the training pipeline for future AI systems, learning degrades unless the retraining happens slowly enough. This means that fast-moving AI aggregation systems that synthesize and re-synthesize outputs create a degradation loop — a sort of intellectual inbreeding — unless you deliberately slow down the feedback cycle.
Every AI model trained on internet text is now being trained on AI-generated text. As this compounds, the question becomes: does the feedback loop corrupt what the next model learns, and if so, how fast does the corruption happen? This paper finds a specific answer: it depends on the speed of the cycle. But there's a second finding that actually matters: local aggregators (systems trained on specific communities or topics) robustly improve learning across different conditions, while global aggregators trained on everything do worse in at least some cases. This suggests that one centralized AI system trying to capture all knowledge is structurally worse than many smaller systems trained on specific domains.
Watch whether large language models trained on data that includes previous LLM outputs show measurable divergence from models trained on only human-generated text, and whether that divergence accelerates or stabilizes over retraining cycles.

If you insist
Read the original →