The world is being quietly rearranged by people who write very long documents.


The title they went with Council Mode: Mitigating Hallucination and Bias in LLMs via Multi-Agent Consensus Noisy translates that to

Researchers tested whether asking multiple AI models for answers and picking the consensus cuts hallucinations by a third


A team built a system that sends the same question to several different AI models in parallel, then uses a fourth model to synthesize their answers and flag where they agree or disagree. Testing shows this approach cuts hallucination rates by 36% on one benchmark and improves factual accuracy by 7.8 points on another, while reducing bias variance across domains.
The core problem with current AI systems is they sound confident whether they're right or wrong — and that confidence is hard to distinguish from actual knowledge. This work suggests a cheap mitigation exists: redundancy. If you route uncertain queries to multiple models and only trust the consensus, you catch more errors before they reach a user. The catch is obvious: it's slower and more expensive than using one model. Whether that tradeoff is worth it depends entirely on the use case — a medical diagnosis system probably says yes; a chatbot probably says no.
Watch whether commercial AI products start implementing consensus-style checking for high-stakes queries (legal documents, medical recommendations, financial advice) versus whether this stays confined to research settings where cost doesn't matter.

If you insist
Read the original →