Key points are not available for this paper at this time.
Abstract Large language models (LLMs) have achieved impressive results in complex reasoning and knowledge tasks, yet they often struggle with factual accuracy and logical consistency. Prior works have improved LLM performance using prompt-based techniques (e.g. chain-of-thought prompting and self-consistency) and post-hoc self-refinement, but these typically operate on a single model instance. Recently, multi-agent debate frameworks have emerged as a complementary approach, wherein multiple LLM agents propose answers and critique each other’s reasoning to reach consensus. Such a “society of minds” approach has been shown to significantly improve mathematical reasoning and reduce factual hallucinations. However, existing debate methods use homogeneous agents with simple majority voting, limiting their effectiveness. In this work, we propose Adaptive Heterogeneous Multi-Agent Debate (A-HMAD), a novel framework that extends multi-agent debate with (i) diverse specialized agents, (ii) dynamic debate routing, and (iii) a learned consensus mechanism. Each agent in A-HMAD is assigned a distinct role or expertise (e.g. logical reasoning, factual verification, strategic planning), enabling more comprehensive error-checking and perspective diversity than identical agents. A coordination policy dynamically selects which agents contribute at each round based on the question’s domain and the evolving debate state. To aggregate viewpoints, we introduce a consensus optimizer that learns to weight each agent’s vote according to its reliability and the confidence of its arguments. On six challenging benchmarks – including arithmetic QA, grade-school math (GSM8K), multifact question answering (MMLU), factual biography generation, and chess strategy – our A-HMAD consistently outperforms prior single-model methods and the original multi-agent debate baseline. Notably, A-HMAD achieves 4–6% absolute accuracy gains over standard debate on these tasks, and reduces factual errors by over 30% in biography facts. We provide extensive ablations demonstrating the benefits of agent heterogeneity, additional debate rounds, and the learned consensus module. Our findings suggest that an adaptive, role-diverse debating ensemble can drive significant advances in LLM-based educational reasoning, paving the way for safer, more interpretable, and pedagogically reliable AI systems.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhou et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69403b822d562116f290bf86 — DOI: https://doi.org/10.1007/s44443-025-00353-3
Yan Zhou
Yanguang Chen
Journal of King Saud University - Computer and Information Sciences
South China Agricultural University
Shanghai University of Finance and Economics
Building similarity graph...
Analyzing shared references across papers
Loading...