November 24, 2025Open Access

Adaptive heterogeneous multi-agent debate for enhanced educational and factual reasoning in large language models

Key Points

Key points are not available for this paper at this time.

Abstract

Abstract Large language models (LLMs) have achieved impressive results in complex reasoning and knowledge tasks, yet they often struggle with factual accuracy and logical consistency. Prior works have improved LLM performance using prompt-based techniques (e.g. chain-of-thought prompting and self-consistency) and post-hoc self-refinement, but these typically operate on a single model instance. Recently, multi-agent debate frameworks have emerged as a complementary approach, wherein multiple LLM agents propose answers and critique each other’s reasoning to reach consensus. Such a “society of minds” approach has been shown to significantly improve mathematical reasoning and reduce factual hallucinations. However, existing debate methods use homogeneous agents with simple majority voting, limiting their effectiveness. In this work, we propose Adaptive Heterogeneous Multi-Agent Debate (A-HMAD), a novel framework that extends multi-agent debate with (i) diverse specialized agents, (ii) dynamic debate routing, and (iii) a learned consensus mechanism. Each agent in A-HMAD is assigned a distinct role or expertise (e.g. logical reasoning, factual verification, strategic planning), enabling more comprehensive error-checking and perspective diversity than identical agents. A coordination policy dynamically selects which agents contribute at each round based on the question’s domain and the evolving debate state. To aggregate viewpoints, we introduce a consensus optimizer that learns to weight each agent’s vote according to its reliability and the confidence of its arguments. On six challenging benchmarks – including arithmetic QA, grade-school math (GSM8K), multifact question answering (MMLU), factual biography generation, and chess strategy – our A-HMAD consistently outperforms prior single-model methods and the original multi-agent debate baseline. Notably, A-HMAD achieves 4–6% absolute accuracy gains over standard debate on these tasks, and reduces factual errors by over 30% in biography facts. We provide extensive ablations demonstrating the benefits of agent heterogeneity, additional debate rounds, and the learned consensus module. Our findings suggest that an adaptive, role-diverse debating ensemble can drive significant advances in LLM-based educational reasoning, paving the way for safer, more interpretable, and pedagogically reliable AI systems.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Zhou et al. (Mon,) studied this question.

www.synapsesocial.com/papers/69403b822d562116f290bf86 — DOI: https://doi.org/10.1007/s44443-025-00353-3

Authors

Yan Zhou

Yanguang Chen

Journals

Journal of King Saud University - Computer and Information Sciences

Actions

Institutions

South China Agricultural University

Shanghai University of Finance and Economics

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Adaptive heterogeneous multi-agent debate for enhanced educational and factual reasoning in large language models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion