What type of study is this?

This is a Quantitative Study study.

October 19, 2025Open Access

Judging with Many Minds: Do More Perspectives Mean Less Prejudice? On Bias Amplifications and Resistance in Multi-Agent Based LLM-as-Judge

Key Points

Debate frameworks amplify biases sharply after initial discussions, leading to sustained bias in further rounds.
Position bias, verbosity bias, and bandwagon bias are systematically analyzed across frameworks for bias evaluation.
Incorporating a debiasing agent shows effective bias reduction in debate settings but less impact in meta-judge scenarios.
A comprehensive study reveals substantial bias behavior in multi-agent LLM-as-Judge systems, emphasizing mitigation needs.

Abstract

LLM-as-Judge has emerged as a scalable alternative to human evaluation, enabling large language models (LLMs) to provide reward signals in trainings. While recent work has explored multi-agent extensions such as multi-agent debate and meta-judging to enhance evaluation quality, the question of how intrinsic biases manifest in these settings remains underexplored. In this study, we conduct a systematic analysis of four diverse bias types: position bias, verbosity bias, chain-of-thought bias, and bandwagon bias. We evaluate these biases across two widely adopted multi-agent LLM-as-Judge frameworks: Multi-Agent-Debate and LLM-as-Meta-Judge. Our results show that debate framework amplifies biases sharply after the initial debate, and this increased bias is sustained in subsequent rounds, while meta-judge approaches exhibit greater resistance. We further investigate the incorporation of PINE, a leading single-agent debiasing method, as a bias-free agent within these systems. The results reveal that this bias-free agent effectively reduces biases in debate settings but provides less benefit in meta-judge scenarios. Our work provides a comprehensive study of bias behavior in multi-agent LLM-as-Judge systems and highlights the need for targeted bias mitigation strategies in collaborative evaluation settings.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Chiyu Ma

Evangelina Zhang

Yilun Zhao

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Judging with Many Minds: Do More Perspectives Mean Less Prejudice? On Bias Amplifications and Resistance in Multi-Agent Based LLM-as-Judge

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study