What question did this study set out to answer?

The aim is to evaluate and improve role-playing agents through a new benchmark for multi-character interactions in a text-speech context.

May 8, 2026

OmniCharacter++: Towards Comprehensive Benchmark for Realistic Role-Playing Agents

Key Points

The aim is to evaluate and improve role-playing agents through a new benchmark for multi-character interactions in a text-speech context.
Introduced the OmniCharacter++ benchmark comprising a large-scale dataset of 10,287 characters and 118,017 multi-turn dialogues.
Developed the UniCharacter-7B model capable of handling multi-character dynamics with a focus on vocal fidelity and semantic alignment.
Assessed the performance of state-of-the-art models using the comprehensive evaluation suite for dialogue quality and naturalness.
UniCharacter-7B produces more realistic and consistent role-playing responses in terms of attractiveness and consistency.
OmniCharacter++ presents significant challenges that current models struggle to meet, indicating areas for future improvement.

Abstract

Existing Role-Playing Agents (RPAs), powered by large language models, are predominantly evaluated on static, text-only, dyadic conversations, which inadequately reflect the complexity of realistic human interactions involving multiple interlocutors and multi-modal communication. To bridge this gap, we propose OmniCharacter++, the first benchmark for evaluating multi-character interactions in a joint text-speech context. Specifically, OmniCharacter++ contributes: (1) a large-scale dataset comprising 10,287 characters, 118,017 multi-turn dialogues, and over one million audio responses across 8 open-world topics and 31 subfields, covering diverse multi-modal role-playing scenarios; (2) a comprehensive evaluation suite for dialogue understanding, generation quality, and perceptual naturalness; and (3) UniCharacter-7B, a unified text-speech model trained on this dataset to manage complex multi-character dynamics, ensuring both role-specific vocal fidelity and cross-participant semantic alignment. Experimental results demonstrate that UniCharacter-7B achieves more realistic and consistent role-playing responses in terms of both attractiveness and consistency, while also highlighting that OmniCharacter++ poses substantial challenges for state-of-the-art models, charting a clear path for future research. The Code is publicly available at: https://github.com/zchoi/OmniCharacter-plus.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Zhang et al. (Thu,) studied this question.

www.synapsesocial.com/papers/69fd7ddcbfa21ec5bbf0620e — DOI: https://doi.org/10.1109/tpami.2026.3690447

Authors

Haonan Zhang

Pengpeng Zeng

J Q Zhang

Journals

IEEE Transactions on Pattern Analysis and Machine Intelligence

Actions

Institutions

Tongji University

University of Trento

University of Electronic Science and Technology of China

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

OmniCharacter++: Towards Comprehensive Benchmark for Realistic Role-Playing Agents

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion