What question did this study set out to answer?

The aim is to improve user satisfaction in dialogue systems by optimizing overall dialogue impressions.

March 3, 2026Open Access

Training Dialogue Systems Using Entire Dialogue Context Evaluation

Key Points

The aim is to improve user satisfaction in dialogue systems by optimizing overall dialogue impressions.
Constructed multiple reward models to assess dialogue impressions based on metrics.
Applied reinforcement learning from AI feedback to fine-tune dialogue models.
Compared capabilities of prompt-based and supervised fine-tuning approaches to reward model development.
Improved evaluation metrics for entire dialogue impressions using tailored reward models.
Enhanced the naturalness of dialogue responses post-model fine-tuning.

Abstract

To enhance user satisfaction in dialogue with a system, it is essential not only to ensure that individual responsesare natural but also to improve the entire dialogue impression, including consistency, personality, and empathy.However, methods for optimizing dialogue systems to such entire dialogue impressions remain unclear. Inrecent studies on large language model (LLM)-based dialogue systems, Reinforcement Learning from AI Feedback(RLAIF) has emerged as a promising approach for improving the consistency and quality of entire dialogue impressions.When applying RLAIF with language models, LLM-based reward models guide the adaptation of the dialoguemodel. However, even with the capabilities of today ’s high-performing language models, it remains extremely challengingto derive accurate reward signals from zero-shot or few-shot prompts. To address this issue, we first constructmultiple reward models that assess entire dialogue impressions based on 12 evaluation metrics. These reward modelsare built using both prompt-based approaches and supervised fine-tuning (SFT), and their respective capabilities areempirically compared. The most effective reward model is then used to fine-tune the dialogue model to improvethe entire dialogue impression. Both automatic and human evaluations demonstrate that leveraging a reward modeltrained to assess entire dialogue impressions leads not only to improvements in evaluation metrics for entire dialogueimpressions but also to enhanced naturalness of the responses.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Yoshida et al. (Sat,) studied this question.

www.synapsesocial.com/papers/69a67e0ef353c071a6f09fcd — DOI: https://doi.org/10.1527/tjsai.41-2_ids26-b

Authors

Kai Yoshida

Masahiro Mizukami

Seiya Kawano

Journals

Transactions of the Japanese Society for Artificial Intelligence

Actions

Institutions

Nara Institute of Science and Technology

NTT (Japan)

Kyoto Institute of Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Training Dialogue Systems Using Entire Dialogue Context Evaluation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion