Key points are not available for this paper at this time.
Conversational agents are transforming digital interactions across various domains, including healthcare, education, and customer service, thanks to advances in large language models (LLMs). As these systems become more autonomous and ubiquitous, understanding what constitutes high-quality interaction from a user perspective is increasingly critical. Despite growing empirical research, the field lacks a unified framework for defining, measuring, and designing user-perceived interaction quality in human–artificial intelligence (AI) dialogue. Here, we present an integrative review of 125 empirical studies published between 2017 and 2025, spanning text-, voice-, and LLM-powered systems. Our synthesis identifies three consistent layers of user judgment: a pragmatic core (usability, task effectiveness, and conversational competence), a social–affective layer (social presence, warmth, and synchronicity), and an accountability and inclusion layer (transparency, accessibility, and fairness). These insights are formalised into a four-layer interpretive framework—Capacity, Alignment, Levers, and Outcomes—operationalised via a Capacity × Alignment matrix that maps distinct success and failure regimes. It also identifies design levers such as anthropomorphism, role framing, and onboarding strategies. The framework consolidates constructs, positions inclusion and accountability as central to quality, and offers actionable guidance for evaluation and design. This research redefines interaction quality as a dialogic construct, shifting the focus from system performance to co-orchestrated, user-centred dialogue quality.
Marconi et al. (Mon,) studied this question.