Abstract Voice-based virtual assistants enable hands-free operation, allowing users to perform tasks, access information, and control smart home devices through simple voice commands. Their growing ubiquity in smartphones, smart speakers, and other devices led to the flourish of more and more apps taking advantage of a Voice User Interface (VUI). VUI testing is far from trivial due to the wide variability in human speech (e.g., different accents, dialects, speech patterns), and the fact that users can express the same command in numerous ways, using different (but semantically equivalent) wordings and phrases. For this reason, techniques have been proposed to support VUI testing. The basic idea behind these specialized approaches is to generate paraphrases for the set of voice commands for which developers implemented support in the VUI. Preliminary results from a recent study suggest that specialized models can outperform a general-purpose LLM (ChatGPT). However, a simple prompt and interaction strategy with ChatGPT has been adopted. In other words, it is still unknown whether optimizing the LLM usage allows to obtain better results. In this paper, we aim to thoroughly study to what extent LLMs (ChatGPT, specifically) can be adopted to test VUIs. We focused on optimizing the used prompt and the interaction with the model. Our results show that an optimized use of LLMs results in new state-of-the-art performance for VUI testing in terms of number of correct and bug-revealing paraphrases. While introducing the generated paraphrases into the Voice Interaction Models of the skills allows to fix some bugs, we observe that many bugs remain, and some are even introduced by the generated paraphrases. Our results call for specialized approaches for fixing bugs in VUIs.
Building similarity graph...
Analyzing shared references across papers
Loading...
Emanuela Guglielmi
Angelica Spina
Gabriele Bavota
Empirical Software Engineering
Building similarity graph...
Analyzing shared references across papers
Loading...
Guglielmi et al. (Sat,) studied this question.
www.synapsesocial.com/papers/69fd7ee0bfa21ec5bbf071f5 — DOI: https://doi.org/10.1007/s10664-026-10859-7