March 18, 2024Open Access

Controllable Speaking Styles Using A Large Language Model

Key Points

Key points are not available for this paper at this time.

Abstract

Reference-based Text-to-Speech (TTS) models can generate multiple, prosodically-different renditions of the same target text. Such models jointly learn a latent acoustic space during training, which can be sampled from during inference. Controlling these models during inference typically requires finding an appropriate reference utterance, which is non-trivial.Large generative language models (LLMs) have shown excellent performance in various language-related tasks. Given only a natural language query text (the 'prompt'), such models can be used to solve specific, context-dependent tasks. Recent work in TTS has attempted similar prompt-based control of novel speaking style generation. Those methods do not require a reference utterance and can, under ideal conditions, be controlled with only a prompt. But existing methods typically require a prompt-labelled speech corpus for jointly training a prompt-conditioned encoder.In contrast, we instead employ an LLM to directly suggest prosodic modifications for a controllable TTS model, using contextual information provided in the prompt. The prompt can be designed for a multitude of tasks. Here, we give two demonstrations: control of speaking style; prosody appropriate for a given dialogue context. The proposed method is rated most appropriate in 50% of cases vs. 31% for a baseline model.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Sigurgeirsson et al. (Mon,) studied this question.

www.synapsesocial.com/papers/68e7398bb6db6435876b2fda — DOI: https://doi.org/10.1109/icassp48485.2024.10448400

Authors

Atli Sigurgeirsson

Simon King

Actions

Institutions

University of Edinburgh

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Controllable Speaking Styles Using A Large Language Model

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Also consider