Pathological voice synthesis represents a critical challenge in biomedical signal processing, as generated speech must replicate phoneme-specific distortions such as hoarseness, breathiness, and strain with high fidelity. Existing synthesis models often oversimplify voice quality by treating it as a global property, neglecting the segmental and phoneme-dependent nature of pathological manifestations. This study introduces a novel phoneme-specific quality assessment framework that formulates the evaluation problem as a multiobjective optimization task. Using Mordukhovich subdifferential analysis, the framework traces Pareto fronts for different acoustic metrics across Lithuanian vowels, consonants, and complex phonemes. Synthetic voice samples are then classified in an semi-supervised manner based on their proximity to the Pareto front, providing both a holistic quality score and phoneme-level diagnostic feedback. Experimental results on a corpus of 5,200 synthetic Lithuanian alaryngeal cancer substitution voices demonstrate that the proposed approach achieves robust convergence, strong cluster separation (mean silhouette score of 0.70), and reliable classification performance (mean F1 of 0.88 against experts), outperforming conventional assessment methods.
Building similarity graph...
Analyzing shared references across papers
Loading...
Maskeliūnas et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69a76073c6e9836116a2d331 — DOI: https://doi.org/10.1109/taslpro.2026.3661281
Rytis Maskeliūnas
Robertas Damasevicius
Kipras Pribusis
University of Health Science
Real-Time Innovations (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...