Pathological voice synthesis represents a critical challenge in biomedical signal processing, as generated speech must replicate phoneme-specific distortions such as hoarseness, breathiness, and strain with high fidelity. Existing synthesis models often oversimplify voice quality by treating it as a global property, neglecting the segmental and phoneme-dependent nature of pathological manifestations. This study introduces a novel phoneme-specific quality assessment framework that formulates the evaluation problem as a multiobjective optimization task. Using Mordukhovich subdifferential analysis, the framework traces Pareto fronts for different acoustic metrics across Lithuanian vowels, consonants, and complex phonemes. Synthetic voice samples are then classified in an semi-supervised manner based on their proximity to the Pareto front, providing both a holistic quality score and phoneme-level diagnostic feedback. Experimental results on a corpus of 5,200 synthetic Lithuanian alaryngeal cancer substitution voices demonstrate that the proposed approach achieves robust convergence, strong cluster separation (mean silhouette score of 0.70), and reliable classification performance (mean F1 of 0.88 against experts), outperforming conventional assessment methods.
Maskeliūnas et al. (Thu,) studied this question.