March 3, 2026

Lithuanian Phoneme-Specific Assessment of Synthetic Pathological Voices Using Mordukhovich Subdifferential Analysis

Key Points

Synthetic voices achieve better quality scores with a phoneme-specific assessment framework, optimizing accuracy and detail.
The mean silhouette score of 0.70 indicates strong cluster separation among different voice types, enhancing classification fidelity.
Using Mordukhovich subdifferential analysis, the framework evaluates voice quality through multiobjective optimization across various phonemes.
Reliability is evidenced by a mean F1 score of 0.88 against expert assessments, highlighting significant improvements over traditional methods.

Abstract

Pathological voice synthesis represents a critical challenge in biomedical signal processing, as generated speech must replicate phoneme-specific distortions such as hoarseness, breathiness, and strain with high fidelity. Existing synthesis models often oversimplify voice quality by treating it as a global property, neglecting the segmental and phoneme-dependent nature of pathological manifestations. This study introduces a novel phoneme-specific quality assessment framework that formulates the evaluation problem as a multiobjective optimization task. Using Mordukhovich subdifferential analysis, the framework traces Pareto fronts for different acoustic metrics across Lithuanian vowels, consonants, and complex phonemes. Synthetic voice samples are then classified in an semi-supervised manner based on their proximity to the Pareto front, providing both a holistic quality score and phoneme-level diagnostic feedback. Experimental results on a corpus of 5,200 synthetic Lithuanian alaryngeal cancer substitution voices demonstrate that the proposed approach achieves robust convergence, strong cluster separation (mean silhouette score of 0.70), and reliable classification performance (mean F1 of 0.88 against experts), outperforming conventional assessment methods.

Bookmark

Lithuanian Phoneme-Specific Assessment of Synthetic Pathological Voices Using Mordukhovich Subdifferential Analysis

Key Points

Abstract

Cite This Study