What question did this study set out to answer?

To create an objective evaluation method for singing quality by extracting singers' voices from mixed audio using advanced models.

April 16, 2026Open Access

Siamese network contrastive learning model for the objective evaluation of singing sound quality

Key Points

To create an objective evaluation method for singing quality by extracting singers' voices from mixed audio using advanced models.
Developed SNN-SpEx+ integrating Siamese neural networks and contrastive learning.
Utilized parameter-sharing architecture for enhancing feature space alignment.
Conducted experiments on MUSDB18-HQ and NSynth-Singer datasets.
SNN-SpEx+ outperformed SpEx++ by 0.85 dB in SI-SDRi and 0.17 in PESQ.
For short references under 2 seconds, SI-SDRi dropped only 3.16 dB, which is 3.4 dB lower than SpEx++.
Provides an automatic, standardized evaluation tool for music education and singer selection.

Abstract

This study extracts target singers' voices from mixed audio with background music and noise, addressing the subjectivity, instability and lack of objective standards in traditional evaluation.An innovative SNN-SpEx+ method is proposed, combining Siamese neural network (SNN) and contrastive learning-based SpEx+.Its parameter-sharing twin architecture unifies the feature space for reference and mixed speech, breaking the bottleneck of feature space dislocation in traditional dual-network structures.Contrastive learning is integrated into vocal extraction to build a 'separation is learning' joint optimisation framework, enhancing adaptability to unknown singers and short reference voices.Experiments on MUSDB18-HQ and NSynth-Singer show SNN-SpEx+ outperforms SpEx++ by 0.85 dB in SI-SDRi and 0.17 in PESQ.For short references (<2 s), its SI-SDRi drops only 3.16 dB (3.4 dB lower than SpEx++), providing an automatic standardised evaluation tool for music education and singer selection with broad prospects.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Xiaochen Liang

Journals

International Journal of Information and Communication Technology

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Siamese network contrastive learning model for the objective evaluation of singing sound quality

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study