This study extracts target singers' voices from mixed audio with background music and noise, addressing the subjectivity, instability and lack of objective standards in traditional evaluation.An innovative SNN-SpEx+ method is proposed, combining Siamese neural network (SNN) and contrastive learning-based SpEx+.Its parameter-sharing twin architecture unifies the feature space for reference and mixed speech, breaking the bottleneck of feature space dislocation in traditional dual-network structures.Contrastive learning is integrated into vocal extraction to build a 'separation is learning' joint optimisation framework, enhancing adaptability to unknown singers and short reference voices.Experiments on MUSDB18-HQ and NSynth-Singer show SNN-SpEx+ outperforms SpEx++ by 0.85 dB in SI-SDRi and 0.17 in PESQ.For short references (<2 s), its SI-SDRi drops only 3.16 dB (3.4 dB lower than SpEx++), providing an automatic standardised evaluation tool for music education and singer selection with broad prospects.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xiaochen Liang
International Journal of Information and Communication Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Xiaochen Liang (Thu,) studied this question.
www.synapsesocial.com/papers/69e07d1d2f7e8953b7cbe2ed — DOI: https://doi.org/10.1504/ijict.2026.152862