To realize the accurate and automatic analysis of the core feature of ethnic music, “slight variation of melody”, and to overcome the dependence of existing calculation models on massive labeled data, this study proposes and verifies a new analysis model that integrates the self-supervised, Pitch Estimation with Self-supervised Transposition-equivariant Objective (PESTO) algorithm. It proposes a new analysis framework named PESTO-Unsupervised Clustering (PESTO-UC), which deeply integrates the advanced self-supervised pitch estimation algorithm PESTO and unsupervised clustering technology. Firstly, PESTO is used to directly extract high-precision Fundamental Frequency (F0) curves from the original audio, thus avoiding the dependence on massive labeled data. K-Means clustering algorithm is used to analyze the extracted pitch data to automatically discover the internal scale structure of music. The experimental evaluation results on Indian classical music datasets show that the model has excellent performance, with the Raw Pitch Error (RPE) as low as 7.81, and the Normalized Mutual Information (NMI) score as high as 0.941 in the scale discovery task. Therefore, the proposed model algorithm provides an efficient, objective and widely applicable end-to-end solution, and provides an extensible calculation idea for the digital protection and in-depth theoretical research of ethnic music heritage with scarce resources.
Rong Li (Wed,) studied this question.