This paper proposes a model integrating YOLO V11 with the high-resolution network (HRNet) for real-time detection and error correction of pipa fingering actions.The object detection method YOLO V11 is employed to rapidly and accurately locate the left and right hands playing the pipa in video footage.The extracted positioning data is then cropped and fed into the pose estimation algorithm HRNet.By calculating whether the output finger positions deviate from standard angles and coordinates beyond specified thresholds, the model identifies errors such as finger bending or wrist collapse.Through training, the proposed model achieves an average accuracy of 85% at an intersection over union (IoU) threshold of 0.75 for the dual-hand playing detection task.For the hand pose estimation task, it attains an average accuracy of 88% at a target keypoint similarity threshold of 0.75.
Jingyi Xiong (Thu,) studied this question.