Recent developments in computer vision have significantly enhanced the automation and objectivity of sports analytics. This paper proposes a novel deep learning-based framework for estimating football possession directly from broadcast video, eliminating the reliance on manual annotations or event-based data that are often labor-intensive, subjective, and temporally coarse. The framework incorporates two structurally improved object detection models: YOLOv8-P2S3A for football detection and YOLOv8-HWD3A for player detection. These models demonstrate superior accuracy compared to baseline detectors, achieving 79.4% and 71.1% validation average precision, respectively, while maintaining low computational latency. Team identification is accomplished through unsupervised DBSCAN clustering on jersey color features, enabling robust and label-free team assignment across diverse match scenarios. Object trajectories are maintained via the Norfair multi-object tracking algorithm, and a temporally aware refinement module ensures accurate estimation of ball possession durations. Extensive experiments were conducted on a dataset comprising 20 full-match Video clips. The proposed system achieved a root mean square error (RMSE) of 4.87 in possession estimation, outperforming all evaluated baselines, including YOLOv10n (RMSE: 5.12) and YOLOv11 (RMSE: 5.17), with a substantial improvement over YOLOv6n (RMSE: 12.73). These results substantiate the effectiveness of the proposed framework in enhancing the precision, efficiency, and automation of football analytics, offering practical value for coaches, analysts, and sports scientists in professional settings.
Building similarity graph...
Analyzing shared references across papers
Loading...
Rong Guo
Yucheng Zeng
Rong Deng
Building similarity graph...
Analyzing shared references across papers
Loading...
Guo et al. (Sat,) studied this question.
www.synapsesocial.com/papers/699405bb4e9c9e835dfd691b — DOI: https://doi.org/10.3390/s26041252