What question did this study set out to answer?

To reduce reliance on labeled training data for 3D hand pose estimation by using a semi-supervised approach.

February 8, 2026

Advancing depth-based semi-supervised three-dimensional hand pose estimation with consistency training

Key Points

To reduce reliance on labeled training data for 3D hand pose estimation by using a semi-supervised approach.
Proposes a semi-supervised Deep Learning method with a teacher and student network.
The teacher network utilizes both labeled and unlabeled data to improve predictions.
The student network employs pseudo-labels generated by the teacher for training.
Achieves accuracy comparable to fully supervised methods using only 25% of labeled data.
Reduces annotation needs significantly, with effective performance even at just 1% labeled data.
Shows mean distance error improvements on benchmark datasets ICVL and NYU.

Abstract

Despite the significant progress that depth-based three-dimensional (3D) hand pose estimation methods have made in recent years, thanks to advancements in Deep Learning, they still require a large amount of labeled training data to achieve high accuracy. However, collecting such data is both costly and time-consuming. To tackle this issue, we propose a semi-supervised Deep Learning method to significantly reduce the dependence on labeled training data. The proposed method consists of two identical networks trained jointly: a teacher network and a student network. The teacher network is trained using both the available labeled and unlabeled samples. It leverages the unlabeled samples via a loss formulation that encourages estimation equivariance under a set of affine transformations. The student network is trained using the unlabeled samples with their pseudo-labels provided by the teacher network. For inference at test time, only the student network is used. Extensive experiments on challenging benchmarks (ICVL, NYU, MSRA) demonstrate the proposed method’s effectiveness. Notably, our approach significantly outperforms state-of-the-art semi-supervised methods across all datasets. Crucially, using only 25% of the available labeled data, our method achieves accuracy comparable to, and sometimes exceeding, fully supervised state-of-the-art methods trained on 100% of the labels. Even with just 1% of labels, our method surpasses prior semi-supervised techniques, achieving a mean distance error of 6.94mm and 8.71mm on ICVL and NYU, respectively. These results signify a substantial reduction in annotation requirements, making high-accuracy 3D hand pose estimation more practical and accessible.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Rezaei et al. (Fri,) studied this question.

www.synapsesocial.com/papers/6987eb5df6bacdd2fe8fca6c — DOI: https://doi.org/10.1007/s11042-026-21305-7

Also consider

Synapse has enriched 4 closely related papers on similar clinical questions. Consider them for comparative context:

TriHorn-Net: A model for accurate depth-based 3D hand pose estimation· 2023 · 47 citations
Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation· 2020 · 121 citations
SO-HandNet: Self-Organizing Network for 3D Hand Pose Estimation With Semi-Supervised Learning· 2019 · 94 citations
LSMVC:Low-rank Semi-supervised Multi-view Clustering for Special Equipment Safety Warning· 2021 · 9 citations

Social Feed

Authors

Mohammad Rezaei

Farnaz Farahanipad

Alex Dillhoff

Journals

Multimedia Tools and Applications

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Advancing depth-based semi-supervised three-dimensional hand pose estimation with consistency training

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Social Feed

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion