Semi-supervised learning (SSL) provides an effective means of reducing reliance on large-scale annotated datasets by leveraging unlabeled data. However, existing SSL methods often struggle with semantic ambiguity, especially under limited supervision. Recent studies have incorporated textual information to provide contextual guidance, yet most focus on feature fusion rather than emphasizing target semantics critical for segmentation. In this paper, we proposed a novel Text-anchored Visual Decoupling (TeViD) framework for semi-supervised medical image segmentation. TeViD is built upon a teacher-student architecture with a dual-decoder design that explicitly disentangles target and background representations using both labeled and unlabeled data. For unlabeled data, a reversed cross-supervision mechanism is introduced to enhance decoder diversity and semantic separation. Furthermore, two contrastive learning objectives are proposed: a teacher-guided visual contrastive loss and a text-anchored contrastive loss, both designed to reinforce semantic disentanglement from visual and textual perspectives. Extensive experiments on five public datasets (covering X-ray, pathology, ultrasound, MRI, and CT) demonstrate that TeViD consistently outperforms both standard SSL and text-enhanced SSL methods, achieving average improvements of 5.72% in Dice and 8.15% in mIoU over the second-best competitor. The code is available at: https://github.com/jgfiuuuu/TeViD.
Zeng et al. (Thu,) studied this question.