Key points are not available for this paper at this time.
Abstract Surgical video workflow analysis has made intensive development in computer-assisted surgery by combining deep learning models, aiming to enhance surgical scene analysis and decision-making. However, previous research has primarily focused on coarse-grained analysis of surgical videos, e.g., phase recognition, instrument recognition, and triplet recognition that only considers relationships within surgical triplets. In order to provide a more comprehensive fine-grained analysis of surgical videos, this work focuses on accurately identifying triplets from surgical videos. Specifically, we propose a vision-language deep learning framework that incorporates intra- and inter- triplet modeling, termed I 2 TM, to explore the relationships among triplets and leverage the model understanding of the entire surgical process, thereby enhancing the accuracy and robustness of recognition. Besides, we also develop a new surgical triplet semantic enhancer (TSE) to establish semantic relationships, both intra- and inter-triplets, across visual and textual modalities. Extensive experimental results on surgical video benchmark datasets demonstrate that our approach can capture finer semantics, achieve effective surgical video understanding and analysis, with potential for widespread medical applications.
Building similarity graph...
Analyzing shared references across papers
Loading...
Li et al. (Sat,) studied this question.
www.synapsesocial.com/papers/69dbe1d6eb8801008ea3c196 — DOI: https://doi.org/10.1038/s44401-024-00010-3
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Pengpeng Li
Xiangbo Shu
Chun-Mei Feng
Agency for Science, Technology and Research
Harbin Institute of Technology
Nanjing Medical University
Building similarity graph...
Analyzing shared references across papers
Loading...