Automated segmentation of kidney tumors from computed tomography (CT) scans is crit- ical for diagnosis, treatment planning, and monitoring of renal cell carcinoma (RCC). While recent deep learning models report high Dice scores (>0.97), their clinical utility remains questionable due to false positive predictions that misclassify healthy tissue as tumors and computational constraints limiting real-world deployment. Unlike existing studies that emphasise quantitative metrics, this work investigates the critical gap between high segmentation accuracy and clinical applicability. We systematically evaluate six diverse architectures spanning 2D CNNs (U-Net, MedSAM) to 3D volumetric models (nnU-Net, UNETR, Total Segmenta- tor, MIScnn) on the KiTS19 dataset, emphasising false positive analysis, boundary delineation accuracy, and computational feasibility. Key findings 1: MONAI U-Net achieves Dice score of 0.98 but exhibits excessive false positives, undermining clinical trust 2; nnU-Net provides balanced performance (Dice: 0.82) with consistent results but demands 16GB VRAM 3; MedSAM achieves state-of-the-art accuracy (Dice: 0.99) with minimal false positives but re- quires high-end GPUs 4; computational constraints prevented full training of UNETR. This study identifies that high Dice scores do not guarantee clinical utility and provides actionable insights for developing clinically feasible segmentation tools for renal oncology applications including treatment planning, longitudinal monitoring, and risk assessment.
Building similarity graph...
Analyzing shared references across papers
Loading...
Rahul Lalwani
Akshada Telang
Vibha Tiwari
Journal of Medical Engineering & Technology
Artificial Intelligence in Medicine (Canada)
Building similarity graph...
Analyzing shared references across papers
Loading...
Lalwani et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69e1cecc5cdc762e9d857c1e — DOI: https://doi.org/10.1080/03091902.2026.2651078