What question did this study set out to answer?

This work aims to evaluate the effectiveness of deep learning models in segmenting kidney tumors while highlighting discrepancies between accuracy and clinical usability.

April 17, 2026

A multi-paradigm evaluation spanning pixels to voxels for deep learning-based kidney tumor segmentation

Key Points

This work aims to evaluate the effectiveness of deep learning models in segmenting kidney tumors while highlighting discrepancies between accuracy and clinical usability.
Systematic evaluation of six diverse architectures from 2D CNNs to 3D models.
Utilization of the KiTS19 dataset for performance analysis.
Focus on false positive rates, boundary delineation accuracy, and computational demands.
MONAI U-Net achieved a Dice score of 0.98 but had excessive false positives.
nnU-Net displayed balanced performance with a Dice score of 0.82, requiring 16GB VRAM.
MedSAM achieved state-of-the-art accuracy with a Dice score of 0.99 but necessitated high-end GPUs.
Computational constraints hindered the full training of the UNETR model.

Abstract

Automated segmentation of kidney tumors from computed tomography (CT) scans is crit- ical for diagnosis, treatment planning, and monitoring of renal cell carcinoma (RCC). While recent deep learning models report high Dice scores (>0.97), their clinical utility remains questionable due to false positive predictions that misclassify healthy tissue as tumors and computational constraints limiting real-world deployment. Unlike existing studies that emphasise quantitative metrics, this work investigates the critical gap between high segmentation accuracy and clinical applicability. We systematically evaluate six diverse architectures spanning 2D CNNs (U-Net, MedSAM) to 3D volumetric models (nnU-Net, UNETR, Total Segmenta- tor, MIScnn) on the KiTS19 dataset, emphasising false positive analysis, boundary delineation accuracy, and computational feasibility. Key findings 1: MONAI U-Net achieves Dice score of 0.98 but exhibits excessive false positives, undermining clinical trust 2; nnU-Net provides balanced performance (Dice: 0.82) with consistent results but demands 16GB VRAM 3; MedSAM achieves state-of-the-art accuracy (Dice: 0.99) with minimal false positives but re- quires high-end GPUs 4; computational constraints prevented full training of UNETR. This study identifies that high Dice scores do not guarantee clinical utility and provides actionable insights for developing clinically feasible segmentation tools for renal oncology applications including treatment planning, longitudinal monitoring, and risk assessment.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Rahul Lalwani

Akshada Telang

Vibha Tiwari

Journals

Journal of Medical Engineering & Technology

Actions

Institutions

Artificial Intelligence in Medicine (Canada)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A multi-paradigm evaluation spanning pixels to voxels for deep learning-based kidney tumor segmentation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study