Los puntos clave no están disponibles para este artículo en este momento.
Malignant melanoma (MM) is the most aggressive form of skin cancer, for which early detection is critical and strongly associated with improved survival outcomes. Recent advances in large language models (LLMs), such as ChatGPT and Gemini, present promising opportunities to support melanoma early screening and clinical decision-making. However, despite increasing interest in LLM-based dermatologic applications, their diagnostic reliability across different populations remains insufficiently characterized. In this study, we systematically evaluated the performance of GPT-5.2 across skin pigmentation groups using Milk10K, a clinically curated, publicly available dermatology dataset comprising paired dermoscopic and clinical close-up images with histopathology-confirmed diagnoses and standardized skin tone annotations. GPT-5.2 was assessed on two clinically relevant tasks: binary malignancy discrimination and top-3 differential diagnosis. A balanced subset of 460 lesions (92 per skin tone class) was randomly selected for evaluation. Across both tasks and imaging conditions, GPT-5.2 showed moderate diagnostic performance, with broadly consistent accuracy, F1 score, and Cohen’s κ across skin tone groups, without evidence of systematic performance decline in darker skin tones. The incorporation of clinical close-up images provided modest improvements in overall performance while maintaining similar behavior across pigmentation classes. These findings suggest that GPT-5.2 exhibits stable melanoma-related diagnostic performance across diverse skin tones on this dataset. The study’s limitations and implications for future development are also discussed.
Building similarity graph...
Analyzing shared references across papers
Loading...
Katie L. Frederickson
Samuel E. Adunyah
Qingguo Wang
Frontiers in Medicine
Meharry Medical College
Building similarity graph...
Analyzing shared references across papers
Loading...
Frederickson et al. (Fri,) studied this question.
synapsesocial.com/papers/6a160b7fc4bcdd6cffc5bb9b — DOI: https://doi.org/10.3389/fmed.2026.1816102