Background Intracranial hemorrhage is a potentially fatal neurological emergency. It requires rapid diagnosis to guide the management plan. Non-contrast computed tomography (NCCT) is the primary imaging method for detecting acute intracranial bleeding due to its speed and accessibility. Recent advances in artificial intelligence (AI), including large language models like ChatGPT (OpenAI, San Francisco, CA, USA), offer new opportunities to support radiological interpretation. Objective This study evaluated ChatGPT's ability to detect intracranial hemorrhages on NCCT brain images and compared its diagnostic performance with that of radiologists. Methods A retrospective case-control study analyzed 276 computed tomography (CT) brain scans obtained from December 2025 to February 2026. The dataset comprised 138 cases with confirmed intracranial hemorrhage and 138 control cases without hemorrhage. CT images were evaluated using ChatGPT with a structured prompt. Radiologist reports served as the reference standard. Diagnostic performance was measured by sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV). Statistical analyses included chi-squared tests and Cohen's kappa coefficient. Results ChatGPT identified 124 of 138 hemorrhage-positive cases and 117 of 138 hemorrhage-negative cases, resulting in 89.9% sensitivity, 84.8% specificity, and 87.3% diagnostic accuracy. Subtype analysis revealed the highest sensitivity for intraparenchymal hemorrhage (88.2%), followed by subarachnoid (73.8%), epidural (66.7%), and subdural (61.5%) hemorrhages, respectively. A statistically significant association was found between ChatGPT predictions and radiologist diagnoses (χ² = 154.1; p < 0.001). The agreement between ChatGPT and radiologist interpretations was good (κ = 0.75). McNemar's test showed no statistically significant difference between ChatGPT and radiologist diagnoses (p = 0.31). Conclusion ChatGPT exhibited promising sensitivity in detecting intracranial hemorrhage on NCCT brain scans. However, its moderate specificity suggests it should serve as an adjunct to, rather than a substitute for, radiologist interpretation. Additional research involving larger datasets and model optimization is necessary prior to clinical implementation.
Shankar et al. (Tue,) studied this question.