What question did this study set out to answer?

This study aims to evaluate the ability of ChatGPT to detect intracranial hemorrhage in non-contrast CT scans and compare its performance to that of radiologists.

April 17, 2026Open Access

Diagnostic Role of ChatGPT in the Detection of Intracranial Hemorrhage on Non-contrast Computed Tomography: A Retrospective Case-Control Study

Key Points

This study aims to evaluate the ability of ChatGPT to detect intracranial hemorrhage in non-contrast CT scans and compare its performance to that of radiologists.
Retrospective case-control design analyzing 276 CT brain scans from December 2025 to February 2026
Consisted of 138 confirmed cases of intracranial hemorrhage and 138 control cases without hemorrhage
Evaluated CT images with ChatGPT using a structured prompt and compared it to radiologist reports
Measured diagnostic performance through sensitivity, specificity, accuracy, PPV, and NPV
Applied chi-squared tests and Cohen's kappa coefficient for statistical analyses
ChatGPT identified 124 of 138 hemorrhage-positive cases, achieving 89.9% sensitivity
Achieved 84.8% specificity by identifying 117 of 138 hemorrhage-negative cases
Overall diagnostic accuracy was 87.3%
Highest sensitivity observed in detecting intraparenchymal hemorrhage (88.2%)
Good agreement with radiologist interpretations (κ = 0.75) with no significant difference in diagnoses (p = 0.31)

Abstract

Background Intracranial hemorrhage is a potentially fatal neurological emergency. It requires rapid diagnosis to guide the management plan. Non-contrast computed tomography (NCCT) is the primary imaging method for detecting acute intracranial bleeding due to its speed and accessibility. Recent advances in artificial intelligence (AI), including large language models like ChatGPT (OpenAI, San Francisco, CA, USA), offer new opportunities to support radiological interpretation. Objective This study evaluated ChatGPT's ability to detect intracranial hemorrhages on NCCT brain images and compared its diagnostic performance with that of radiologists. Methods A retrospective case-control study analyzed 276 computed tomography (CT) brain scans obtained from December 2025 to February 2026. The dataset comprised 138 cases with confirmed intracranial hemorrhage and 138 control cases without hemorrhage. CT images were evaluated using ChatGPT with a structured prompt. Radiologist reports served as the reference standard. Diagnostic performance was measured by sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV). Statistical analyses included chi-squared tests and Cohen's kappa coefficient. Results ChatGPT identified 124 of 138 hemorrhage-positive cases and 117 of 138 hemorrhage-negative cases, resulting in 89.9% sensitivity, 84.8% specificity, and 87.3% diagnostic accuracy. Subtype analysis revealed the highest sensitivity for intraparenchymal hemorrhage (88.2%), followed by subarachnoid (73.8%), epidural (66.7%), and subdural (61.5%) hemorrhages, respectively. A statistically significant association was found between ChatGPT predictions and radiologist diagnoses (χ² = 154.1; p < 0.001). The agreement between ChatGPT and radiologist interpretations was good (κ = 0.75). McNemar's test showed no statistically significant difference between ChatGPT and radiologist diagnoses (p = 0.31). Conclusion ChatGPT exhibited promising sensitivity in detecting intracranial hemorrhage on NCCT brain scans. However, its moderate specificity suggests it should serve as an adjunct to, rather than a substitute for, radiologist interpretation. Additional research involving larger datasets and model optimization is necessary prior to clinical implementation.

Bookmark

View Full Paper

Bookmark

View Full Paper

Diagnostic Role of ChatGPT in the Detection of Intracranial Hemorrhage on Non-contrast Computed Tomography: A Retrospective Case-Control Study

Key Points

Abstract

Cite This Study