What question did this study set out to answer?

This research aims to quantify the interpretability of CNNs using Grad-CAM in COVID-19 chest X-ray classification.

May 3, 2026Open Access

Beyond Black-Box AI: A Quantitative Grad-CAM Analysis of Convolutional Neural Network Interpretability in COVID-19 Chest X-Ray Classification

Key Points

This research aims to quantify the interpretability of CNNs using Grad-CAM in COVID-19 chest X-ray classification.
Evaluated six CNN models: VGG16, VGG19, ResNet-101, NASNet-Mobile, NASNet-Large, and Xception.
Implemented an automated pipeline for generating objective metrics from Grad-CAM heatmaps and lung masks.
Classifications measured using accuracy, precision, recall, and F1-score, alongside IoU and Dice scores for model interpretability.
Xception achieved the highest accuracy at 95.90% with an F1-score of 95.92%.
VGG19 reached the highest precision of 98.89%.
Classification accuracies across models ranged from 90% to 96% with notable anatomical interpretability scores.

Abstract

Modern AI models use deep architectures that obscure how predictions are made. Without understanding how models reach their predictions, it becomes difficult to verify reasoning, identify biases, or trust their reliability in high-stakes domains like healthcare. Many COVID-19 chest X-ray (CXR) studies report high accuracy and present qualitative gradient-weighted class activation mapping (Grad-CAM) heatmaps, providing no quantitative evidence of alignment with lung anatomy and relying on manual, subjective inspection. We introduce an automated quantitative pipeline that converts interpretability into objective, anatomy grounded metrics between Grad-CAM heatmaps and lung masks. We evaluate six convolutional neural networks (CNNs): VGG16, VGG19, ResNet-101, NASNet-Mobile, NASNet-Large, and Xception, for both classification performance and anatomical interpretability in COVID-19 CXR detection. Classification accuracies ranged from 90% to 96%, with Xception achieving the highest accuracy (95.90%) and a balanced precision, recall, and F1-score of 95.92%. NASNet-Large and VGG19 followed at 94.87%, with VGG19 reaching the highest precision (98.89%). To assess model transparency, we automated interpretability analysis by thresholding the Grad-CAM outputs and comparing them to radiologist-annotated lung masks using Intersection-over-Union (IoU) and Dice score metrics.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Aiman Abd Saeed

Rasber Dhahir Rashid

Journals

SHILAP Revista de lepidopterología

Actions

Institutions

Salahaddin University-Erbil

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Beyond Black-Box AI: A Quantitative Grad-CAM Analysis of Convolutional Neural Network Interpretability in COVID-19 Chest X-Ray Classification

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study