Multi-label chest X-ray classification faces three critical challenges: (i) inadequate modeling of inter-pathology dependencies despite clinical co-occurrence patterns, (ii) severe class imbalance (11. 2−47. 6%) causing minority-class underperformance, and (iii) limited interpretability hindering clinical trust. Existing methods address these challenges independently; no current framework jointly models pathology dependencies, imbalance-aware training, and interpretable attention. We propose a Hierarchical Pathology-aware Vision Transformer (HP-ViT), which jointly addresses these limitations in a unified architecture by employing: Hierarchical Pathology-Aware Attention (HPAA) for explicit disease co-occurrence modeling through two-stage token refinement, Multi-Scale Feature Aggregation (MSFA) for detecting localized and diffuse abnormalities across four hierarchical scales, and Balanced Adaptive Focal Loss (BAFL) implementing curriculum-scheduled focal modulation that progressively transitions from class-balanced to difficulty-focused training. Evaluated on COVIDx, ChestX-ray14, and BIMCV-COVID19+ (N=36, 904 images), HP-ViT achieves macro-F1 of 0. 924, exact match ratio of 0. 842, and PPV of 0. 925, representing 1. 76%, 1. 32%, and 1. 5% improvements over state-of-the-art, with statistical significance (p<0. 001, McNemar’s test on per-sample exact-match correctness). HP-ViT requires only 12. 6 M parameters (85% reduction vs. ViT-B/16) with 29. 8 ms inference time, enabling real-time clinical deployment. Interpretability evaluation yields 83. 7% mean SSIM between attention maps and radiologist annotations, confirming pathology-aligned localization.
Building similarity graph...
Analyzing shared references across papers
Loading...
Muneeb A. Khan
Heemin Park
Khurelbaatar Zagarzusem
Sangmyung University
Mongolian University of Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Khan et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69f19ff5edf4b46824806af2 — DOI: https://doi.org/10.1007/s10791-026-10127-8