Introduction In response to the increasing demand for clinically interpretable and reliable diagnostic tools in medical informatics, this study introduces a novel computational framework for the early diagnosis and progression prediction of complex diseases, grounded in the integration of imaging omics and molecular omics. Aligning with the research scope of advanced data-driven solutions in computer science for healthcare applications, this work targets key challenges in multimodal biomedical data analysis, including feature heterogeneity, data imbalance, and diagnostic uncertainty. Traditional diagnostic models, often relying on single-modal data or shallow machine learning methods, struggle to capture non-linear dependencies across heterogeneous feature spaces and lack reliable uncertainty quantification for clinical decision-making, resulting in limited generalization and interpretability on real-world, imbalanced clinical datasets. Methods To address these limitations, we propose a transformer-based diagnostic encoder, CervixFormer, coupled with a Domain-Aware Calibration Strategy (DACS). CervixFormer leverages hierarchical attention mechanisms and cross-modality feature fusion to extract comprehensive diagnostic representations from high-dimensional imaging and omics data. The framework incorporates imbalance-aware embedding layers and stochastic uncertainty modeling to enhance robustness against noisy and unevenly distributed samples. Furthermore, DACS introduces domain-guided probabilistic recalibration by integrating clinical priors and uncertainty estimates, optimizing the alignment between predicted confidence and true diagnostic risk. Results and discussion Extensive experiments conducted on large-scale multimodal datasets demonstrate that the proposed framework significantly outperforms conventional machine learning and deep learning baselines in terms of diagnostic accuracy, robustness, and calibration reliability. The results indicate substantial improvements in handling data imbalance and uncertainty, while maintaining strong predictive performance across heterogeneous modalities. These findings highlight the effectiveness of combining transformer-based multimodal representation learning with domain-aware uncertainty calibration for clinical diagnostics. The proposed framework not only enhances predictive accuracy but also improves confidence reliability and interpretability, which are critical for real-world clinical decision support. Overall, this study underscores the potential of advanced multimodal learning architectures to advance data-driven healthcare applications and provides a promising direction for reliable and clinically applicable diagnostic systems.
Sun et al. (Tue,) studied this question.