Abstract Prostate cancer is the most common non-cutaneous malignancy and the second leading cause of cancer-related death among men in the United States. Its clinical and biological heterogeneity poses major challenges for accurate outcome prediction and treatment planning. Traditional single-modality models often fail to accurately capture the risk of disease progression. Recent advances in artificial intelligence, particularly foundation models, enable the integration of heterogeneous data into unified representations of tumor biology, which can in turn uncover better predictive and prognostic features than what is possible with unimodal modalities.We develop a multimodal model that integrates histopathology whole-slide images (WSIs), transcriptomics, and clinical data to predict prostate cancer molecular and clinical phenotypes, enhancing both predictive accuracy and interpretability. The framework employs separate encoders for each modality: an image encoder adapted from foundation models CONCHv1.5 and TITAN, a text encoder based on ClinicalBERT for clinical data, and a transcriptomic encoder combining Gene2Vec embeddings with expression-level features projected into a shared latent space. The encoders are contrastively fine-tuned using pairwise alignment, with WSIs serving as the anchor modality based on paired data availability.The multimodal representations are evaluated on several downstream clinically significant prediction tasks, including Gleason grading, metastasis prediction, biochemical recurrence, and molecular alteration status such as TMPRSS2:ERG fusion and PTEN loss. The framework also provides interpretability by quantifying the contribution of each modality and visualizing attention distributions, highlighting joint feature representations that are important to each task.This study uses data from 632 patients with paired data from public and institutional repositories. Inter-modality fine-tuning between WSI and clinical text data yielded promising alignment results, with paired image-text samples achieving significantly higher similarity ranks than unpaired samples (mean p-value ≈ 10-29). Gleason grade multiclass classification downstream task achieved an overall test accuracy of 81.4% and a weighted F1 score of 79.6%. Citation Format: Priyanka Vasanthakumari, Mohamed Omar. Multimodal deep learning framework for predicting clinically significant phenotypes in prostate cancer abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 4184.
Building similarity graph...
Analyzing shared references across papers
Loading...
Priyanka Vasanthakumari
Mohamed Omar
Cancer Research
Cedars-Sinai Medical Center
Building similarity graph...
Analyzing shared references across papers
Loading...
Vasanthakumari et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69d1fceba79560c99a0a2b27 — DOI: https://doi.org/10.1158/1538-7445.am2026-4184
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: