Variation in medical practices and reporting standards across healthcare systems limits the transferability of prediction models based on structured electronic health record data. Prior studies have demonstrated that embedding medical codes into a shared semantic space can help address these discrepancies, but real-world applications remain limited. Here, we show that leveraging embeddings from a large language model alongside a transformer-based prediction model provides an effective and scalable solution to enhance generalizability. We call this approach GRASP and apply it to predict the onset of 21 diseases and all-cause mortality in over one million individuals. Trained on the UK Biobank (UK) and evaluated in FinnGen (Finland) and Mount Sinai (USA), GRASP achieved an average ΔC-index that was 88% and 47% higher than language-unaware models, respectively. GRASP also showed significantly higher correlations with polygenic risk scores for 62% of diseases, and maintained robust performance even when datasets were not harmonized to the same data model.
Building similarity graph...
Analyzing shared references across papers
Loading...
Kirchler et al. (Thu,) studied this question.
www.synapsesocial.com/papers/6975b2c8feba4585c2d6e4ff — DOI: https://doi.org/10.1038/s41746-026-02363-5
Matthias Kirchler
Matteo Ferro
Veronica Lorenzini
Massachusetts General Hospital
Icahn School of Medicine at Mount Sinai
University of Helsinki
Building similarity graph...
Analyzing shared references across papers
Loading...