March 3, 2026

Extending BEHRT to UK Biobank: Assessing Transformer Model Performance in Clinical Prediction

Puntos clave

High predictive performance was observed for transformer models across diverse clinical scenarios, indicating their utility for clinical applications.
The larger BEHRT model achieved an AUROC of 0.874 for long-term predictions, outperforming the smaller model at 0.858.
Model performance showed sensitivity to data characteristics, including medical terminology and vocabulary size, impacting outcomes.
Results underscore the need for careful modelling decisions, especially for long-term prediction tasks involving clinical data.

Resumen

Transformer-based models have shown strong potential for clinical prediction using electronic health record data, yet their performance can vary depending on modelling decisions and data characteristics. In this study, we trained a BEHRT model on hospital-based UK Biobank data and evaluated its performance across four clinical prediction tasks, including next-visit diagnosis and longer-term diagnosis prediction up to five years. We exhaustively assessed the impact of model size, medical terminology (CALIBER vs ICD-10), and data split strategies. The large model consistently outperformed the smaller one in long-term prediction tasks (AUROC = 0.874 vs 0.858 at 5 years), while differences were marginal in 6-months prediction tasks. Performance was also sensitive to the vocabulary size, with CALIBER model yielding higher average precision scores (Average Precision Score = 0.773 vs 0.678 using ICD-10). Our results show that transformer models can achieve high predictive performance across diverse clinical scenarios, but outcomes vary considerably depending on modelling choices, particularly in long-term prediction tasks.

Me gusta

Guardar

Me gusta

Guardar

Extending BEHRT to UK Biobank: Assessing Transformer Model Performance in Clinical Prediction

Puntos clave

Resumen

Cite This Study