Abstract Background Crohn’s disease (CD) has a non-negligible prevalence globally especially in the developed countries. Due to its large influenced population over world and great medical burden after disease progression, numerous researches focus on the risk factors and prediction models of CD. However, CD is a chronic and dynamically progressive disease, previous models were mostly convention models utilized only baseline data and ignored medical events during disease course, causing a low prediction efficiency and lack of real time prediction capacity. Here we developed a Transformer-based prediction framework: Time-Aware CD Progression Prediction Model (TACDPPM) that utilized patients’ all longitudinal electronic health records (EHRs) to predict future CD related events including disease behavior progression, surgery and medical usage and compare the prediction efficiency with Long Short Term Memory (LSTM) model, Gated Recurrent Unit (GRU) model, Logistic regression model and Time-varying Cox regression model. Methods Based on transformer structure we developed TACDPPM with an input-aware module, an expert knowledge module and a multi-scale time-aware module. As for datasets we collected 66 static and dynamic variables from 761 CD patients from Peking Union Medical College Hospital as internal training/validation cohort, 74 CD patients from Guizhou Provincial People’s Hospital and Zunyi Medical University Affiliated Hospital as external validation 1 and 170 CD patients from Nanjing Jing Hospital as external validation 2. TACDPPM forecasts the risk of three types of events mentioned above in 1 year, 3 years, and 5 years after supposed starting time points. Also, SHAP of each variable was calculated to show the association with outcome. Results TACDPPM showed excellent results in internal validation with 0.910-0.979 AUROC (0.835-0.955 for LSTM; 0.821-0.955 for GRU) in predicting 1-, 3- and 5-years disease behavior progression; 0.811 Macro-AUROC in predicting disease behavior progression; 0.729-0.823 AUROC (0.698-0.715 for LSTM; 0.704-0.724 for GRU) in predicting 1-, 3- and 5-years surgery; 0.811 Macro-AUROC in predicting disease behavior progression; 0.813-0.930 AUROC (0.738-0.884 for LSTM; 0.748-0.876 for GRU) in predicting 1-, 3- and 5-years glucocorticoid usage; 0.796-0.901 AUROC (0.735-0.855 for LSTM; 0.738-0.854 for GRU) in predicting 1-, 3- and 5-years immunosuppressant usage; 0.939-0.943 AUROC (0.844-0.873 for LSTM; 0.831-0.874 for GRU) in predicting 1-, 3- and 5-years biologics usage. Conclusion Our TACDPPM showed much better predicting results in CD related medical events than LSTM, GRU and conventional results. But heterogeneity of CD patients, lack of EHR data and different therapy habits may decrease the efficiency of TACDPPM. Conflict of interest: Dr. Wang, Beiming: No conflict of interest Yang, Yingliang: No conflict of interest Liu, Honglei: No conflict of interest Yang, Hong: No conflict of interest Bai, Xiaoyin: No conflict of interest Xu, Hui: No conflict of interest Ruan, Gechong: No conflict of interest Xu, Zhiwei: No conflict of interest Cui, Dejun: No conflict of interest Yan, Fang: No conflict of interest
Wang et al. (Thu,) studied this question.