Administrative health data models predicted breast cancer risk, achieving a 58.9% concordance index with random survival forest and a 0.0068 Brier score with multitask logistic regression.
Can machine learning models using administrative health data predict individual time until breast cancer onset in women?
Machine learning models applied to population-level administrative health data show feasibility for personalized breast cancer risk prediction.
Absolute Event Rate: 0% vs 0%
Abstract Breast cancer screening programs are one-size-fits-most approaches with suboptimal participation rates. Population-level administrative health databases provide a unique opportunity to build scalable, data-driven risk assessment tools capable of identifying women who may benefit from more personalized screening strategies. We assembled nearly two decades of longitudinal health data, including mammographic screening history, medication use, physician visits, and hospital discharge abstracts, for 1.74 million women in British Columbia, among whom 39,211 incident breast cancers were diagnosed. Our team is developing new breast cancer risk assessment models to predict each woman’s individual time until Breast Cancer Onset (BCo) using administrative health data from Canada’s publicly funded healthcare system. We are applying machine learning Individual Survival Distribution (ISD) models, which identify each subject x with a distribution S (t | x), showing the probability that x’s time until BCo is at least t more years, for all t 0. We can then use these models to estimate each woman’s expected time until BCo, as well as her risk score. In preliminary models using 25 features with known/suspected links to breast cancer, random survival forest (RSF) achieved the highest concordance index (CI = 58.9%), while multitask logistic regression (MTLR) achieved a competitive 5-year Brier score (BS = 0.0068) and a low mean absolute error (MAE = 30.4 months). These early results demonstrate the feasibility of leveraging administrative health data for personalized breast cancer risk prediction. Ongoing work will substantially expand the feature sets to improve model discrimination. Citation Format: Fidela Mushashi, Shi-ang Qi, Parveen Bhatti, Andrew Roth, Russell Greiner, Rachel A. Murphy. Personalized risk assessment of breast cancer using administrative health data abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 7593.
Mushashi et al. (Fri,) reported a other. Administrative health data models predicted breast cancer risk, achieving a 58.9% concordance index with random survival forest and a 0.0068 Brier score with multitask logistic regression.