70,075 12-lead ECGs from four Danish cohorts and the UK Biobank
Convolutional neural network (CNN) model trained using 5-fold cross-validation to identify the origin of each ECG
Accuracy and F1 score for identifying the origin of each ECG
Deep learning models can identify the origin cohort of an ECG with high accuracy, highlighting a potential source of confounding when pooling datasets for AI analysis.
• When pooling cohorts, bias can underlie seemingly good results. • Bias requires a difficult-to-predict outcome unevenly distributed between cohorts. • Strength of bias can be directly obtained to estimate unbiased results. Deep learning models can isolate device characteristics in medical images, enabling the models to identify the origin of a medical image, which creates a bias. It is unknown whether such bias can arise with raw medical signals such as electrocardiograms (ECGs), so we aimed to quantify to what extent deep learning models can identify the study cohort and site from which an ECG originates. We used 70,075 12-lead ECGs from four Danish cohorts and the UK Biobank. We trained a convolutional neural network (CNN) model using 5-fold cross-validation to identify the origin of each ECG. We also tested the effect of easy vs. difficult to predict outcomes (sex vs. diabetes) and equal vs. skewed outcome distributions. We reported accuracy and F 1 score, which is less sensitive to unequal sample sizes. The CNN model was able to distinguish ECGs from five different cohorts with an accuracy of 93.4% and an F 1 score of 77.1%. We found no bias with easily detected outcomes or equal distributions. We were unable to ascribe the CNN performance to any known factors, including ECG device, software, or population characteristics, including sex, age, and comorbidities. Deep learning models can identify the origin of an ECG. Combining studies can introduce confounding to the model if the rate of the outcome varies between studies and the outcome is difficult to identify. In deep learning models with potential cohort confounding, we recommend training a baseline model to separate these cohorts to assess the strength of confounding.
Building similarity graph...
Analyzing shared references across papers
Loading...
Marie Mørch-Pedersen
Malene Nørregaard
Claus Graff
Biomedical Signal Processing and Control
Harvard University
University of California, San Francisco
University of Copenhagen
Building similarity graph...
Analyzing shared references across papers
Loading...
Mørch-Pedersen et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69eefc6dfede9185760d36f2 — DOI: https://doi.org/10.1016/j.bspc.2026.110424