What question did this study set out to answer?

This research aims to quantify how deeply learning models identify the origin of electrocardiograms (ECGs) from pooled cohorts.

April 27, 2026Open Access

Pooling cohorts for deep learning analysis: a potential source of bias for electrocardiogram analysis

Key Points

This research aims to quantify how deeply learning models identify the origin of electrocardiograms (ECGs) from pooled cohorts.
Analyzed 70,075 12-lead ECGs from four Danish cohorts and the UK Biobank.

Structured PICO

Population

70,075 12-lead ECGs from four Danish cohorts and the UK Biobank

Intervention

Convolutional neural network (CNN) model trained using 5-fold cross-validation to identify the origin of each ECG

Outcome

Accuracy and F1 score for identifying the origin of each ECG

Deep learning models can identify the origin cohort of an ECG with high accuracy, highlighting a potential source of confounding when pooling datasets for AI analysis.

Abstract

• When pooling cohorts, bias can underlie seemingly good results. • Bias requires a difficult-to-predict outcome unevenly distributed between cohorts. • Strength of bias can be directly obtained to estimate unbiased results. Deep learning models can isolate device characteristics in medical images, enabling the models to identify the origin of a medical image, which creates a bias. It is unknown whether such bias can arise with raw medical signals such as electrocardiograms (ECGs), so we aimed to quantify to what extent deep learning models can identify the study cohort and site from which an ECG originates. We used 70,075 12-lead ECGs from four Danish cohorts and the UK Biobank. We trained a convolutional neural network (CNN) model using 5-fold cross-validation to identify the origin of each ECG. We also tested the effect of easy vs. difficult to predict outcomes (sex vs. diabetes) and equal vs. skewed outcome distributions. We reported accuracy and F 1 score, which is less sensitive to unequal sample sizes. The CNN model was able to distinguish ECGs from five different cohorts with an accuracy of 93.4% and an F 1 score of 77.1%. We found no bias with easily detected outcomes or equal distributions. We were unable to ascribe the CNN performance to any known factors, including ECG device, software, or population characteristics, including sex, age, and comorbidities. Deep learning models can identify the origin of an ECG. Combining studies can introduce confounding to the model if the rate of the outcome varies between studies and the outcome is difficult to identify. In deep learning models with potential cohort confounding, we recommend training a baseline model to separate these cohorts to assess the strength of confounding.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Marie Mørch-Pedersen

Malene Nørregaard

Claus Graff

Journals

Biomedical Signal Processing and Control

Actions

Institutions

Harvard University

University of California, San Francisco

University of Copenhagen

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Pooling cohorts for deep learning analysis: a potential source of bias for electrocardiogram analysis

Key Points

Structured PICO

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study