What question did this study set out to answer?

The study investigates the impact of missing eligibility criteria on causal effect estimation in electronic health records for target trials.

April 10, 2026Open Access

Missingness in Eligibility Criteria for Target Trial Emulation in EHR With Survival Outcomes

Key Points

The study investigates the impact of missing eligibility criteria on causal effect estimation in electronic health records for target trials.
Emulated a target trial using EHR data for advanced breast cancer treatments.
Utilized multiple imputation strategies to address missingness in eligibility criteria.
Compared bias in outcomes from alternative imputation methods before and after excluding ineligible individuals.
Found lower bias when imputing missing data before excluding ineligible individuals compared to complete case analysis.
Demonstrated that flexible models like random forests outperformed traditional methods under high missingness conditions.

Abstract

In certain settings, when conducting a randomized trial would be infeasible, electronic health records (EHR) can be used to emulate a target trial and estimate causal effects of an intervention. This process involves specifying the elements of a hypothetical trial protocol and applying these to the design of an observational study conducted with EHR data (or other observational data source). One element of target trial specification includes defining eligibility criteria. However, defining the eligible population with EHR can be complicated by missingness in eligibility-defining variables. Multiple imputation (MI) is one common approach to missingness in EHR data, but it is unclear whether imputation of eligibility criteria should occur before or after excluding ineligible individuals. Motivated by a target trial emulation of two treatments for advanced breast cancer, we explore this question when estimating the average causal effect under a target trial framework with survival outcomes. We illustrate how alternative MI strategies perform using simulated data and in a real-world analysis of oncology EHR data. We found that in most settings with high proportions of missingness in eligibility-defining variables, imputing missing data using a flexible imputation model, such as a random forest, prior to excluding ineligible individuals resulted in lower bias than complete case analysis or imputation after excluding ineligible individuals. Choices about how to handle practical challenges such as this in the application of target trial emulation to messy, real-world data sources can have substantial effects on causal parameter estimation and should be carefully considered to ensure that the results of observational studies are as rigorous as possible.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jenny I. Shen

Kristin A. Linn

Amy S. Clark

Journals

Statistics in Medicine

Actions

Institutions

University of Pennsylvania

Brown University

Hospital of the University of Pennsylvania

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Missingness in Eligibility Criteria for Target Trial Emulation in EHR With Survival Outcomes

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study