Abstract Synthetic data generation is increasingly proposed as an alternative to classical anonymization for sharing health data. We compared concrete applications of both approaches on a small, high-dimensional health claims dataset, assessing their impact on fidelity, reproducibility of study outcomes, and privacy risks. To reflect different sharing contexts, we considered a context-independent, higher-risk scenario with no assumptions about potential attacks, and a context-dependent, lower-risk scenario informed by threat modeling. Analyses on anonymized and synthetic data yielded results similar to those from the original study data, but came at the cost of higher uncertainty when estimating hazard ratios. As expected, higher data utility and fidelity were related to higher privacy risks. Our findings provide a reusable workflow and comparative insights into anonymization and synthetization and show that both methods are valuable means to lower privacy risks in data sharing scenarios but verifying results on the original data should be done whenever possible.
Building similarity graph...
Analyzing shared references across papers
Loading...
Mehmed Halilovic
Thierry Meurers
Marco Alibone
npj Digital Medicine
Berlin Institute of Health at Charité - Universitätsmedizin Berlin
Federal Institute for Drugs and Medical Devices
IGES Institut
Building similarity graph...
Analyzing shared references across papers
Loading...
Halilovic et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69df2b65e4eeef8a2a6b0650 — DOI: https://doi.org/10.1038/s41746-026-02622-5