What question did this study set out to answer?

To develop a comprehensive framework for quantifying and stratifying the reidentification risk of medical data.

April 14, 2026Open Access

Proposal of a procedure to stratify the reidentification risk of medical data: RIMEDA

Key Points

To develop a comprehensive framework for quantifying and stratifying the reidentification risk of medical data.
Conducted a systematic literature review to identify critical risk factors.
Developed evaluation criteria for grading risk factors by data controllers.
Utilized entity‒relationship models to visualize factor interactions.
Created an influence matrix to quantify risk directions and magnitudes.
Tested the framework using published reidentification attacks.
Identified key factors influencing reidentification risk across multiple perspectives.
Generated a risk score and indices to evaluate data privacy risks.
Demonstrated functionality of the method through experimental validation.
Confirmed feasibility of quantitative assessment of reidentification risks in medical datasets.
Proposed further development of a software application based on the prototype.

Abstract

Routine medical data are highly valuable for secondary use, and data sharing is a prerequisite for pioneering research. Furthermore, since the advent of artificial intelligence and its application in various medical fields, such as decision-making and pharmacovigilance, the demand for real-world training data has steadily increased. However, the associated privacy risk, especially concerning reidentification, is extremely sensitive, and we are currently unaware of any standardised method to quantify it comprehensively. Assessing the reidentification risk of a data collection under examination requires the consideration and analysis of a complex system. To develop a holistic framework for stratifying this risk, an integrative approach is followed where the risk of deanonymisation is not considered mono-causally but includes various aspects. On the basis of a systematic literature review, factors and corresponding risks that are decisive in reidentification attacks are identified. These factors are grouped into overarching perspectives, and evaluation criteria are developed, facilitating the systematic grading of each risk factor by a data controller. Interactions between factors are visualised in entity‒relationship models (ERMs), and their direction and supposed magnitude are quantified in an influence matrix. Finally, on the basis of this matrix, a risk score and different indices are generated to evaluate the reidentification risk and facilitate possible countermeasures. The reidentification risk comprises four general perspectives regarding data, knowledge, potential attackers, and technical/organisational aspects. The ERMs represent a complex system of clear interconnections between the factors of the respective perspectives. The final calculation is performed in the influence matrix based on the assessment of the data controller. The derivable indices and visualisations provide indications of particularly risk-driving components of a dataset and thus for targeted safety measures, such as generalisation, suppression and randomisation approaches. Experiments to determine the functionality of the method via published and verified reidentification attacks confirm the plausibility and selectivity of risk stratification. A quantitative assessment of the reidentification risk of a medical dataset, including the identification of risk drivers, is necessary and feasible. The proposed prototype must be further evaluated and will serve as the basis for the development of a software application.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Sebastian Behre

Dorothea Kesztyüs

Sarah Schnabel

Journals

BMC Medical Informatics and Decision Making

Actions

Institutions

University of Göttingen

Klinikum Links der Weser

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Proposal of a procedure to stratify the reidentification risk of medical data: RIMEDA

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study