Routine medical data are highly valuable for secondary use, and data sharing is a prerequisite for pioneering research. Furthermore, since the advent of artificial intelligence and its application in various medical fields, such as decision-making and pharmacovigilance, the demand for real-world training data has steadily increased. However, the associated privacy risk, especially concerning reidentification, is extremely sensitive, and we are currently unaware of any standardised method to quantify it comprehensively. Assessing the reidentification risk of a data collection under examination requires the consideration and analysis of a complex system. To develop a holistic framework for stratifying this risk, an integrative approach is followed where the risk of deanonymisation is not considered mono-causally but includes various aspects. On the basis of a systematic literature review, factors and corresponding risks that are decisive in reidentification attacks are identified. These factors are grouped into overarching perspectives, and evaluation criteria are developed, facilitating the systematic grading of each risk factor by a data controller. Interactions between factors are visualised in entity‒relationship models (ERMs), and their direction and supposed magnitude are quantified in an influence matrix. Finally, on the basis of this matrix, a risk score and different indices are generated to evaluate the reidentification risk and facilitate possible countermeasures. The reidentification risk comprises four general perspectives regarding data, knowledge, potential attackers, and technical/organisational aspects. The ERMs represent a complex system of clear interconnections between the factors of the respective perspectives. The final calculation is performed in the influence matrix based on the assessment of the data controller. The derivable indices and visualisations provide indications of particularly risk-driving components of a dataset and thus for targeted safety measures, such as generalisation, suppression and randomisation approaches. Experiments to determine the functionality of the method via published and verified reidentification attacks confirm the plausibility and selectivity of risk stratification. A quantitative assessment of the reidentification risk of a medical dataset, including the identification of risk drivers, is necessary and feasible. The proposed prototype must be further evaluated and will serve as the basis for the development of a software application.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sebastian Behre
Dorothea Kesztyüs
Sarah Schnabel
BMC Medical Informatics and Decision Making
University of Göttingen
Klinikum Links der Weser
Building similarity graph...
Analyzing shared references across papers
Loading...
Behre et al. (Sat,) studied this question.
www.synapsesocial.com/papers/69ddd8eee195c95cdefd66c6 — DOI: https://doi.org/10.1186/s12911-026-03475-4