What type of study is this?

This is a Mixed-Methods study (also classified as: Validation Study).

What question did this study set out to answer?

Develop and validate a competency-based evaluation framework for medical trainees in the context of artificial intelligence.

February 14, 2026Open Access

Development and validation of a competency-based evaluation framework for clinical medical trainees in the era of artificial intelligence: a mixed-methods study in China

Key Points

Develop and validate a competency-based evaluation framework for medical trainees in the context of artificial intelligence.
Conducted a mixed-methods study utilizing a Delphi process with medical education experts.
Developed a competency assessment matrix derived from national and international standards.
Evaluated the instrument using a sample of 276 residents, postgraduate students, and clinical educators.
Achieved strong reliability with Cronbach’s α = 0.928 and high exploratory factor analysis variance explanation at 74.5%.
Radar plots indicated role-dependent differences in competency emphasis among faculty, residents, and postgraduates.
Consensus on 72 evaluation items across the domains of importance, feasibility, and clarity was reached.

Abstract

The integration of artificial intelligence (AI) into clinical practice is reshaping the competency requirements for medical trainees. Yet, validated evaluation instruments aligned with outcome-based education (OBE) frameworks remain scarce. We conducted a sequential mixed methods study to develop and preliminarily evaluate an OBE-based competency assessment matrix for clinical medical trainees in China. The framework was derived from national and international competency standards and refined through a three-round Delphi process with 16 medical education experts. Empirical evaluation involved 276 respondents including residents, postgraduate students, and clinical educators who completed the finalized 72-item instrument via a digital assessment platform. Reliability and exploratory structural characteristics were examined using Cronbach’s α, exploratory factor analysis (EFA), and inter-item correlation matrices. Subgroup differences were examined descriptively and visualized with radar plots. The Delphi panel reached consensus on 72 items across three domains—Importance, Feasibility, and Clarity—with progressive convergence (Kendall’s W ranging from 0.65 in Round 1 to 0.74 in Round 3). The resulting scale showed excellent internal consistency (Cronbach’s α = 0.928) and strong sampling adequacy (KMO = 0.884). Bartlett’s test of sphericity was highly significant (χ 2 = 421.35, df = 28, p < 0.001), confirming the suitability of the data for structural exploration. EFA of aggregated domain scores yielded a three-component pattern that cumulatively explained 74.5% of the variance. The resulting loading profile suggested meaningful contributions of Importance, Feasibility, and Clarity, offering exploratory support for the proposed domain-level structure. Radar plots revealed systematic but role-dependent differences: faculty emphasized Importance, residents prioritized Feasibility, and postgraduates rated Clarity slightly higher. This study provides a context-sensitive evaluation matrix with encouraging initial psychometric evidence, tailored to the evolving demands of AI-informed clinical education. The framework offers a promising platform for competency assessment and curriculum development in Chinese teaching hospitals and may serve as a reference model for other AI-integrating medical education systems, while highlighting the need for confirmatory factor analysis in independent samples to more definitively establish its dimensional structure.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Li et al. (Thu,) studied this question.

synapsesocial.com/papers/698fd276306598e8538deb06 https://doi.org/https://doi.org/10.1186/s12909-026-08779-7

Bookmark

View Full Paper