What question did this study set out to answer?

The aim is to validate a generative AI framework designed for personalized education assessment in programming courses.

March 5, 2026Open Access

Empirical validation of a generative AI framework for personalized education assessment

Key Points

The aim is to validate a generative AI framework designed for personalized education assessment in programming courses.
Developed a five-layer hierarchical architecture for assessment frameworks.
Utilized ChatGLM3-6B fine-tuned on expert feedback for generating assessments.
Conducted empirical validation with 449 undergraduate students in Python courses.
Performed ablation experiments to determine the effect of knowledge graph integration on accuracy.
Achieved assessment accuracy correlating at 0.847 with expert consensus.
Reduced generation time by over 99% compared to manual evaluation.
Significantly higher learning gains observed (Cohen’s d = 0.56), especially in lower-performing students.
Enhanced learner engagement and satisfaction relative to conventional assessment approaches.

Abstract

The tension between personalized learning demands and standardized evaluation mechanisms presents a persistent challenge in contemporary education. This study proposes a comprehensive personalized education assessment framework driven by generative artificial intelligence technologies. The framework adopts a five-layer hierarchical architecture integrating data collection, processing, intelligent analysis, assessment generation, and feedback optimization components. ChatGLM3-6B, fine-tuned on 50,000 expert-curated programming feedback instances assembled through a human-in-the-loop process combining authentic instructor records, newly authored examples, and AI-assisted human-verified content, enables contextually responsive feedback generation, while dynamic learner profiling and knowledge graph modeling support precise diagnostic assessment. Empirical validation involving 449 undergraduate students in introductory Python programming courses demonstrated that the framework achieved assessment accuracy correlating at 0.847 with expert consensus (Fleiss’ κ = 0.74 for inter-rater reliability) while reducing generation time by over 99% compared to manual evaluation. Ablation experiments confirmed that knowledge graph integration contributed most substantially to accuracy improvements, with removal of this component reducing correlation by 0.055. Experimental participants exhibited significantly higher learning gains (Cohen’s d = 0.56), with particularly pronounced effects among initially lower-performing students. The framework also enhanced learner engagement and satisfaction compared to conventional assessment approaches. These findings suggest that generative AI can effectively operationalize personalized assessment at scale while maintaining pedagogical quality and transparency.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Meina Qian

Jilin International Studies University

Hualei Ji

Jilin International Studies University

Lianzhi Li

Jilin International Studies University

Journals

Scientific Reports

Actions

Institutions

Jilin International Studies University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Empirical validation of a generative AI framework for personalized education assessment

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study