What question did this study set out to answer?

This research aims to improve the accuracy and relevance of AI-generated educational content for K–12 learners by integrating knowledge graphs into large language models.

April 7, 2026Open Access

Design and Evaluation of a Question-Answering System Based on Knowledge Graph-Augmented Large Language Models in K–12 Artificial Intelligence Curriculum

Key Points

This research aims to improve the accuracy and relevance of AI-generated educational content for K–12 learners by integrating knowledge graphs into large language models.
Constructed a knowledge graph of the K–12 AI curriculum.
Developed a question-answering system utilizing KG-augmented LLMs.
Evaluated system performance on a dataset of 1098 AI curriculum questions across three difficulty levels.
Utilized the G-Eval with no-reference metrics for assessment.
Measured performance using DeepSeek-V3 across three mainstream LLMs.
Integration of the curriculum KG enhanced factual accuracy of LLM-generated answers.
Performance improvements varied across different LLMs: Qwen and Baichuan showed the strongest enhancements.
Non-declarative knowledge integration impacted linguistic fluency and coherence negatively.
The system provides a reliable framework for developing AI teaching assistants.

Abstract

Digital transformation is reshaping the education sector, fostering an AI-enabled, learner-centered ecosystem. This shift is characterized by the adoption of large language models (LLMs) in education, which is forging a new paradigm for intelligent teaching. However, the integration of LLMs into K–12 AI education is often hindered by their tendency to generate factually inaccurate and pedagogically misaligned content. To address this, we constructed a knowledge graph (KG) of the K–12 AI curriculum and developed a question-answering system based on KG-augmented LLMs. The system was evaluated on a dedicated AI curriculum dataset comprising 1098 questions categorized into three difficulty levels. The evaluation employed the G-Eval with no-reference metrics. Using DeepSeek-V3 as the scoring model, the system performance was assessed across three mainstream LLMs and measured along five distinct dimensions. Results indicated that the integration of curriculum KG significantly enhanced the factual accuracy and relevance of LLM-generated answers in K–12 AI education. However, this enhancement involves a trade-off, as the incorporation of non-declarative knowledge can negatively affect linguistic fluency and coherence. Performance gains varied across LLMs: Qwen and Baichuan demonstrated the strongest improvements, particularly in complex tasks. This study provides a scalable, knowledge-anchored framework for developing reliable AI teaching assistants, demonstrating a practical pathway to mitigate domain-specific hallucinations in educational applications.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Huang et al. (Sun,) studied this question.

www.synapsesocial.com/papers/69d49fa9b33cc4c35a22821c — DOI: https://doi.org/10.3390/app16073552

Authors

Jingxiu Huang

Feiyu Lai

Zixuan Zheng

Journals

Applied Sciences

Actions

Institutions

South China Normal University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Design and Evaluation of a Question-Answering System Based on Knowledge Graph-Augmented Large Language Models in K–12 Artificial Intelligence Curriculum

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion