Abstract This article addresses domain knowledge gaps in general large language models for historical text analysis in the context of computational humanities and AIGC technology. We propose the GraphRAG framework, combining chain-of-thought prompting, self-instruction generation, and process supervision to create a “The First Four Histories” character relationship dataset with minimal manual annotation. This dataset supports automated historical knowledge extraction, reducing labor costs. In the graph-augmented generation phase, we introduce a collaborative mechanism between knowledge graphs and retrieval-augmented generation, improving the alignment of general models with historical knowledge. Experiments show that the domain-specific model Xunzi-Qwen1.5-14B, with Simplified Chinese input and chain-of-thought prompting, achieves optimal performance in relation extraction (F1 = 0.68). The DeepSeek-R1 model integrated with GraphRAG achieves an absolute F1 increase of 0.11 (0.08 → 0.19) on the open-domain C-CLUE relation extraction dataset, surpassing the F1 value of Xunzi-Qwen1.5-14B (0.12), effectively alleviating “hallucinations,” and improving interpretability. This framework offers a low-resource solution for classical text knowledge extraction, advancing historical knowledge services and humanities research.
Building similarity graph...
Analyzing shared references across papers
Loading...
Fan Yang
Qi Zhang
Wenqian Xing
Digital Scholarship in the Humanities
Nanjing Agricultural University
Shanxi University
Shanxi University of Finance and Economics
Building similarity graph...
Analyzing shared references across papers
Loading...
Yang et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69a67eb2f353c071a6f0a0fa — DOI: https://doi.org/10.1093/llc/fqag006