To address issues such as inaccurate legal provision matching, lack of targeted recommendations, and limited scenario coverage in existing datasets when general-purpose large language models perform legal question-answering tasks, this study constructs a high-quality fine-tuning dataset tailored for large language models in the legal QA domain. The dataset is primarily derived from authentic consultation records collected from provincial legal service websites across China, covering high-frequency public legal scenarios, including civil and commercial disputes, labor disputes, traffic accidents, and criminal cases. Through a series of processing steps, including automated collection, multi-level cleaning and noise reduction, SimHash-based deduplication, privacy de-identification, and format standardization, a dataset containing 77,703 structured JSON data pairs was ultimately constructed. To evaluate the effectiveness of the dataset, four mainstream foundation models (Llama-3.1-8B, Qwen3-8B, Hunyuan-7B, and InternLM3-8B) were selected for parameter-efficient fine-tuning using LoRA. Experimental results demonstrate that all models achieved significant improvements in automated evaluation metrics such as BLEU and ROUGE after fine-tuning, with InternLM3-8B exhibiting particularly outstanding performance. This dataset not only helps fill the gap in high-quality structured data resources for routine legal consultation scenarios but also enhances the capabilities of large language models in legal intent understanding, statutory citation, and logical reasoning, thereby providing crucial data support for the practical application of legal artificial intelligence.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zijian HUANG
Shibiao Shi
Xu OUYANG
China Scientific Data
Building similarity graph...
Analyzing shared references across papers
Loading...
HUANG et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69bf86ecf665edcd009e8fa5 — DOI: https://doi.org/10.11922/11-6035.csd.2025.0250.zh