Large language models hold vast application potential across diverse fields such as healthcare, law, and finance. However, these domains impose higher requirements on the models specialization, accuracy, explainability, and security. Most existing public datasets primarily focus on conclusive answers and lack explainable reasoning that reflects the expert decision-making process within complex consultation scenarios. Consequently, they are insufficient for effectively supporting conducting long-context, multi-turn interactive reasoning in large language models. To address this, this study constructs the MPCCD-MLF dataset (dataset of multi-round professional consultation conversations in the medical, legal and financial domains), comprising multi-round conversation corpora across medical, legal, and financial domains. Data sources include professional platforms such as Haodf.com, China Legal Service Network (12348), and Xueqiu.com, covering the period from January 2023 to December 2024. The dataset was constructed through a pipeline including web crawling, prompt engineering, and structural reorganization. Using specially designed multi-dimensional constrained prompt templates, it anchors to factual judgements and conclusive information within experts’ original responses, thereby generating structured and interpretable reasoning expressions that unfold across multiple conversation rounds. After cleaning and anonymization, the final dataset contains 31,745 three-round question-answer interactions (approximately 181 MB) stored in JSON format. Each conversation follows a multi-round interaction pattern comprising user query, expert response, user follow-up, and expert follow-up response. To ensure dataset quality, a double-blind evaluation strategy combining automated model scoring and expert manual verification was adopted, yielding an overall dataset quality score of 4.75 (out of 5). This dataset provides high-quality, and highly interpretable corpora for large language models in specialized domains, supporting research on complex logical reasoning and long-context multi-round interactions, and offering valuable data resources for the development of domain-specific intelligent consultation systems.
Building similarity graph...
Analyzing shared references across papers
Loading...
Congfei Luo
Qidong YAN
Dejie Wang
China Scientific Data
Building similarity graph...
Analyzing shared references across papers
Loading...
Luo et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69bf86ecf665edcd009e9142 — DOI: https://doi.org/10.11922/11-6035.csd.2025.0247.zh