The International Classification of Diseases (ICD) coding system standardizes clinical texts to improve the information extraction process in healthcare and research. However, most existing automated coding systems operate as “black boxes”, highlighting the need for explainable approaches that offer transparency and interpretable solutions. In this work, we present an end-to-end system that provides explainable clinical coding predictions. We develop a three-phase system for explainable clinical coding. In Phase 1, text spans potentially describing ICD codes are detected using different Named Entity Recognition (NER) models. Phase 2 applies a supervised text classification model with a confidence threshold, while low-confidence cases are classified in Phase 3 using a semantic similarity model built from ICD code descriptions and related keyphrases. The system is evaluated on four corpora in Spanish and English, annotated with ICD codes from three variants (ICD-10-CM, ICD-10-PCS, and ICD-O-3) and their corresponding textual mentions. Overall, the system proves to be robust and competitive with state-of-the-art approaches, outperforming most of them and achieving an average F1-score improvement of 3.42%. This study presents one of the most comprehensive evaluations of an explainable clinical coding system across languages and ICD variants. The proposed approach demonstrates strong robustness and generalization, effectively handling unseen codes as well as discontinuous and overlapping entities.
Building similarity graph...
Analyzing shared references across papers
Loading...
Alicia Ramirez-Arrabe
Juan Martinez-Romo
Andrés Duque
BMC Medical Informatics and Decision Making
Universidad Nacional de Educación a Distancia
Escuela Nacional de Sanidad
Building similarity graph...
Analyzing shared references across papers
Loading...
Ramirez-Arrabe et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69e07dfe2f7e8953b7cbef77 — DOI: https://doi.org/10.1186/s12911-026-03473-6