What question did this study set out to answer?

The aim is to develop an explainable clinical coding system that enhances transparency and accuracy in coding processes.

April 16, 2026Open Access

An end-to-end system for explainable clinical coding across languages and diverse medical data sources

Key Points

The aim is to develop an explainable clinical coding system that enhances transparency and accuracy in coding processes.
Developed a three-phase system for clinical coding: detecting text spans, supervised classification, and handling low-confidence cases.
Used named entity recognition models to identify potential ICD code descriptions in texts.
Applied a supervised text classification model, then addressed low-confidence instances with a semantic similarity model.
Achieved an average F1-score improvement of 3.42% over existing methods.
Demonstrated robust performance across different language corpora (Spanish and English) and multiple ICD variants.
Effectively managed unseen codes and complex entity overlaps.

Abstract

The International Classification of Diseases (ICD) coding system standardizes clinical texts to improve the information extraction process in healthcare and research. However, most existing automated coding systems operate as “black boxes”, highlighting the need for explainable approaches that offer transparency and interpretable solutions. In this work, we present an end-to-end system that provides explainable clinical coding predictions. We develop a three-phase system for explainable clinical coding. In Phase 1, text spans potentially describing ICD codes are detected using different Named Entity Recognition (NER) models. Phase 2 applies a supervised text classification model with a confidence threshold, while low-confidence cases are classified in Phase 3 using a semantic similarity model built from ICD code descriptions and related keyphrases. The system is evaluated on four corpora in Spanish and English, annotated with ICD codes from three variants (ICD-10-CM, ICD-10-PCS, and ICD-O-3) and their corresponding textual mentions. Overall, the system proves to be robust and competitive with state-of-the-art approaches, outperforming most of them and achieving an average F1-score improvement of 3.42%. This study presents one of the most comprehensive evaluations of an explainable clinical coding system across languages and ICD variants. The proposed approach demonstrates strong robustness and generalization, effectively handling unseen codes as well as discontinuous and overlapping entities.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Alicia Ramirez-Arrabe

Juan Martinez-Romo

Andrés Duque

Journals

BMC Medical Informatics and Decision Making

Actions

Institutions

Universidad Nacional de Educación a Distancia

Escuela Nacional de Sanidad

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

An end-to-end system for explainable clinical coding across languages and diverse medical data sources

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study