March 3, 2026Open Access

Understanding Clinical Reasoning Variability in Medical Large Language Models: A Mechanistic Interpretability Study

Key Points

Models exhibit significant reasoning instability, shifting staging accuracy by over 50% based solely on prompt format.
Sparse autoencoder analysis reveals hierarchical encoding, affecting how models interpret clinical cases and answer questions.
Evaluation of 355 systematic perturbations shows that benchmark equivalence does not guarantee functional equivalence across model architectures.
Findings highlight the necessity for targeted safety validation in healthcare AI applications, as varied architectures respond differently to interventions.

Abstract

Medical large language models (LLMs) achieving high benchmark accuracy exhibit unexplained variability in clinical tasks, producing errors that clinicians cannot safeguard against. We evaluated clinical reasoning stability in GPT-5, MedGemma-27B-Text-IT, and OpenBioLLM-Llama3-70B using 355 systematic perturbations of physician-validated oncology cases and trained sparse autoencoders on 1 billion tokens from 50,000 MIMIC-IV clinical notes to decompose their internal representation. We find models exhibit dramatic reasoning instability, shifting staging accuracy by over 50% based solely on prompt format, or generating definitive staging in clinically insufficient scenarios. Sparse autoencoder analysis revealed hierarchical encoding in MedGemma, where high-magnitude features encode lexical identity and low-magnitude features encode contextual meaning. OpenBioLLM distributes information uniformly. We demonstrate these internal encoding structures differentially affect retrieval interventions, suggesting interventions effective for one architecture may harm another. We recommend healthcare institutions implement architecture-specific safety validation, as benchmark equivalence does not imply functional equivalence, with implications for AI safety beyond healthcare.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Mirage Modi

Jordan E. Krull

Donte Johnson

Actions

Institutions

The Ohio State University

The Ohio State University Comprehensive Cancer Center – Arthur G. James Cancer Hospital and Richard J. Solove Research Institute

Understanding Clinical Reasoning Variability in Medical Large Language Models: A Mechanistic Interpretability Study

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider