August 9, 2025

Leveraging LLMs for Early Diagnosis in the Emergency Department: Comparing ClinicalBERT and GPT-4.

Key Points

ClinicalBERT outperforms GPT models in diagnosing respiratory system diseases with an F1-score of 0.952.
The highest F1-score for GPT models is 0.784 using the binary voting method, mainly excelling in recall.
The analysis utilized early clinical notes from the MIMIC-III dataset to identify diagnoses in four specific health conditions.
Understanding model performance in diagnostics can enhance decision-making in emergency medical settings.

Abstract

This study explored the potential of LLMs, such as ClinicalBERT and GPT-4, to identify potential diagnoses using early clinical notes from the MIMIC-III dataset. We compared these models across four conditions: circulatory system diseases, respiratory system diseases, septicemia, and pneumonia. ClinicalBERT consistently outperformed the GPT models, with its highest F1-score of 0.952 for respiratory system diseases. The GPT models, while showing high recall, had lower precision, with the highest F1-score of 0.784 achieved by the GPT binary voting method. ClinicalBERT demonstrated strong precision and F1-scores, while GPT-4 excelled in recall.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Cui et al. (Thu,) studied this question.

www.synapsesocial.com/papers/689dfe97d61984b91e13c0ce — DOI: https://doi.org/10.3233/shti251241

Authors

Wanting Cui

Joseph Finkelstein

Actions

Institutions

University of Utah

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Leveraging LLMs for Early Diagnosis in the Emergency Department: Comparing ClinicalBERT and GPT-4.

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion