This study explored the potential of LLMs, such as ClinicalBERT and GPT-4, to identify potential diagnoses using early clinical notes from the MIMIC-III dataset. We compared these models across four conditions: circulatory system diseases, respiratory system diseases, septicemia, and pneumonia. ClinicalBERT consistently outperformed the GPT models, with its highest F1-score of 0.952 for respiratory system diseases. The GPT models, while showing high recall, had lower precision, with the highest F1-score of 0.784 achieved by the GPT binary voting method. ClinicalBERT demonstrated strong precision and F1-scores, while GPT-4 excelled in recall.
Building similarity graph...
Analyzing shared references across papers
Loading...
Cui et al. (Thu,) studied this question.
www.synapsesocial.com/papers/689dfe97d61984b91e13c0ce — DOI: https://doi.org/10.3233/shti251241
Wanting Cui
Joseph Finkelstein
University of Utah
Building similarity graph...
Analyzing shared references across papers
Loading...