Background: Poor documentation quality can significantly affect healthcare operations, but the feedback process for clinicians to improve clinical notes is time-consuming and often insufficient. Large language models (LLMs) such as Generative Pre-trained Transformer 4 (GPT-4) have the potential to streamline this process. Objectives: To determine whether an LLM can generate feedback to improve the medical contingency and discharge planning (MCDP) component of clinical documentation that is non-inferior to feedback by physicians. Methods: A cross-sectional study of GPT-4 feedback and physician feedback on inpatient progress notes was conducted. A random sample of 64 inpatient progress notes identified by the validated AI Audit Tool as having a low likelihood of containing MCDP was included from adult general medicine patients hospitalized at New York University Langone Health (NYULH) in December 2023. Both GPT-4 model and attending physicians generated feedback on these inpatient progress notes. A/B testing was then conducted on the measures of understandability, usefulness, acceptability, and impartiality. Evaluations employed 5-point Likert scales that were converted to 10-point bidirectional interval scales for interpretability, ranging from –10 (human suggestions significantly better) to +10 (GPT-4 suggestions significantly better), with a non–inferiority threshold set to –1 for the primary endpoint. Results: 64 inpatient progress notes were included, representing 55% female patients with a median age of 73. GPT-4 feedback was non-inferior to physician feedback in all measures: understandability (mean 1.27, 95% CI 0.73 to 1.8, P < 0.001), usefulness (mean 2.09, 95% CI 1.27 to 2.91, P < 0.001), acceptability (mean 2.07, 95% CI 1.33 to 2.81, P < 0.001), and impartiality (mean –0.20, 95% CI –0.52 to 0.12, P < 0.001). Conclusions: This study shows that an LLM can be leveraged to generate note quality feedback that is non-inferior to expert clinician feedback.
Building similarity graph...
Analyzing shared references across papers
Loading...
Chris Kim
Joseph Gelfinbein
Nihan Gencerliler
Applied Clinical Informatics
New York University
NYU Langone Health
Winthrop-University Hospital
Building similarity graph...
Analyzing shared references across papers
Loading...
Kim et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69e1d0715cdc762e9d85932e — DOI: https://doi.org/10.1055/a-2851-0739