March 6, 2024Open Access

Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks

Key Points

Key points are not available for this paper at this time.

Abstract

Abstract It is likely that individuals are turning to Large Language Models (LLMs) to seek health advice, much like searching for diagnoses on Google. We evaluate clinical accuracy of GPT-3·5 and GPT-4 for suggesting initial diagnosis, examination steps and treatment of 110 medical cases across diverse clinical disciplines. Moreover, two model configurations of the Llama 2 open source LLMs are assessed in a sub-study. For benchmarking the diagnostic task, we conduct a naïve Google search for comparison. Overall, GPT-4 performed best with superior performances over GPT-3·5 considering diagnosis and examination and superior performance over Google for diagnosis. Except for treatment, better performance on frequent vs rare diseases is evident for all three approaches. The sub-study indicates slightly lower performances for Llama models. In conclusion, the commercial LLMs show growing potential for medical question answering in two successive major releases. However, some weaknesses underscore the need for robust and regulated AI models in health care. Open source LLMs can be a viable option to address specific needs regarding data privacy and transparency of training.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Sandmann et al. (Wed,) studied this question.

www.synapsesocial.com/papers/68e75683b6db6435876ce2ed — DOI: https://doi.org/10.1038/s41467-024-46411-8

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

ChatGPT: The transformative influence of generative AI on science and healthcare· 2023 · 88 citations
ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports· 2023 · 503 citations
Implications of large language models such as <scp>ChatGPT</scp> for dental medicine· 2023 · 276 citations
Should You Search the Internet for Information About Your Acute Symptom?· 2012 · 47 citations

Authors

Sarah Sandmann

Sarah Riepenhausen

Lucas Plagwitz

Journals

Nature Communications

Actions

Institutions

University of Münster

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion