March 7, 2024Open Access

Development of a liver disease–specific large language model chat interface using retrieval-augmented generation

Key Points

Key points are not available for this paper at this time.

Abstract

Background and Aims: Large language models (LLMs) have significant capabilities in clinical information processing tasks. Commercially available LLMs, however, are not optimized for clinical uses and are prone to generating hallucinatory information. Retrieval-augmented generation (RAG) is an enterprise architecture that allows the embedding of customized data into LLMs. This approach “specializes” the LLMs and is thought to reduce hallucinations. Approach and Results We developed “LiVersa,” a liver disease–specific LLM, by using our institution’s protected health information-complaint text embedding and LLM platform, “Versa.” We conducted RAG on 30 publicly available American Association for the Study of Liver Diseases guidance documents to be incorporated into LiVersa. We evaluated LiVersa’s performance by conducting 2 rounds of testing. First, we compared LiVersa’s outputs versus those of trainees from a previously published knowledge assessment. LiVersa answered all 10 questions correctly. Second, we asked 15 hepatologists to evaluate the outputs of 10 hepatology topic questions generated by LiVersa, OpenAI’s ChatGPT 4, and Meta’s Large Language Model Meta AI 2. LiVersa’s outputs were more accurate but were rated less comprehensive and safe compared to those of ChatGPT 4. Results: We evaluated LiVersa’s performance by conducting 2 rounds of testing. First, we compared LiVersa’s outputs versus those of trainees from a previously published knowledge assessment. LiVersa answered all 10 questions correctly. Second, we asked 15 hepatologists to evaluate the outputs of 10 hepatology topic questions generated by LiVersa, OpenAI’s ChatGPT 4, and Meta’s Large Language Model Meta AI 2. LiVersa’s outputs were more accurate but were rated less comprehensive and safe compared to those of ChatGPT 4. Conclusions: In this demonstration, we built disease-specific and protected health information-compliant LLMs using RAG. While LiVersa demonstrated higher accuracy in answering questions related to hepatology, there were some deficiencies due to limitations set by the number of documents used for RAG. LiVersa will likely require further refinement before potential live deployment. The LiVersa prototype, however, is a proof of concept for utilizing RAG to customize LLMs for clinical use cases.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Ge et al. (Thu,) studied this question.

www.synapsesocial.com/papers/68e7541bb6db6435876cbd52 — DOI: https://doi.org/10.1097/hep.0000000000000834

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Latent bias and the implementation of artificial intelligence in medicine· 2020 · 197 citations
Health system-scale language models are all-purpose prediction engines· 2023 · 432 citations
Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma· 2023 · 619 citations
Survey of Hallucination in Natural Language Generation· 2022 · 3,253 citations

Authors

Jin Ge

Steve Sun

Joseph F. Owens

Journals

Hepatology

Actions

Institutions

University of California, San Francisco

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Development of a liver disease–specific large language model chat interface using retrieval-augmented generation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion