What question did this study set out to answer?

The review aims to synthesize empirical evidence regarding the applications of large language models in healthcare and biomedical informatics.

March 3, 2026

Large language models in healthcare and biomedical informatics: A comprehensive review

Key Points

The review aims to synthesize empirical evidence regarding the applications of large language models in healthcare and biomedical informatics.
Conducted a systematic review of existing literature on LLM applications in healthcare.
Focused on analyzing the capabilities, evaluation practices, and reported outcomes of LLMs.
Grouped studies by thematic applications, including genomic interpretation and clinical documentation.
Identified promising applications in genomic interpretation and multiomics analysis.
Highlighted uses in electronic health record summarization and medical question answering.
Recognized challenges like hallucination, bias, and data privacy for effective clinical deployment.

Abstract

Large language models (LLMs) are rapidly advancing natural language processing and driving increasing interest across biomedical research, clinical care, and healthcare operations. This systematic review synthesizes current empirical evidence on LLM applications in healthcare and biomedical informatics, focusing on their capabilities, evaluation practices, and reported outcomes. Existing studies highlight promising uses in research, where LLMs assist literature-informed genomic interpretation, functional annotation, and biological hypothesis generation, as well as support tasks related to protein and multiomics analysis. In clinical contexts, LLMs are primarily evaluated for natural language-driven tasks, including electronic health record summarization, clinical documentation support, medical question answering, and information extraction, rather than autonomous diagnostic or therapeutic decision-making. Early investigations also describe potential value in healthcare operations, patient communication, clinician education, and drug discovery workflows largely through knowledge retrieval, text generation, and semantic search. However, current evidence remains preliminary: most studies are retrospective, benchmark-based, simulation-driven, or limited to controlled research settings. Key challenges include hallucination and unreliable reasoning, bias and inequitable performance across populations, data privacy and security constraints, reproducibility limitations, and lack of prospective clinical validation and regulatory guidance. Emerging strategies, such as domain-specific pretraining, retrieval augmentation, multi-modal architectures, and privacy-preserving learning, aim to improve reliability, safety, and real-world applicability. This review concludes by outlining methodological, infrastructural, and governance requirements needed to responsibly integrate LLMs into biomedical workflows, emphasizing that clinical deployment remains exploratory and must be rigorously evaluated.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Andrew Hornback

Harinishree Sathu

Kyungbeom Kim

Journals

Innovation and Emerging Technologies

Actions

Institutions

Georgia Institute of Technology

The Wallace H. Coulter Department of Biomedical Engineering

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Large language models in healthcare and biomedical informatics: A comprehensive review

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study