March 3, 2026Open Access

Large language models for clinical artificial intelligence in healthcare a systematic review

Key Points

Integration of multimodal llms enhances data representation and processing in healthcare, making systems more effective.
The review covers 90 studies, highlighting advancements that address gaps in understanding LLM applications in clinical settings.
Assessment of challenges such as bias and privacy risks alongside actionable guidelines for improved regulatory compliance.
The findings serve as a comprehensive guide for integrating generative AI into medical workflows, bridging technology and policy.

Abstract

Abstract Large Language Models (LLMs) have demonstrated the capacity to process, reason, and generate extensive volumes of data, providing a novel paradigm for integrating generative artificial intelligence (GenAI) into the medical field. Multimodal LLMs (MLLMs) extend these capabilities by incorporating diverse data modalities into unified representations, including genomics, medical imaging, and clinical text. This systematic review synthesizes advancements from 246 records identified between January 2020 and September 2025, of which 90 studies were included after full-text screening, to address critical gaps in understanding the clinical role of LLM and MLLM in healthcare. We trace the evolution from classical natural language processing (NLP) approaches to modern transformer-based architectures, summarize their technical foundations, and examine their construction, evaluation, and deployment in medical workflows. Key contributions include highlighting multimodal integration (e.g., imaging-genomics-text fusion), ethical governance frameworks, and validated domain-specific fine-tuning in clinical settings. We also highlight advances in Prompting, Retrieval-Augmented Generation (RAG), and Multi-Agent (agentic) workflows, providing a critical assessment of their benefits and limitations. In addition, we analyze challenges such as hallucinations, bias, and privacy risks, while providing actionable guidelines for clinicians, developers, and policymakers to improve regulatory compliance. By consolidating the nomenclature and systematically evaluating GenAI in medicine, this review offers evidence-based recommendations and directions for the safe and effective integration of generative AI into healthcare. The findings are intended as an authoritative guide for researchers and practitioners, bridge principles, clinical applications, and policy considerations for LLM and MLLM.

Large language models for clinical artificial intelligence in healthcare a systematic review

Key Points

Abstract

Cite This Study