Abstract Large Language Models (LLMs) have demonstrated the capacity to process, reason, and generate extensive volumes of data, providing a novel paradigm for integrating generative artificial intelligence (GenAI) into the medical field. Multimodal LLMs (MLLMs) extend these capabilities by incorporating diverse data modalities into unified representations, including genomics, medical imaging, and clinical text. This systematic review synthesizes advancements from 246 records identified between January 2020 and September 2025, of which 90 studies were included after full-text screening, to address critical gaps in understanding the clinical role of LLM and MLLM in healthcare. We trace the evolution from classical natural language processing (NLP) approaches to modern transformer-based architectures, summarize their technical foundations, and examine their construction, evaluation, and deployment in medical workflows. Key contributions include highlighting multimodal integration (e.g., imaging-genomics-text fusion), ethical governance frameworks, and validated domain-specific fine-tuning in clinical settings. We also highlight advances in Prompting, Retrieval-Augmented Generation (RAG), and Multi-Agent (agentic) workflows, providing a critical assessment of their benefits and limitations. In addition, we analyze challenges such as hallucinations, bias, and privacy risks, while providing actionable guidelines for clinicians, developers, and policymakers to improve regulatory compliance. By consolidating the nomenclature and systematically evaluating GenAI in medicine, this review offers evidence-based recommendations and directions for the safe and effective integration of generative AI into healthcare. The findings are intended as an authoritative guide for researchers and practitioners, bridge principles, clinical applications, and policy considerations for LLM and MLLM.
Ghnemat et al. (Wed,) studied this question.