What question did this study set out to answer?

This review critically analyzes the evolution and performance of SDTM mapping automation using different technological generations.

April 19, 2026Open Access

Automating SDTM Mapping with Artificial Intelligence and Large Language Models: A Narrative Review

Key Points

This review critically analyzes the evolution and performance of SDTM mapping automation using different technological generations.
Conducted a non-systematic literature search across various sources including peer-reviewed journals and conference proceedings.
Adopted a narrative design to assess the heterogeneity of source types relevant to SDTM automation.
Developed an analytical methodology encompassing a three-generation classification framework and evidence quality assessment.
First-generation approaches achieved 40-60% reuse of specifications but lacked adaptability.
Second-generation machine learning systems demonstrated 75.1% prediction accuracy and over 90% field classification accuracy.
Third-generation LLMs showed 50-70% reductions in time, automating 75% of initial mappings with approximately 70% accuracy.

Abstract

Background. SDTM-conformant (Study Data Tabulation Model—the FDA’s required format for clinicaltrial data) data preparation is required for every new drug, biologic, and abbreviated new drug applicationsubmitted to the U.S. Food and Drug Administration (FDA), directly affecting the pace at which newtherapeutics reach the American public. The Clinical Data Interchange Standards Consortium (CDISC)SDTM is the mandated standard for regulatory submissions of clinical trial data. Manual SDTMmapping—the process of transforming raw study data into SDTM-conformant datasets—remains aresource-intensive bottleneck, with industry estimates placing effort at six to eight weeks and 35–45programmer hours per dataset. As regulatory submission timelines face increasing commercial and publichealth pressure, automation of this process has become a strategic priority.Objective. The author presents a critical analytical framework examining the evolution of SDTM mappingautomation across three technology generations: (1) metadata-driven tools, (2) supervised machinelearning classifiers, and (3) large language model (LLM) and agentic AI architectures, evaluating reportedaccuracy metrics, efficiency gains, evidence quality, and readiness for regulatory-grade deployment.Methods. The author conducted a targeted, non-systematic literature search of peer-reviewed journals(PLoS ONE, Journal of Biomedical Informatics), conference proceedings (PharmaSUG, PHUSE,WUSS), FDA regulatory notices, CDISC standards documentation, vendor white papers, preprint servers(medRxiv), and open-source project repositories through March 2026. A narrative rather than systematicdesign was adopted because the extreme heterogeneity of source types—spanning peer-reviewedresearch, conference proceedings, vendor marketing materials, regulatory notices, and open-sourcerepositories—and the absence of standardized reporting formats across SDTM automation studies precludethe application of systematic review protocols designed for homogeneous clinical evidence. Sourceswere identified through database searches, citation tracking, and the author’s domain expertise and professionalexperience in the field. No formal inclusion/exclusion criteria were applied; this review istherefore subject to selection bias. The author independently designed the analytical methodology, includingthe three-generation classification framework, evidence-quality assessment rubric, and benchmarkrequirements synthesis.Results. First-generation metadata-driven approaches achieved 40–60% specification reuse but lackedadaptability. Second-generation machine learning systems demonstrated 75.1% domain predictionaccuracy (Galiker et al., 2023) and greater than 90% overall field classification accuracy (Yang et al.,2024), with precision reaching 91.4% at calibrated confidence thresholds. Third-generation LLM systemsreport 50–70% time reductions and automated generation of approximately 75% of initial mappings,though at only ∼70% initial accuracy. Critically, a medRxiv preprint review found that only 2.3% ofpublished clinical programming validation studies provide quantitative effectiveness data (Patel andGupta, 2025), highlighting a significant evidence gap.Conclusions. AI-driven SDTM automation is progressing from experimental prototypes toward increasinglyoperational systems. However, the field lacks standardized benchmark datasets, head-to-headcomparisons, and regulatory-specific validation frameworks. Human-in-the-loop oversight remainsessential. The analytical framework and evidence-gap analysis presented in this review are applicableacross pharmaceutical companies, contract research organizations, and regulatory agencies nationwide,contributing to the national effort to modernize clinical trial data infrastructure.

Automating SDTM Mapping with Artificial Intelligence and Large Language Models: A Narrative Review

Key Points

Abstract

Cite This Study