What type of study is this?

This is a Systematic Review study.

September 5, 2025Open Access

Large Language Model Analysis of Reporting Quality of Randomized Clinical Trial Articles

Key Points

The large language model achieved 91.7% agreement with experts in evaluating CONSORT compliance.
Mean CONSORT compliance improved significantly from 27.3% between 1966-1990 to 57.0% from 2010-2024.
The systematic review included 21,041 RCTs, highlighting persistent gaps in key reporting elements and disciplines.
Compliance varied significantly, with pharmacology at 35.2% and urology at 63.4%, indicating the need for targeted interventions.

Abstract

Importance Incomplete reporting in randomized clinical trials (RCTs) obscures bias and limits reproducibility. Manual audits for adherence to the Consolidated Standards of Reporting Trials (CONSORT) guideline cannot keep pace with publication volume. Objectives To build and validate a zero-shot large-language-model (LLM) pipeline for automated CONSORT assessment and to map reporting quality over time, biomedical disciplines, and trial features. Design, Setting, and Participants This systematic review included RCTs that were indexed on PubMed, available in English, open access, human-participant research, and published between MONTH 1966 to MONTH 2024. PubMed PDFs were converted to XML and linked with Semantic Scholar and ClinicalTrials.gov metadata. Chat GPT-4o-mini was tested on the 50-article CONSORT–Text Classification Model (CONSORT-TM) benchmark, checked by experts in 70 randomly sampled RCTs, and then applied to the full sample. Exposure Publication year, biomedical discipline, funding source, trial phase, US Food and Drug Administration regulation, and oversight features. Main Outcomes and Measures The LLM judged whether each of 21 CONSORT items was met. Primary outcomes were (1) model performance vs expert review (precision, recall, and macro F1 score) and (2) proportion of items reported. Results Of 53 137 screened PDFs, 21 041 RCTs (median IQR publication year, 2014 2003-2020; 30 disciplines) were included, with a registry-linked subset of 1790 RCTs that had a median (IQR) planned enrollment of 210 (95-440) participants. In the 70-article validation set (2210 decisions) LLM outputs matched experts 91.7% of the time (2026 of 2210 decision); the macro F1 score on CONSORT-TM was 0.86 (95% CI, 0.84-0.87). Mean CONSORT compliance increased from 27.3% (95% CI, 27.0%-27.6%) in 1966 to 1990 to 57.0% (95% CI, 56.8%-57.2%) in 2010 to 2024. However, reporting critical elements remained uncommon, such as allocation-concealment mechanism (16.1% 95% CI, 15.6%-16.6%) and external-validity discussion (1.6% 95% CI, 1.5%-1.8%). Compliance varied across disciplines from 35.2% (95% CI, 34.8%-35.6%) in pharmacology to 63.4% (95% CI, 62.1%-64.7%) in urology and showed only negligible associations with clinical trial characteristics (all Cramer V lt;0.10). Conclusions and Relevance In this systemic review of RCTs, a zero-shot LLM audited CONSORT adherence at scale, uncovering persistent reporting gaps and wide disciplinary variation across biomedical fields, underscoring the need for targeted editorial action to boost transparency and reproducibility.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Srinivasan et al. (Thu,) studied this question.

www.synapsesocial.com/papers/68bb3edf2b87ece8dc956d6f — DOI: https://doi.org/10.1001/jamanetworkopen.2025.29418

Authors

Apoorva Srinivasan

Jacob Berkowitz

Nadine A. Friedrich

Journals

JAMA Network Open

Actions

Institutions

Cedars-Sinai Medical Center

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Large Language Model Analysis of Reporting Quality of Randomized Clinical Trial Articles

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion