Importance Incomplete reporting in randomized clinical trials (RCTs) obscures bias and limits reproducibility. Manual audits for adherence to the Consolidated Standards of Reporting Trials (CONSORT) guideline cannot keep pace with publication volume. Objectives To build and validate a zero-shot large-language-model (LLM) pipeline for automated CONSORT assessment and to map reporting quality over time, biomedical disciplines, and trial features. Design, Setting, and Participants This systematic review included RCTs that were indexed on PubMed, available in English, open access, human-participant research, and published between MONTH 1966 to MONTH 2024. PubMed PDFs were converted to XML and linked with Semantic Scholar and ClinicalTrials.gov metadata. Chat GPT-4o-mini was tested on the 50-article CONSORT–Text Classification Model (CONSORT-TM) benchmark, checked by experts in 70 randomly sampled RCTs, and then applied to the full sample. Exposure Publication year, biomedical discipline, funding source, trial phase, US Food and Drug Administration regulation, and oversight features. Main Outcomes and Measures The LLM judged whether each of 21 CONSORT items was met. Primary outcomes were (1) model performance vs expert review (precision, recall, and macro F1 score) and (2) proportion of items reported. Results Of 53 137 screened PDFs, 21 041 RCTs (median IQR publication year, 2014 2003-2020; 30 disciplines) were included, with a registry-linked subset of 1790 RCTs that had a median (IQR) planned enrollment of 210 (95-440) participants. In the 70-article validation set (2210 decisions) LLM outputs matched experts 91.7% of the time (2026 of 2210 decision); the macro F1 score on CONSORT-TM was 0.86 (95% CI, 0.84-0.87). Mean CONSORT compliance increased from 27.3% (95% CI, 27.0%-27.6%) in 1966 to 1990 to 57.0% (95% CI, 56.8%-57.2%) in 2010 to 2024. However, reporting critical elements remained uncommon, such as allocation-concealment mechanism (16.1% 95% CI, 15.6%-16.6%) and external-validity discussion (1.6% 95% CI, 1.5%-1.8%). Compliance varied across disciplines from 35.2% (95% CI, 34.8%-35.6%) in pharmacology to 63.4% (95% CI, 62.1%-64.7%) in urology and showed only negligible associations with clinical trial characteristics (all Cramer V lt;0.10). Conclusions and Relevance In this systemic review of RCTs, a zero-shot LLM audited CONSORT adherence at scale, uncovering persistent reporting gaps and wide disciplinary variation across biomedical fields, underscoring the need for targeted editorial action to boost transparency and reproducibility.
Building similarity graph...
Analyzing shared references across papers
Loading...
Srinivasan et al. (Thu,) studied this question.
www.synapsesocial.com/papers/68bb3edf2b87ece8dc956d6f — DOI: https://doi.org/10.1001/jamanetworkopen.2025.29418
Apoorva Srinivasan
Jacob Berkowitz
Nadine A. Friedrich
JAMA Network Open
Cedars-Sinai Medical Center
Building similarity graph...
Analyzing shared references across papers
Loading...