What question did this study set out to answer?

The aim is to address issues related to protocol adherence and the importance of correct statistical interpretation in systematic reviews.

April 22, 2026Open Access

Ensuring Methodological Rigor: The Essential Role of Protocol Adherence and Correct Statistical Interpretation

Key Points

The aim is to address issues related to protocol adherence and the importance of correct statistical interpretation in systematic reviews.
Reviewed discrepancies between PROSPERO registration and final publication in systematic reviews.
Evaluated the use of statistical methods and analytical tools in the meta-analysis.
Conducted independent replication of the meta-analysis with different models.
Identified significant inconsistencies in search strategy documentation and statistical analyses.
Confirmed that the pooled mean difference is statistically significant when correctly interpreted.
Highlighted concerns regarding the methodological rigor and transparency of systematic reviews.

Abstract

Dear Editor: Prospective registration of systematic reviews – such as via PROSPERO – was established to safeguard methodological transparency, prevent selective reporting, and reduce the risk of post hoc decision-making. When deviations from the registered protocol occur without clear justification or documentation, the reliability of a review’s conclusions is undermined. In the systematic review by Atia et al. (2025),1 several discrepancies between the PROSPERO record (CRD42022371033) and the final publication raise concerns about protocol adherence and methodological validity. First, while the PROSPERO record specifies an electronic search using PubMed, Scopus, Web of Science, and Cochrane CENTRAL, it provides only a single, generic search strategy rather than database-specific queries. This omission impairs reproducibility and contravenes PRISMA guidance, which recommends detailed, database-specific structured searches (item #7).2 The manuscript similarly lacks comprehensive search strings or supplementary materials beyond a list of drug synonyms, leaving the completeness of the search process unverifiable. Second, the authors registered Revman ((Review Manager) version 5.3, developed by The Cochrane Collaboration, London, United Kingdom) as their planned analysis software in PROSPERO but ultimately used R with the packages “meta” and “dmetar.” While switching to a more flexible platform is not inherently problematic, the change is undocumented and unjustified. Analytical tools influence available modeling approaches and output structure and modifying them post hoc without transparency raises legitimate the concerns about analytical consistency and selective reporting. Third, the decision to apply a fixed-effect model appears to be based solely on an I² value below 50%, as stated both in the PROSPERO registration and in the article itself. In the PROSPERO record, the authors explicitly state: “We are planning to perform the fixed-effect model meta-analysis, but if Cochrane Q test P value 0.1 and Higgin’s I² 50% indicates a significant heterogeneity between studies, we will perform the random-effects model.” However, in the main manuscript, under the “Synthesis Methods” section, they claim: “In a random effect meta-analysis model, data from the CIMT assessment were pooled as the standardized MD (SMD).” Nevertheless, in the abstract, the authors report: “In the fixed-effect model, treatment with allopurinol (n = 195) showed higher efficacy than the control group (n = 187) in terms of mean change of CIMT (mean difference −0.05 −0.06, −0.04).” This inconsistency between the registered protocol, the methodological description, and the actual analyses raises substantial concerns regarding methodological transparency, protocol adherence, and the credibility of the reported findings. Moreover, this threshold-driven approach to model selection is overly simplistic and neglects the conceptual foundation of random-effects models, which are specifically designed to account for clinical and methodological heterogeneity beyond what the I² statistic captures. Given the heterogeneity among included studies — particularly the inclusion of a nonrandomized trial — a random-effects model (or, at minimum, presenting both models in parallel) would have been more appropriate, irrespective of the I² value. The Cochrane Handbook (v6.4, Ch. 10.10.4) explicitly warns against such mechanical application of heterogeneity thresholds: “The choice between a fixed-effect and a random-effects meta-analysis should never be made on the basis of a statistical test for heterogeneity.”3 Fourth, the authors report a pooled mean difference (MD) of −0.05 (95% confidence interval: −0.06 to −0.04) but ambiguously associate a P = 0.24 with this result in the abstract. However, this P value corresponds to Cochran’s Q test for heterogeneity – not to the statistical significance of the pooled effect estimate. My independent replication, using both fixed- and random-effects models in StataCorp. (Stata Statistical Software: Release 19. College Station, TX: StataCorp LLC; 2019), confirms that the pooled MD is statistically significant (P < 0.0001) under both models, with consistent directionality Figure 1. This represents a serious statistical misinterpretation, as conflating the heterogeneity P value with the overall effect P value can mislead readers, obscure the true strength of the findings, and undermine the credibility of the meta-analytic result.Figure 1: Replication of the meta-analysis conducted by Atia et al. (2025), using the mean differences (MDs) and standard errors reported by the review authors. Both analyses were performed in Stata 19, applying an inverse-variance fixed-effect model and a DerSimonian-Laird random-effects model. The upper panel displays the fixed-effect model, which yields a statistically significant pooled reduction in carotid intima-media thickness after allopurinol treatment (MD = −0.049; 95% confidence interval CI: −0.058 to −0.040; P < 0.0001). Notably, under this model, the study by Liu et al. contributes 91.73% of the overall weight, resulting in a disproportionate influence from a single trial. According to the authors’ own risk of bias assessment (RoB 2), Liu et al. was classified as having an overall high risk of bias, which further undermines the validity of conclusions derived from a fixed-effect model dominated by this study. The lower panel presents the random-effects model, which confirms the significance and direction of the effect (MD = −0.046; 95% CI: −0.068 to −0.024; P < 0.0001), while providing a more balanced distribution of study weights (Liu et al.: 63.58%). Importantly, although both models yield statistically significant results, the magnitude of the Z-statistic differs substantially: −10.405 for the fixed-effect model versus −4.088 for the random-effects model. This illustrates the inherently more conservative nature of random-effects modeling when accounting for between-study heterogeneity. It is also crucial to clarify that the P = 0.24 reported in the original review corresponds to heterogeneity (Cochran’s Q), not to the statistical significance of the pooled effect estimate (Z-statistic), which was unequivocally significant. By definition, when the pooled estimate and its 95% CI (represented by the diamond in the forest plot) do not cross the null line, the result is statistically significantFifth, the leave-one-out sensitivity analysis is performed under a fixed-effect model, again justified solely by the I2 threshold. While such analyses can be conducted under either fixed- or random-effects models, they are most informative when performed under random-effects assumptions, as this better accounts for between-study heterogeneity and provides a more conservative assessment of the influence of individual studies. This is particularly relevant here given the inclusion of a nonrandomized trial with a high risk of bias. Sixth, although the review included only controlled trials, its analytical strategy appears to rely predominantly on within-group pre–post changes rather than on true between-group comparisons. This is evident in the data extraction and synthesis: for Liu et al. and Andrews et al., the authors appear to have simply subtracted the mean change in the placebo group from the mean change in the allopurinol group, without accounting for the variance of the difference or baseline imbalances. In the case of Andrews et al., the meta-analysis even appears to pool final values directly without considering pre–post differences. Only Higgins et al. reported a properly adjusted between-group difference with its standard error and confidence interval, which the review appears to have incorporated without distinction. As a result, the pooled estimate blends valid and invalid comparisons, undermining the ability to draw reliable conclusions about treatment efficacy. What purports to be a comparative meta-analysis thus devolves into an observational synthesis of within-arm changes. In conclusion, PROSPERO registration is not a mere administrative formality. It exists to safeguard methodological rigor, ensure reproducibility, and foster trust in systematic reviews and their findings. Crucially, meta-analysis is not simply a process of aggregating numerical data – it requires statistical competence, strict adherence to predefined protocols, and a clear understanding of the implications of each analytical decision. In this case, fundamental statistical principles were neither fully understood nor appropriately applied, undermining both the credibility of the findings and adherence to the registered methodology. These shortcomings underscore the need for journals and peer reviewers to view PROSPERO registrations not as optional declarations, but as binding methodological commitments. Financial support and sponsorship Nil. Conflicts of interest There are no conflicts of interest.

Ensuring Methodological Rigor: The Essential Role of Protocol Adherence and Correct Statistical Interpretation

Key Points

Abstract

Cite This Study