values were generated under a multivariate model, incorporating a binary variable that could represent either study-level or effect size-level characteristics. When the moderator referred to an effect size-level characteristic, its effect was allowed to vary across studies. Factors manipulated in the simulation included number of studies, number of outcomes per study, and the distribution of effect sizes across the categories of the moderator variable, ranging from balanced to highly unbalanced. The methods were applied and compared in terms of bias, Type I error, and power. The results showed that all methods exhibited lower power to detect effects when the moderator variable referred to study-level characteristics and the effect size distribution was very unbalanced. Methods based on RVE (correlated-effects with RVE or with RVE and CWB, and three-level models with RVE) effectively controlled Type I error rates but tended to be overconservative. In contrast, three-level models achieved higher power but at the cost of inflated Type I error. The best balance between Type I error control and power was observed when using a combination of three-level models and RVE.
Fernández-Castilla et al. (Thu,) studied this question.