Absence of differential item functioning (DIF) is an important piece of evidence to support inferences based on group comparisons of test results. We illustrate how to tailor the DIF identification process following the guiding questions proposed by Sireci and Rios for large scale assessment’s (LSA) specific characteristics by examining DIF between two test forms of the Mathematics test of a Colombian LSA (SABER 11). We investigate the performance of the non-compensatory DIF (NCDIF) index and the Mantel–Haenszel (MH) DIF procedure under large sample sizes and sample size ratios (up to 1:25), and the performance of effect size guidelines under these conditions. These simulations were needed to adequately address the guiding questions for DIF analyses of SABER 11. DIF analyses of SABER 11 test forms were conducted in light of the results of these simulations. Type I error is affected, for both procedures, by both the sample size and sample size ratio, as well as by the magnitude of impact between the groups. The joint use of the effect size guidelines helps mitigate this issue without much loss of power given the large sample sizes involved. The DIF analyses of the Mathematics test forms of SABER 11 provide robust evidence that the inferences derived from score comparisons are fair. Beyond the immediate implications for the use of SABER 11 tests, the presented case study may help guide practitioners in the assessment of DIF by illustrating how to perform several of the steps involved. Moreover, the simulation studies shed new insights into the frequentist behavior of the two DIF indices under conditions that had not been previously explored but which are applicable to many LSA. Additionally, the results indicate that simulation studies examining the performance of NCDIF, MH, and possibly any DIF statistic, should implement realistic item parameter pools and not only sanitized well-distributed sets of item parameters.
Building similarity graph...
Analyzing shared references across papers
Loading...
John Alexander Calderón
Nelson Andrés Rodríguez
Víctor H. Cervantes
Large-scale Assessments in Education
University of Illinois Urbana-Champaign
Fundación para la Educación y el Desarrollo Social
Building similarity graph...
Analyzing shared references across papers
Loading...
Calderón et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69df2a99e4eeef8a2a6af912 — DOI: https://doi.org/10.1186/s40536-026-00294-x