Synthetic datasets are increasingly used in education research for methodological validation, privacy-preserving data sharing, and reproducible equity analysis; however, most generative approaches prioritize marginal distributional similarity without ensuring preservation of multilevel inferential properties. This limitation is consequential for repeated-measures data analyzed using intersectionality-focused hierarchical models, where conclusions depend on variance partitioning, partial pooling, and stratum-level heterogeneity. We introduce MAI-GAN, a hybrid generative framework that implements a structure–residual decomposition approach combining Bayesian longitudinal MAIHDA with conditional GAN-based residual generation. Inferential fidelity is operationalized with respect to multilevel intersectional models by explicitly targeting the preservation of fixed effects, variance components, and variance partitioning coefficients, while baseline composition is maintained via stratified bootstrap resampling. Applied to a six-semester undergraduate biology dataset (N = 2669 students), MAI-GAN was evaluated across multiple independent random seeds and consistently reproduced baseline-dependent residual structure and key inferential quantities. These results demonstrate that model-aligned generative strategies can produce synthetic longitudinal datasets that remain coherent under intersectionality-focused multilevel analysis, offering a principled foundation for equity-oriented synthetic data generation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Benjamin Hechtman
Ross H. Nehm
Wei Zhu
Stats
Stony Brook University
State University of New York
Building similarity graph...
Analyzing shared references across papers
Loading...
Hechtman et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69db36e64fe01fead37c4dbb — DOI: https://doi.org/10.3390/stats9020042