Large-scale population datasets are rarely generated via simple random sampling; instead, they reflect complex designs involving stratification, clustering, and unequal inclusion probabilities. While survey weights are provided to recover population-representative estimates, standard Bayesian Kernel Machine Regression (BKMR), a flexible nonlinear model for high-dimensional exposure mixtures, does not explicitly accommodate these design features. We present a simulation-based framework that evaluates performance under complex sampling by comparing two analytic strategies applied to identical survey-like data: (i) a naïve, unweighted BKMR implementation and (ii) a design-aware workflow that can be executed using existing software without modifying the BKMR algorithm itself. Finite populations are generated with correlated exposures and a known nonlinear data-generating function. Stratified two-stage cluster samples are then drawn under both non-informative and exposure-dependent (informative) selection mechanisms, with controlled intra-class correlation (ICC). The design-aware approach incorporates sampling weights through resampling of the dataset while preserving primary sampling unit structure, followed by standard BKMR fitting. Methods are evaluated using bias, interval width, and empirical 95% coverage relative to the known truth. Across simulation scenarios, naïve BKMR exhibits bias and systematic under-coverage under informative sampling, with empirical 95% coverage often dropping to approximately 0–40%, whereas the design-aware workflow improves coverage to approximately 40–60%, moving results closer to nominal levels. These findings provide a practical, implementation-ready strategy for integrating survey design considerations into BKMR analyses and delineate conditions under which accounting for sampling design affects inference. While the proposed approach improves inferential performance relative to naïve BKMR, it does not fully achieve nominal coverage, indicating that further methodological development is required for fully valid uncertainty quantification under complex survey designs.
Building similarity graph...
Analyzing shared references across papers
Loading...
Doreen Jehu-Appiah
Emmanuel Obeng-Gyasi
Stats
North Carolina Agricultural and Technical State University
Building similarity graph...
Analyzing shared references across papers
Loading...
Jehu-Appiah et al. (Thu,) studied this question.
synapsesocial.com/papers/69ec5b8a88ba6daa22dad0cb — DOI: https://doi.org/10.3390/stats9030046