What question did this study set out to answer?

The aim is to develop a framework for integrating complex survey designs into Bayesian Kernel Machine Regression (BKMR) analyses.

April 25, 2026Open Access

A Practical Framework for Incorporating Complex Survey Design in Bayesian Kernel Machine Regression

Read Full Paperexternally

Key Points

The aim is to develop a framework for integrating complex survey designs into Bayesian Kernel Machine Regression (BKMR) analyses.
Simulation of finite populations with correlated exposures under stratified two-stage cluster sampling.
Comparison of naïve BKMR and a design-aware workflow using sampling weights and existing software.
Evaluation of methods based on bias, interval width, and empirical 95% coverage.
Naïve BKMR shows bias and empirical 95% coverage drops to approximately 0–40% under informative sampling.
Design-aware workflow improves coverage to approximately 40–60%, moving results closer to nominal levels.
Further methodological improvements needed to achieve fully valid uncertainty quantification under complex survey designs.

Abstract

Large-scale population datasets are rarely generated via simple random sampling; instead, they reflect complex designs involving stratification, clustering, and unequal inclusion probabilities. While survey weights are provided to recover population-representative estimates, standard Bayesian Kernel Machine Regression (BKMR), a flexible nonlinear model for high-dimensional exposure mixtures, does not explicitly accommodate these design features. We present a simulation-based framework that evaluates performance under complex sampling by comparing two analytic strategies applied to identical survey-like data: (i) a naïve, unweighted BKMR implementation and (ii) a design-aware workflow that can be executed using existing software without modifying the BKMR algorithm itself. Finite populations are generated with correlated exposures and a known nonlinear data-generating function. Stratified two-stage cluster samples are then drawn under both non-informative and exposure-dependent (informative) selection mechanisms, with controlled intra-class correlation (ICC). The design-aware approach incorporates sampling weights through resampling of the dataset while preserving primary sampling unit structure, followed by standard BKMR fitting. Methods are evaluated using bias, interval width, and empirical 95% coverage relative to the known truth. Across simulation scenarios, naïve BKMR exhibits bias and systematic under-coverage under informative sampling, with empirical 95% coverage often dropping to approximately 0–40%, whereas the design-aware workflow improves coverage to approximately 40–60%, moving results closer to nominal levels. These findings provide a practical, implementation-ready strategy for integrating survey design considerations into BKMR analyses and delineate conditions under which accounting for sampling design affects inference. While the proposed approach improves inferential performance relative to naïve BKMR, it does not fully achieve nominal coverage, indicating that further methodological development is required for fully valid uncertainty quantification under complex survey designs.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Doreen Jehu-Appiah

Emmanuel Obeng-Gyasi

Journals

Stats

Actions

Institutions

North Carolina Agricultural and Technical State University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A Practical Framework for Incorporating Complex Survey Design in Bayesian Kernel Machine Regression

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study