Data integration-the analysis of two or more observational datasets in a single statistical model-is on the rise in species distribution modelling. Recent papers showcase the usefulness of data integration, but few highlight cases where data integration produces equal or worse outcomes compared to single-dataset modelling. Here, we offer a decision-making framework to assess whether data integration may provide improvements over simpler modelling approaches. We focus on joint likelihood data integration, in which two or more datasets are linked to a single shared process model. We highlight three considerations for analysts deciding whether to use data integration: (1) the practical costs associated with developing and validating an integrated model; (2) the marginal benefits to model performance, which vary depending on data volume and coverage; and (3) the concordance (or compatibility) of the two datasets. Using a simulation study, we illustrate modelling outcomes under a variety of conditions of data volume and bias, showing consistent patterns across three distinct formulations of joint likelihood models. We explore a priori and a posteriori tests of data concordance, but we find that such tests fail to usefully differentiate between cases where joint modelling produces better or worse outcomes. Ultimately, we outline a decision-making workflow and illustrate its application to the joint modelling of real data.
Building similarity graph...
Analyzing shared references across papers
Loading...
Goldstein et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69a75bb7c6e9836116a23911 — DOI: https://doi.org/10.1111/1365-2656.70210
Benjamin R. Goldstein
Jeffrey Doser
Brent S. Pease
Journal of Animal Ecology
North Carolina State University
Southern Illinois University Carbondale
North Carolina Museum of Natural Sciences
Building similarity graph...
Analyzing shared references across papers
Loading...