Northwest Fisheries Science Center

Display All Information

Document Type: Journal Article
Center: NWFSC
Document ID: 8222
Title: Using Inverse Probability Bootstrap Sampling to Eliminate Sample Induced Bias in Model Based Analysis of Unequal Probability Samples
Author: Matthew Nahorniak, D. P. Larsen, Carol Volk, Chris E. Jordan
Publication Year: 2015
Journal: PLoS ONE
Volume: 10
Issue: 6
DOI: 10.1371/journal.pone.0131765
Keywords: sampling design,weights,model based inference,
Abstract:

In ecology, as in other research fields, there is growing awareness that a disturbingly high proportion of reported results are irreproducible, often as a result of unrecognized bias in the design and analysis of research studies.  In ecological research, a problematic and widespread source of bias is the improper analysis of data from unequal probability sampling, which is used extensively in ecological studies and monitoring programs.  While design based analysis tools can be utilized to properly account for unequal sample inclusion probabilities, exploration of complex ecological relationships may require model based statistical analysis tools dependent on a critical assumption of uniform sample inclusion probabilities.  Model based analyses that ignore sample inclusion probabilities are highly susceptible to bias.  Unfortunately, most model based tools do not enable incorporation of sample inclusion probabilities into the analysis.  To address this, we leveraged bootstrap techniques developed for complex survey designs, and suggest inverse probability bootstrapping (IPB) for obtaining equal probability re-samples from a probability sample, from which unbiased model based estimates can be made.  We demonstrated the potential for bias in model-based analyses that ignore sample inclusion probabilities, and the effectiveness of IPB sampling in eliminating this bias, using both simulated and actual ecological data.  For illustration, we considered three model based analysis tools - linear regression, quantile regression, and cluster analysis.  In all models, using both simulated and actual ecological data, we found inferences to be biased, sometimes severely, when sample inclusion probabilities were ignored, while IPB sampling effectively produced unbiased parameter estimates.