Non-probability Sampling for Finite Population
Jill A Dever & Richard Valliant
RTI International & Universities of Michigan and Maryland
AAPOR Webinar (October 18, 2016)
New Sources, New Problems
Webinar Goals
Understand the different types of non-probability samples
currently in use
Understand how non-probability samples can be affected by
errors such as coverage and nonresponse
Understand what methods of estimation can be used for
non-probability samples and the arguments used to justify them
Motivation for Non-probability Sampling
Low response rates for many probability samples (Kohut et al.
Ever increasing costs with ever decreasing funds
Nonsampling errors
The need for speed
Data are everywhere just waiting to be analyzed!!!
Examples of “New-ish” Sources of Data
Mechanical Turk
Pop-up Surveys
Data warehouses
Probabilistic matching of multiple sources
New Sources of Data: Example Studies
Analysis of medical records including text to predict heart disease (Giles &
Wilcox 2011)
Correlates of local climate & temperature with spread of infectious disease
(Global Pandemic Initiative)
MIT’s Billion Prices Project–Price indexes for 22 countries from web-scraped
Marketing of e-cigarettes (Kim et al. 2015)
Political polls and political issues (e.g., Clement 2016; Conway et al. 2015;
Dropp & Nyhan 2016)
Prediction of social stability (e.g., Kleinman 2014)
Public health events, outbreaks (Harris et al. 2014; Kim et al. 2012)
Research on subscribers to
Ad-hoc surveys via Amazon’s Mechanical Turk
Google flu and dengue fever trends (defunct)
Probability vs. Non-probability Samples
Probability sampling:
Presence of a sampling frame linked to population
Every unit has a known probability of being selected
Design-based theory focuses on random selection mechanism
Probability samples became touchstone in surveys after Neyman
(JRSS 1934) article
Non-probability sampling:
Investigator does not randomly pick units with KNOWN
No population sampling frame available/desired
Underlying population model is important
Differing opinions on reporting estimates of error
Probability vs. Non-probability Samples
We focus on surveys with the goal to use sample to make
estimates for
entire finite population—external validity
Many applications of big data analysis use non-probability
samples. Population may not be well defined.
Many probability surveys have such low RRs they basically are
non-probability samples
Pew Research response rates in typical telephone surveys dropped from 36%
in 1997 to 9% in 2012 (Kohut et al. 2012)
Recommendations for using non-probability samples:
AAPOR task force reports on non-probability samples (2013) & online
samples (2010)
Perils and potentials of self-selected entry (Keiding & Louis 2016)
Three Categories of Non-probability Samples
Convenience—units at hand selected; notion overlaps with
accidental, availability, opportunity, haphazard or unrestricted
Matched—units are drawn into study (panel) based on
characteristics, i.e., controlled selection
Network—a set of units form starting seeds, which sequentially
lead to additional units selected (aka snowball, respondent driven
(Note: Sirken network sampling is an exception)
Types of Convenience Samples
Volunteer sampling—recruitment at events (e.g. sports, music, etc.)
and other locations (e.g. mall intercept, street recruitment), limited (if
any) refusal conversion
River sampling—general or study-specific invitation through
banner/pop-up web ads, etc.
Mail-in surveys—type of volunteer sampling with paper-and-pencil
questionnaires, distributed as leaflets at public locations (e.g. hotels,
restaurants) or enclosed in magazines, journals, newspapers, etc.)
Tele-voting (text message)—type of volunteer sampling where people
are invited to express their vote by calling-in or by sending a text (TV
shows, contests)
Observational—“you get what you see”
Types of Matched Samples
Purposive sampling—selection follows some judgment or
arbitrary ideas of looking for a “representative” sample
Expert selection—subject experts pick the units, e.g., two most
typical settlements selected from a region
Quota sampling—sample “improved” by obtaining targeted
socio-demographic quotas (e.g. region, gender, age) to reflect
population distribution
Balanced sampling
Comments on Balanced Sampling
Samples selected until means or other quantities match the
population (Särndal et al. 2003)
Estimates are either unweighted (e.g., average) or via a model
Quota sampling is a subset and focuses only on observable
Shown to protect against misspecified inferential models
(Royall & Herson 1973; Valliant et al. 2000)
For probability-based balanced sampling
Survey weights are required (e.g., Horvitz-Thompson estimation)
Cube method randomly chooses from a set of balanced samples
(Deville & Tillé 2004)
Survey Errors
Selection bias
I Coverage and/or selection bias is a problem if the seen (sample) part of the
population differs from the unseen (nonsample) in such a way that the sample
cannot be projected to the full population
I (some unknown nonresponse for non-probability surveys)
Measurement error (e.g., satisficing—provide an acceptable
answer instead of the correct one)
Non-probability Electoral Polls: Many Failures
Early failure of a non-probability sample
1936 Literary Digest mail survey
2.3 million subscribers plus automobile and telephone owners
Predicted landslide win by Alf Landon over FDR
Excluded core lower-income supporters of FDR
More recent failures
British parliamentary election May 2015
Sturgis et al. (2016) is a post mortem
Israeli Knesset election March 2015
Scottish independence referendum, Sep 2014
US state of Michigan democratic primary, 2016
Non-probability Electoral Polls: One that Worked
Xbox gamers: 345,000 people surveyed in opt-in poll for 45 days
continuously before 2012 US presidential election
Xboxers much different from overall electorate:
18- to 29-year olds were 65% of dataset, compared to 19% in
national exit poll
93% male vs. 47% in electorate
Unadjusted data suggested landslide for Romney
Wang et al. (2015) used multilevel regression and
poststratification to get good estimates with covariates
sex, race, age, education, state, party ID, political ideology, and
who they voted for in the 2008 presidential election
Comparing Probability and Non-probability Samples
Mixed results
Kennedy et al. (2016)—compared 9 non-probability and 1
probability sample
Dutwin & Buskirk (2016)—some techniques show benefits (e.g.,
sample matching) but ....
Tourangeau et al. (2013)—examined wt adjustments for 8 opt-in
web panels using weight with mixed results
Yeager et al. (2011)—compared RDD and non-probability internet
survey with results varying by type of variable
Valliant & Dever (2011)—effective propensity scores are possible
with weighted reference survey cases
Universe & Sample
For example ...
= adult population
= adults without internet access
= adults with internet access
= adults with internet access who visit some webpage(s)
= adults who volunteer for a panel
16 / 43
Illustration of a Coverage Problem
Volunteer web panel surveyed about voting intentions
Support for 2 candidates differs by age group
Suppose the panel has no one in older groups
17 / 43
Inference Problem
Correcting for Sample Imbalance
Quota sampling or other type of controlled recruiting
(YouGov/Polymetrix); no weights needed
Weights to correct imbalance of sample compared to pop
Two approaches to weighting
Quasi-randomization weighting
Superpopulation modeling of s
Both involve modeling
Flavors of Missing Data
MCAR (Missing completely at random)
—Every unit has same probability of appearing in sample
MAR (Missing at random)
—Probability of appearing depends on covariates known for
sample and nonsample cases
NMAR (Not missing at random)
—Probability of appearing depends on covariates and s
Population Inference: Estimating a Total
Pop total
To estimate , predict 2nd, 3rd, and 4th sums
What if non-covered units are much different from covered?
Difference from a bad probability sample with a good frame but
low RR:
I No unit in or had any chance of appearing in the
20 / 43
Population Inference: Quasi-randomization Approach
Model probability of appearing in sample
Probabilities are sometimes estimated with special Reference
(probability) sample or an existing sample (ACS, NHIS, etc.)
21 / 43
Population Inference: Quasi-randomization Approach
Propensity score method:
Put and reference sample together
Estimate pseudo-inclusion probability via binary
Use as a weight
Model covariates:
demographic items
webographic (attitudinal) items
–mixed results (Schonlau et al. 2007; Lee et al. 2009))
covariates highly correlated with s (Lee 2006; Dever et al. 2015)
Population Inference: Quasi-randomization Approach
Binary regression to estimate propensity scores:
Code non-probability cases = 1; reference cases = 0
for non-probability sample cases
= survey weight for reference survey cases
Propensities estimate probability of being in non-prob sample
within whatever pop the reference weights to. Cases:
adult pop with internet access
adult pop regardless of internet access
Caveats—reference survey weighting must correct for any
coverage and nonresponse error
Poststratification, raking, or other calibration often applied after
getting pseudo-inclusion probabilities
Population Inference: Quasi-randomization Approach
Assumptions important for propensity score methods (Valliant & Dever
Surveys are disjoint (no respondent overlap)
Nonparticipants in non-probability survey are MAR
Large reference survey from target population
Identical key items on covariates in both questionnaires
Propensity scores:
have common support in reference and non-probability (distributions
estimated with reference survey weights
Population Inference: Superpopulation “Prediction”
Use a model to predict the value for each nonsample unit (Valliant
et al. 2000)
Linear model:
If this model holds, then
Note: Nonlinear models require individual s for nonsample units
25 / 43
Population Inference: Superpopulation (Prediction)
, where
is matrix of covariates for the sample units
is the -vector of sample s
Resulting weight:
where = vector of totals for nonsample units
Note: With this , weights do not depend on s
Similar structure to generalized regression estimation (GREG)
Methods of Inference Model for
s & Covariates
If is binary, a linear model is being used to predict a 0-1 variable
I Done routinely in surveys without thinking explicitly about a
Every may have a different model pick a set of s good for
many s
I Same thinking as done for GREG and other calibration
Undercoverage: use s associated with coverage
I Also done routinely in surveys
27 / 43
Modeling Considerations
Good modeling should consider how to predict s and how to
correct for coverage errors
Covariate selection: LASSO, CART, random forest, boosting,
other machine learning methods
Covariates: an extensive set of covariates needed
(Dever, Rafferty & Valliant 2008; Valliant & Dever 2011; Wang et al. 2015)
Model fit with sample needs to hold for nonsample (difficult
[impossible?] to prove)
Methods of Inference
Pros and Cons with Quasi-Randomization and
Pro = general weights for estimating any
Con = possible bias with respect to the superpopulation model for
Pro = model-specific estimators with lower variance than quasi-randomization
Con = possible bias with respect to the superpopulation model for
Notes: Model misspecification a worry for both
Bayesian variations available for each
See review paper by Elliott & Valliant (forthcoming)
Propensity classes: pclass function in R PracTools package (Valliant et al.
WTADJUST and WTADJX in SUDAAN 11 (Kott 2016; RTI 2012)
Custom-written software in SAS, Stata, R, etc.
Superpopulation modeling
calibrate function in R survey package (Lumley 2014)
ReGenesees in R (Zardetto 2015)
WTADJUST and WTADJX in SUDAAN 11 (Kott 2016; RTI 2012)
ipfraking in Stata (Kolenikov 2014)
sreweight in Stata (Pacifico 2014)
svycal in future version of Stata
Set weights to 1 in design-based calibration routines
Simulation Study: Set-up (Valliant & Dever 2011)
Data: 2003 Michigan Behavioral Risk Factor Surveillance Survey
2,845 sample persons bootstrapped to study
simulation runs with two samples:
Volunteer sample
Volunteers selected by Poisson sampling; (expected)
Logistic regression for volunteering; probabilities based on having
internet access
Volunteering probabilities generated with logistic regression with
covariates: age, race, gender, wireless phone, education, income
Reference sample—srswor of from non-volunteers
Simulation Comparisons
individual propensity weights (1: propensity wts)
average propensity weights in each of five subclasses (2: avg
propensity wts)
propensity-poststratified estimator (3: propensity PS)
calibration to population totals of covariates (no propensity
adjustment) using a regression estimator (4: calibration to X);
example of a prediction estimator
10,000 simulations with 500 in each volunteer & reference samples
Simulation Comparisons
Simulation Study: Statistical Results
Simulation Study: Key Findings
Reference survey weights need to be used to estimate
propensities of volunteering
Estimates with individual propensities or average propensity
weights within classes are biased with unweighted propensity
estimates, but less so with weighted
Propensity-poststratification poor with unweighted or weighted
propensity estimates
GREG and estimate with individual propensity weights generally
have smallest biases
If probability of volunteering depends on analysis variables, all
estimators are biased
Other Research
Desire to compare estimates from non-probability against “the
truth” leads researchers to contrast probability and non-probability
Quasi-randomization techniques do not always work
(e.g., Dever & Brown 2016; Willis et al. 2015; Rothschild & Goel 2014; Valliant &
Dever 2011; Yeager et al. 2011; Lee & Valliant 2009; Schonlau et al. 2009;
Rivers 2007; Duffy et al. 2005)
Limited comparisons with model-based estimation
Lingering concerns
Were right covariates available?
Were they used correctly—multiway interactions?
Poor modeling leads to biased estimators
Variance estimation
Treat pseudo-inclusion probabilities in same way as designed-based selection
Design-based variance estimators apply. Justification is consistency under
quasi-randomization distribution
Linearization or replication can be used
Replication shows most promise (Lee & Valliant 2009)
Need to decide whether strata and clusters are appropriate
Superpopulation modeling
Compute variance under model used for point estimates with variance based on
squared residuals
Replication estimators also justified (Valliant et al. 2000)
Bayesian models, e.g., credibility interval (Santos, Buskirk & Gelman 2012)
with(out) applying survey design effects
Justification is consistency under superpopulation model
Non-probability samples do not have the (false?) assurance of
complete population coverage that probability samples do
Inference to finite populations is possible but only with either
correct modeling of
Chance of being in sample, or
Dependence of analysis variables on covariates
Convincing users that a non-probability sample represents
nonsample part of population will always be an issue
(true for low RR probability samples, too)
Work needed on diagnostics for "representativity"
Are non-probability estimates aiming at desired target population?
Distance measure
= set of estimates from non-probability sample
= values from some reliable data source (ACS, NHIS, CPS,
census, etc.)
Compare to a chi-square distribution or
Validation items in are not used in non-probability weight
calculation; may not be of direct interest in the survey
The Future .....
Quasi-randomization—model pseudo-inclusion probabilities
Superpopulation models—model the s
Which is better???
The Future .....
Quasi-randomization—model pseudo-inclusion probabilities
Superpopulation models—model the s
Which is better???
