YouGov Sampling Methodology
Sampling and Sample Matching
Sample matching is a methodology for selection of representative samples from non-randomly
selected pools of respondents. It is ideally suited for Web access panels, but could also be used
for other types of surveys, such as phone surveys. Sample matching starts with an enumeration
of the target population. For general population studies, the target population is all adults, and
can be enumerated through the use of the decennial Census or a high quality survey, such as
the American Community Survey. In other contexts, this is known as the sampling frame,
though, unlike conventional sampling, the sample is not drawn from the frame. Traditional
sampling, then, selects individuals from the sampling frame at random for participation in the
study. This may not be feasible or economical as the contact information, especially email
addresses, is not available for all individuals in the frame and refusals to participate increase
the costs of sampling in this way.
Sample selection using the matching methodology is a two-stage process. First, a random
sample is drawn from the target population. We call this sample the target sample. Details on
how the target sample is drawn are provided below, but the essential idea is that this sample is
a true probability sample and thus representative of the frame from which it was drawn.
Second, for each member of the target sample, we select one or more matching members from
our pool of opt-in respondents. This is called the matched sample. Matching is accomplished
using a large set of variables that are available in consumer and voter databases for both the
target population and the opt-in panel.
The purpose of matching is to find an available respondent who is as similar as possible to the
selected member of the target sample. The result is a sample of respondents who have the
same measured characteristics as the target sample. Under certain conditions, described
below, the matched sample will have similar properties to a true random sample. That is, the
matched sample mimics the characteristics of the target sample. It is, as far as we can tell,
“representative” of the target population (because it is similar to the target sample).
The Distance Function
When choosing the matched sample, it is necessary to find the closest matching respondent in
the panel of opt-ins to each member of the target sample. Various types of matching could be
employed: exact matching, propensity score matching, and proximity matching. Exact matching
is impossible if the set of characteristics used for matching is large and, even for a small set of
characteristics, requires a very large panel (to find an exact match). Propensity score matching
has the disadvantage of requiring estimation of the propensity score. Either a propensity score
needs to be estimated for each individual study, so the procedure is automatic, or a single
propensity score must be estimated for all studies. If large numbers of variables are used the
estimated propensity scores can become unstable and lead to poor samples.