2
will be a family of possible conditional distributions corresponding to the different possible
values of θ ∈ Θ. However, it may happen that for each possible value of t, the conditional
joint distribution of X
1
, · · · , X
n
given that T = t is the same for all the values of θ ∈ Θ
and therefore does not actually depend on the value of θ. In this case, we say that T is a
sufficient statistic for the parameter θ.
Formally, a statistic T (X
1
, · · · , X
n
) is said to be sufficient for θ if the conditional distribution
of X
1
, · · · , X
n
, given T = t, does not depend on θ for any value of t.
In other words, given the value of T , we can gain no more knowledge about θ from knowing
more about the probability distribution of X
1
, · · · , X
n
. We could envision keeping only T
and throwing away all the X
i
without losing any information!
The concept of sufficiency arises as an attempt to answer the following question: Is there
a statistic, i.e. a function T (X
1
, · · · , X
n
), that contains all the information in the sample
about θ? If so, a reduction or compression of the original data to this statistic without loss
of information is possible. For example, consider a sequence of independent Bernoulli trials
with unknown probability of success, θ. We may have the intuitive feeling that the total
number of successes contains all the information about θ that is in the sample, that the
order in which the successes occurred, for example, does not give any additional information
about θ.
Example 1: Let X
1
, · · · , X
n
be a sequence of independent bernoulli trials with P (X
i
=
1) = θ. We will verify that T =
P
n
i=1
X
i
is sufficient for θ.
Proof: We have
P (X
1
= x
1
, · · · , X
n
= x
n
|T = t) =
P (X
1
= x
1
, · · · , X
n
= x
n
)
P (T = t)
Bearing in mind that the X
i
can take on only the values 0s or 1s, the probability in the
numerator is the probability that some particular set of t X
i
are equal to 1s and the other
n − t are 0s. Since the X
i
are independent, the probability of this is θ
t
(1 − θ)
n−t
. To find
the denominator, note that the distribution of T , the total number of ones, is binomial with
n trials and probability of success θ. Therefore the ratio in the above equation is
θ
t
(1 − θ)
n−t
³
n
t
´
θ
t
(1 − θ)
n−t
=
1
³
n
t
´
The conditional distribution thus does not involve θ at all. Given the total number of ones,
the probability that they occur on any particular set of t trials is the same for any value of
θ so that set of trials contains no additional information about θ.