SOCI208 Module 9 - Sampling Distribution of the Sample Mean
0. Introduction
This module is beyond the shadow of a doubt the most important one in the
course, because it introduces the crucial notion of the sampling
distribution of an estimator. It is on the notion of sampling
distribution that the whole of statistical inference (confidence intervals,
hypothesis testing) is based. Without the concept of sampling distribution
none of this makes durable sense (i.e., one can learn to do confidence
intervals and test hypotheses by rote, just applying the formula, but without
understanding the underlying principle -- which flows from the notion of
sampling distribution of an estimator -- this knowledge does not stick;
it fades away without traces over X-mas break.
1. Point Estimation
1. Definition
The process of estimating a population parameter by a sample statistic
-- a single number derived from the sample -- is called point estimation.
The sample statistic is called an estimate of the population
parameter.
Example: use the GSS 1998 to estimate the average prestige score of
the occupations of adults in the U.S.
2. Features of Point Estimation
Features of point estimation include
-
an unknown population parameter is to be estimated, denoted q
(Greek theta)
-
a sample statistic denoted S is calculated from a sample of n observations
X1, X2, ..., Xn as an estimate
of the parameter q
-
prior to the selection of the actual sample, the sample observations X1,
X2, ..., Xn are RVs, and thus the statistic S (since
it calculated from the observations) is also a RV; the probability distribution
of a sample statistic S is called the sampling distribution
of S
The subtle status of S as a RV prior to sample selection and as a number
after sample selection is captured in the following terminology
-
prior to sample selection a sample statistic is a RV and is called a point
estimator of a population parameter
-
after sample selection, a sample statistic is a number and is called a
point
estimate of the population parameter
3. Alternative Point Estimators
A population parameter may usually be estimated using alternative estimator;
for example location can be estimated as
-
the mean X.
-
the median Md
-
the sample midrange (Xmin + Xmax)/2
The choice of estimator is made based on the properties of the sampling
distribution of each estimator.
2. Experimental Study of X.
1. Setup of the Experiment
The population is defined as the 256 cases in the data set survey2.syd
provided with the SYSTAT program. The parameter to be estimated is
the mean income of the respondents.
First the mean and standard deviation of income is calculated for the
whole population of 256 cases, and the population distribution of income
is plotted using the kernel density estimator.
Then 600 samples of n = 3 elements are drawn from the population and
the mean calculated for each sample; the mean and the standard deviation
of the 600 sample means are then calculated, and the distribution of the
600 means is plotted using the kernel density estimator.
Then the same procedure is carried out for 600 samples of n = 10 elements.
Then the same procedure is carried out for 600 samples of n = 100 elements.
Exhibit: SYSTAT Program for Income Sampling
Experiment - 600 Samples With n=3, 10, 100 [m9002.txt]
Exhibit: SYSTAT Output for Income Sampling Experiment
- 600 Samples With n=3, 10, 100 [m9001.htm]
2. Sampling Distribution of X.
The results of the income sampling experiment are summarized in Table 1
and the four kernel density plots that follow.
Table 1. Income Sampling Experiment
| Data |
Mean
|
Standard Deviation
|
| Population |
22.172
|
15.635
|
| 600 samples, n=3 |
22.584
|
9.376
|
| 600 samples, n=10 |
21.955
|
4.916
|
| 600 samples, n=100 |
22.176
|
1.193
|
Income - Population Distribution
Mean Income - 600 Samples With n = 3
Mean Income - 600 Samples With n = 10
Mean Income - 600 Samples With n = 100
3. Empirical Conclusions
The following conjectures are suggested by the experimental results
-
the distribution of values of X. for simple random sampling
is centered around the population mean, regardless of sample size
-
the standard deviation of the values of X. decreases with increasing
sample size; that is, the distribution of X. values becomes
more concentrated around the population mean as the sample size becomes
larger
-
the distribution of X. values becomes more symmetrical as the
sample size becomes larger and is approximately normal for large sample
sizes
3. Theoretical Results About X.
The theoretical results apply to simple random sampling from
-
infinite populations
-
finite populations whenever the sample size n is small relative to the
population size N; (specifically whenever the sampling fraction n/N
is less than 5%; what happens when the sample size is large relative to
N is not exactly a "problem" and is discussed in Section 5 - Sampling Finite
Populations)
1. Sampling Distribution of X.
The probability distribution associated with X. (in advance
of sampling) is called the sampling distribution of X..
2. Expected Value of X.
The expected value of (the sampling distribution of) X. is equal
to the mean of the population, i.e.
E{X.} = m
Q - Can you derive this from formulas in Module 5?
3. Variance of X.
The variance of (the sampling distribution of) X. and the standard
deviation of (the sampling distribution of) X. are
s2{X.} = s2/n
s{X.} = s/(n)1/2
where s2 refers to the population
variance of X.
The standard deviation of the sampling distribution of X.s{X.}
is also called the standard error of the mean.
Q - Can you derive these formulas from those in Module 5?
Derivation of E{X.} = m
and s2{X.} = s2/n
1. Derivation of E{X.} = m
E{X.} = E{(X1 + X2 + ... + Xn)/n}
= (1/n)E{X1 + X2 + ... + Xn}
=(1/n)(E{X1} + E{X2} + ... + E{Xn})
=(1/n)nm = m
|
2. Derivation of s2{X.}
= s2/n
The derivation is carried out in the case of a simple random sample
from an infinite population; then observations are i.i.d. so that
s2{X.} = s2{(X1
+ X2 + ... + Xn)/n}
= (1/n2)s2{X1
+ X2 + ... + Xn }
= (1/n2)(s2{X1}
+ s2{X2} + ... + s2{Xn})
= (1/n2)ns2 = s2/n
|
Table 2 repeats Table 1 with several columns added containing the expected
value and variance of X..
Table 2. Income Sampling Experiment Results and Theoretical
Values Compared
| Data |
Mean
|
Standard Deviation
|
E{X.} = m
|
s{X.} = s/(n)1/2
|
(1 - n/N)1/2
See note (1)
|
sc{X.}
See note (2)
|
| Population |
22.172
|
15.635
|
--
|
--
|
--
|
--
|
| 600 samples, n=3 |
22.584
|
9.376
|
22.172
|
9.027
|
0.994
|
8.974
|
| 600 samples, n=10 |
21.955
|
4.916
|
22.172
|
4.944
|
0.980
|
4.846
|
| 600 samples, n=100 |
22.176
|
1.193
|
22.172
|
1.564
|
0.781
|
1.221
|
NOTES:
(1) (1 - n/N)1/2 is the finite population
correction factor for the standard error (see Subsection 5 below) to
be used when the sampling fraction n/N is more than 5%
(2) sc{X.} =
(s/(n)1/2)(1 - n/N)1/2
is the standard error with finite population correction for N = 256 and
n = 3, 10, or 100 (see Subsection 5 below)
1. Effect of Sample Size
The standard error (i.e. the standard deviation of the sampling distribution
of X.) decreases in inverse proportion to the square root of
the sample size.
2. Effect of Population Variability
For any given sample size, the greater the population variability, the
greater the standard error.
4. The Central Limit Theorem
Exhibit: Laplace's 1810 statement
of the central limit theorem (Stigler 1986 F4.2 p. 144) [m9013.gif]
Exhibit: Gauss's 1809 derivation of the normal
density (Stigler 1986 F4.1 p. 142) [m9014.gif]
The "Central Limit Theorem" expresses as a theorem a very important and
pervasive natural phenomenon.
Central Limit Theorem (CLT). For almost all populations,
the sampling distribution of X. is approximately normal when
the size of the simple random sample is sufficiently large.
NOTES:
-
Q - What is the qualification "almost all populations" about? A -
All the CLT requires is that the population standard deviation s
be finite.
-
Q - What does "sufficiently large" mean? A - The sample size required
for the CLT to apply depends on the skewness of the population distribution:
more skewed requires larger sample; symmetrical distributions require smaller
n.
-
Q - When all is said and done, what does "sufficiently large" mean?
A - 30.
5. Sampling Finite Populations
The sampling fraction is the ratio n/N of the sample size
n to the population size N.
When the sampling fraction is more than 5 percent, the formula for the
standard error gives a value that is pessimistic (too large). A more
accurate (smaller) estimate of the standard error is obtained using the
finite population correction.
The the estimated variance of X. and the estimated standard
error with finite population correction are
s2c{X.} = (s2/n)(1
- n/N)
sc{X.} = (s/n1/2)(1 - n/N)1/2
The last two columns of Table 2 show the finite population correction factor
and the corrected standard errors for sample sizes n = 3, 10, 100.
See how the finite population correction takes care of the apparent discrepancy
between theoretical (1.564) and empirical (1.193) standard errors for n
= 100: the corrected theoretical standard error is 1.221, much closer to
the empirical value. n = 100 represents a substantial sample size
for this population with N = 256.
-
Q - So what kind of "problem" is that, when the sample size is large relative
to population size? A - It is no problem at all since the estimate
of the mean is actually more precise (the variance of the sampling distribution
is less) than if the sample represented only a small fraction of the population!
-
Q - What happens when n = N? A - You figure it out!
4. Normal Approximation of the Sampling Distribution of X.
We now know that, when the CLT applies, X. is distributed normally
with mean m and standard deviation s/n1/2,
i.e., X. ~ N(m, s2/n).
What use is this knowledge concerning the sampling distribution of
X.?
A principal use is to estimate an interval around the population mean
in which a certain percentage (say 95%) of the sample means are expected
to fall; this interval gives us a measure of the precision of the estimated
X.
Example: Consider the income sampling experiment with n = 10,
assuming the the CLT applies even though n is only 10. (This is called
living dangerously!) In this case (because we have the population)
we know that E{X.} = 22.172 and s{X.}
= 4.944.
The question is: in what interval around the mean will 95% of the estimates
X. fall? The solution can be found, using the assumption
that the sampling distribution is normal, in two steps.
Step 1: for Z ~ N(0,1) the value of Z such that P(|Z|<=z)
= 0.95 is given by (SYSTAT)
>calc xif(0.975)
1.96
so that the interval [-1.96, +1.96] contains 95% of the probability
on the Z scale.
Step 2: convert the Z-scale interval [-1.96, +1.96] back to the
X scale using the formula X = 22.172 + (z)4.944 as
[22.172 - (1.96)(4.944), 22.172 + (1.96)(4.944)]
= [12.482, 31.862]
Thus the interval [12.482, 31.862] is expected to contain 95% of the estimates
of X..
We can compare this interval with the actual numbers from the 600 replications
of the sampling experiment with n = 10. Out of the 600 estimates
of the mean 573 (or 95.5%) fall in the interval [12.482, 31.862].
That's amazingly good performance for the normal approximation, given that
n is only 10.
A similar logic will be used to derive confidence intervals for the
sample mean in Module 10. (The only difference there will be that
we do not know the population mean and variance, so we must estimate these
quantities from the sample.)
5. Optional - Exact Sampling Distribution of X.
One easy result is that, when the population is normal, the sampling distribution
of X. is normal for any sample size n.
See NWW 10.5 pp. 271-272.
6. Criteria for Choosing Point Estimators
A "good" estimator is one that gives estimates close to the value of the
population parameter that is being estimated.
Criteria capture desirable properties of estimators. These criteria
are useful for any kind of estimators and will be used extensively in more
advanced statistics courses.
1. Unbiasedness
An estimator S of a parameter q is unbiased
if
E{S} = q
i.e., the mean of the sampling distribution of S is equal to the population
parameter q to be estimated.
If an estimator S is biased the amount of bias is
Bias = E{S} - q
Exhibit: (NWW Figure 10.3 p. 273) [m9007.gif]
2. Efficiency
When two estimators are unbiased, the one with the smaller standard error
is preferable and is called more efficient.
-
An estimator S1 is relatively more efficient than
an alternative estimator S2 in estimating q
if
s2{S1} < s2{S2}
and E{S1} = E{S2} = q
i.e., the variance of S1 is less than the variance of S2
and both S1 and S2 are unbiased.
Exhibit: (NWW Figure 10.4 p. 275) [m9008.gif]
3. Consistency
S is a consistent estimator of a population parameter q
if for any small positive value e (Greek epsilon)
limn->8P(|S - q| <
e)
= 1
i.e., (in words) as the sample size increases the sampling distribution
of S becomes increasingly concentrated around the value of the population
parameter q.
NOTE: a biased estimator can be consistent if the amount of bias
decreases as n increases.
Exhibit: (NWW Figure 10.5 p. 275) [m9009.gif]
4. Mean Squared Error
The mean squared error criterion combines the criteria of unbiasedness
and efficiency; it is useful for comparing estimators, at least one of
which is biased.
-
The mean squared error of an estimator S of a population
parameter q is
Mean squared error = s2{S} +
(E{S} - q)2
i.e., the mean squared error is equal to the variance of S plus the bias
of S squared.
The mean squared error criterion is useful when one compares an unbiased
estimator with large variance with a slightly biased estimator with smaller
variance. An instance of this situation arises in the context of
a multiple regression model with highly correlated (collinear) independent
variable. In that case the ridge estimator, which is biased
but has smaller variance, may be preferred to the standard (OLS) regression
estimator. This is discussed in SOCI209.
Exhibit: (NKNW Figure 10.2 p. 411) [m9010.gif]
7. Optional - Maximum Likelihood Estimation
See NWW Section 10.7 pp. 277-281.
Exhibit: (NWW Figure 10.6 p. 278) [m9011.gif]
Exhibit: (NWW Figure 10.7 p. 280) [m9012.gif]
Last modified 6 Oct 2002