SOCI208 Module 9 - Sampling Distribution of the Sample Mean

0.  Introduction

This module is beyond the shadow of a doubt the most important one in the course, because it introduces the crucial notion of the sampling distribution of an estimator.  It is on the notion of sampling distribution that the whole of statistical inference (confidence intervals, hypothesis testing) is based.  Without the concept of sampling distribution none of this makes durable sense (i.e., one can learn to do confidence intervals and test hypotheses by rote, just applying the formula, but without understanding the underlying principle -- which flows from the notion of sampling distribution of an estimator -- this knowledge does not stick; it fades away without traces over X-mas break.

1.  Point Estimation

1.  Definition

The process of estimating a population parameter by a sample statistic -- a single number derived from the sample -- is called point estimation.
The sample statistic is called an estimate of the population parameter.

Example: use the GSS 1998 to estimate the average prestige score of the occupations of adults in the U.S.

2.  Features of Point Estimation

Features of point estimation include The subtle status of S as a RV prior to sample selection and as a number after sample selection is captured in the following terminology

3.  Alternative Point Estimators

A population parameter may usually be estimated using alternative estimator; for example location can be estimated as The choice of estimator is made based on the properties of the sampling distribution of each estimator.

2.  Experimental Study of X.

1.  Setup of the Experiment

The population is defined as the 256 cases in the data set survey2.syd provided with the SYSTAT program.  The parameter to be estimated is the mean income of the respondents.
First the mean and standard deviation of income is calculated for the whole population of 256 cases, and the population distribution of income is plotted using the kernel density estimator.
Then 600 samples of n = 3 elements are drawn from the population and the mean calculated for each sample; the mean and the standard deviation of the 600 sample means are then calculated, and the distribution of the 600 means is plotted using the kernel density estimator.
Then the same procedure is carried out for 600 samples of n = 10 elements.
Then the same procedure is carried out for 600 samples of n = 100 elements.
Exhibit: SYSTAT Program for Income Sampling Experiment - 600 Samples With n=3, 10, 100 [m9002.txt]
Exhibit: SYSTAT Output for Income Sampling Experiment - 600 Samples With n=3, 10, 100 [m9001.htm]

2.  Sampling Distribution of X.

The results of the income sampling experiment are summarized in Table 1 and the four kernel density plots that follow.
 
Table 1.  Income Sampling Experiment
Data
Mean
Standard Deviation
Population
22.172
15.635
600 samples, n=3
22.584
9.376
600 samples, n=10
21.955
4.916
600 samples, n=100
22.176
1.193
Income - Population Distribution
Mean Income - 600 Samples With n = 3
Mean Income - 600 Samples With n = 10
Mean Income - 600 Samples With n = 100

3.  Empirical Conclusions

The following conjectures are suggested by the experimental results
  1. the distribution of values of X. for simple random sampling is centered around the population mean, regardless of sample size
  2. the standard deviation of the values of X. decreases with increasing sample size; that is, the distribution of X. values becomes more concentrated around the population mean as the sample size becomes larger
  3. the distribution of X. values becomes more symmetrical as the sample size becomes larger and is approximately normal for large sample sizes

3.  Theoretical Results About X.

The theoretical results apply to simple random sampling from

1.  Sampling Distribution of X.

The probability distribution associated with X. (in advance of sampling) is called the sampling distribution of X..

2.  Expected Value of X.

The expected value of (the sampling distribution of) X. is equal to the mean of the population, i.e.
E{X.} = m
Q - Can you derive this from formulas in Module 5?

3.  Variance of X.

The variance of (the sampling distribution of) X. and the standard deviation of (the sampling distribution of) X. are
s2{X.} = s2/n
s{X.} = s/(n)1/2
where s2 refers to the population variance of X.
The standard deviation of the sampling distribution of X.s{X.} is also called the standard error of the mean.

Q - Can you derive these formulas from those in Module 5?
 
 
Derivation of E{X.} = m and s2{X.} = s2/n    
1.  Derivation of E{X.} = m
E{X.} = E{(X1 + X2 + ... + Xn)/n}
= (1/n)E{X1 + X2 + ... + Xn}
=(1/n)(E{X1} + E{X2} + ... + E{Xn})
=(1/n)nm = m
2.  Derivation of s2{X.} = s2/n
The derivation is carried out in the case of a simple random sample from an infinite population; then observations are i.i.d. so that
s2{X.} = s2{(X1 + X2 + ... + Xn)/n}
= (1/n2)s2{X1 + X2 + ... + Xn }
= (1/n2)(s2{X1} + s2{X2} + ... + s2{Xn})
= (1/n2)ns2 = s2/n     

Table 2 repeats Table 1 with several columns added containing the expected value and variance of X..
 
Table 2.  Income Sampling Experiment Results and Theoretical Values Compared
Data
Mean
Standard Deviation
E{X.} = m
s{X.} = s/(n)1/2
(1 - n/N)1/2
See note (1)
sc{X.
See note (2) 
Population
22.172
15.635
--
--
--
--
600 samples, n=3
22.584
9.376
22.172
9.027
0.994
8.974
600 samples, n=10
21.955
4.916
22.172
4.944
0.980
4.846
600 samples, n=100
22.176
1.193
22.172
1.564
0.781
1.221

NOTES:

(1)  (1 - n/N)1/2 is the finite population correction factor for the standard error (see Subsection 5 below) to be used when the sampling fraction n/N is more than 5%
(2)  sc{X.} = (s/(n)1/2)(1 - n/N)1/2 is the standard error with finite population correction for N = 256 and n = 3, 10, or 100 (see Subsection 5 below)
1.  Effect of Sample Size
The standard error (i.e. the standard deviation of the sampling distribution of X.) decreases in inverse proportion to the square root of the sample size.
2.  Effect of Population Variability
For any given sample size, the greater the population variability, the greater the standard error.

4.  The Central Limit Theorem

Exhibit:  Laplace's 1810 statement of the central limit theorem  (Stigler 1986 F4.2 p. 144) [m9013.gif]
Exhibit:  Gauss's 1809 derivation of the normal density  (Stigler 1986 F4.1 p. 142) [m9014.gif]
The "Central Limit Theorem" expresses as a theorem a very important and pervasive natural phenomenon.
Central Limit Theorem (CLT).  For almost all populations, the sampling distribution of X. is approximately normal when the size of the simple random sample is sufficiently large.
NOTES:

5.  Sampling Finite Populations

The sampling fraction is the ratio n/N of the sample size n to the population size N.

When the sampling fraction is more than 5 percent, the formula for the standard error gives a value that is pessimistic (too large).  A more accurate (smaller) estimate of the standard error is obtained using the finite population correction.

The the estimated variance of X. and the estimated standard error with finite population correction are

s2c{X.} = (s2/n)(1 - n/N)
sc{X.} = (s/n1/2)(1 - n/N)1/2
The last two columns of Table 2 show the finite population correction factor and the corrected standard errors for sample sizes n = 3, 10, 100.  See how the finite population correction takes care of the apparent discrepancy between theoretical (1.564) and empirical (1.193) standard errors for n = 100: the corrected theoretical standard error is 1.221, much closer to the empirical value.  n = 100 represents a substantial sample size for this population with N = 256.

4.  Normal Approximation of the Sampling Distribution of X.

We now know that, when the CLT applies, X. is distributed normally with mean m and standard deviation s/n1/2, i.e., X. ~ N(m, s2/n).
What use is this knowledge concerning the sampling distribution of X.?
A principal use is to estimate an interval around the population mean in which a certain percentage (say 95%) of the sample means are expected to fall; this interval gives us a measure of the precision of the estimated X.
Example:  Consider the income sampling experiment with n = 10, assuming the the CLT applies even though n is only 10.  (This is called living dangerously!)  In this case (because we have the population) we know that E{X.} = 22.172 and s{X.} = 4.944.
The question is: in what interval around the mean will 95% of the estimates X. fall?  The solution can be found, using the assumption that the sampling distribution is normal, in two steps.

Step 1: for Z ~ N(0,1) the value of Z such that P(|Z|<=z) = 0.95 is given by (SYSTAT)
>calc xif(0.975)
        1.96
so that the interval [-1.96, +1.96] contains 95% of the probability on the Z scale.

Step 2: convert the Z-scale interval [-1.96, +1.96] back to the X scale using the formula X = 22.172 + (z)4.944 as

[22.172 - (1.96)(4.944), 22.172 + (1.96)(4.944)]
= [12.482, 31.862]
Thus the interval [12.482, 31.862] is expected to contain 95% of the estimates of X..
We can compare this interval with the actual numbers from the 600 replications of the sampling experiment with n = 10.  Out of the 600 estimates of the mean 573 (or 95.5%) fall in the interval [12.482, 31.862].  That's amazingly good performance for the normal approximation, given that n is only 10.

A similar logic will be used to derive confidence intervals for the sample mean in Module 10.  (The only difference there will be that we do not know the population mean and variance, so we must estimate these quantities from the sample.)

5.  Optional - Exact Sampling Distribution of X.

One easy result is that, when the population is normal, the sampling distribution of X. is normal for any sample size n.
See NWW 10.5 pp. 271-272.

6.  Criteria for Choosing Point Estimators

A "good" estimator is one that gives estimates close to the value of the population parameter that is being estimated.
Criteria capture desirable properties of estimators.  These criteria are useful for any kind of estimators and will be used extensively in more advanced statistics courses.

1.  Unbiasedness

  • An estimator S of a parameter q is unbiased if
  • E{S} = q
    i.e., the mean of the sampling distribution of S is equal to the population parameter q to be estimated.
  • If an estimator S is biased the amount of bias is
  • Bias = E{S} - q
    Exhibit: (NWW Figure 10.3 p. 273) [m9007.gif]

    2.  Efficiency

    When two estimators are unbiased, the one with the smaller standard error is preferable and is called more efficient.
    Exhibit: (NWW Figure 10.4 p. 275) [m9008.gif]

    3.  Consistency

  • S is a consistent estimator of a population parameter q if for any small positive value e (Greek epsilon)
  • limn->8P(|S - q| < e) = 1
    i.e., (in words) as the sample size increases the sampling distribution of S becomes increasingly concentrated around the value of the population parameter q.


    NOTE: a biased estimator can be consistent if the amount of bias decreases as n increases.

    Exhibit: (NWW Figure 10.5 p. 275) [m9009.gif]

    4.  Mean Squared Error

    The mean squared error criterion combines the criteria of unbiasedness and efficiency; it is useful for comparing estimators, at least one of which is biased. The mean squared error criterion is useful when one compares an unbiased estimator with large variance with a slightly biased estimator with smaller variance.  An instance of this situation arises in the context of a multiple regression model with highly correlated (collinear) independent variable.  In that case the ridge estimator, which is biased but has smaller variance, may be preferred to the standard (OLS) regression estimator.  This is discussed in SOCI209.
    Exhibit: (NKNW Figure 10.2 p. 411) [m9010.gif]

    7.  Optional - Maximum Likelihood Estimation

    See NWW Section 10.7 pp. 277-281.
    Exhibit: (NWW Figure 10.6 p. 278) [m9011.gif]
    Exhibit: (NWW Figure 10.7 p. 280) [m9012.gif]




    Last modified 6 Oct 2002