SOCI208 Module 8 - Statistical Sampling

An important new distinction is that data sets can be
• considered a population, and/or
• considered a sample
Example: the 1998 General Social Survey is a sample of the adult population in the U.S.; but for purposes of sampling experiments the 1998 GSS data set can be treated as the population, from which one draws a random sample of size n = 100, say.  (So you might truthfully say that the population/sample distinction is socially constructed!!!)

1.  Populations

A population (or universe) is the total set of elements of interest for a given study.

1.  Finite Populations

Example:
• active graduate students in the sociology department

2.  Infinite Populations

An infinite population usually has elements that consist of all the outcomes of a process if the process were to operate indefinitely under the same conditions.  The infinite population is represented by an RV with its associated probability distribution.
Example:
• all items produced by a manufacturing process - interest on proportion of defective items
• all Gala apples - interest on average weight
• all earthquakes in California - interest on magnitude distribution

2.  Censuses and Samples

1.  Census

A census is a study of a finite population that includes every element of the population.
Example:
• look at all the titles in the fiction section of Barnes and Nobles and note the sex of the first author
A census is not possible with an infinite population.

2.  Sample

A sample is a part of the population selected so that inferences can be drawn from it about the population.
The process of designing and executing s study based on a sample is called a sample survey.
NOTE: "survey" does not necessarily imply the use of a questionnaire.

Example

• look at the titles on every other shelf in the fiction section of Barnes and Nobles and note the sex of the first author

3.  Reasons for Sampling

Reasons for using sampling rather than a census with a finite population are
• lower cost
• greater timeliness
• greater accuracy
• more detailed information
• sampling must be used when testing is destructive (EX: dating the Shroud of Turin or the Dead Sea Scrolls with carbon dating techniques)

4.  Sampling and Nonsampling Errors

• A sampling error is the difference between the result obtained from a sample and the result that would be obtained from a census conducted by using the same procedures as in the sample.
• Nonsampling errors are those present in data irrespective of whether data are obtained from a sample or a census.

5.  Probability and Judgement Samples

• A sample in which the selection of elements from the population is made according to known probabilities is a probability sample.
• A sample for which judgement is used to select representative elements from the population or to infer that it is representative of the population is a judgement sample.
Example:
• judgement samples ("quota samples") were widely used in early days of political opinion polling
• convenience samples may or may not be judgement samples

3.  Simple Random Sampling From a Finite Population

1.  Definition

A (simple) random sample from a finite population is a sample selected so that each possible sample combination of the specified size has equal probability of being chosen.

NOTE:

• the number of different possible samples of n from a population of N elements is given by the formula
• N!/(n!(N - n)!)  (it may be a BIG number!!!)
• the definition of a simple random sample implies that each element of the population has an equal probability of being selected; but equal probability of selection of elements is not a sufficient condition for a simple random sample - Q - Why?  (See NWW p. 241 bottom.)
• 2.  Selection of Simple Random Sample

Selecting a simple random sample requires a frame.
• A frame is a listing of all elements of the finite population.
The general procedure for selecting a simple random sample is to select elements sequentially without replacement:
• select first element with probability 1/N
• select the second element with probability 1/(N - 1)
• etc., until n elements are selected
1.  Sample Selection Using a Table of Random Numbers
See the following exhibit (county lawyers)
Exhibit: (NWW Table 9.1 p. 244) [m8001.gif]
2.  Sample Selection Using Computer-Generated Numbers
<show example of programs in SYSTAT and STATA>

4.  Simple Random Sampling From an Infinite Population

1.  Definition

The n random variables X1, X2, ..., Xn generated by a process constitute a simple random sample from an infinite population if
1. they are independent and
2. they come from the same probability distribution (i.e., they are "identically distributed"); the common probability distribution for X1, X2, ..., Xn is the infinite population.

2.  Diagnostic Procedures for Checking Randomness of Data

See the following exhibits:
Exhibit: (NWW Figure 9.1 p. 246) [m8002.gif]
Exhibit: (NWW Figure 9.2 p. 247) [m8003.gif]
Exhibit: (NWW Figure 9.3 p. 248) [m8004.gif]
NOTE: in the social sciences an important example of a population treated as an infinite population is that of the error term in a statistical models, as in a regression model of Y as a function of variables Xk and a residual error term; diagnostic procedures for checking randomness may then be used with an estimate of the error term called the residual.

5.  Sample Statistics and Population Parameters

• A characteristic of a population is referred to as a population parameter, or parameter for short.
• A characteristic of a sample is referred to as a sample statistic, or statistic for short.

1.  Sample Statistics

Sample statistics consist of summaary measures calculated from a sample of n observations X1, X2, ..., Xn from a population such as
• the sample mean X.
• the sample variance s2, etc.

2.  Population Parameters

Definitions of population parameters differ depending on whether the population is finite or infinite.

 Parameter Finite population with observations X1, X2, ..., XN Infinite population represented by RV X Population Mean m = (Si=1 to NXi)/N m = E{X} Population Variance s2 = (Si=1 to N(Xi - m)2)/N s2 = s2{X} Population Standard Deviation s = (s2)1/2 s = (s2)1/2

NOTE: See also box in NWW p. 251 for demonstration that "the population mean m and variance s2 for a finite population correspond, respectively, to the expected value and variance of the RV associated with the equal-probability selection of one population element."

3.  Definition of Statistical Inference

Statistical inference is the use of probability theory to make inferences about population parameters using information obtained from a sample.