Sampling Counties for Air Quality 

Statistical Topic:
One of the essentials of good research and data collection is making sure that you have selected a sample which is representative of the population you are interested in studying.  We must choose every observation in a data set in such a way that we have a truly random sample, meaning that each and every element of the population has an equal chance of being included in the sample.
Student Issue:
What is the air quality for counties in your state? How does your county's air quality compare with the entire state's?
Data Set:
Use a table containing emissions of your state by county.  The example I am currently using is  the 1999 emissions report from the Kentucky Division of Air Quality which has reported emissions for both counties and regions of the state.  The emission variables are the amount in tons for the following EPA regulated emissions:  sulfur dioxide, nitrous oxide, carbon dioxide, violatile organophosphate compounds and the amount of particulate matter in the air. Using the Internet sites shown below, your instructor will find a data set for your state listing emissions for each county.
Goal of Data Analysis Lab:
Determine the importance of random sampling in testing research questions.  Compare a sample mean created through random sampling with other means generated by students in your class.  Also compare these means with those obtained through stratified random sampling. 
Statistical Techniques:
  1. Identify each county by giving it a number from 1 to the number of counties.  Using a random number generator, select 15 counties from all the counties for the variable sulfur dioxide.  Compute the sample mean and standard deviation for this emission.  Place your sample mean on the board? How does your sample mean compare with sample means generated by your classmates
  2. Compute the total number of samples you could generate by selecting 15 counties from all the counties in the state.  Use the combination function on your calculator or use the formula given by your instructor.
  3. Renumber each county according to the regions of the state (Kentucky has 9 regions as designated by the Environmental Protection Agency).  Again using a random number generator select three counties from each region (if possible), and compute the sample mean and standard deviation for these three means  Place your stratified sample mean on the board. How does this mean compare with other stratified means? How does this mean compare with randomly sampled means that were not stratified?
  4. Draw two histograms-one for the randomly selected means and the second for the stratified by county randomly selected means.  How do these two histograms compare? 
Social Commentary:
  1. How does your county compare with others in the state for all types of emissions?
  2. What is your county doing to lower these emissions and improve air quality?
Web Sites for Environmental Data:
  1. EPA Center for Environmental Information and Statistics:
    • http://www.epa.gov/ceis
  2. EPA ZipCode Search (Map of Your Neighborhood):
    • http://www.epa.gov/enviro/zipcode_js.html
  3. Scorecard Provided by the Environmental Defense Fund:
    1. http://www.scorecard.org