Randomness of Ages of U.S. Residents 

Statistical Topic: 
Understanding the concept of randomness is crucial to theories on how to draw inferences about a population by analyzing only a single, smaller sample.  The purpose of random number generators such as dice and a random number tables can produce results that appear to be controlled entirely by chance.  Students will learn the difference between theoretical, empirical and experimental probability and how to present results using a probability distribution histogram.   Differences in randomness with small numbers and large numbers will also be considered.
Student Issue: 
Understanding the age structure of U.S. residents is key to understanding the labor market, number of college students, number of children in schools, consumer markets and almost every aspect of the U.S. economy.  The pattern of violent victimizations (rape, sexual assault, robbery with or without injury, aggravated assult with injury and simple assault with minor injury) across age groups is not random with teenagers ages 12-19 being victim of crimes almost 42 times more often that senior citizens, 65 years and older
Data Set:
To estimate the number of U.S. Resident in 1997 by age group consult Table 1. Resident Population.
Research Questions: 
What are the different types of probability-theoretical, empirical and experimental?
What are random numbers and how are they generated?
What happens to probabilities when small numbers are involved? Large numbers? 
Statistical Techniques: 
  1. The ages of U.S. residents 5 and over are presented in class intervals of equal widths of 15 years starting with age 5 in Table 1.  If the number of residents were equal for each of the five age intervals we would expect the probability to equal to 1/5.  Draw a probability distribution with the age intervals on the x-axis and the probability for each class on the y-axis.  This would be a theoretical probablity distribution and have a shape called uniform or rectangular.
  2. Compute the actual percent or probability for each age interval as determined by the actual number of U.S. residents determined by the Bureau of the Census and place this value in Table 1.  Draw a probability distribution with the age class intervals on the x-axis and the empirical (based on real observations or numbers) probabilities on the y-axis.  Compare this probability distribution with the one draw in question 1. Can you think of factors that would cause the empirical probablilites to be different from the theoretical probabilities?
  3. Suppose we decided to simulate an experiment in which we continually generated a person whose age come from one of the five groups with an equal probability (theoretical)? We could use a random number generator such a one dice which has the chance of selecting a 1, 2, 3, 4, 5 or 6. We could number each age interval as a 1, 2, 3, 4 or 5.  Next we could throw this dice 20 times to generate 20 random numbers.  Using these numbers generated from the dice throw to determine the age group for the 20 fictious people you have created.  Compute the experimental probabilities (based on the randomness of tossing the dice) for each age group.  Draw a histogram (probability distribution) for your twenty people.  How does this histogram compare with the distributions drawn in question 2 or 3? Note: if you do not have a dice you can use a random number table that comes in the back of most statistics text. Select any column or row of one digit  numbers and chose only those numbers ranging from 1-5.
  4.  Have your instructor collect the number for all age groups from each student and draw a histogram for all the experimental data collected.  How does this probility distribution compare with those drawn previously especially the one in question 4? Why do you think these distributions are different? If you did this 248,488,000 times (number of people in the U.S.) which probability distribution should closest reflect your results?
Social Commentary:
  1. The next time you visit a mall or restaurant, look around and mentally determine the age groups of about 10-20 people. Which one of the probabilities you have computed in this exercise best fits this group of people?  Why do you think you got the results you did?