[final version, Spring 2007]
Sociology 709
Final Exam
Take home version
You have 5 hours to complete the exam once you start work on it. You can look at the questions and study your notes before starting the clock. You can write out sample syntax so you are all ready to go.
The exam is open book, open notes, and closed person…i.e., once you start you may not ask for help.
[It is fine to email me questions about clarification. I will cc my responses to the class email.]
If something goes wrong (Stata doesn’t work, your computer crashes, a question doesn’t make sense or simply is wrong), don’t panic! This is intended as a low-stress exam. Send me an email (or call my cellphone # 923-5555 during the day) and leave a callback number. You can stop the clock on the exam while we solve your problem.
I will put a practice data set and tables online, and you can ask for help using the practice data/tables…before you start the actual exam.
Once you begin to write answers and/or open up the data set, then your 5 hours have started.
If you run out of time: stop all work [this is for your benefit and psychological well being. I don’t want you to spend more than 5 hours working on this].
If you get less than 70% correct after 5 hours then you will take a make-up exam.
(You can start and stop multiple times, but the total time must be less than 5 hours)
Start time:
Finish time:
Table 1: OLS estimates of Wages, 50% sample of 2005
CPS data
Notes:
Variable codes
Race-Ethnicity
(re):
1
White, 2 Black, 3 Asian, 4 Hispanic
Sex
1
Male, 2 Female
|
|
Model
1 |
Model
2 |
|
|
inflation
adjusted wage per hour |
inflation
adjusted wage per hour |
|
Race-Ethnicity
(excluded category re==1) |
|
|
|
re==2 |
-3.782 |
-5.250 |
|
|
(0.170)** |
(0.253)** |
|
re==3 |
2.713 |
2.832 |
|
|
(0.248)** |
(0.339)** |
|
re==4 |
-6.681 |
-7.914 |
|
|
(0.173)** |
(0.220)** |
|
sex==2 |
-3.732 |
-4.252 |
|
|
(0.100)** |
(0.113)** |
|
Race-Ethnicity
& Gender interaction terms |
|
|
|
re==2
& sex==2 |
|
2.700 |
|
|
|
(0.342)** |
|
re==3
& sex==2 |
|
-0.278 |
|
|
|
(0.496) |
|
re==4
& sex==2 |
|
3.137 |
|
|
|
(0.354)** |
|
Constant |
20.316 |
20.570 |
|
|
(0.075)** |
(0.079)** |
|
Observations |
54001 |
54001 |
|
R-squared |
0.06 |
0.06 |
Standard
errors in parentheses
* significant
at 5%; ** significant at 1%
1. These questions refer to Table 1 (1 point each)
a. Test the hypothesis that the coefficient on “female” in Model 1 is equal to -2.
interaction terms. The questions refer to the predicted values based on the regression coefficients.
b. In Model 1, how much more do Asian men earn than Hispanic women?
c. In Model 2, how much more do Asian men earn than Hispanic women?
2. Explain, in words, where uncertainty about the estimate of B comes from in the model (2 points)
Y=A+Bx+e
How do we calculate the standard error of B? (Describe in words).
Table 2:
Logit and Probit results for models of the probability that Y=1
Dependent variable: Y (0 or 1)
|
Variable |
Probit model results |
Logit model results |
Case #1 values |
|
X |
.5 |
.5 |
1 |
|
Z |
1 |
1 |
2 |
|
W |
-.5 |
-.5 |
1 |
|
Constant |
-1 |
-1 |
|
3. These questions refer to Tables 2 a (probit analysis) & b (logit), and the values for case #1. (1 point each)
Note: In Stata, you can calculate the probability for a logit model by using the following syntax. If your predicted log odds are, for example, 3, then the probability is
di exp(3)/(1+exp(3))
For a probit model with a Z-score of 3, the probability is
di normprob(3)
a. Predict the Z-score and the probability that Y=1 for case 1 from Table 2a.
b. Predict the log odds and the probability that Y=1 for case 1 from Table 2a.
c. What would the probability that Y=1 be if the value for X were increased by 1 in case 1? (use either the probit or logit results to find the answer).
The next series of
questions refer to your personal data set.
Write down the id# of your data (in Stata, type “notes” to get the id).
Tip for doing this in Stata: With the do-file editor window open, you can highlight a section of your do-file, then click “Tools” à “Do Selection”. This will help you move through your do-file step by step.
|
Certified Statistical Analyst ($350/hr consultant
fee) |
Data Set # (see link below) |
|
Aldrich, Rebecca |
3 |
|
|
4 |
|
Bailliard, Antoine |
5 |
|
Baird, Timothy |
3 |
|
Choemprayong, Songphan |
4 |
|
Combs, Tab |
5 |
|
Crosby, |
3 |
|
Garrett-Peters, Raymond |
4 |
|
Gordon, Brady |
5 |
|
Knauer, Stefanie |
3 |
|
Leung, May May |
4 |
|
Levy, Jessica |
5 |
|
McFarland, Katherine |
3 |
|
McRee, Annie |
4 |
|
Stutzman, Frederic D |
5 |
|
Wagner, |
3 |
|
Weng, Jui-Hsun |
4 |
Download Data Set: # 3 , 4 , 5
The data consists of an imaginary data set of student test scores.
4. Cross sectional analysis: (1 point each)
Estimate the effect of student effort on math scores, controlling for other relevant variables in the data (hours of TV watched and gender).
check for and/or deal with [see sample syntax from the review sheet for each of these issues]. Remember to interpret the results and explain why you did things the way you did them:
a. influential cases
* for the exam, you can assume than any “extremely” influential cases are due to coding errors, and can be deleted, if their Cook’s D is greater than .3
b. heteroskedasticity
c. multicollinearity
d. missing data
c. sampling probability
d. interpret the results, including the implications of any problems and assumptions that are violated (3 points)
5. (2 points) Omitted variable bias: you don’t have a good measure of social class in your data. If social class is positively correlated with student effort and also positively correlated with math scores, how might this bias your results?
6. (2 points each) Longitudinal analysis (You have cross-sectional data. This question asks you to consider what you would do if you had longitudinal data): Provided that social class has a constant effect on math scores, how could you use longitudinal data (if you had it) to estimate an alternative model of student effort on math scores? Explain how the model would work and how it might be able to overcome the potential for bias inherent in the cross-sectional models.
7. (2 points) Instrumental variables: You have a variable in your data, “Rocky” which is an index of how much the student likes the movie Rocky (about an underdog boxer who never gives up, and then wins, and then repeats the process several times). How could you justify this as an instrumental variable? Are the assumptions valid (it’s a matter of opinion…explain your view)? Obtain IV estimates using the syntax from Lecture Y.
8. (2 points) Maximum likelihood. Evaluate the likelihood of obtaining the sequence HTT for values of p=.2, .8, and .9. Where p = the probability of obtaining heads (H). Of these three values, which one would be the “maximum likelihood estimate”?