[preliminary version]
Sociology 709
Final Exam
Take home version
You have 5 hours to complete the exam once you start work on it. You can look at the questions and study your notes before starting the clock. You can write out sample syntax so you are all ready to go.
The exam is open book, open notes, and closed person…i.e., once you start you may not ask for help.
I will put a practice data set and tables online, and you can ask for help using the practice data/tables…before you start the actual exam.
Once you begin to write answers and/or open up the data set, then your 5 hours have started.
If you run out of time: stop all work [this is for your benefit and psychological well being. I don’t want you to spend more than 5 hours working on this].
If you get less than 70% correct after 5 hours then you will take a make-up exam during the regularly scheduled exam time for the class.
(You can start and stop multiple times, but the total time must be less than 5 hours)
Start time:
Finish time:
1. These questions refer to Table 1 (1 point each)
a. Test the hypothesis that the coefficient on ___ is equal to ____.
interaction terms
b. How much more do men earn than women? Is this difference significant at the p=.05 level?
c. How much more do Hispanic men earn than Hispanic women? Is this difference significant at the p=.05 level?
2. Explain, in words, where uncertainty about the estimate of B comes from in the model (2 points)
Y=A+Bx+e
How do we calculate the standard error of B? (Describe in words).
3. These questions refer to Tables 2 a (probit analysis) & b (logit), and the values for case #1. (1 point each)
a. Predict the Z-score and the probability that Y=1 for case 1 from Table 2a.
b. Predict the log odds and the probability that Y=1 for case 1 from Table 2a.
c. What would the probability that Y=1 be if the value for ____ were increased by 1? (use either the probit or logit results to find the answer).
The next series of questions refer to your personal data set.
Write down the id# of your data (in Stata, type “notes” to get the id).
The data consists of an imaginary data set of student test scores.
4. Cross sectional analysis: (1 point each)
Estimate the effect of student effort on math scores, controlling for other relevant variables in the data.
check for and/or deal with [see sample syntax from the review sheet for each of these issues]. Remember to interpret the results and explain why you did things the way you did them:
a. influential cases
* for the exam, you can assume than any “extremely” influential cases are due to coding errors, and can be deleted, if their Cook’s D is greater than .3
b. heteroskedasticity
c. multicollinearity
d. missing data
c. sampling probability
d. interpret the results, including the implications of any problems and assumptions that are violated (3 points)
5. (2 points) Omitted variable bias: you don’t have a good measure of social class in your data. If social class is positively correlated with student effort and also positively correlated with math scores, how might this bias your results?
6. (2 points each) Longitudinal analysis: Provided that social class has a constant effect on math scores, how could you use longitudinal data (if you had it) to estimate an alternative model of student effort on math scores? Explain how the model would work and how it might be able to overcome the potential for bias inherent in the cross-sectional models.
7. (2 points) Instrumental variables: You have a variable in your data, “underdog” which indicates how many times the student stood up and cheered during a school showing of the movie Rocky (about an underdog boxer who never gives up, and then wins, and then repeats the process several times). How could you justify this as an instrumental variable? Are the assumptions valid (it’s a matter of opinion…explain your view)? Obtain IV estimates using the syntax from Lecture Y.
8. (2 points) Maximum likelihood. Evaluate the likelihood of obtaining the sequence HTHHH for values of p=.2, .8, and .9. Where p = the probability of obtaining heads (H). Of these three values, which one would be the “maximum likelihood estimate”?
9. A simple multilevel model (for 2008).
Table 1:
. *
Model 4
. xi: reg wage_re
i.re*i.sex [w=weight]
i.re
_Ire_1-4
(naturally coded; _Ire_1 omitted)
i.sex
_Isex_1-2
(naturally coded; _Isex_1 omitted)
i.re*i.sex _IreXsex_#_#
(coded as above)
(analytic weights assumed)
(sum of wgt is 2.2412e+08)
Source |
SS df
MS
Number of obs = 108288
-------------+------------------------------
F( 7,108280) = 992.39
Model | 922855.374
7 131836.482 Prob >
F = 0.0000
Residual | 14384670.8108280
132.846979
R-squared = 0.0603
-------------+------------------------------
Adj R-squared = 0.0602
Total | 15307526.2108287
141.360701 Root
MSE = 11.526
------------------------------------------------------------------------------
wage_re | Coef.
Std. Err. t
P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Ire_2 | -5.150555
.1787742 -28.81 0.000
-5.50095 -4.800161
_Ire_3 | 2.32965 .2482191
9.39 0.000 1.843144
2.816156
_Ire_4 | -7.976564
.1542988 -51.70 0.000
-8.278987 -7.674141
_Isex_2 | -4.229605
.0798763 -52.95 0.000
-4.386161 -4.073048
_IreXsex_2_2
| 2.601923 .2412023 10.79
0.000 2.12917 3.074676
_IreXsex_3_2
| .4760114 .3585717
1.33 0.184 -.2267841 1.178807
_IreXsex_4_2
| 2.988753 .2490093 12.00
0.000 2.500699 3.476808
_cons | 20.58432 .055977
367.73 0.000 20.47461
20.69403
------------------------------------------------------------------------------
. tab sex, sum(wage_re) nost
| Summary of inflation
| adjusted wage per hour
Sex |
Mean Freq.
------------+------------------------
1
male |
19.429552 54058
2 female |
15.771149 54230
------------+------------------------
Total | 17.597445 108288
. tab re, sum(wage_re) nost
| Summary of inflation
| adjusted wage per hour
re
|
Mean Freq.
------------+------------------------
1
white |
18.230697 88778
2
black |
14.737052 8032
3
asian
| 19.980944 3808
4hispanic |
12.079773 7670
------------+------------------------
Total | 17.597445 108288
Table 2:
Logit and Probit results for models of the probability that Y=1
Dependent variable: Y (0 or 1)
|
Variable |
Probit model results |
Logit model results |
Case #1 values |
|
X |
1 |
1 |
1 |
|
Z |
2 |
2 |
1 |
|
W |
-1 |
-1 |
1 |
|
Constant |
1 |
1 |
|