Sociology 709
Problem set #8
1. If you were arguing with someone and they said “heteroskedasticity is not really a problem…the coefficient estimates are unbiased”, how would you respond?
2. Explain the logic behind why multicollinearity is a problem.
Questions 3 & 4: I have created two data sets, ps8_a.dta and ps8_b.dta. Each of these data sets may be plagued by any number of the following problems:
influential cases, omitted variable bias, heteroskedasticity, and multicollinearity. I want you to use the diagnostic procedures we have learned to figure out what is wrong with each data set.
Notes: Theory suggests that the correct relationship is a regression of y on x, z, and w. So, you want to estimate this regression equation.
However, your intuition suggests that q may be an omitted factor. Fortunately, q was included on the two data sets you have. See if the regression “reg y x z w” is biased because of the omitted factor q.
Next, you suspect that one of the data entry employees made some errors. Can you identify any cases that seem to be extreme outliers? If so, what impact do they have on the results? [Use syntax from lecture p]
Do you find evidence of heteroskedasticity? If so, is there justification for using weighted least squares with a particular variable as the weight? Or, do you decide to run a robust regression? [Use syntax & intuition from lecture r]
Is there evidence of multicollinearity? [Use syntax from lecture s]
3. Diagnosis for ps8_a.dta
4. Diagnosis for ps8_b.dta
5. Do step 2 of the paper. [Note: this step includes a “basic descriptive analysis”—you only have to start this step by producing a few tables, following the example in the do file on the paper web page].