[preliminary version]

Sociology 709

Final Exam

Take home version

You have 5 hours to complete the exam once you start work on it.  You can look at the questions and study your notes before starting the clock.  You can write out sample syntax so you are all ready to go.

 

The exam is open book, open notes, and closed person…i.e., once you start you may not ask for help.

 

I will put a practice data set and tables online, and you can ask for help using the practice data/tables…before you start the actual exam.

 

Once you begin to write answers and/or open up the data set, then your 5 hours have started.

If you run out of time: stop all work  [this is for your benefit and psychological well being.  I don’t want you to spend more than 5 hours working on this]. 

If you get less than 70% correct after 5 hours then you will take a make-up exam during the regularly scheduled exam time for the class.

 

(You can start and stop multiple times, but the total time must be less than 5 hours)

Start time:

Finish time:

 

1. These questions refer to Table 1 (1 point each)

a.  Test the hypothesis that the coefficient on ___ is equal to ____.

 

interaction terms

b.  How much more do men earn than women?  Is this difference significant at the p=.05 level?

c.  How much more do Hispanic men earn than Hispanic women?  Is this difference significant at the p=.05 level?

 

 

2.  Explain, in words, where uncertainty about the estimate of B comes from in the model (2 points)

Y=A+Bx+e

How do we calculate the standard error of B?  (Describe in words).

 

 

3.  These questions refer to Tables 2 a (probit analysis) & b (logit), and the values for case #1.  (1 point each)

 

a.  Predict the Z-score and the probability that Y=1 for case 1 from Table 2a.

b.  Predict the log odds and the probability that Y=1 for case 1 from Table 2a.

c.  What would the probability that Y=1 be if the value for ____ were increased by 1? (use either the probit or logit results to find the answer).

 

The next series of questions refer to your personal data set. 

Write down the id# of your data (in Stata, type “notes” to get the id).

 

The data consists of an imaginary data set of student test scores.

 

4.  Cross sectional analysis: (1 point each)

Estimate the effect of student effort on math scores, controlling for other relevant variables in the data.

check for and/or deal with [see sample syntax from the review sheet for each of these issues].  Remember to interpret the results and explain why you did things the way you did them:

a. influential cases

* for the exam, you can assume than any “extremely” influential cases are due to coding errors, and can be deleted, if their Cook’s D is greater than .3

b. heteroskedasticity

c. multicollinearity

d. missing data

c. sampling probability

 

d. interpret the results, including the implications of any problems and assumptions that are violated (3 points)

 

5.  (2 points) Omitted variable bias: you don’t have a good measure of social class in your data.  If social class is positively correlated with student effort and also positively correlated with math scores, how might this bias your results?

 

6.  (2 points each) Longitudinal analysis:  Provided that social class has a constant effect on math scores, how could you use longitudinal data (if you had it) to estimate an alternative model of student effort on math scores?   Explain how the model would work and how it might be able to overcome the potential for bias inherent in the cross-sectional models.

 

7.  (2 points) Instrumental variables:  You have a variable in your data, “underdog” which indicates how many times the student stood up and cheered during a school showing of the movie Rocky (about an underdog boxer who never gives up, and then wins, and then repeats the process several times).  How could you justify this as an instrumental variable?  Are the assumptions valid (it’s a matter of opinion…explain your view)?  Obtain IV estimates using the syntax from Lecture Y.

 

 

8.  (2 points) Maximum likelihood.  Evaluate the likelihood of obtaining the sequence HTHHH for values of p=.2, .8, and .9.  Where p = the probability of obtaining heads (H).  Of these three values, which one would be the “maximum likelihood estimate”? 

 

 

9.  A simple multilevel model (for 2008).

 

 

Table 1:

. * Model 4

. xi: reg wage_re i.re*i.sex [w=weight]

i.re              _Ire_1-4            (naturally coded; _Ire_1 omitted)

i.sex             _Isex_1-2           (naturally coded; _Isex_1 omitted)

i.re*i.sex        _IreXsex_#_#        (coded as above)

(analytic weights assumed)

(sum of wgt is   2.2412e+08)

 

      Source |       SS       df       MS              Number of obs =  108288

-------------+------------------------------           F(  7,108280) =  992.39

       Model |  922855.374     7  131836.482           Prob > F      =  0.0000

    Residual |  14384670.8108280  132.846979           R-squared     =  0.0603

-------------+------------------------------           Adj R-squared =  0.0602

       Total |  15307526.2108287  141.360701           Root MSE      =  11.526

 

------------------------------------------------------------------------------

     wage_re |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

      _Ire_2 |  -5.150555   .1787742   -28.81   0.000     -5.50095   -4.800161

      _Ire_3 |    2.32965   .2482191     9.39   0.000     1.843144    2.816156

      _Ire_4 |  -7.976564   .1542988   -51.70   0.000    -8.278987   -7.674141

     _Isex_2 |  -4.229605   .0798763   -52.95   0.000    -4.386161   -4.073048

_IreXsex_2_2 |   2.601923   .2412023    10.79   0.000      2.12917    3.074676

_IreXsex_3_2 |   .4760114   .3585717     1.33   0.184    -.2267841    1.178807

_IreXsex_4_2 |   2.988753   .2490093    12.00   0.000     2.500699    3.476808

       _cons |   20.58432    .055977   367.73   0.000     20.47461    20.69403

------------------------------------------------------------------------------

 

. tab sex, sum(wage_re) nost

 

            |  Summary of inflation

            | adjusted wage per hour

        Sex |        Mean       Freq.

------------+------------------------

  1    male |   19.429552       54058

  2  female |   15.771149       54230

------------+------------------------

      Total |   17.597445      108288

 

. tab re, sum(wage_re) nost

 

            |  Summary of inflation

            | adjusted wage per hour

         re |        Mean       Freq.

------------+------------------------

  1   white |   18.230697       88778

  2   black |   14.737052        8032

  3   asian |   19.980944        3808

  4hispanic |   12.079773        7670

------------+------------------------

      Total |   17.597445      108288

 

Table 2:

Logit and Probit results for models of the probability that Y=1

Dependent variable: Y (0 or 1)

Variable

Probit model results

Logit model results

Case #1 values

X

1

1

1

Z

2

2

1

W

-1

-1

1

Constant

1

1