Sociology 709

 

Study guide for the midterm

 

Remember, if you score less than 70% on the midterm, you can take a makeup and earn up to 70%.

 

If you don’t want to take the exam in class on 3/1, you can take it on Monday 3/5 at either 9:00 am or 3:00 pm, just come to my office.

 

We will go over this review sheet during our review session.  Look at it beforehand and I will go over any questions that you have.

 

The exam questions will be very similar to questions on this sheet (I will not try to surprise you).   I will aim for something that will take around 45-50 minutes to finish.

 

Things to memorize:

 

You may be asked to make calculations using these formulas, which you will need to memorize.  If there are any other

calculations that involve formulas, I will give you the formulas. 

 

The equation for a confidence interval and a t-test. 

Calculate the probability of a normal variable with mean x and standard deviation y falling in the interval (a,b)

How do you estimate the variance of the error term?  What does the error term “mean”?

 

 

Things to review:

 

Lecture B

*9. p. 6 Why does Fox worry about the shorthand use of “effect” to describe the result of a statistical model being taken too literally?

*10.  p. 7  What is the difference between experimental and non-experimental data?  Why is randomization key?

*11.  p. 7-8  How could we use randomization to test for a causal effect of specific judges on verdicts in Table 1.1

*12.  If someone wanted to argue that it wasn’t the judges per se but some omitted factor that produced the results in Table 1.1, what might the omitted factor?  In the case of income and education, what might an omitted factor be?

13.  p.9.  Why would you want to use a “double-blind” test of a treatment effect? 

*14. p.10.  Refer to Figure 1.1.  What does it mean to say that education “explains away” some of the effect of income on prestige?

 PS1, #1

Lecture E

Questions about this table:

1.  What is the effect of education on prestige in Model 1?  Looking just at Model 1, what alternative explanations could we have involving income and gender composition?

2.  How do we test those other explanations in the remaining models?

3.  What is the effect of gender composition on prestige, in this data?  What might one conclude about gender effects based on this?  What alternative mechanisms about the effect of gender on prestige might be operating? 

4.  Could the effect of gender be mediated through income? 

Q. p.20  Is multiple regression good as an experiment? 

Let’s take Allison’s example of the effect of an SAT training class on SAT scores.   How could we use OLS to find an answer?

What if someone says “you should have controlled for grades”?

…think of another possible thing to control for…and another…and another…and another

PS2, #2 c-g

 

Lecture D

 

What is the central limit theorem?

 

Lecture F

 

Explain how I got the results in Graph F1 and what it means.

 

The standard error of B is given by

 

Why does the equation for SE(B) make sense?

 

Be able to construct the x% (i.e. 95% or some other number) confidence interval for B and test B at the y-level of significance (i.e. .05 or some other number)

 

Explain in words what we are doing when we test the null hypothesis that B=0.

 

Lecture G

 

Fox p.120, Equation 6.2

 

(I will give you this formula if I ask questions about it)

this is similar to the equation for simple regression, except that the variance is inflated by

*Q1. Why does it make sense that the variance is inflated by ?  What happens if the independent variables are strongly correlated with each other?

 

*Q2.  How does this relate to the Venn diagram we talked about last class from Kennedy?

 

Lecture H1

 

Given this formula for the coefficients in multiple regression,

Explain in words how you would calculate the coefficients.  When you come to technical terms involving matrix algebra (i.e., transpose, inverse matrix etc.) briefly explain what those terms mean.

 

 

Lecture H2

 

*Q1:  If we have a normal variable X with mean 0 and standard deviation 2 (i.e., X~N(0,2) ) , what is the probability that |X|  (i.e., the absolute value of X, or the “magnitude” X) will be greater than  1?

 * Q2.  If X~N(0,5) what is the probability that |X|>10?

 Q3.  In a regression of income on education with 2000 cases, the estimated coefficient, B, is 4 with standard error 3. 

 The null hypothesis, H0, is that the actual coefficient is 0. 

 If H0 were true, what is the chance that |B| >4?  In other words, what is the chance of observing a coefficient at least as big in magnitude as the one we estimated?

 Q4.  In a regression of income on education with 2000 cases, the estimated coefficient, B, is 6 with standard error 3. 

Test H0: =0

 

Q4:  Describe in words where, in simple regression, the standard error of B comes from.

 

PS4, #3

 

Lecture I

 

. xi: reg happy i.type*i.size sugar

i.type            _Itype_1-4          (naturally coded; _Itype_1 omitted)

i.size            _Isize_0-1          (naturally coded; _Isize_0 omitted)

i.type*i.size     _ItypXsiz_#_#       (coded as above)

 

      Source |       SS       df       MS              Number of obs =    2000

-------------+------------------------------           F(  8,  1991) = 1042.28

       Model |  852077.437     8   106509.68           Prob > F      =  0.0000

    Residual |  203457.573  1991  102.188635           R-squared     =  0.8072

-------------+------------------------------           Adj R-squared =  0.8065

       Total |  1055535.01  1999  528.031521           Root MSE      =  10.109

 

------------------------------------------------------------------------------

       happy |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

    _Itype_2 |   9.937232   .8944368    11.11   0.000     8.183102    11.69136

    _Itype_3 |   19.45203   .9320253    20.87   0.000     17.62419    21.27988

    _Itype_4 |   28.51886   .9173906    31.09   0.000     26.71971      30.318

    _Isize_1 |   9.359119   .9046301    10.35   0.000     7.584998    11.13324

_ItypXsi~2_1 |   6.397666   1.281947     4.99   0.000     3.883569    8.911764

_ItypXsi~3_1 |   30.98568   1.281288    24.18   0.000     28.47287    33.49848

_ItypXsi~4_1 |   12.75011    1.27919     9.97   0.000     10.24142     15.2588

       sugar |   .2162254   .0077933    27.75   0.000     .2009415    .2315092

       _cons |   9.835219   .7751546    12.69   0.000      8.31502    11.35542

------------------------------------------------------------------------------

 

 

Type: 1 "red" 2 "green" 3 "purple"  4 "blue"

Size: 0 = small, 1= big

Sugar: 0 to 100, continuous variable

* Q:  What does the coefficient on  _Isize_1 indicate?

* Q:  What does the coefficient _ItypXsi~3_1 indicate?

* Q:  What is the effect of size for blue jellybeans using these regression results?

 

For the type*sugar interaction model, what is the effect of sugar for green jellybeans?

 

let’s see what the regression results look like:

 

 

(1)

(2)

(3)

 

happy

Happy

happy

type==2

22.789

21.649

0.502

 

(3.709)**

(3.713)**

(0.635)

type==3

35.012

32.732

0.298

 

(3.709)**

(3.755)**

(0.649)

type==4

57.948

54.024

0.671

 

(3.709)**

(3.863)**

(0.687)

size==1

 

13.258

9.581

 

 

(3.764)**

(0.639)**

Sugar

 

 

1.994

 

 

 

(0.008)**

Constant

130.511

121.178

0.712

 

(2.623)**

(3.723)**

(0.784)

Observations

2000

2000

2000

R-squared

0.11

0.12

0.97

Standard errors in parentheses                       

* significant at 5%; ** significant at 1%            

*Q: What happens when we go from model 1 to model 3?  Why?

 

PS5, #2

 

 

Lecture J

 

What does Graph 1 indicate?  Why?

 

What doe we learn from the following models (in the exam I will give you the Stata output)

 

(re is the categorical variable for race and ethnicity)

 

Model 1:

xi: reg wage i.sex

Model 2:

xi: reg wage i.sex i.re

Model 3:

xi: reg wage i.sex*i.re

 

In model 3, what is the gender gap among Asian Americans?  Is it different than the gender gap among whites?  How would we test whether the gender gap

among Asians was = 0?

 

What is the predicted hourly wage for Hispanic females?

 

Lecture K

 

When we have a dichotomous dependent variable, what is the problem with using OLS?

 

Why is the cumulative density function useful for the probit model?

 

The index –i.e. what we predict with the data and coefficients—of a probit model is a “latent” variable—something we don’t actually observe.  Using the example of a survey question about happiness discussed in class, explain the difference between the data we observe and the underlying latent variable, and why a probit model is useful in this context.

 

Imagine that I ran the following model:

 

probit  happy x

and got a coefficient estimate for x of 1 and 0 for the constant term.

Is the effect of a 1-unit increase in x on the probability of being happy constant for all levels of x?  Give me an example to make your point.

 

PS6, #1-2