Sociology 709
Study guide for the midterm
Remember, if you score less than 70% on the midterm, you can take a makeup and earn up to 70%.
If you don’t want to take the exam in class on 3/1, you can take it on Monday 3/5 at either 9:00 am or 3:00 pm, just come to my office.
We will go over this review sheet during our review session. Look at it beforehand and I will go over any questions that you have.
The exam questions will be very similar to questions on this sheet (I will not try to surprise you). I will aim for something that will take around 45-50 minutes to finish.
Things to memorize:
You may be asked to make calculations using these formulas, which you will need to memorize. If there are any other
calculations that involve formulas, I will give you the formulas.
![]()
![]()

![]()
![]()
The equation for a confidence interval and a t-test.
Calculate the probability of a normal variable with mean x and standard deviation y falling in the interval (a,b)
How do you estimate the variance of the error term? What does the error term “mean”?
Things to review:
Lecture B
*9. p. 6 Why does Fox worry about the shorthand use of “effect” to describe the result of a statistical model being taken too literally?
*10. p. 7 What is the difference between experimental and non-experimental data? Why is randomization key?
*11. p. 7-8 How could we use randomization to test for a causal effect of specific judges on verdicts in Table 1.1
*12. If someone wanted to argue that it wasn’t the judges per se but some omitted factor that produced the results in Table 1.1, what might the omitted factor? In the case of income and education, what might an omitted factor be?
13. p.9. Why would you want to use a “double-blind” test of a treatment effect?
*14. p.10. Refer to Figure 1.1. What does it mean to say that education “explains away” some of the effect of income on prestige?
PS1, #1
Lecture E
Questions about this table:
1. What is the effect of education on prestige in Model 1? Looking just at Model 1, what alternative explanations could we have involving income and gender composition?
2. How do we test those other explanations in the remaining models?
3. What is the effect of gender composition on prestige, in this data? What might one conclude about gender effects based on this? What alternative mechanisms about the effect of gender on prestige might be operating?
4. Could the effect of gender be mediated through income?
Q. p.20 Is multiple regression good as an experiment?
Let’s take Allison’s example of the effect of an SAT training class on SAT scores. How could we use OLS to find an answer?
What if someone says “you should have controlled for grades”?
…think of another possible thing to control for…and another…and another…and another
PS2, #2 c-g
Lecture D
What is the central limit theorem?
Lecture F
Explain how I got the results in Graph F1 and what it means.
The
standard error of B is given by

Why does the equation for SE(B) make sense?
Be able to construct the x% (i.e. 95% or some other number) confidence interval for B and test B at the y-level of significance (i.e. .05 or some other number)
Explain in words what we are doing when we test the null hypothesis that B=0.
Lecture G
Fox p.120, Equation 6.2
(I will give you this formula if I ask questions about
it)

this is similar to the equation for
simple regression, except that the variance is inflated by 
*Q1.
Why does it make sense that the variance is inflated by
? What
happens if the independent variables are strongly correlated with each other?
*Q2.
How does this relate to the Venn diagram we talked about last class from
Kennedy?
Lecture H1
Given this formula for the coefficients in multiple regression,
![]()
Explain in words how you would calculate the coefficients. When you come to technical terms involving matrix algebra (i.e., transpose, inverse matrix etc.) briefly explain what those terms mean.
Lecture H2
*Q1:
If we have a normal variable X with mean 0 and standard deviation 2 (i.e., X~N(0,2) ) , what is the probability
that |X| (i.e., the absolute value of X, or the “magnitude” X) will be
greater than 1?
Test H0:
=0
Q4: Describe in words where, in simple regression, the standard error of B comes from.
PS4, #3
Lecture I
. xi: reg happy i.type*i.size sugar
i.type
_Itype_1-4 (naturally
coded; _Itype_1 omitted)
i.size
_Isize_0-1
(naturally coded; _Isize_0 omitted)
i.type*i.size _ItypXsiz_#_# (coded as
above)
Source |
SS df
MS
Number of obs = 2000
-------------+------------------------------
F( 8, 1991) = 1042.28
Model | 852077.437
8
106509.68 Prob > F = 0.0000
Residual | 203457.573 1991
102.188635
R-squared = 0.8072
-------------+------------------------------
Adj R-squared = 0.8065
Total | 1055535.01
1999
528.031521 Root
MSE = 10.109
------------------------------------------------------------------------------
happy | Coef. Std. Err. t P>|t| [95%
Conf. Interval]
-------------+----------------------------------------------------------------
_Itype_2 | 9.937232 .8944368
11.11 0.000 8.183102
11.69136
_Itype_3 | 19.45203 .9320253
20.87 0.000 17.62419
21.27988
_Itype_4 | 28.51886 .9173906
31.09 0.000
26.71971 30.318
_Isize_1 | 9.359119 .9046301
10.35 0.000 7.584998
11.13324
_ItypXsi~2_1
| 6.397666 1.281947
4.99 0.000 3.883569
8.911764
_ItypXsi~3_1
| 30.98568 1.281288 24.18
0.000 28.47287 33.49848
_ItypXsi~4_1
| 12.75011 1.27919
9.97 0.000 10.24142
15.2588
sugar | .2162254
.0077933 27.75 0.000
.2009415 .2315092
_cons | 9.835219 .7751546
12.69 0.000 8.31502
11.35542
------------------------------------------------------------------------------
Type:
1 "red" 2 "green" 3 "purple"
4 "blue"
Size:
0 = small, 1= big
Sugar:
0 to 100, continuous variable
For the type*sugar
interaction model, what is the effect of sugar for green jellybeans?
let’s see what the regression
results look like:
|
|
(1) |
(2) |
(3) |
|
|
happy |
Happy |
happy |
|
type==2 |
22.789 |
21.649 |
0.502 |
|
|
(3.709)** |
(3.713)** |
(0.635) |
|
type==3 |
35.012 |
32.732 |
0.298 |
|
|
(3.709)** |
(3.755)** |
(0.649) |
|
type==4 |
57.948 |
54.024 |
0.671 |
|
|
(3.709)** |
(3.863)** |
(0.687) |
|
size==1 |
|
13.258 |
9.581 |
|
|
|
(3.764)** |
(0.639)** |
|
Sugar |
|
|
1.994 |
|
|
|
|
(0.008)** |
|
Constant |
130.511 |
121.178 |
0.712 |
|
|
(2.623)** |
(3.723)** |
(0.784) |
|
Observations |
2000 |
2000 |
2000 |
|
R-squared |
0.11 |
0.12 |
0.97 |
Standard errors in
parentheses
* significant
at 5%; ** significant at
1%
PS5, #2
Lecture J
What does Graph 1 indicate? Why?
What doe we learn from the following models (in the exam I will give you the Stata output)
(re is the categorical variable for race and ethnicity)
Model 1:
xi: reg wage i.sex
Model 2:
xi: reg wage i.sex i.re
Model 3:
xi: reg wage i.sex*i.re
In model 3, what is the gender gap among Asian Americans? Is it different than the gender gap among whites? How would we test whether the gender gap
among Asians was = 0?
What is the predicted hourly wage for Hispanic females?
Lecture K
When we have a dichotomous dependent variable, what is the problem with using OLS?
Why is the cumulative density function useful for the probit model?
The index –i.e. what we predict with the data and coefficients—of a probit model is a “latent” variable—something we don’t actually observe. Using the example of a survey question about happiness discussed in class, explain the difference between the data we observe and the underlying latent variable, and why a probit model is useful in this context.
Imagine that I ran the following model:
probit happy x
and got a coefficient estimate for x of 1 and 0 for the constant term.
Is the effect of a 1-unit increase in x on the probability of being happy constant for all levels of x? Give me an example to make your point.
PS6, #1-2