Sociology 709
Lecture H2
Revised 1/27/2007 10:59 PM
1. T-tests and confidence intervals, take II
2. Calculating the standard errors of the coefficients, by hand using matrices.
3. Maximum likelihood estimation.
1. T-tests and confidence intervals, take II
(Hand out cumulative normal table)
Note: To convert a
normal variable to a standard normal: ![]()
*Q1: If we have a normal variable X with mean 0 and standard deviation 2 (i.e., X~N(0,2) ) , what is the probability that |X| (i.e., the absolute value of X, or the “magnitude” X) will be greater than or equal to 1?
* Q2. If X~N(0,5) what is the probability that |X|
10?
Q3. In a regression of income on education with 2000 cases, the estimated coefficient, B, is 4 with standard error 3.
(Why is it convenient to use a large number of cases here?)
The null hypothesis, H0, is that the actual coefficient is 0.
If H0 were true, what is the chance that |B|
4? In other words,
what is the chance of observing a coefficient at least as big in magnitude as the
one we estimated?
Q4. In a regression of income on education with 2000 cases, the estimated coefficient, B, is 6 with standard error 3.
Test H0:
=0
Summary: To test the
null hypothesis that
=0, take B/SE(B) and compare it to
the critical T-value. For large N, the
p=.05 level of significance is 1.96.
What you are implicitly doing when you calculate this is asking “if H0 is true, what is the chance of observing a T-value that is at least this big.” If the chance is small, i.e., less than 5% (p<.05) then you reject H0 at the .05 level of significance.
Q4: Describe in words where, in simple regression, the standard error of B comes from.
2. Calculating the standard errors of the
coefficients, by hand using matrices.
As we discussed in the last class, in matrix notation, the estimated coefficients for multiple regression can be found by solving
![]()
From Fox, p.216, the estimated coefficients are distributed as:
(1) ![]()
In words, this means they are distributed as multivariate
normal variables that have a mean of
and a correlation matrix
.
P. 217 of Fox notes the similarity between the formulas for simple regression and multiple regression.
Let’s go through an example of calculating equation 1. First we need to find the standard error of the error term.
The squared sum of errors is given on p.212 of Fox,
![]()
Let’s use the prestige data from Fox.
The do file for this example, lech2.do
Note: on long lists, I am only showing the first 5-6 cases.
-------------------------------------------------------------------------------
log: C:\papers\soc709\lech2.log
log type: text
opened on: 2 Feb 2007, 13:01:17
. *
to view a log file (in this case lech2.log) go to
file-->log-->view in Stat
>
a
. *
or open it up in word and set the font to courier new,
9 or 10 point
.
. use prestige
(From Fox, Applied Regression Analysis. Use 'notes' command for source of data
>
)
.
. *
let's first check what the right answer should be
. reg prestige educ
inc
Source | SS
df MS Number of obs
= 102
-------------+------------------------------ F( 2,
99) = 195.55
Model | 23856.5752 2
11928.2876 Prob > F
= 0.0000
Residual | 6038.85086 99
60.9984935
R-squared = 0.7980
-------------+------------------------------ Adj
R-squared = 0.7939
Total | 29895.4261 101
295.994318 Root MSE =
7.8102
------------------------------------------------------------------------------
prestige | Coef. Std. Err. t P>|t|
[95% Conf. Interval]
-------------+----------------------------------------------------------------
educat
| 4.137444 .348912
11.86 0.000 3.445127
4.829762
income | .0013612
.0002242 6.07 0.000
.0009163 .0018061
_cons | -6.847778 3.218977
-2.13 0.036
-13.23493 -.4606292
------------------------------------------------------------------------------
. est store mod1
. predict yhat
(option xb assumed; fitted values)
. list yhat educ
inc
+----------------------------+
|
yhat educat income |
|----------------------------|
1. | 64.20587 13.11
12351 |
2. | 79.1029 12.26
25879 |
3. | 58.60675 12.77
9271 |
4. | 52.46857 11.42
8865 |
5. | 65.07954 14.62
8403 |
.
. *
type help mkmat to learn
about the mkmat command
.
. gen cons=1
.
. list cons educat income
+------------------------+
| cons
educat
income |
|------------------------|
1. |
1 13.11 12351 |
2. |
1 12.26 25879 |
3. |
1 12.77 9271 |
4. |
1 11.42 8865 |
5. |
1 14.62 8403 |
|------------------------|
.
. mkmat cons educat
income, mat(x)
.
. *
this makes a matrix x with the variables cons, educat, and income.
.
. mat list x
x[102,3]
cons educat income
r1
1 13.11 12351
r2
1 12.26 25879
r3
1 12.77 9271
r4
1 11.42 8865
r5
1 14.62 8403
r6
1 15.64 11030
.
. mkmat prestige, mat(y)
.
. mat xx=x'*x
.
. mat lis xx
symmetric xx[3,3]
cons educat income
cons 102
educat 1095.28
12513.045
income 693386 8121410.2
6.534e+09
. mat xxinv=inv(xx)
. mat lis xxinv
symmetric xxinv[3,3]
cons educat income
cons .16986999
educat -.01639526
.00199578
income 2.352e-06 -7.407e-07
8.241e-10
.
. mat a=xxinv*xx
. mat lis a
symmetric a[3,3]
cons educat income
cons 1
educat -1.110e-16
1
income 0 0 1
. *
note that a is the identity matrix.
.
. mat xy=x'*y
. mat lis xy
xy[3,1]
prestige
cons 4777
educat 55326.378
income 37748108
. mat b=xxinv*xy
. *
b will be the OLS estimates.
. mat lis b
b[3,1]
prestige
cons -6.8477782
educat 4.1374443
income .00136117
.
. *
ok, we went this far in lech_matrix.do
. *
now we want to calculate the error terms
.
. *
first calculate the predicted values
. mat yhat=x*b
.
. mat lis yhat
yhat[102,1]
prestige
r1 64.205875
r2
79.1029
r3 58.606756
r4 52.468571
r5 65.079534
r6 72.875512
.
. mat e=y-yhat
. mat lis e
e[102,1]
prestige
r1
4.5941285
r2 -10.002902
r3
4.7932454
r4
4.3314278
r5
8.4204661
r6
4.7244869
. mat ee=e'*e
. mat lis ee
symmetric ee[1,1]
prestige
prestige 6038.8509
.
. *
this is the sum of squared errors.
. *
to calculate the mean squared error (i.e. the variance
of the error term),
>
divide ee by n-(k+1)
.
. global n=_N
. global ee=ee[1,1]
. *
$ee ,
a global variable, now is the sum of squared errors
.
. di $ee/($n-(2+1))
60.998494
.
. global mse=$ee/($n-(2+1))
.
.
. * Q: find this number
in the regression table from the top of this log file.
.
. *
ok, we have found the mean squared error.
. *
now, from formula 1 in this lecture, we can calculate
the standard error of
> the coefficients.
.
. *
first replay the original regression results
. est replay mod1
-------------------------------------------------------------------------------
Model
mod1
-------------------------------------------------------------------------------
Source | SS
df MS
Number of obs = 102
-------------+------------------------------ F( 2,
99) = 195.55
Model | 23856.5752 2
11928.2876 Prob > F
= 0.0000
Residual | 6038.85086 99
60.9984935 R-squared =
0.7980
-------------+------------------------------ Adj
R-squared = 0.7939
Total | 29895.4261 101
295.994318 Root MSE =
7.8102
------------------------------------------------------------------------------
prestige | Coef. Std.
Err. t P>|t|
[95% Conf. Interval]
-------------+----------------------------------------------------------------
educat
| 4.137444 .348912
11.86 0.000 3.445127
4.829762
income | .0013612
.0002242 6.07 0.000
.0009163 .0018061
_cons | -6.847778 3.218977
-2.13 0.036 -13.23493
-.4606292
------------------------------------------------------------------------------
.
. mat varb=$mse*xxinv
. mat lis varb
symmetric varb[3,3]
cons educat income
cons 10.361814
educat -1.0000864
.1217396
income .00014345 -.00004518
5.027e-08
.
. *
each diagonal element of varb
is the variance of the estimate of b. I.e.,
>
take the
. *
square root of each to get the se, and compare to the
results from the regr
>
ess command.
.
. *
the key point in all of this: the process of
estimating a regression is, ev
>
en in a multivariate case,
. *
a mechanical process.
There is nothing magical about it.
.
. log close
log: C:\papers\soc709\lech2.log
log type: text
closed on: 2 Feb 2007, 13:01:18
-------------------------------------------------------------------------------
Fox page 219: Maximum Likelihood estimation.
First review: the discussion of likelihood in lecture d .
With the exception of a few models like OLS that can be solved by hand (as we have done in class), most statistical models are estimated by “maximum likelihood”
This means, quite literally, finding the solution that maximizes the likelihood of observing the data.
*Q: Example: If we are drawing from a normal distribution with standard deviation 1 and unknown mean, and we observe X=3 & 6 on our first two samples, what is more likely to be the true mean: 0 or 4? Why?
Hint: draw a normal curve with mean 0 and mean 4, and plot the points 3 & 6.
The height of the normal curve at Z=X is the probability density at X.
*Q: What is the probability of flipping a coin 3 times and getting 3 tails?
To calculate the likelihood function for N cases, we multiple the probability densities together.
Why?
Calculate the likelihood of X=3,6 given
a. X~N(0,1)
b. X~N(4,1)
c. X~N(1,6)
In this case, if we were estimating the mean and standard deviation we would have 2 parameters. In a regression equation with 5 variables, we have 6 parameters (including the constant term). How do you think a computer does it?