Sociology 709

Lecture H2

Revised 1/27/2007 10:59 PM

 

1.  T-tests and confidence intervals, take II

2.  Calculating the standard errors of the coefficients, by hand using matrices.

3.  Maximum likelihood estimation.

 

 

1.  T-tests and confidence intervals, take II

(Hand out cumulative normal table)

 

Note:  To convert a normal variable to a standard normal: 

 

*Q1:  If we have a normal variable X with mean 0 and standard deviation 2 (i.e., X~N(0,2) ) , what is the probability that |X|  (i.e., the absolute value of X, or the “magnitude” X) will be greater than or equal to 1?

 

* Q2.  If X~N(0,5) what is the probability that |X|10?

 

Q3.  In a regression of income on education with 2000 cases, the estimated coefficient, B, is 4 with standard error 3. 

 

(Why is it convenient to use a large number of cases here?)

 

The null hypothesis, H0, is that the actual coefficient is 0. 

 

If H0 were true, what is the chance that |B|4?  In other words, what is the chance of observing a coefficient at least as big in magnitude as the one we estimated?

 

Q4.  In a regression of income on education with 2000 cases, the estimated coefficient, B, is 6 with standard error 3. 

 

Test H0: =0

 

Summary:  To test the null hypothesis that =0, take B/SE(B) and compare it to the critical T-value.  For large N, the p=.05 level of significance is 1.96. 

What you are implicitly doing when you calculate this is asking “if H0 is true, what is the chance of observing a T-value that is at least this big.”  If the chance is small, i.e., less than 5% (p<.05) then you reject H0 at the .05 level of significance. 

 

 

Q4:  Describe in words where, in simple regression, the standard error of B comes from.

 

 

 

2.  Calculating the standard errors of the coefficients, by hand using matrices.

As we discussed in the last class, in matrix notation, the estimated coefficients for multiple regression can be found by solving

 

 

 

From Fox, p.216, the estimated coefficients are distributed as:

 

(1)   

 

In words, this means they are distributed as multivariate normal variables that have a mean of  and a correlation matrix . 

 

P. 217 of Fox notes the similarity between the formulas for simple regression and multiple regression.

 

Let’s go through an example of calculating equation 1.   First we need to find the standard error of the error term.

 

The squared sum of errors is given on p.212 of Fox,

 

 

Let’s use the prestige data from Fox.

 

The do file for this example, lech2.do

 

Note: on long lists, I am only showing the first 5-6 cases.

-------------------------------------------------------------------------------

       log:  C:\papers\soc709\lech2.log

  log type:  text

 opened on:   2 Feb 2007, 13:01:17

 

. * to view a log file (in this case lech2.log) go to file-->log-->view in Stat

> a

. * or open it up in word and set the font to courier new, 9 or 10 point

.

. use prestige

(From Fox, Applied Regression Analysis.  Use 'notes' command for source of data

> )

 

.

. * let's first check what the right answer should be

. reg prestige educ inc

 

      Source |       SS       df       MS              Number of obs =     102

-------------+------------------------------           F(  2,    99) =  195.55

       Model |  23856.5752     2  11928.2876           Prob > F      =  0.0000

    Residual |  6038.85086    99  60.9984935           R-squared     =  0.7980

-------------+------------------------------           Adj R-squared =  0.7939

       Total |  29895.4261   101  295.994318           Root MSE      =  7.8102

 

------------------------------------------------------------------------------

    prestige |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

      educat |   4.137444    .348912    11.86   0.000     3.445127    4.829762

      income |   .0013612   .0002242     6.07   0.000     .0009163    .0018061

       _cons |  -6.847778   3.218977    -2.13   0.036    -13.23493   -.4606292

------------------------------------------------------------------------------

 

. est store mod1

 

. predict yhat

(option xb assumed; fitted values)

 

. list yhat educ inc

 

     +----------------------------+

     |     yhat   educat   income |

     |----------------------------|

  1. | 64.20587    13.11    12351 |

  2. |  79.1029    12.26    25879 |

  3. | 58.60675    12.77     9271 |

  4. | 52.46857    11.42     8865 |

  5. | 65.07954    14.62     8403 |

 

.

. * type help mkmat to learn about the mkmat command

.

. gen cons=1

 

.

. list cons educat income

 

     +------------------------+

     | cons   educat   income |

     |------------------------|

  1. |    1    13.11    12351 |

  2. |    1    12.26    25879 |

  3. |    1    12.77     9271 |

  4. |    1    11.42     8865 |

  5. |    1    14.62     8403 |

     |------------------------|

 

.

. mkmat cons educat income, mat(x)

 

.

. * this makes a matrix x with the variables cons, educat, and income.

.

. mat list x

 

x[102,3]

           cons     educat     income

  r1          1      13.11      12351

  r2          1      12.26      25879

  r3          1      12.77       9271

  r4          1      11.42       8865

  r5          1      14.62       8403

  r6          1      15.64      11030

 

.

. mkmat prestige, mat(y)

 

.

. mat xx=x'*x

 

.

. mat lis xx

 

symmetric xx[3,3]

             cons     educat     income

  cons        102

educat    1095.28  12513.045

income     693386  8121410.2  6.534e+09

 

. mat xxinv=inv(xx)

 

. mat lis xxinv

 

symmetric xxinv[3,3]

              cons      educat      income

  cons   .16986999

educat  -.01639526   .00199578

income   2.352e-06  -7.407e-07   8.241e-10

 

.

. mat a=xxinv*xx

 

. mat lis a

 

symmetric a[3,3]

              cons      educat      income

  cons           1

educat  -1.110e-16           1

income           0           0           1

 

. * note that a is the identity matrix.

.

. mat xy=x'*y

 

. mat lis xy

 

xy[3,1]

         prestige

  cons       4777

educat  55326.378

income   37748108

 

. mat b=xxinv*xy

 

. * b will be the OLS estimates.

. mat lis b

 

b[3,1]

          prestige

  cons  -6.8477782

educat   4.1374443

income   .00136117

 

.

. * ok, we went this far in lech_matrix.do

. * now we want to calculate the error terms

.

. * first calculate the predicted values

. mat yhat=x*b

 

.

. mat lis yhat

 

yhat[102,1]

       prestige

  r1  64.205875

  r2    79.1029

  r3  58.606756

  r4  52.468571

  r5  65.079534

  r6  72.875512

.

. mat e=y-yhat

 

. mat lis e

 

e[102,1]

        prestige

  r1   4.5941285

  r2  -10.002902

  r3   4.7932454

  r4   4.3314278

  r5   8.4204661

  r6   4.7244869

 

. mat ee=e'*e

 

. mat lis ee

 

symmetric ee[1,1]

           prestige

prestige  6038.8509

 

.

. * this is the sum of squared errors.

. * to calculate the mean squared error (i.e. the variance of the error term),

> divide ee by n-(k+1)

.

. global n=_N

 

. global ee=ee[1,1]

 

. * $ee , a global variable, now is the sum of squared errors

.

. di  $ee/($n-(2+1))

60.998494

 

.

. global mse=$ee/($n-(2+1))

 

.

.

. *  Q: find this number in the regression table from the top of this log file.

.

. * ok, we have found the mean squared error.

. * now, from formula 1 in this lecture, we can calculate the standard error of

>  the coefficients.

.

. * first replay the original regression results

. est replay mod1

 

-------------------------------------------------------------------------------

Model mod1

-------------------------------------------------------------------------------

 

      Source |       SS       df       MS              Number of obs =     102

-------------+------------------------------           F(  2,    99) =  195.55

       Model |  23856.5752     2  11928.2876           Prob > F      =  0.0000

    Residual |  6038.85086    99  60.9984935           R-squared     =  0.7980

-------------+------------------------------           Adj R-squared =  0.7939

       Total |  29895.4261   101  295.994318           Root MSE      =  7.8102

 

------------------------------------------------------------------------------

    prestige |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

      educat |   4.137444    .348912    11.86   0.000     3.445127    4.829762

      income |   .0013612   .0002242     6.07   0.000     .0009163    .0018061

       _cons |  -6.847778   3.218977    -2.13   0.036    -13.23493   -.4606292

------------------------------------------------------------------------------

 

.

. mat varb=$mse*xxinv

 

. mat lis varb

 

symmetric varb[3,3]

              cons      educat      income

  cons   10.361814

educat  -1.0000864    .1217396

income   .00014345  -.00004518   5.027e-08

 

.

. * each diagonal element of varb is the variance of the estimate of b.  I.e.,

> take the

. * square root of each to get the se, and compare to the results from the regr

> ess command.

.

. * the key point in all of this: the process of estimating a regression is, ev

> en in a multivariate case,

. * a mechanical process.  There is nothing magical about it.

.

. log close

       log:  C:\papers\soc709\lech2.log

  log type:  text

 closed on:   2 Feb 2007, 13:01:18

-------------------------------------------------------------------------------

 

 

 

Fox page 219: Maximum Likelihood estimation.

 

First review:  the discussion of likelihood in lecture d .

 

With the exception of a few models like OLS that can be solved by hand (as we have done in class), most statistical models are estimated by “maximum likelihood”

 

This means, quite literally, finding the solution that maximizes the likelihood of observing the data.

 

*Q:  Example:  If we are drawing from a normal distribution with standard deviation 1 and unknown mean, and we observe X=3 & 6  on our first two samples, what is more likely to be the true mean:  0 or 4?  Why?

Hint: draw a normal curve with mean 0 and mean 4, and plot the points 3 & 6.

 

 

The height of the normal curve at Z=X is the probability density at X.

 

*Q:  What is the probability of flipping a coin 3 times and getting 3 tails?

 

To calculate the likelihood function for N cases, we multiple the probability densities together.

Why?

 

Calculate the likelihood of  X=3,6 given

a.  X~N(0,1)

b. X~N(4,1)

c. X~N(1,6)

 

In this case, if we were estimating the mean and standard deviation we would have 2 parameters.  In a regression equation with 5 variables, we have 6 parameters (including the constant term).  How do you think a computer does it?