------------------------------------------------------------------------------- log: C:\papers\soc709\final_example.log log type: text opened on: 25 Apr 2007, 23:25:43 . . use final_1 . . * begin analysis . . . /* > checklist: > 1. sampling probability: create weights > 2. influential cases? > 3. heteroskedasticity? > 4. multicollinearity? > 5. missing data? > 6. estimate and interpret cofficients > (note: real data may be more complex...(explain) > 7. iv? > */ . . * your analysis will differ depending on the specific characteristics . * of your data. . * remember to explain why you do things and how you interpret the output . * and the potential consequences of choices you have made and problems you en > counter . . notes _dta: 1. final exam, # 1 . des Contains data from final_1.dta obs: 1,700 vars: 6 25 Apr 2007 22:42 size: 47,600 (99.5% of memory free) (_dta has notes) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- effort float %9.0g effort rocky float %9.0g cheers for rocky tv float %9.0g average hours of tv per day gender float %9.0g gender a=0, b=1 math float %9.0g math test score p float %9.0g sampling probability ------------------------------------------------------------------------------- Sorted by: . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- effort | 1386 .0197985 .9794829 -3.588318 3.326167 rocky | 1700 .0066574 .9950269 -3.417665 3.16118 tv | 1700 6.567307 1.448899 4.001171 8.999391 gender | 1700 1.470588 .4992811 1 2 math | 1700 5928.752 242534.5 32.33077 1.00e+07 -------------+-------------------------------------------------------- p | 1700 .8529412 .0499281 .8 .9 . . . gen wgt=1/p . . . * influential cases . reg math effort tv gender Source | SS df MS Number of obs = 1386 -------------+------------------------------ F( 3, 1382) = 0.51 Model | 1.1122e+11 3 3.7074e+10 Prob > F = 0.6732 Residual | 9.9816e+13 1382 7.2226e+10 R-squared = 0.0011 -------------+------------------------------ Adj R-squared = -0.0011 Total | 9.9927e+13 1385 7.2149e+10 Root MSE = 2.7e+05 ------------------------------------------------------------------------------ math | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- effort | 911.7908 7379.991 0.12 0.902 -13565.4 15388.99 tv | 3252.157 5257.012 0.62 0.536 -7060.429 13564.74 gender | 15655.17 14474.51 1.08 0.280 -12739.21 44049.55 _cons | -35960.41 39960.79 -0.90 0.368 -114350.8 42429.96 ------------------------------------------------------------------------------ . predict d, cooksd (314 missing values generated) . . sum d Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- d | 1386 .0004594 .0170482 2.20e-14 .6346906 . list if d>.3 & d~=. +----------------------------------------------------------------+ 98. | effort | rocky | tv | gender | math | p | wgt | | .1056299 | -.161703 | 7.052979 | 2 | 1.00e+07 | .8 | 1.25 | |----------------------------------------------------------------| | d | | .6346906 | +----------------------------------------------------------------+ . drop if d>.3 & d~=. (1 observation deleted) . . * heteroskedasticity . reg math effort tv gender [w=wgt] (analytic weights assumed) (sum of wgt is 1.6283e+03) Source | SS df MS Number of obs = 1385 -------------+------------------------------ F( 3, 1381) = 391.01 Model | 10567.2907 3 3522.43022 Prob > F = 0.0000 Residual | 12440.7477 1381 9.00850664 R-squared = 0.4593 -------------+------------------------------ Adj R-squared = 0.4581 Total | 23008.0383 1384 16.6243052 Root MSE = 3.0014 ------------------------------------------------------------------------------ math | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- effort | 2.199172 .0824181 26.68 0.000 2.037494 2.360851 tv | -.9778859 .0586853 -16.66 0.000 -1.093008 -.8627639 gender | 1.898619 .161335 11.77 0.000 1.582131 2.215107 _cons | 50.087 .4484805 111.68 0.000 49.20722 50.96678 ------------------------------------------------------------------------------ . estat hettest, rhs Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: effort tv gender chi2(3) = 4.86 Prob > chi2 = 0.1822 . estat hettest tv Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: tv chi2(1) = 4.14 Prob > chi2 = 0.0419 . predict ehat, resid (314 missing values generated) . gen ehat2=ehat^2 (314 missing values generated) . reg ehat2 effort tv gender [w=wgt] (analytic weights assumed) (sum of wgt is 1.6283e+03) Source | SS df MS Number of obs = 1385 -------------+------------------------------ F( 3, 1381) = 1.72 Model | 784.618979 3 261.53966 Prob > F = 0.1610 Residual | 210019.902 1381 152.078134 R-squared = 0.0037 -------------+------------------------------ Adj R-squared = 0.0016 Total | 210804.521 1384 152.315406 Root MSE = 12.332 ------------------------------------------------------------------------------ ehat2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- effort | .2813946 .3386333 0.83 0.406 -.3828966 .9456858 tv | .5150412 .2411219 2.14 0.033 .0420364 .988046 gender | .1902531 .6628811 0.29 0.774 -1.11011 1.490616 _cons | 5.484591 1.842683 2.98 0.003 1.869832 9.099351 ------------------------------------------------------------------------------ . . gen wgt2=wgt/tv . reg math effort tv gender [w=wgt2] (analytic weights assumed) (sum of wgt is 2.7460e+02) Source | SS df MS Number of obs = 1385 -------------+------------------------------ F( 3, 1381) = 395.31 Model | 10512.2799 3 3504.0933 Prob > F = 0.0000 Residual | 12241.5538 1381 8.86426778 R-squared = 0.4620 -------------+------------------------------ Adj R-squared = 0.4608 Total | 22753.8337 1384 16.4406313 Root MSE = 2.9773 ------------------------------------------------------------------------------ math | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- effort | 2.191303 .0815568 26.87 0.000 2.031314 2.351291 tv | -.9961214 .0601441 -16.56 0.000 -1.114105 -.8781376 gender | 1.900733 .1600433 11.88 0.000 1.586779 2.214687 _cons | 50.19758 .4413106 113.75 0.000 49.33187 51.06329 ------------------------------------------------------------------------------ . estat hettest, rhs Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: effort tv gender chi2(3) = 4.71 Prob > chi2 = 0.1945 . capture drop ehat ehat2 . predict ehat, resid (314 missing values generated) . gen ehat2=ehat^2 (314 missing values generated) . reg ehat2 effort tv gender [w=wgt2] (analytic weights assumed) (sum of wgt is 2.7460e+02) Source | SS df MS Number of obs = 1385 -------------+------------------------------ F( 3, 1381) = 1.63 Model | 735.534139 3 245.178046 Prob > F = 0.1796 Residual | 207198.567 1381 150.035168 R-squared = 0.0035 -------------+------------------------------ Adj R-squared = 0.0014 Total | 207934.101 1384 150.241402 Root MSE = 12.249 ------------------------------------------------------------------------------ ehat2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- effort | .3243328 .3355331 0.97 0.334 -.333877 .9825425 tv | .495215 .2474392 2.00 0.046 .0098178 .9806123 gender | .2511037 .6584347 0.38 0.703 -1.040537 1.542744 _cons | 5.517 1.815597 3.04 0.002 1.955373 9.078626 ------------------------------------------------------------------------------ . . . reg math effort tv gender [w=wgt2], robust (analytic weights assumed) (sum of wgt is 2.7460e+02) Linear regression Number of obs = 1385 F( 3, 1381) = 372.55 Prob > F = 0.0000 R-squared = 0.4620 Root MSE = 2.9773 ------------------------------------------------------------------------------ | Robust math | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- effort | 2.191303 .0839095 26.12 0.000 2.026699 2.355907 tv | -.9961214 .059882 -16.63 0.000 -1.113591 -.8786518 gender | 1.900733 .1631232 11.65 0.000 1.580737 2.220729 _cons | 50.19758 .4448699 112.84 0.000 49.32489 51.07028 ------------------------------------------------------------------------------ . . * multicollinearity . * substitute current #1 model here . reg math effort tv gender [w=wgt2] (analytic weights assumed) (sum of wgt is 2.7460e+02) Source | SS df MS Number of obs = 1385 -------------+------------------------------ F( 3, 1381) = 395.31 Model | 10512.2799 3 3504.0933 Prob > F = 0.0000 Residual | 12241.5538 1381 8.86426778 R-squared = 0.4620 -------------+------------------------------ Adj R-squared = 0.4608 Total | 22753.8337 1384 16.4406313 Root MSE = 2.9773 ------------------------------------------------------------------------------ math | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- effort | 2.191303 .0815568 26.87 0.000 2.031314 2.351291 tv | -.9961214 .0601441 -16.56 0.000 -1.114105 -.8781376 gender | 1.900733 .1600433 11.88 0.000 1.586779 2.214687 _cons | 50.19758 .4413106 113.75 0.000 49.33187 51.06329 ------------------------------------------------------------------------------ . estat vif Variable | VIF 1/VIF -------------+---------------------- tv | 1.00 0.997799 effort | 1.00 0.998266 gender | 1.00 0.999529 -------------+---------------------- Mean VIF | 1.00 . . . . ice math effort tv gender using impute [w=wgt2], m(5) replace (analytic weights assumed) #missing | values | Freq. Percent Cum. ------------+----------------------------------- 0 | 1,385 81.52 81.52 1 | 314 18.48 100.00 ------------+----------------------------------- Total | 1,699 100.00 Variable | Command | Prediction equation ------------+---------+------------------------------------------------------- math | | [No missing data in estimation sample] effort | regress | math tv gender tv | | [No missing data in estimation sample] gender | | [No missing data in estimation sample] ------------------------------------------------------------------------------ Imputing [Only 1 variable to be imputed, therefore no cycling needed.] 1..2..3..4..5..file impute.dta saved . drop _all . use impute . * substitute current #1 model here . micombine reg math effort tv gender [w=wgt2] (analytic weights assumed) Multiple imputation parameter estimates (5 imputations) ------------------------------------------------------------------------------ math | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- effort | 2.201283 .0767224 28.69 0.000 2.050802 2.351764 tv | -1.015252 .0544317 -18.65 0.000 -1.122012 -.9084916 gender | 1.848889 .1489589 12.41 0.000 1.556726 2.141052 _cons | 50.37854 .4035057 124.85 0.000 49.58712 51.16996 ------------------------------------------------------------------------------ 1699 observations (imputation 1). . . * iv? (remember untestable assumptions must be argued theoretically) . reg effort rocky Source | SS df MS Number of obs = 9880 -------------+------------------------------ F( 1, 9878) = 2168.00 Model | 1708.84374 1 1708.84374 Prob > F = 0.0000 Residual | 7785.94843 9878 .788211017 R-squared = 0.1800 -------------+------------------------------ Adj R-squared = 0.1799 Total | 9494.79216 9879 .961108631 Root MSE = .88781 ------------------------------------------------------------------------------ effort | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- rocky | .4185066 .0089882 46.56 0.000 .4008879 .4361253 _cons | -.0020256 .0089321 -0.23 0.821 -.0195344 .0154832 ------------------------------------------------------------------------------ . predict efforthat (option xb assumed; fitted values) . micombine reg math efforthat tv gender [w=wgt2] (analytic weights assumed) Multiple imputation parameter estimates (5 imputations) ------------------------------------------------------------------------------ math | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- efforthat | 1.578606 .2097129 7.53 0.000 1.167283 1.98993 tv | -1.097197 .0605262 -18.13 0.000 -1.215911 -.9784827 gender | 1.813768 .1753662 10.34 0.000 1.469811 2.157725 _cons | 50.96195 .4699258 108.45 0.000 50.04025 51.88364 ------------------------------------------------------------------------------ 1699 observations (imputation 1). . . ivreg math (effort=rocky) tv gender [w=wgt] (analytic weights assumed) (sum of wgt is 1.1622e+04) Instrumental variables (2SLS) regression Source | SS df MS Number of obs = 9880 -------------+------------------------------ F( 3, 9876) = 1320.91 Model | 75105.139 3 25035.0463 Prob > F = 0.0000 Residual | 91928.6575 9876 9.30828853 R-squared = 0.4496 -------------+------------------------------ Adj R-squared = 0.4495 Total | 167033.796 9879 16.907966 Root MSE = 3.0509 ------------------------------------------------------------------------------ math | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- effort | 1.582425 .0739246 21.41 0.000 1.437518 1.727332 tv | -1.027416 .0214579 -47.88 0.000 -1.069478 -.9853542 gender | 1.830954 .0614126 29.81 0.000 1.710573 1.951335 _cons | 50.48728 .1708185 295.56 0.000 50.15244 50.82212 ------------------------------------------------------------------------------ Instrumented: effort Instruments: tv gender rocky ------------------------------------------------------------------------------ . . . log close log: C:\papers\soc709\final_example.log log type: text closed on: 25 Apr 2007, 23:25:48 -------------------------------------------------------------------------------