Stata lecture 1
link to do file for this lecture
We will be using Stata as a tool to do statistics and develop an intuition about the models we are learning.
I will tell you everything you need to know about Stata for the class as we progress through the course.
Ask for help on Stata when you need it. Do not spend hours in silent frustration (or worse). If you get stuck, just stop and ask me the next day.
A few key points before you ask for help:
1. Starting stata
2. Set the working directory for the directory where the data and programming files are located.
In Stata, type: pwd
to see what working directory you are in.
cd c:\anaylsis
for example.
3. Use the Stata help. type help command or lookup command.
Think of Stata as a new language with about 150 new words (i.e. commands) that you need to learn.
do files and ado files
A do file is a list of Stata commands, an ado file is a new, user written Stata program.
a key benefit of ado files is that they allow you to loop, such as:
----------my program.ado----------------
program define myprogram
local i=1
while i<101 {
sum x if city==`i’
local i=`i’+1
}
end
-----------------------------------------------------
locals and macros: i in this example is a “local” variable, just a counter, that goes from 1 to 100.
Entering data.
use edit to enter data.
For example:
clear
set obs 10
gen x=1
gen y=1
now, edit (watch in Stata). I will enter a bunch of new values.
Let’s preview next week’s material and estimate the regression line.
Graph it first:
twoway (scatter y x), ti(“A Nonsense Graph”) note(“enter note here”) saving(graph1, replace)

regress y x
.
regress y x
Source | SS
df MS Number of obs = 10
-------------+------------------------------ F(
1, 8) = 35.70
Model | 1745.7
1 1745.7 Prob > F =
0.0003
Residual | 391.2
8 48.9 R-squared =
0.8169
-------------+------------------------------ Adj R-squared = 0.7940
Total | 2136.9
9 237.433333 Root MSE =
6.9929
------------------------------------------------------------------------------
y | Coef.
Std. Err. t P>|t|
[95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 4.6
.7698878 5.97 0.000
2.824635 6.375365
_cons | -5.4
4.777028 -1.13 0.291
-16.41585 5.615847
------------------------------------------------------------------------------
predict yhat
twoway (scatter y x) (scatter yhat x, c(l) s(.)) , ti(“Another Nonsense Graph”) note(“With Regression Line”) saving(graph2, replace)

Example of local smoothing:
use occpaper_iz
lowess varlog year, bwidth(.4)
saving(occpaper_varlog, replace) ti("Figure 1: Variance of Log Wages,
1983-2006") /*
*/ note("note: line generated with lowess
smoother, bandwith .4. See text for
details")
graph created by this command

Another example:
Figure 4: Hypotheses 1 and 2

Now let’s do the regression step by step (we will
learn this next week, I am just giving you the Stata commands here)
sum
help egen
egen xmean=mean(x)
egen ymean=mean(y)
gen x1=x-xmean
gen y1=y-ymean
gen x1sq=(x1)^2
gen y1sq=(y1)^2
gen x1y1=x1*y1
list
Homework for next week:
|
X |
Y |
|
|
|
|
( |
|
|
1 |
4 |
|
|
|
|
|
|
|
2 |
3 |
|
|
|
|
|
|
|
3 |
6 |
|
|
|
|
|
|
|
4 |
7 |
|
|
|
|
|
|
|
5 |
9 |
|
|
|
|
|
|
|
6 |
7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
sum x1sq
help return list
return list
di r(sum)
global x1sq=r(sum)
* note could also use egen
sum x1y1
di r(sum)
global x1y1=r(sum)
di ${x1y1}/${x1sq}
* compare to results of reg command above.
Next: let’s
make a pretend data set of random variables.
I’ll explain this step by step
clear
set obs 1000
gen n=_n/1000
gen x=uniform()
gen y1=invnorm(n)
gen y2=invnorm(x)
gen y3=invnorm(uniform())
gen pdensity=normalden(y1)
twoway (scatter pdensity y1, c(l) s(i)),
saving(lec11, replace)
twoway (scatter n y1, c(l) s(i)), saving(lec12,
replace)
graph combine lec11.gph lec12.gph, saving(lec13,
replace)

sum
* help functions
di invnorm(.7)
di invnorm(.9)
di norm(1.282)
di normden(1.282)