Stata lecture 1

 

link to do file for this lecture

 

We will be using Stata as a tool to do statistics and develop an intuition about the models we are learning. 

 

I will tell you everything you need to know about Stata for the class as we progress through the course.

 

Ask for help on Stata when you need it.  Do not spend hours in silent frustration (or worse).  If you get stuck, just stop and ask me the next day.

 

A few key points before you ask for help:

1. Starting stata

2.  Set the working directory for the directory where the data and programming files are located.

In Stata, type:  pwd

to see what working directory you are in.

cd  c:\anaylsis

for example.

3.  Use the Stata help.  type help command or lookup command.

 

 

Think of Stata as a new language with about 150 new words (i.e. commands) that you need to learn.

 

do files and ado files

 

A do file is a list of Stata commands, an ado file is a new, user written Stata program.

a key benefit of ado files is that they allow you to loop, such as:

 

----------my program.ado----------------

 

program define myprogram

local i=1

while i<101 {

            sum x if city==`i’

            local i=`i’+1

            }

 

end

 

-----------------------------------------------------

 

locals and macros:  i in this example is a “local” variable, just a counter, that goes from 1 to 100.

 

Entering data.

 

use edit to enter data.

 

For example:

 

clear

set obs 10

gen x=1

gen y=1

 

now, edit  (watch in Stata).  I will enter a bunch of new values.

 

Let’s preview next week’s material and estimate the regression line.

 

Graph it first:

twoway (scatter y x), ti(“A Nonsense Graph”)  note(“enter note here”) saving(graph1, replace)

 

 

regress y x

 

. regress y x

 

      Source |       SS       df       MS              Number of obs =      10

-------------+------------------------------           F(  1,     8) =   35.70

       Model |      1745.7     1      1745.7           Prob > F      =  0.0003

    Residual |       391.2     8        48.9           R-squared     =  0.8169

-------------+------------------------------           Adj R-squared =  0.7940

       Total |      2136.9     9  237.433333           Root MSE      =  6.9929

 

------------------------------------------------------------------------------

           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

           x |        4.6   .7698878     5.97   0.000     2.824635    6.375365

       _cons |       -5.4   4.777028    -1.13   0.291    -16.41585    5.615847

------------------------------------------------------------------------------

 

predict yhat

twoway (scatter y x) (scatter yhat x, c(l) s(.)) , ti(“Another Nonsense Graph”)  note(“With Regression Line”) saving(graph2, replace)

 

 

 

Example of local smoothing:

use occpaper_iz

lowess varlog year, bwidth(.4) saving(occpaper_varlog, replace) ti("Figure 1: Variance of Log Wages, 1983-2006") /*

*/ note("note: line generated with lowess smoother, bandwith .4.  See text for details")

 

graph created by this command

 

 

Another example:

 

Figure 4:  Hypotheses 1 and 2

 

 

 


 

 

 

 

Now let’s do the regression step by step (we will learn this next week, I am just giving you the Stata commands here)

 

 

sum

help egen

egen xmean=mean(x)

egen ymean=mean(y)

 

gen x1=x-xmean

gen y1=y-ymean

 

gen x1sq=(x1)^2

gen y1sq=(y1)^2

 

gen x1y1=x1*y1

 

list

 

Homework for next week:

 

X

Y

( ) * ( )

 

1

 

4

 

 

 

 

 

 

2

 

3

 

 

 

 

 

 

3

 

6

 

 

 

 

 

 

4

 

7

 

 

 

 

 

 

5

 

9

 

 

 

 

 

 

6

 

7

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

sum x1sq

help return list

return list

di r(sum)

global x1sq=r(sum)

 

* note could also use egen

sum x1y1

di r(sum)

global x1y1=r(sum)

 

di  ${x1y1}/${x1sq}

 

* compare to results of reg command above.

 

 

Next:  let’s make a pretend data set of random variables.

I’ll explain this step by step

 

 

clear

set obs 1000

gen n=_n/1000

 

gen x=uniform()

 

gen y1=invnorm(n)

gen y2=invnorm(x)

gen y3=invnorm(uniform())

 

gen pdensity=normalden(y1)

 

twoway (scatter pdensity y1, c(l) s(i)), saving(lec11, replace)

twoway (scatter n y1, c(l) s(i)), saving(lec12, replace)

graph combine lec11.gph lec12.gph, saving(lec13, replace)

 

 

sum

 

* help functions

 

di invnorm(.7)

di invnorm(.9)

di norm(1.282)

di normden(1.282)