Lecture 26—Monday, February 27, 2006

What was covered?

Terminology Defined

Using Categorical Variables in Multiple Regression—Continued

A categorical variable with three levels—the wrong way to include it in a regression model

Label
Occupation Type
blue collar
1
professional
2
white collar
3

A categorical variable with three levels—a correct way to include it in a regression model

Note: by default R alphabetizes the levels of the character variable and defines the dummy coding levels in alphabetical order. As a result the first level alphabetically becomes the baseline level.

lm(prestige~education+occupation)

we obtain the following regression model by default.

Coding Schemes of Regressors for Categorical Variables

lm(prestige~occupation)

Dummy (Indicator) Coding

Deviation (Effects) Coding

So we see that intercept in deviation coding corresponds to the mean of all three levels.

Helmert Coding

contrasts(type)<-'contr.helmert'

The best way to interpret this coefficient is to observe that if β1 is not significantly different from zero we would conclude that the mean prestige for professionals and the mean prestige for blue collar workers are not significantly different from each other.

The best way to interpret this coefficient is to observe that if β2 is not significantly different from zero we would conclude that the mean prestige of white collar workers is not significantly different from the mean prestige of blue collar and professional workers together.

Course Home Page


Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516
Copyright © 2006
Last Revised--March 2, 2006
URL: http://www.unc.edu/courses/2006spring/ecol/145/001/docs/lectures/lecture26.htm