Lecture 24—Wednesday, February 22, 2006
What was covered?
- The use of an offset in generalized linear models
- Deviance and Pearson residuals
- The uses of multiple regression in observational studies
Terminology Defined
The offset in count data regression models
- It can happen in obtaining count data that the observed counts are not equivalent.
- If the counts are obtained over time, the lengths of time, ti, may vary for each observation.
- If the counts are obtained in space, the areas in which the counts occur, Ai, may vary between observations.
- Even if the time interval and area are standardized, populations densities, Ni, may vary across sample units.
- Note in each of these cases it would make sense to work with the rate of occurrence—number of observations per unit time, number of observations per unit area, or a per capita rate.
- A standard approach to dealing with such a lack of equivalence while still treating this as a model of count data is to include what's called an offset in the regression model for these data. An offset here means a term of the form,
,
, or
included in the model but with coefficient constrained to be equal to 1. Thus an offset is a term for which a coefficient is not estimated.
- To understand what this is supposed to accomplish consider the scenario where the counts are obtained from different sized areas. Suppose that our model has p predictors,
. We fit a count regression model for the mean (either as a Poisson or negative binomial regression) using a log link and an offset.

- An equivalent way of writing this equation is the following.

Thus by including an offset we end up fitting a model for the rate of occurrence, as was desired.
- The use of an offset then is just a trick that allows us to use Poisson or negative binomial regression, which are only appropriate for count data, to fit a rate model.
- Note: The inclusion of an offset is primarily for purposes of interpretation. If the interpretation of the response as a rate is not important, but the controlling for the lack of equivalence is, it is perfectly legitimate to include a time, area, or population term in the model as a covariate. When included as a covariate rather than an offset, its regression coefficient is estimated instead of being set to one. Also when a variable is included as a covariate it may not be necessary or desirable to log-transform it first as is done when it is entered as an offset.
- Covariate is the term used for a predictor that is included for control purposes only, i.e., to reduce bias in or to increase the precision of parameters of interest. Covariates generally are variables that are of no interest by themselves but are included solely because it is believed that to omit them would deleteriously alter the results.
Residuals for generalized linear models
- In the ordinary normal-based multiple regression model, we write

where εi is referred to as the error. When we obtain estimates for the parameters we write

so that
is an estimate of the errors, εi. The ei are called residuals, or more properly, the raw residuals (also called the response residuals) of the model.
- For generalized linear models the raw residuals are seldom useful. Instead two other kinds of residuals, deviance residuals and Pearson residuals, are used.
Deviance residuals
- Previously, for probability models from the exponential family, we called the expression
the deviance of the model. If we define

then

is called a deviance residual. Here sign denotes the sign function which is defined as follows.
- The deviance residual measures the ith observation's contribution to the deviance. Since under ideal conditions the deviance can be used as a measure of lack of fit, the deviance residual measures the ith observation's contribution to model lack of fit.
- For the Poisson model we found that
. In addition, for the saturated model
. Thus the deviance takes the following form for a Poisson regression model.

Thus the deviance residual for a Poisson model is the following.
Pearson residual
- The Pearson residual is a direct modification of the raw residual to account for the fact that in GLIMs the variance is a function of the mean.

- If we square the Pearson residual we have

where for a distribution in the exponential family
.
- Now for a Poisson model,
. So in this case the squared Pearson residuals take the following form.

- Thus the Pearson residual when squared can be viewed as the contribution the ith observation makes to the Pearson statistic. We'll consider residual analysis more carefully in one of our computer sessions.
The systematic component of GLIMs—Uses for multiple regression in observational studies
- The simplest way of describing multiple regression is to say that "multiple regression is a tool for analyzing observational studies."
- Essentially we use multiple regression to statistically control for those features that weren't accounted for in the design of the study.
- This definition is a bit deceptive because experimental data can also be analyzed using multiple regression. The analyses of standard experimental designs—one-way designs, randomized block designs, nested designs, etc.—are all variations on multiple regression.
- The most obvious difference between multiple linear regression and simple linear regression is that the effects of multiple predictors are treated in concert rather than individually. When viewed in this way, multiple regression can be a very powerful tool for scientific discovery.
- I focus here on three primary uses of multiple regression.
- Inferring causal relations
- Controlling precision
- Avoiding bias
- To illustrate some of these uses I consider the following ancient study from the field of sociology (Duncan 1961) and figures prominently in Fox's book on regression analysis using R (Fox 2002).
- The study made use of social survey data from 1961 in which the public perception of certain occupations was evaluated and related to various tangible characteristics of those occupations. The tangible characteristics were obtained from the 1950 U.S. census.
- The variables include the following.
- incomethe percentage of individuals in various occupations who earned $3500 or more.
- educationthe percentage of individuals in various occupations who were high school graduates.
- prestigethe percentage of 1961 survey respondents who rated the occupation as good or better.
- job typea categorical variable classifying the occupations as blue collar (bc), white collar (wc), or professional and managerial (prof).
- We'll consider this study further next time.
Cited References
- Duncan, O. D. 1961. A socioeconomic index for all occupations. In A. J. Reiss, Jr. (Ed.), Occupations and Social Status, New York: Free Press, 109–138.
- Fox, John. 2002. An R and S-Plus Companion to Applied Regression. Thousand Oaks, CA: Sage Publications.
Course Home Page