# Lecture 22—Monday, February 20, 2006

### What was covered?

• Other link functions used in generalized linear models
• The deviance as a goodness of fit statistic
• Analysis of deviance

### Further details on probability distributions used in GLIMs

• The table below lists the probability distributions typically used as the random component in generalized linear models along with their canonical link functions. The table also lists other link functions available in R, the form of the variance function for each probability distribution, and the historic name for models now subsumed under the GLIM rubric.
Probability Distribution
Canonical Link g(μ) Other links supported in R Variance function Historic name for these models
Poisson
log:
identity, sqrt
Poisson regression, loglinear model
Normal (Gaussian)
identity:
log, inverse
1
ordinary linear regression
Binomial
logit:
probit, cloglog, log
logistic regression, probit analysis
Gamma
inverse:
identity, log
gamma regression
Inverse Gaussian
identity, inverse, log
"Negative Binomial"
log, sqrt, identity
negative binomial regression
• As noted previously the canonical link for the negative binomial is seldom used. The log link is typically used instead. Also, technically speaking, the negative binomial is not a member of the exponential family.
• Two of the links for the binomial, the probit and cloglog, deserve additional comment.
• The probit is the inverse cumulative distribution function (cdf) for the standard normal distribution and is denoted . For this to be a sensible choice of link function it should map p into the interval (0, 1). We demonstrate this as follow

Since the last expression is a probability, the desired bounds on p are attained.

• Note: from this we see that any inverse cdf would work.
• The probit is a popular link in drug assay analysis. Probit analysis was fairly important historically but has now been largely supplanted by logistic regression.
• The complementary log log (cloglog) link
• The cloglog link is defined as . It also maps p into the interval (0, 1).

• Since the linear predictor is unrestricted we have . Starting from this we can string together the following series of inequalities to show how p is mapped.

### The deviance as a goodness of fit test

• Historically a quantity called the deviance has played a large role in assessing the fit of generalized linear models. The deviance derives from the usual likelihood ratio statistic as I now illustrate.
• Suppose we have two nested models with estimated parameter sets and and corresponding likelihoods and . If is a special case of then we can compare the models using a likelihood ratio test.

Recall that LR has an asymptotic chi-squared distribution with degrees of freedom equal to the difference in the number of parameters estimated by the two models. Alternatively, the degrees of freedom is the number of parameters that are fixed to have a specific value in one model but are freely estimated in the other model.

• If the likelihoods in question are based on probability distributions from the exponential family then the likelihood ratio statistic takes the following form.

• In all the cases we've examine the function has turned out to be a constant or a single parameter. In keeping with this let , usually called the scale parameter, then the last expression becomes the following.

• Finally suppose the model with likelihood is the saturated model. A saturated model is one in which one parameter is estimated for each observation. We previously considered a saturated model in the context of formulating the G2 test alternative to the Pearson chi-squared test. Let the model with likelihood be denoted model M.
• In the special case when the model being compared against is the saturated model, the quantity that appears in the numerator of the last expression is called the deviance or more specifically the residual deviance DM.

• The deviance divided by the scale parameter is called the scaled deviance.

• We can understand the role of the deviance in GLIMs as follows. In the saturated model n parameters are estimated. In model M there are p < n parameters that are estimated, or equivalently n – p parameters that were estimated in the saturated model but have been set to zero in Model M. Since the scaled deviance is a likelihood ratio statistic we know

Thus the deviance statistic can be used in a goodness of fit test. But just like the Pearson chi-squared test, the G2 test, and the likelihood ratio test on which it is based, there are sample size and cell size issues that may make the asymptotic chi-squared distribution suspect in specific applications.

• The mean of a chi-squared distribution is equal to its degrees of freedom, i.e., . Thus if a model provides a good fit to the data we would expect its scaled deviance to be not too far from its mean value np. In other words we expect . For Poisson and binomial probability models, φ = 1. Thus for these models we expect or equivalently .
• So a quick test of the adequacy of a Poisson or binomial model is to divide the residual deviance by its degrees of freedom and see if the result is close to 1. If the ratio is much larger than 1we call the data overdispersed. If it is much smaller than 1 we call the data underdispersed. (The use of these terms tends to be restricted to Poisson and binomial distributions.) Unfortunately the accuracy of this test is somewhat dubious with small data sets.

### Analysis of deviance

• The likelihood ratio test is often written with a negative sign.

Written this way L1 is the larger of the two models (more estimated parameters) and L0 is the simpler model in which some of those parameters have been assigned specific values (such as zero).

• Written this way and letting LS denote the likelihood of the saturated model, the scaled deviance could be written as follows.

• Now suppose we wish to compare two nested models with likelihoods L1 and L2 with L2 the "larger" (more estimated parameters) of the two likelihoods. The likelihood ratio statistic for comparing these two models, written in the alternate format given above, is the following.

where D1 and D2 are the deviances of models 1 and 2, respectively.

• Thus to carry out a likelihood ratio test to compare two models we can equivalently just compute the differences in their scaled deviances. Because of the parallel to analysis of variance (ANOVA) in ordinary linear models, carrying out an LR test using the scaled deviances of the models is often called analysis of deviance.

 Jack Weiss Phone: (919) 962-5930 E-Mail: jack_weiss@unc.edu Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516 Copyright © 2006 Last Revised--Feb 26, 2006 URL: http://www.unc.edu/courses/2006spring/ecol/145/001/docs/lectures/lecture22.htm