Lecture 35 (lab 9)—March 21, 2006
What was covered?
- R topics
- empirical logit
- logistic regression with grouped binary data: toxicity data
- logistic regression with ungrouped binary data: a habitat suitability model for the corroboree frog
- choosing the structural form of the linear predictor
- goodness of fit test by grouping the x-variable
- Hosmer-Lemeshow goodness of fit test
- Hosmer-Lemeshow-Cressie goodness of fit test in Design package
R functions and commands demonstrated
- lrm is the logistic regression model function in the Design package
- residuals.lrm is the residuals method for the lrm function in the Design package
R function options
- family=binomial (argument to glm) for logistic regression
- cbind(Y,N) used in the formulation of the response variable for grouped binomial regression. Here Y denotes the variable recording the number of successes and N is the variable denoting the number of failures
- test='Chisq' (argument to anova) carries out a likelihood ratio test. If anova is given only one model the LR test tests individual predictors in the order they are specified in the model expression. If it is given two models that are nested, anova provides a LR test of the predictors that are contained in the larger model but omitted from the smaller model.
R libraries used
- DAAG contains the frogs dataset
- Design contains the lrm function
- faraway contains the bliss dataset
Grouped binary data
- Grouped binary data is not the norm in ecology. When it does arise it typically does so in an experimental setting. As an example we look at a toxicity data set first analyzed in Bliss (1935). Bliss reported the number of beetles dying (dead) after five hours of exposure at various levels of gaseous carbon disulphide (conc) measured on a log base 2 scale. The data are available in the faraway package as bliss.
> #obtain data
> library(faraway)
> data(bliss)
> bliss
dead alive conc
1 2 28 0
2 8 22 1
3 15 15 2
4 23 7 3
5 27 3 4
- These are grouped binary data because what is recorded are not the 0-1 outcomes for individual beetles but the summarized result, the number of dead beetles and alive beetles at various dosage levels. In R we analyze these data by using as the response a matrix in which the number of successes appears as the first column and the number of failures occurs as the second column. We can create this matrix on the fly with the cbind function. The argument family=binomial is used specify the binomial random component. We model the number of dead beetles as a function of concentration as follows.
> out1<-glm(cbind(dead,alive)~conc,data=bliss,family=binomial)
> summary(out1,corr=FALSE)
Call:
glm(formula = cbind(dead, alive) ~ conc, family = binomial, data = bliss)
Deviance Residuals:
1 2 3 4 5
-0.4510 0.3597 0.0000 0.0643 -0.2045
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.3238 0.4179 -5.561 2.69e-08 ***
conc 1.1619 0.1814 6.405 1.51e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 64.76327 on 4 degrees of freedom
Residual deviance: 0.37875 on 3 degrees of freedom
AIC: 20.854
Number of Fisher Scoring iterations: 4
- In the summary table the column Pr(>|z|) contains the p-values for individual Wald tests. The row labeled conc is a test of whether the regression coefficient of concentration is significantly different from zero. Since the p-value is small we conclude that there is a significant linear relationship on the log odds scale between the proportion dying and the log concentration of carbon disulphide. To quantify this relationship we exponentiate the coefficient of conc.
> exp(coef(out1)[2])
conc
3.195984
- The exponentiated value of 3.196 is interpretable as an odds ratio. It is the multiplicative increase in the odds of dying due to a one unit increase (on a log scale) in carbon disulphide. Since the log scale is base two, each one unit increase corresponds to a doubling of the concentration. Thus when the concentration is doubled, the odds of dying increases by a factor of 3.196.
- To obtain a likelihood ratio test of the result we can use the anova function with the test='Chisq' option.
> anova(out1,test='Chisq')
Analysis of Deviance Table
Model: binomial, link: logit
Response: cbind(dead, alive)
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev P(>|Chi|)
NULL 4 64.763
conc 1 64.385 3 0.379 1.024e-15
Although the results are consistent with the Wald test, notice that the obtained significance level is very different.
- To examine the assumption of linearity on the log odds scale we can plot the empirical logit against the predictor, in this case concentration. Let pi denote the observed success probability in group i, let ni be the total number of observations in group i, and let yi be the observed number of successes in group i. Then the logit can be written as follows.

- If it turns out that any of the groupings yield 100% successes or 100% failures then the logit is undefined. To correct for this possibility the empirical logit is usually defined as follows.

- I plot the empirical logit (as points) below and superimpose the estimated logit from the regression model. The fit is clearly outstanding.
> plot(bliss$conc, log((bliss$dead+1/2)/(bliss$alive+1/2)), xlab='concentration', ylab='logit')
> abline(coef(out1),col=2)
- As another way to graphically assess fit we can also plot the logistic curve superimposed on a scatter plot of the observed probabilities of dying as a function of concentration. Recall from lecture that

> plot(bliss$conc, bliss$dead/(bliss$dead+bliss$alive), xlab='Concentration',
ylab='probability')
> lines(seq(0,4,.1), 1/(1+exp(-coef(out1)[1]-coef(out1)[2]*seq(0,4,.1))), col=2)

Fig. 1 Estimated logit and probability functions for logistic regression model with grouped binary data
- A formal goodness of fit test compares the observed and expected numbers of successes and failures at the different concentration levels using the Pearson test. The degrees of freedom for the chi-squared distribution are the number of groups minus the number of estimated parameters.
#obtain group sample sizes
> ni<-apply(bliss[,1:2],1,sum)
> ni
1 2 3 4 5
30 30 30 30 30
#calculate predicted probabilities
> 1/(1+exp(-coef(out1)[1]-coef(out1)[2]*0:4))
[1] 0.08917177 0.23832314 0.50000000 0.76167686 0.91082823
> 1/(1+exp(-coef(out1)[1]-coef(out1)[2]*0:4))->phat
#calculate expected values
> cbind(ni*phat,ni-ni*phat)->Ei
> Ei
[,1] [,2]
1 2.675153 27.324847
2 7.149694 22.850306
3 15.000000 15.000000
4 22.850306 7.149694
5 27.324847 2.675153
#obtain Oi and calculate Pearson
> Oi<-bliss[,1:2]
> pearson<-sum((Oi-Ei)^2/Ei)
> pearson
[1] 0.3672674
> 1-pchisq(pearson,dim(bliss)[1]-2)
[1] 0.9469181
- As was noted in lecture, we can also obtain the Pearson statistic by extracting the Pearson residuals, squaring them, and adding them up.
#compare to sum of pearson residuals squared
> residuals(out1,type='pearson')
1 2 3 4 5
-0.43252342 0.36437292 0.00000000 0.06414687 -0.20810684
> sum(residuals(out1,type='pearson')^2)
[1] 0.3672674
- Similarly if instead we calculate the sum of squared deviance residuals we get the goodness-of-fit G2 statistic, i.e., the goodness of fit test based on the likelihood ratio statistic.
#try G2 test using formula
> sum(2*Oi*log(Oi/Ei))
[1] 0.3787483
#look at sum of deviance residuals squared
> sum(residuals(out1,type='deviance')^2)
[1] 0.3787483
- This last statistic, the residual deviance, is also part of the standard output from the logistic regression.
#look at output--the residual deviance is the G2 goodness of fit test.
> summary(out1)
Call:
glm(formula = cbind(dead, alive) ~ conc, family = binomial, data = bliss)
Deviance Residuals:
1 2 3 4 5
-0.4510 0.3597 0.0000 0.0643 -0.2045
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.3238 0.4179 -5.561 2.69e-08 ***
conc 1.1619 0.1814 6.405 1.51e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 64.76327 on 4 degrees of freedom
Residual deviance: 0.37875 on 3 degrees of freedom
AIC: 20.854
Number of Fisher Scoring iterations: 4
Ungrouped binary data—Habitat suitability modeling
- As an example of analyzing ungrouped binary data we examine habitat suitability data for the rare Corroboree frog of Australia. The data are from a student's Master's thesis at the University of Canberra, Hunter (2000), and can be found in the DAAG library as the dataset frogs.
> library(DAAG)
> data(frogs)
> names(frogs)
[1] "pres.abs" "northing" "easting" "altitude" "distance"
[6] "NoOfPools" "NoOfSites" "avrain" "meanmin" "meanmax"
- The help screen includes a brief description of what the various variables mean.
> help(frogs)

- Discussion of the biology of the corroboree frog along with videos of the animal can be found at the following web sites.
http://www.kidcyber.com.au/topics/frog_corrob.htm
http://www.abc.net.au/science/scribblygum/june2004/
http://www.arkive.org/species/GES/amphibians/Pseudophryne_corroboree/GES005645.html?size=large
- For this exercise we will use only a single variable to predict the presence or absence of frogs at a site, the variable meanmin, the mean minimum Spring temperature. Habitat suitability data have a spatial component reflected by definition here recorded in the variables northing and easting. We begin by plotting the presence-absence variable as a function of its spatial coordinates where we color code the absences as red and the presence values as blue.
> plot(frogs$easting, frogs$northing, pch=1, col=2*(frogs$pres.abs+1))
- From the plot we see that there is an obvious clustering of the positive frog locations in the south and east. Technically after fitting a model we should examine the residuals for any lingering spatial correlation.
- In fitting a logistic regression model of pres.abs against meanmin, we first need to choose the structural form for the regressor. I illustrate two different ways to do this: first on the original scale of the data, and secondly on the logit scale. Recall that on the presence-absence scale we expect to see the standard S-shaped curve of the logistic function. If we fit a locally weighted regression to the presence-absence data and display the curve that results we see that the graph turns back on itself (Fig. 4).
> plot(frogs$meanmin, frogs$pres.abs)
#perhaps quadratic?
> lines(lowess(frogs$pres.abs~frogs$meanmin), col=1,lty=3)
- We can get a clearer picture of the correct structural form for the model by plotting on the logit scale. Since our data are not grouped we'll need to group them artificially in order to calculate the logit. With 212 observations, deciles should be adequate to obtain a reasonable picture.
> dim(frogs)
[1] 212 10
> meanmin.cuts<-cut(frogs$meanmin, quantile(frogs$meanmin, seq(0,1,.1)), include.lowest=TRUE)
> table(meanmin.cuts)
meanmin.cuts
[2.03,2.4] (2.4,2.5] (2.5,2.73] (2.73,2.87] (2.87,3] (3,3.25]
29 19 18 21 20 20
(3.25,3.47] (3.47,3.63] (3.63,4.13] (4.13,4.33]
25 18 24 18
- I obtain counts of sites where the frog was present by summing the presence-absence variable for each of the grouped values of meanmin.
> tapply(frogs$pres.abs,meanmin.cuts,sum)->yi
> yi
[2.03,2.4] (2.4,2.5] (2.5,2.73] (2.73,2.87] (2.87,3] (3,3.25]
2 4 3 4 8 9
(3.25,3.47] (3.47,3.63] (3.63,4.13] (4.13,4.33]
15 14 13 7
- Finally I calculate the empirical logit and then plot it against the midpoint of the different meanmin intervals. The midpoint is easily calculated by just adding the decile values to a one unit frame shift of the deciles and then dividing by 2. (This takes the left hand endpoint and adds it to the right hand endpoint of each interval.)
> emp.logit<-log((yi+1/2)/(table(meanmin.cuts)-yi+1/2))
> midpt<-(seq(0,1,.1)[1:10]+seq(0,1,.1)[2:11])/2
> plot(midpt,emp.logit)
> plot(midpt,emp.logit,xlab='meanmin',ylab='empirical logit')
> lines(lowess(emp.logit~midpt),col=2)

Fig. 4 Determining the correct structural form of the regressor: on the original scale (left) and the logit scale (right)
- Both approaches yield a consistent picture, a quadratic model seems appropriate. I fit both the linear and quadratic models and compare them graphically and statistically.
#fit linear model
> glm(pres.abs~meanmin,data=frogs,family=binomial)->out0
> summary(out0)
Call:
glm(formula = pres.abs ~ meanmin, family = binomial, data = frogs)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5683 -0.8739 -0.6552 1.1951 1.8758
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.3465 0.8257 -5.264 1.41e-07 ***
meanmin 1.2070 0.2531 4.769 1.85e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 279.99 on 211 degrees of freedom
Residual deviance: 254.40 on 210 degrees of freedom
AIC: 258.4
Number of Fisher Scoring iterations: 4
#fit quadratic model
> glm(pres.abs~meanmin+I(meanmin^2), data=frogs, family=binomial)->out0b
> summary(out0b)
Call:
glm(formula = pres.abs ~ meanmin + I(meanmin^2), family = binomial,
data = frogs)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3660 -1.0032 -0.4631 1.0583 2.3412
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -21.1754 5.2810 -4.010 6.08e-05 ***
meanmin 11.6649 3.1900 3.657 0.000255 ***
I(meanmin^2) -1.5743 0.4717 -3.337 0.000846 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 279.99 on 211 degrees of freedom
Residual deviance: 241.76 on 209 degrees of freedom
AIC: 247.76
Number of Fisher Scoring iterations: 5
- Observe that the quadratic term is statistically significant and the AIC has dropped from 258.4 to 247.76. Both results suggest a quadratic term is needed. I next graph the presence-absence data against meanmin, add both the linear and quadratic models to the plot, and complete the picture with a lowess curve.
> plot(frogs$meanmin,frogs$pres.abs, xlab='meanmin', ylab='presence/absence')
> #add linear model
> phat<-function(x) 1/(1+exp(-coef(out0)[1] - coef(out0)[2]*x))
> lines(seq(2,4.5,.01), phat(seq(2,4.5,.01)), col=2)
> #add quadratic
> phat2<-function(x) 1/(1+exp(-coef(out0b)[1] - coef(out0b)[2]*x - coef(out0b)[3]*x^2))
> lines(seq(2,4.5,.01), phat2(seq(2,4.5,.01)), col=4)
> #add lowess
> lines(lowess(frogs$pres.abs ~ frogs$meanmin), col=1, lty=3)
> legend(2.2, .9, c('linear', 'quadratic', 'lowess'), col=c(2,4,1), lty=c(1,1,3), bty='n', cex=c(.9,.9,.9))
Goodness of fit tests for ungrouped binary data
Grouping by values of the predictor
- When there is only one predictor in a logistic regression model, grouping by values of the predictor is a reasonable option. Typically there will be not be enough repeated values so the range of the predictor will have to be cut into categories. When there is more than one predictor grouping on the predictors is no longer practical.
- The strategy is simple. In each predictor category add up the predicted probabilities to obtain the expected number of successes. Subtract this from the total number of observations in that category to get the expected number of failures. Carry out a Pearson chi-square test comparing both the observed and expected number of successes and failures. The degrees of freedom for the test are the number of original categories of the predictor minus the number of estimated parameters. As usual minimum expected cell size requirements still need to be satisfied. No more than 20% of the expected cell counts should be less than 5.
- With 212 observations ten categories might be a reasonable choice.
> cut(frogs$meanmin, quantile(frogs$meanmin, seq(0,1,.2)), include.lowest=TRUE)->test.cuts2
> table(frogs$pres.abs,test.cuts2)
test.cuts2
[2.03,2.5] (2.5,2.87] (2.87,3.25] (3.25,3.63] (3.63,4.33]
0 42 32 23 14 22
1 6 7 17 29 20
- The observed cell counts meet the minimum cell size requirements but it is the expected counts that matter. I compute them next by summing the fitted values (the predicted probabilities) in each category.
> tapply(fitted(out0b),test.cuts2,sum)->np
> np
[2.03,2.5] (2.5,2.87] (2.87,3.25] (3.25,3.63] (3.63,4.33]
4.117692 10.161348 17.619857 24.697414 22.403690
- Next I subtract the expected number of successes from the total number of observations in each category to obtain the total number of failures.
- > table(test.cuts2)-np->fail
> rbind(fail,np)->Ei
> table(frogs$pres.abs,test.cuts2)->Oi
> Ei
[2.03,2.5] (2.5,2.87] (2.87,3.25] (3.25,3.63] (3.63,4.33]
fail 43.882308 28.83865 22.38014 18.30259 19.59631
np 4.117692 10.16135 17.61986 24.69741 22.40369
test.cuts2
[2.03,2.5] (2.5,2.87] (2.87,3.25] (3.25,3.63] (3.63,4.33]
0 42 32 23 14 22
1 6 7 17 29 20
- In the expected table only one cell count is under 5 and not by much. Since 1 out of 10 is 10%, we satisfy the 20% criterion and we can proceed to do the test. The degrees of freedom for the chi-square distribution is the number of columns of our table, 5, minus the number of parameters estimated in our model, 3.
> sum((Oi-Ei)^2/Ei)
[1] 4.624012
> 1-pchisq(sum((Oi-Ei)^2/Ei),df=5-3)
[1] 0.09906236
- Our test is not significant so we don't have evidence of lack of fit. At this point it might be interesting to see if the quadratic term made a difference in our fit. I redo the goodness of fit test using the model that is only linear in meanmin.
> tapply(fitted(out0),test.cuts2,sum)->np2
> apply(table(frogs$pres.abs,test.cuts2),2,sum)-np2->fail2
> rbind(fail2,np2)->Ei2
> Ei2
[2.03,2.5] (2.5,2.87] (2.87,3.25] (3.25,3.63] (3.63,4.33]
fail2 39.278011 28.83784 26.30426 23.55765 15.02225
np2 8.721989 10.16216 13.69574 19.44235 26.97775
- All the expected cells counts are greater than 5 so I can proceed to the test. We have one more degree of freedom for this test.
> sum((Oi-Ei2)^2/Ei2)
[1] 17.20310
> 1-pchisq(sum((Oi-Ei2)^2/Ei2),df=5-2)
[1] 0.0006419176
- The quadratic term mattered. We now have evidence for a significant lack of fit.
Hosmer-Lemeshow Test
- When there is more than one predictor, grouping by values of the predictor is not a reasonable approach since we'd have to divide the data set into categories using more than one variable simultaneously. The Hosmer-Lemeshow test is essentially the only option in this case. The Hosmer-Lemeshow test rather than dividing the data into groups using levels of the predictor, divides the data set into groups based on categories formed from the predicted logits (or equivalently the predicted probabilities). Simulations suggest that the number of groupings should be 5 (if there isn't much data) or 10. The test statistic is once again a Pearson chi-square statistic. The degrees of freedom for the chi-square distribution is always two less than the number of groupings regardless of how many parameters have been estimated. Minimum cell size restrictions must still be satisfied.
- Since we have plenty of data, I try dividing the predicted logit into deciles.
> logit.groups<-cut(predict(out0b), quantile(predict(out0b), seq(0,1,.1)), include.lowest=TRUE)
> table(logit.groups)
logit.groups
[-3.97,-2.25] (-2.25,-1.85] (-1.85,-1.05] (-1.05,-0.673]
29 19 18 21
(-0.673,-0.349] (-0.349,-0.00651] (-0.00651,0.131] (0.131,0.21]
20 27 17 18
(0.21,0.385] (0.385,0.433]
21 22
- I obtain the observed number of successes and failures in each of these categories.
> table(frogs$pres.abs,logit.groups)->Oi
- Next I obtain the expected number of successes and failures.
> tapply(fitted(out0b),logit.groups,sum)->success
> tapply(fitted(out0b),logit.groups,length)->ni
> ni-success->failure
> rbind(failure,success)->Ei
> Ei
[-3.97,-2.25] (-2.25,-1.85] (-1.85,-1.05] (-1.05,-0.673]
failure 27.205735 16.676574 14.378504 14.460148
success 1.794265 2.323426 3.621496 6.539852
(-0.673,-0.349] (-0.349,-0.00651] (-0.00651,0.131] (0.131,0.21]
failure 12.096493 14.11400 8.170336 8.249696
success 7.903507 12.88600 8.829664 9.750304
(0.21,0.385] (0.385,0.433]
failure 8.90155 8.746968
success 12.09845 13.253032
- Three of the expected cell counts are too small. Since this is less than 20% of the cell counts I don't bother to merge cells. I carry out the test.
> sum((Oi-Ei)^2/Ei)
[1] 7.567172
> 1-pchisq(sum((Oi-Ei)^2/Ei),df=8)
[1] 0.4768487
- So the Hosmer Lemeshow test agrees with our goodness of fit test results based on grouping by the x-variables.
The Hosmer-Lemeshow-Cressie test in the Design package
- Frank Harrell's Design package carries out a variation of the Hosmer-Lemeshow test. In advertising this version, Frank Harrell reports "The H-L test is largely obsolete, having been replaced by specific directed tests of lack of fit." To use this test we have to refit the logistic regression model using the lrm function in the Design package. This function does not support the I() function so we have to first create the quadratic term outside of the model. The x=TRUE and y=TRUE arguments to the lrm function are necessary for what we wish to do.
- We then follow this by applying the residuals.lrm function to the fitted model using the 'gof' option for the type argument. The various steps are shown below.
> library(Design)
> #create quadratic term for use in lrm function
> meanmin2<-frogs$meanmin^2
> out.harrell<-lrm(pres.abs~meanmin+meanmin2, data=frogs, x=TRUE, y=TRUE)
> residuals.lrm(out.harrell,type='gof')
Sum of squared errors Expected value|H0 SD Z P
41.2834597 41.6627336 0.3486647 -1.0877900 0.2766878
- The column labeled P is the p-value for the test. As we can see it is not significant. So all three of the tests we've carried out fail to find a significant lack of fit.
Cited References
- Bliss, C. I. 1935. The calculation of the dosage-mortality curve. Annals of Applied Biology 22: 134–167.
- Hunter, D. (2000) The conservation and demography of the southern corroboree frog (Pseudophryne corroboree). M.Sc. thesis, University of Canberra, Canberra.
Course Home Page