Read in the data.
> shells<-read.table('http://www.unc.edu/courses/2006spring/ecol/145/001/data/final/problem1.csv', header=TRUE, sep=',')
The crucial thing to recognize in this problem is that these data have an obvious structure. Although a random sample of shells was obtained from each stream, the analysis was carried out on shell fragments with multiple shell fragments coming from the same shell. Mixed (multilevel) models provide an ideal platform for analyzing hierarchically structured data. In the parlance of multilevel models, the level-2 units here are shells and the level-1 units are the shell fragments coming from a shell. Stream is a level-2 predictor (it varies among shells, not among shell fragments coming from the same shell). To test the effect of stream I fit two models, one with and one without the level-2 predictor stream.
| Model 1 | Model 2 |
![]() |
![]() |
Here i denotes the shell and j denotes the fragment from that shell. Stream is dummy variable coded 1 for Buffalo Creek and 0 for Boone Creek. I fit each model and carry out a likelihood ratio test to determine the importance of the stream variable as a predictor of carbon isotope composition.
> library(nlme)
> model1<-lme(CARBONISOTOPE~1,random=~1|SHELL,data=shells,method='ML')
> model2<-lme(CARBONISOTOPE~STREAM,random=~1|SHELL,data=shells,method='ML')
> anova(model1,model2)
Model df AIC BIC logLik Test L.Ratio p-value
model1 1 3 1085.363 1097.104 -539.6815
model2 2 4 1083.148 1098.802 -537.5742 1 vs 2 4.214755 0.0401
Based on the results (p = 0.04) it would appear that there is a significant difference in shell carbon isotope composition between streams (at α = .05).
We can obtain a second statistical test of the importance of stream, a Wald Test, by examining the summary output from model2.
> summary(model2)
Linear mixed-effects model fit by maximum likelihood
Data: shells
AIC BIC logLik
1083.148 1098.802 -537.5742
Random effects:
Formula: ~1 | SHELL
(Intercept) Residual
StdDev: 0.974756 0.996767
Fixed effects: CARBONISOTOPE ~ STREAM
Value Std.Error DF t-value p-value
(Intercept) -9.731287 0.5704290 363 -17.059594 0.0000
STREAMbuffalo -1.810200 0.7541505 5 -2.400317 0.0616
Correlation:
(Intr)
STREAMbuffalo -0.756
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.4520196 -0.6136337 0.1001342 0.7029891 2.6023393
Number of Observations: 370
Number of Groups: 7
The p-value of 0.06 would force us to conclude that there is not a significant difference in shell carbon isotope composition between the streams (at α = .05).
The first test is a likelihood ratio test. The second test is a Wald test. Unlike the F-test and t-tests from ordinary regression, the Wald and likelihood ratio tests are only asymptotically equivalent to each other. With large data sets we expect to reach the same conclusion using either test, but for small data sets the two tests will not necessarily agree.
Now of course the only reason we’re calling the results “different” is that they happen to fall on opposite sides of the magical .05 cut-off value. In truth, 0.06 is not that different from 0.04. Thus the "difference" we are seeing is somewhat artificial. If our goal is to find a statistically significant difference in carbon isotope concentration then we probably would be tempted to report the LR test result and ignore the Wald test. If so we should at least admit that our results are at best only marginally significant. On the other hand if the point of the test is to show that the carbon isotope concentrations of shells from the streams are not significantly different then we would be tempted to report the results of the Wald test (and ignore the LR test results). In this case our enthusiasm should be tempered by the fact that the results were close to being significant and with more data (assuming nothing else changes) they would certainly be statistically significant.
Ideally you should always try to plot the raw data, but when the amount of data is large this becomes impractical. We’re somewhere close to that impractical situation here, but we can still obtain a nice summary by showing the raw data (jittered to prevent overlap) superimposed on boxplots. Together these give us a nice snapshot of the distribution. Because of the structured nature of the data, shell fragments coming from shells, our display should reflect this structure. The ideal way of doing this is to plot the distribution of each shell separately in a side by side display with shells grouped by stream.
> boxplot(CARBONISOTOPE~SHELL, data=shells, outline=FALSE, ylab="Carbon isotope")
> points(jitter(as.numeric(shells$SHELL)), shells$CARBONISOTOPE, pch=16, col='seagreen', cex=.6)
> abline(v=4.5,col=4,lty=3)
> mtext('Buffalo Creek', side=3, line=.5, at=2.5, font=2, cex=.9)
> mtext('Boone Creek', side=3, line=.5, at=6, font=2, cex=.9)
> points(1:7, tapply(shells$CARBONISOTOPE, shells$SHELL, mean), col=2, pch=16)

Fig. 1 Carbon isotope values grouped by stream and shell
The plot is especially informative. We see that one of the Boone Creek shells is very different from the rest and more closely resembles the shells from Buffalo Creek. Of course with only three shells to work with, it's difficult to say what constitutes typical.
The obvious mistake one might make in analyzing these data is to ignore the structure and treat it instead as two random samples of shell fragments. The paper I obscurely referred to is the following.
Hurlbert, S. H. 1984. Pseudoreplication and the design of ecological field experiments. Ecological Monographs 54: 187–211.
This paper has been hugely influential in ecology although there has been a backlash of late. If you’re interested in some of the recent backlash, here’s a sampling.
Ignoring that it was shells that were sampled, not shell fragments, is an example of what Hurlbert (1984) would call pseudoreplication. One of the consequences of pseudoreplication is that you artificially inflate your sample size with observations that are not true replicates of the sampling (and inferential) units, in this case shells.
> model3<-lm(CARBONISOTOPE~STREAM,data=shells)
> summary(model3)
Call:
lm(formula = CARBONISOTOPE ~ STREAM, data = shells)
Residuals:
Min 1Q Median 3Q Max
-4.0698 -0.9136 0.2864 1.0202 2.6202
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.6302 0.1126 -85.53 <2e-16 ***
STREAMbuffalo -1.9562 0.1454 -13.46 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.37 on 368 degrees of freedom
Multiple R-Squared: 0.3298, Adjusted R-squared: 0.328
F-statistic: 181.1 on 1 and 368 DF, p-value: < 2.2e-16
Notice how significant the results are when contrasted with the mixed model analysis above. Although the estimates of the parameters are virtually the same (Intercept: –9.63 vs. 9.71, Stream: –1.96 vs. –1.81.), what is very different are the standard errors. The reported standard errors are much smaller in the pseudoreplicated analysis. To understand this a little bit better, I display the anova results.
> anova(model3)
Analysis of Variance Table
Response: CARBONISOTOPE
Df Sum Sq Mean Sq F value Pr(>F)
STREAM 1 339.81 339.81 181.11 < 2.2e-16 ***
Residuals 368 690.45 1.88
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In ordinary linear regression the F-test for STREAM from this anova table is just the square of the t-statistic for STREAM in the summary table. Observe that 368 degrees of freedom are reported for this test. The denominator degrees of freedom of the F-test are monotonic with sample size (sample size, 370, minus the number of estimated parameters, 2). Contrast this with the reported number of degrees of freedom for a test of STREAM in model2 of the mixed model. There it is reported as 5 (number of shells, 7, minus 2, the number of estimated parameters).
The results are also dramatically different graphically when the structure is ignored.
> boxplot(CARBONISOTOPE~STREAM, data=shells, outline=FALSE, ylab="Carbon isotope")
> points(jitter(as.numeric(shells$STREAM)), shells$CARBONISOTOPE, pch=16, col='seagreen', cex=.6)
> points(1:2, tapply(shells$CARBONISOTOPE,shells$STREAM, mean), col=2, pch=16)
> mtext('Pseudoreplication results',side=3,line=.5,font=2)

Fig. 2 Carbon isotope values grouped by stream ignoring shell
The shell fragment distribution for the two streams looks very different when one ignores the data structure. Here are the basic problems with pseudo-replication.
> qt(.975,5)
[1] 2.570582
> qt(.975,368)
[1] 1.966431
Given the difference in size of the critical values of the t-statistics we see that it is harder to reject the null hypothesis of no difference between streams when there are only 5 degrees of freedom with which to work.
Read in the data.
> water<- read.table('http://www.unc.edu/courses/2006spring/ecol/145/001/data/lab12/WaterUsageData.csv', header=TRUE, sep=',')
We saw in Assignment 11 that a model quadratic in time provided a good description of water usage for individual trees. Once again we are faced with a structured data set this time because we have repeated measures on individual trees. It too can be handled as a multilevel model. In Assignment 11 our level-1 model was the following.
![]()
Notice that there are three parameters that could vary between trees (as denoted by the subscript i)—the intercept, the coefficient of the linear term, and the coefficient of the quadratic term. Each parameter gives rise to a separate level-2 equation. In Assignment 11 the only issue you were faced with was how many random effects were required, 1, 2, or 3, and in which level-2 equation they should occur. This time we have two level-2 predictors to consider, Species and Age. Species and Age are level-2 predictors because they are measured on the level-2 units, trees, and do not vary among the level-1 units coming from the same tree. Obviously the individual observations from the same tree cannot have a different value of Species. (Technically Age could change but we'll treat Age as being the age of the tree at the beginning of the study.)
The possible level-2 regressors are Species, Age, and their interaction Species:Age. Since these could appear in any or all of the level-2 equations for
there are potentially a lot of possible models to consider. Because the level-2 predictors are categorical variables we should respect the principle of marginality when building the models and always include Species and Age in any model that also contains their interaction Species:Age. Even with this constraint that still leaves us with five possible models for each level-2 parameter—a model with no predictors, one with only Species, one with only Age, an additive model with Species and Age, and the full interaction model with Species, Age, and Species:Age. Furthermore each of these models could include a random effect term or not. Thus there are 10 possible models for each parameter and therefore a total of 10 × 10 × 10 = 1000 models to consider! The most complicated model of these 1000, a model in which all possible terms are present, is shown below.
Notice that to handle the many different coefficients present in this model I've chosen to use different Greek letters to represent the level-2 coefficients in the various level-2 equations. The composite model formulation of this multilevel model is especially daunting.
Clearly a haphazard approach to model-building would border on chaos here and attempting to fit all possible models should get us classified as clinically insane. Ideally we would only fit models that are suggested by theory. Since I'm not privy to the theory that motivated this study this option is not open to us. The very worst approach would be to throw some automatic variable selection program at the problem. Fortunately such programs aren't readily available for multilevel models. Absent a better approach I'm going to take a middle road, somewhere between model building based solely on theory and model building totally divorced from the thought processes. I begin by graphically examining the data to see which of the parameters
are most variable (and hence most deserving of random effects) and to see how they vary with respect to Species and Age to gain some understanding of how complicated a model we need. This will give me a starting point from which I can then try to simplify or complexify the model.
The lattice graphics package coupled with the nlme package of R provide some outstanding tools for examining variation in multilevel data. I will take advantage of them here. I begin by creating a groupedData object with TREE as the grouping factor. Observe the use of the outer argument as an option below. Adding the outer option to a groupedData object causes the plots that are produced to be sorted by levels of the variables specified. Specifying SPECIES*AGE yields displays sorted by SPECIES and then by AGE within SPECIES.
> library(nlme)
> water.gp<-groupedData(WU~TIME|TREE,data=water,outer=~SPECIES*AGE)
I next use the lmList function to fit separate quadratic models for each tree. I then plot the confidence intervals for the individual parameters using the intervals function.
> library(lattice)
> trellis.par.set(col.whitebg())
> out.models<-lmList(WU~TIME+I(TIME^2),data=water.gp)
> plot(intervals(out.models))

Fig. 3 Trellis graph showing between-tree parameter variation grouped by Species and Age
Although the observations are not labeled as such, an examination of the tree numbers reveals that the Species and Age groupings are as shown by my labels on the right side of the diagram. Here are some conclusions that can be drawn from this picture.
The message from the plot is clear. We should begin by fitting the most complicated model, one in which each parameter has a Species by Age interaction and each parameter also has its own random effect.
A Caveat. The truth is that the displays in the above plots are so similar because the parameter estimates are highly correlated. The mirror image patterns seen in adjacent panels are a hallmark of this. When parameters are correlated we can't tell if the variability the plots show is due to the intrinsic variability of that parameter or the variability inherited via the correlation. Ideally we should try to make the parameters uncorrelated, e.g., by centering the predictor. This is beyond what I expected you to do on this exam so I will ignore it. It does not cause estimation problems here (although it often can in practice) and once we introduce the predictors of interest much of this correlation is reduced. In practice though this is something you might have to address.
Choosing the fixed effects portion of the model
SPECIES and AGE are categorical variables here and therefore technically should be declared to be factors. Because they have only two levels and the levels are coded 1 and 2, we will get exactly the same results (in terms of best model) whether we treat them as continuous or categorical. Thus I don't bother to declare them as factors. (Note: this will not be the case when there are more than two levels.)
I fit the model selected above using some of R's convenient shortcut notation.
> model1<-lme(WU~(TIME+I(TIME^2))*(SPECIES*AGE), data=water, random=~TIME+I(TIME^2)|TREE, method='ML')
This notation is shortcut for the following.
(TIME+I(TIME^2))*(SPECIES*AGE)= SPECIES*AGE + SPECIES*AGE*TIME + SPECIES*AGE*I(TIME^2)
where SPECIES*AGE is already a shortcut notation for SPECIES+AGE+SPECIES:AGE and similarly for the three-factor interactions. Some of the output from this model is shown below.
> summary(model1, correlation=FALSE)
Linear mixed-effects model fit by maximum likelihood
Data: water
AIC BIC logLik
1328.665 1421.760 -645.3324
Random effects:
Formula: ~TIME + I(TIME^2) | TREE
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 3.201661e-01 (Intr) TIME
TIME 2.029479e-03 0.576
I(TIME^2) 6.355108e-06 -0.814 -0.518
Residual 4.240439e-01
Fixed effects: WU ~ (TIME + I(TIME^2)) * (SPECIES * AGE)
Value Std.Error DF t-value p-value
(Intercept) -32.50599 4.850992 944 -6.700896 0.0000
TIME 0.28576 0.043736 944 6.533785 0.0000
I(TIME^2) -0.00057 0.000096 944 -5.939481 0.0000
SPECIES 15.05154 3.067783 36 4.906326 0.0000
AGE 9.68171 3.065825 36 3.157947 0.0032
SPECIES:AGE -6.90668 1.938894 36 -3.562175 0.0011
TIME:SPECIES -0.12669 0.027660 944 -4.580134 0.0000
TIME:AGE -0.07183 0.027640 944 -2.598859 0.0095
I(TIME^2):SPECIES 0.00025 0.000061 944 4.038653 0.0001
I(TIME^2):AGE 0.00015 0.000061 944 2.403213 0.0164
TIME:SPECIES:AGE 0.05850 0.017480 944 3.346421 0.0009
I(TIME^2):SPECIES:AGE -0.00012 0.000039 944 -3.219884 0.0013
Observe that the Wald tests of the two highest order interaction terms (TIME:SPECIES:AGE and I(TIME^2):SPECIES:AGE) are statistically significant. Since all the remaining terms are nested in these two interaction terms, the principle of marginality says we should not test any of the remaining terms in the presence of these significant interactions.
We can obtain a slightly different take on this with the anova command. The anova function displays a sequence of models of increasing complexity in which the principle of marginality is maintained at each step.
> anova(model1)
numDF denDF F-value p-value
(Intercept) 1 944 695.3320 <.0001
TIME 1 944 5.6188 0.0180
I(TIME^2) 1 944 744.3646 <.0001
SPECIES 1 36 5.4026 0.0259
AGE 1 36 32.8192 <.0001
SPECIES:AGE 1 36 1.5762 0.2174
TIME:SPECIES 1 944 131.2714 <.0001
TIME:AGE 1 944 3.5973 0.0582
I(TIME^2):SPECIES 1 944 9.6943 0.0019
I(TIME^2):AGE 1 944 4.2448 0.0396
TIME:SPECIES:AGE 1 944 1.5889 0.2078
I(TIME^2):SPECIES:AGE 1 944 10.3677 0.0013
Notice we might draw a different conclusion from this output than we did from the summary table of the Wald tests. For example from the anova output we would conclude that the 3-factor interaction term TIME:SPECIES:AGE should not be added to a model containing all of the two-factor interactions and main effects. But based on the Wald tests in the summary output we saw that if all of the two-factor interactions and main effects were present, along with I(TIME^2):SPECIES:AGE, then we should add TIME:SPECIES:AGE to the model. Now in truth the principle of marginality also applies to quadratic terms in the following sense. If a term involving I(TIME^2)is in the model, then the corresponding term involving TIME should also be in the model. Thus the principle of marginality is consistent with the Wald test results. If I(TIME^2):SPECIES:AGE is in the model then so should be TIME:SPECIES:AGE.
There is a great deal of disagreement on what the degrees of freedom should be for various individual tests in mixed effects models. R chooses to not adjust degrees of freedom in any way, while most statistical packages (SAS in particular) offer various options for calculating degrees of freedom, generally more conservative than what are shown here. Given this controversy it might be better to avoid the whole issue of significance testing altogether and instead compare relevant models using AIC. By the principle of marginality starting with the full model there are only three models we should compare with regard to the fixed effects: the current model, a model in which the 3-factor interaction term containing the quadratic effect of time is dropped, and a final model in which both 3-factor interaction terms containing TIME are dropped. I fit these remaining models using R's convenient update function. The update function allows you to drop terms from a model that you've already fit. Here's how it works. The notation ~.~ refers to the current model from which we can delete terms by including a minus sign in front of the term(s) we wish to drop.
> model2<-update(model1, .~.-I(TIME^2):SPECIES:AGE)
> model3<-update(model1, .~.-TIME:SPECIES:AGE-I(TIME^2):SPECIES:AGE)
> sapply(list(model1,model2,model3), AIC)
[1] 1328.665 1336.919 1336.730
The most complicated model has the smallest AIC. Based on AIC we should keep a model in which all three of the level-2 equations contain an interaction of Species and Age as well as their main effects.
Choosing the random effects portion of the model
Having settled on the fixed effects, I turn to the random effects portion of the model. Because significant level-2 predictors end up explaining level-2 variance, it's possible that by adding level-2 predictors we have made some or all of the random effects unnecessary. I compare our current model that includes all three random effects to the three subset models in which there are only two random effects.
> model1a<-lme(WU~(TIME+I(TIME^2))*(SPECIES*AGE), data=water, random=~TIME+I(TIME^2)-1|TREE, method='ML')
> model1b<-lme(WU~(TIME+I(TIME^2))*(SPECIES*AGE), data=water, random=~I(TIME^2)|TREE, method='ML')
> model1c<-lme(WU~(TIME+I(TIME^2))*(SPECIES*AGE), data=water, random=~TIME|TREE, method='ML')
> sapply(list(model1,model1a,model1b,model1c), AIC)
[1] 1328.665 1315.362 1328.441 1325.667
The results are fairly unambiguous. We can drop the intercept random effect. I verify this with a likelihood ratio test.
> anova(model1,model1a)
Model df AIC BIC logLik Test L.Ratio p-value
model1 1 19 1328.665 1421.760 -645.3324
model1a 2 16 1315.362 1393.757 -641.6809 1 vs 2 7.303119 0.0628
The likelihood ratio test is less clearcut. Since the test is not significant it says there is little to be gained by adding an intercept random effect. Notice that the difference in parameters is 3. That's because by adding the intercept random effect we end up estimating an additional variance term as well as two additional covariance terms.
Having decided to drop the intercept random effect I see if any additional random effects can be dropped. I try dropping the linear term random effect, quadratic term random effect, and both.
> model1a.1<-lme(WU~(TIME+I(TIME^2))*(SPECIES*AGE), data=water, random=~I(TIME^2)-1|TREE,method='ML')
> model1a.2<-lme(WU~(TIME+I(TIME^2))*(SPECIES*AGE), data=water, random=~TIME-1|TREE,method='ML')
> model1a.3<-lm(WU~(TIME+I(TIME^2))*(SPECIES*AGE), data=water)
> sapply(list(model1a,model1a.1,model1a.2,model1a.3),AIC)
[1] 1315.362 1436.954 1350.057 2060.831
The results indicate we should stay with the current model. Lastly I examine if we need to account for any lingering residual temporal correlation. Following the arguments put forth in Assignment 11, I use a CAR1 correlation structure for the level-1 residuals.
> model1a.corr<-lme(WU~(TIME+I(TIME^2))*(SPECIES*AGE), data=water, random=~TIME+I(TIME^2)-1|TREE,method='ML',correlation=corCAR1(form=~TIME|TREE))
> anova(model1a,model1a.corr)
Model df AIC BIC logLik Test L.Ratio p-value
model1a 1 16 1315.362 1393.757 -641.6809
model1a.corr 2 17 1289.148 1372.444 -627.5742 1 vs 2 28.21331 <.0001
Both AIC and the likelihood ratio test indicate that the correlation structure should be included.
Our final best model includes the following.
Read in the data and recode the DISEASE variable so that 1 indicates the presence of disease.
> bugs <- read.table('http://www.unc.edu/courses/2006spring/ecol/145/001/data/final/problem3.csv', sep=',', header=TRUE)
> names(bugs)
[1] "DISEASE" "temp"
> pres.abs<- 1-bugs$DISEASE
With only one predictor, smoothed lagged temperature, to worry about the only issue is how best to relate temperature linearly to the logit. I use the rcspline.plot function to determine if a transformation or an alternative parameterization is necessary.
> library(Design)
> rcspline.plot(y=pres.abs,x=bugs$temp,nk=5,m=50)

Fig. 4 Restricted cubic spline plot of logit as a function of temperature
The displayed Wald test for linearity rejects a linear relationship between the logit and temperature. Observe that most of the empirical logits fall within the confidence bands indicating a fairly good fit. The pattern shown is somewhat ambiguous but the lack of monotonicity suggests that a quadratic model might be a good first choice.
> out1<-glm(pres.abs~temp+I(temp^2), data=bugs, na.action=na.omit, family=binomial)
> summary(out1)
Call:
glm(formula = pres.abs ~ temp + I(temp^2), family = binomial,
data = bugs, na.action = na.omit)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4532 -1.2014 -0.1301 1.0293 2.2791
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -468.9778 96.9742 -4.836 1.32e-06 ***
temp 30.4773 6.1826 4.929 8.24e-07 ***
I(temp^2) -0.4945 0.0985 -5.020 5.16e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 846.0 on 610 degrees of freedom
Residual deviance: 744.1 on 608 degrees of freedom
AIC: 750.1
Number of Fisher Scoring iterations: 5
The quadratic term is highly significant. We can assess graphically whether the model is reasonable by plotting the fitted model along with a lowess curve on a probability scale.
> phat<-function(x) 1/(1+exp(-(coef(out1)[1]+ coef(out1)[2]*x+ coef(out1)[3]*x^2)))
> plot(pres.abs~bugs$temp, xlab='temperature', ylab='probability of disease')
> lines(seq(29,35,.1), phat(seq(29,35,.1)), col=4)
> lines(lowess(pres.abs[!is.na(bugs$temp)]~bugs$temp[!is.na(bugs$temp)]), col=2, lwd=2)

Fig. 5 Comparison of logistic regression quadratic model and lowess curve
The fit looks better than the rcsplineplot would have led us to believe.
We have three options for assessing goodness of fit:
I try each in turn.
Hosmer-Lemeshow test
> logit.groups<-cut(predict(out1), quantile(predict(out1), seq(0,1,.1)), includue.lowest=TRUE)
> tapply(fitted(out1), logit.groups,sum)->success
> tapply(fitted(out1), logit.groups,length)->ni
> ni-success->failure
I examine the expected counts to see if there is a need for merging categories.
> cbind(success,failure)
success failure
(-5.65,-2.06] 3.006123 57.99388
(-2.06,-0.666] 15.190303 46.80970
(-0.666,-0.124] 25.071012 36.92899
(-0.124,0.101] 35.400768 34.59923
(0.101,0.198] 31.903495 27.09650
(0.198,0.342] 30.907444 23.09256
(0.342,0.449] 35.778988 24.22101
(0.449,0.543] 43.047190 25.95281
(0.543,0.586] 33.156435 18.84356
(0.586,0.628] 39.534736 21.46526
There is only one sparse cell, so we are well within the 20% rule. I proceed to carry out the test.
> table(pres.abs[!is.na(bugs$temp)],logit.groups)->Oi
> rbind(failure,success)->Ei
> sum((Oi-Ei)^2/Ei)
[1] 27.93552
> 1-pchisq(sum((Oi-Ei)^2/Ei),8)
[1] 0.0004866659
The Hosmer-Lemeshow test indicates a significant lack of fit.
Goodness of fit test from the Design package
As noted in a hint, the lrm function from the Design package fails to converge with the model we've been using.
> temp2<-bugs$temp^2
> lrm(pres.abs~bugs$temp+temp2,na.action=na.delete,data=bugs,x=TRUE,y=TRUE)
singular information matrix in lrm.fit (rank= 2 ). Offending variable(s):
temp2
Error in j:(j + params[i] - 1) : NA/NaN argument
I try centering the temperature variable around its mean.
> data.nomiss<-bugs[!is.na(bugs$temp),]
> center.temp<-data.nomiss$temp-mean(data.nomiss$temp)
> center.temp2<-center.temp^2
> lrm(pres.abs[!is.na(bugs$temp)]~center.temp+center.temp2, data=data.nomiss, x=TRUE, y=TRUE)-> out.harrell
> residuals.lrm(out.harrell,type='gof')
Sum of squared errors Expected value|H0 SD Z P
130.1040898 130.6195580 0.5779676 -0.8918634 0.3724661
The p-value is large indicating no significant lack of fit.
Grouping by levels of the predictor
> dim(data.nomiss)
[1] 611 2
With 611 observations I try 20 categories as an initial foray.
> cut(data.nomiss$temp, quantile(data.nomiss$temp, seq(0,1,.05)), include.lowest=TRUE)->test.cuts
> tapply(fitted(out1), test.cuts, sum)->success
> table(test.cuts)-success->fail
> rbind(fail, success)->Ei
> Ei
[29.56,30.05] (30.05,30.1] (30.1,30.4] (30.4,30.61] (30.61,30.94] (30.94,31.15] (31.15,31.26] (31.26,31.4]
fail 14.9514 11.84406 12.02487 10.41536 10.45755 11.03446 10.98836 13.69766
success 18.0486 17.15594 19.97513 18.58464 19.54245 19.96554 19.01164 22.30234
(31.4,31.56] (31.56,31.7] (31.7,31.8] (31.8,31.85] (31.85,31.92] (31.92,32.05] (32.05,32.24] (32.24,32.44]
fail 11.26664 12.74685 17.71781 11.80802 12.1171 17.92402 14.62022 19.90697
success 16.73336 17.25315 21.28219 13.19198 12.8829 17.07598 11.37978 11.09303
(32.44,32.66] (32.66,33.15] (33.15,33.49] (33.49,34.38]
fail 20.777198 25.59839 29.556542 28.5465199
success 9.222802 5.40161 2.443458 0.4534801
Only 2 of the 40 cells have expected counts less than 5. This is well within the 20% limit, so I proceed without merging categories.
> table(pres.abs[!is.na(bugs$temp)],test.cuts)->Oi
> sum((Oi-Ei)^2/Ei)
[1] 36.00678
The degrees of freedom for the chi-square distribution is the number of columns of our table, 20, minus the number of parameters estimated in our model, 3.
> 1-pchisq(sum((Oi-Ei)^2/Ei), 20-3)
[1] 0.004576959
This test concurs with the Hosmer-Lemeshow test suggesting a significant lack of fit. Thus our conclusions about fit are ambiguous. (Note: it is usually remarked that these three tests are testing different sorts of lack of fit so it should not be thought that they are necessarily contradicting each other here.) It should be noted that a quadratic model made sense to the researchers. The quadratic model indicates that there should be a range of temperatures over which the mosquito thrives (not too hot and not too cold) and this is consistent with what is known about its biology.
Read in the data.
> clams <- read.table('http://www.unc.edu/courses/2006spring/ecol/145/001/data/final/problem4.csv', sep=',', header=TRUE)
> names(clams)
[1] "TREATMENT" "YEAR" "SEASON" "LEGAL.CLAMS" "DATE"
I start by recoding the date variable so that the dates are in calendar order. The easiest way to do this is to create a new factor variable from the old DATE variable and use the levels option to specify the dates in the correct order.
> newdate<-factor(clams$DATE, levels=c("SPR2001", "FALL2001", "SPR2002", "FALL2002", "SPR2003", "FALL2003"))
We also need the means for each treatment and at each time period.
> clam.means<-tapply(clams$LEGAL.CLAMS, list(clams$TREATMENT,newdate), mean)
> clam.means
SPR2001 FALL2001 SPR2002 FALL2002 SPR2003 FALL2003
CC 18.85714 35.50000 31.16667 51.15 39.91667 54.69231
CE 20.29412 36.33333 37.16667 81.00 53.68750 146.58065
To make the graph easier to interpret I'm going to use a thick line when an area is closed and a thin line when it's open. The transition from open to closed occurs between Fall 2001 and Spring 2002 for the experimental treatment. I wrote a function, closed.func2, that draws a thick line from the halfway point between these two dates to the last date. The argument val is the vector of means, col is the color for the line, and cexp is an expansion factor for determining the thickness of the line.
closed.func2<-function(val,col,cexp)
{
f2<-function(x,x0) val[2]+(val[3]-val[2])*(x-x0)
xval<-c(2.5,3:6)
yval<-c(f2(2.5,2),val[3:6])
lines(xval,yval,col=col,lwd=cexp)
}
Fig. 6 Preliminary stage of the plot
Next I expand the right margin of the graph to make room for a legend there. I save the old settings so that I can restore them once the graph is made. I also find the maximum number of clams that were caught for use in defining the limits on the y-axis.
oldpar<-par('mar')
par(mar=c(4.1,4.1,2.1 ,9.1))
y.max<-max(clams$LEGAL.CLAMS)
The plot function just sets up the dimensions of the plot. I use points and lines functions to plot the means and the line segments connecting them. The closed.func2 function is used to draw thick line segments when the experimental area is closed (Fig. 6)
#plot means and line segments
plot(jitter(as.numeric(newdate)),
clams$LEGAL.CLAMS,
ylim=c(0,y.max), cex=.9, type='n', axes=FALSE,xlab='', ylab='Legal-sized clams',
cex.lab=.85)
axis(1,at=1:6, labels=rep(c('Spring','Fall'),3), cex.axis=.75)
axis(2,cex.axis=.75)
lines(1:6, clam.means[2,], lty=1, col=2, lwd=2)
closed.func2(clam.means[2,], 'pink', 6)
points(1:6, clam.means[2,], pch=15, col=2, cex=1.7)
points(1:6, clam.means[1,], pch=19, col='royalblue4', cex=1.7)
lines(1:6, clam.means[1,], lty=1, col=4, lwd=1.5)
Next I add the jittered sample points.
#add jittered data values
points(jitter(as.numeric(newdate[clams$TREATMENT=='CE'])),
clams[clams$TREATMENT=='CE','LEGAL.CLAMS'],
pch=22,col=2,cex=.75)
points(jitter(as.numeric(newdate[clams$TREATMENT=='CC'])),
clams[clams$TREATMENT=='CC','LEGAL.CLAMS'],
cex=.75,col=4,pch=8)
Next I add explanatory text as well as a vertical line marking the point when the experimental area is first closed.
box()
mtext(text=c('2001','2002','2003'), side=1, line=2, at=c(1.5,3.5,5.5), cex=.85)
abline(v=2.5, col=1, lwd=4, lty=3)
text(1.5,350, 'Both Areas are Open', col='seagreen4', cex=.75, font=2)
text(4.3,350, 'Treatment Area is Closed', col='seagreen4', cex=.75, font=2)
mtext('Core Sound', cex=1, font=2, side=3, line=.5)
Finally I add the legend on the right side of the plot. I use the xpd option of par to enable plotting outside the figure region.
par(xpd=TRUE)
legend(6.5, y.max, c('treatment', 'control', 'treatment (mean)', 'control (mean)', 'open to kicking', 'closed to kicking'), pch=c(22,8,15,19,NA,NA), lty=c(NA,NA,NA,NA,1,1), lwd=c(NA,NA,NA,NA,2,6), col=c(2, 4, 2, 'royalblue4', 'grey30', 'pink'), cex=rep(.8,6), bty='n')
par(xpd=FALSE)
par(mar=oldpar)
The final figure is shown below.
Fig. 7 Sampling results for Core Sound
The response of interest, number of legal-sized clams obtained in a trawl, is an example of count data. As such probability models such as the Poisson or negative binomial are natural candidates. Furthermore, Fig. 7 reveals that the variability of the response increases with its mean, a relationship that also holds true for these probability distributions. The six sampling dates and two treatment levels provide a natural set of groups with which to evaluate the mean-variance relationship. I plot the mean and variance for these groupings and superimpose various theoretical models.
#obtain means and variances
tapply(clams$LEGAL.CLAMS,
list(newdate,factor(clams$TREATMENT)),mean)->sound.means
tapply(clams$LEGAL.CLAMS,
list(newdate,factor(clams$TREATMENT)),var)->sound.vars
as.vector(sound.means)->core.means
as.vector(sound.vars)->core.vars
#plot means and variances
plot(core.means, core.vars, xlab=expression(paste('Mean ', mu)), type='n', ylab=expression(paste('Variance ', sigma^2)), xlim=c(0,max(core.means)), axes=FALSE)
axis(1,cex.axis=.85)
axis(2,cex.axis=.85)
box()
points(sound.means[,'CC'], sound.vars[,'CC'], pch=16,cex=1.1,col='grey60')
points(sound.means[,'CE'], sound.vars[,'CE'],pch=15,cex=1.1)
#fit NB 1
scale.Poisson<-lm(core.vars~core.means-1)$coefficients
abline(0,scale.Poisson,lty=3,col=4,lwd=1.5)
abline(0,1,col='seagreen4',lty=1,lwd=1.5)
core.mean2<-core.means^2
#fit NB 2
nb<-lm(core.vars~offset(core.means)+core.mean2-1)$coefficients
#fit lognormal
ln.mod<-lm(core.vars~core.mean2-1)$coefficients
test.func2<-function(x) x+nb*x^2
test.func3<-function(x) ln.mod*x^2
lines(0:max(core.means), test.func3(0:max(core.means)), lty=1, col='grey30', lwd=2)
lines(0:max(core.means), test.func2(0:max(core.means)), lty=4, col=2, lwd=1.5)
legend(0,max(core.vars), c('Poisson', 'negative binomial NB1', 'negative binomial NB2', 'lognormal', 'control sites', 'treatment sites'), col=c('seagreen4', 4,2, 'grey30', 'grey60',1), lty=c(1,3,2,4,NA,NA), lwd=c(1,1,2,1,NA,NA), pch=c(NA,NA,NA,NA,16,15), bty='n', cex=.8)
mtext('Models of the Mean-Variance Relationship', font=2, side=3, line=.5)

Fig. 8 The mean-variance relationship exhibited by the data
As the plot reveals a quadratic relationship looks quite reasonable, suggesting either a negative binomial or lognormal distribution would be appropriate. Since the negative binomial is particularly appropriate for discrete data, I fit a negative binomial regression model.
I begin by creating a numeric variable for time that is zero at the initial sampling period. The rationale for starting at zero is that it makes the intercept interpretable all by itself.
numdates<-as.numeric(newdate)-1
I fit the model suggested in the problem—a model with an intercept, a treatment effect, a seasonality effect, a linear time effect, a treatment by time interaction, and a treatment by seasonality interaction.
> library(MASS)
> model1<-glm.nb(LEGAL.CLAMS~TREATMENT+ SEASON+ numdates+ numdates:TREATMENT+ SEASON:TREATMENT, data=clams)
> summary(model1,corr=FALSE)
Call:
glm.nb(formula = LEGAL.CLAMS ~ TREATMENT + SEASON + numdates +
numdates:TREATMENT + SEASON:TREATMENT, data = clams, init.theta = 2.59524367148469,
link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.7562 -0.8532 -0.1684 0.5235 2.1111
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.41026 0.16773 20.332 < 2e-16 ***
TREATMENTCE 0.05995 0.23485 0.255 0.79852
SEASONSPR -0.35276 0.15084 -2.339 0.01936 *
numdates 0.14794 0.04513 3.278 0.00105 **
TREATMENTCE:numdates 0.14738 0.05923 2.488 0.01284 *
TREATMENTCE:SEASONSPR -0.17514 0.20767 -0.843 0.39904
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(2.5952) family taken to be 1)
Based on the summary table we should try to drop the TREATMENT:SEASON interaction and perhaps the TREATMENT effect. The order here is important. Interaction terms are always highly correlated with their component effects and this gets translated into the reliability of their parameter estimates. Thus the fact that TREATMENT has a high p-value may be due to the fact that much of its effect has been diluted by the presence of two interaction terms in the model in which TREATMENT is a component. The principle of marginality argues that we should test interaction terms before we test main effects. So I first try fitting a model without TREATMENT:SEASON. The likelihood ratio test carried out below should be consistent with the Wald test reported for TREATMENT:SEASON in the above summary table.
> model2<-glm.nb(LEGAL.CLAMS~TREATMENT+ SEASON+ numdates+ numdates:TREATMENT, data=clams)
> anova(model2,model1)
Likelihood ratio tests of Negative Binomial Models
Response: LEGAL.CLAMS
Model theta Resid. df
1 TREATMENT + SEASON + numdates + numdates:TREATMENT 2.585370 180
2 TREATMENT + SEASON + numdates + numdates:TREATMENT + SEASON:TREATMENT 2.595244 179
2 x log-lik. Test df LR stat. Pr(Chi)
1 -1744.370
2 -1743.676 1 vs 2 1 0.6943417 0.4046915
Since the test is not significant we can safely drop the TREATMENT:SEASON interaction. Here’s the status of our coefficient estimates at the moment.
> summary(model2,corr=FALSE)
Call:
glm.nb(formula = LEGAL.CLAMS ~ TREATMENT + SEASON + numdates +
numdates:TREATMENT, data = clams, init.theta = 2.58536960982247,
link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.7404 -0.8558 -0.2094 0.5267 2.0237
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.47086 0.14896 23.300 < 2e-16 ***
TREATMENTCE -0.07292 0.17692 -0.412 0.68023
SEASONSPR -0.44016 0.10388 -4.237 2.26e-05 ***
numdates 0.13987 0.04383 3.191 0.00142 **
TREATMENTCE:numdates 0.16733 0.05471 3.059 0.00222 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(2.5854) family taken to be 1)
Null deviance: 381.98 on 184 degrees of freedom
Residual deviance: 195.47 on 180 degrees of freedom
AIC: 1756.4
Number of Fisher Scoring iterations: 1
Since the TREATMENT:numdates interaction is highly significant, the principle of marginality would dictate that even though TREATMENT is not significant we should leave it in. But numdates is a continuous variable while TREATMENT is discrete (as is SEASON). Thus we can interpret our model as two separate line segments (on the log scale) for the two treatments.
The full model is the following.
Since Treatment = 0 for Core control and Treatment = 1 for Core experimental we can write separate models for the two treatments as follows.
In Spring 2003, Season = 1. Thus at the first sampling period, Spring 2003, we have
Thus we see that β1, the coefficient of TREATMENT, has the effect of giving the two treatments different starting points. But graphically we’ve seen that initially the two means are very close. Furthermore since the division into control and experimental is just an arbitrary partition of a stretch of continuous ocean and given that both areas were initially under fishing pressure, we have no a priori reason to suspect that there should be any systematic differences between the two areas. Thus it makes perfect sense to drop the TREATMENT term and violate the principle of marginality if the statistical results justify it. I fit a model without TREATMENT.
> model3<-glm.nb(LEGAL.CLAMS~SEASON+ numdates+ numdates:TREATMENT, data=clams)
> summary(model3, corr=FALSE)
Call:
glm.nb(formula = LEGAL.CLAMS ~ SEASON + numdates + numdates:TREATMENT,
data = clams, init.theta = 2.58289429888629, link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.7337 -0.8115 -0.2193 0.5378 2.0340
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.43302 0.11766 29.178 < 2e-16 ***
SEASONSPR -0.44303 0.10380 -4.268 1.97e-05 ***
numdates 0.15084 0.03558 4.240 2.24e-05 ***
numdates:TREATMENTCE 0.14807 0.02956 5.009 5.48e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(2.5829) family taken to be 1)
Null deviance: 381.63 on 184 degrees of freedom
Residual deviance: 195.47 on 181 degrees of freedom
AIC: 1754.5
Number of Fisher Scoring iterations: 1
Theta: 2.583
Std. Err.: 0.272
2 x log-likelihood: -1744.536
All remaining terms are significant. We can also justify these steps using AIC. The final model we've selected also has the lowest AIC.
> sapply(list(model1,model2,model3),AIC)
[1] 1757.676 1756.370 1754.536
Note: an additional rationale for dropping TREATMENT will be seen in Question 4. By including TREATMENT in the model we are using up a degree of freedom for our goodness of fit test. Since to carry out a goodness of fit we need to group our data subject to minimum frequency criteria the maximum number of groups possible is largely out of our control. As a result model simplification is the only place where we can affect the degrees of freedom. In terms of goodness of fit, model parsimony has its rewards.
I write a function to carry out a Pearson chi-square lack of fit test for various groupings of the counts. I make use of the get.breaks function posted on our website. The sole argument to the function shown below is the desired minimum group size.
getchi<-function(minsize) {
numobs<-length(model3$fitted.values)
#obtain the predicted probabilities for the first 1001 counts
p.actual<-unlist(lapply(0:1000, function(x) mean(dnbinom(x, mu=model3$fitted.values, size=model3$theta))))
#divide by expected counts into groups
xfact2<-cut(0:1000, c(get.breaks(minsize/numobs,p.actual),1000))
#calculate the expected probability of each group
estprob<-tapply(p.actual,xfact2,sum)
#obtain empirical probabilities
temp<-table(clams$LEGAL.CLAMS)/sum(table(clams$LEGAL.CLAMS))
#we need to flesh out the observed counts with zeros for counts not observed
#emp.probs1 contains the observed counts and the observed count categories
emp.probs1<-data.frame(as.numeric(names(temp)),as.vector(temp))
#emp.probs2 just lists the categories observed or not plus a column of zeros.
emp.probs2<-data.frame(0:max(as.numeric(names(temp))),rep(0,max(as.numeric(names(temp)))+1))
#give the count categories the same name in both data frame
colnames(emp.probs1)[1]<-'count'
colnames(emp.probs2)[1]<-'count'
#merge the two data frames by count using all=TRUE to ensure non-matching observations are included
emp.probs<-merge(emp.probs1,emp.probs2,all=TRUE)
#change missing values in frequency column to zero
emp.probs[,2]<-ifelse(is.na(emp.probs[,2]),0,emp.probs[,2])
colnames(emp.probs)[2]<-"Freq"
emp.probs<-emp.probs[,c("count","Freq")]
#group empirical probabilities in same categories as expected probabilities
xfact<-cut(emp.probs[,1],
c(get.breaks(minsize/numobs,p.actual),1000))
rawprob<-tapply(emp.probs[,2],xfact,sum)
rawprob<-ifelse(is.na(rawprob),0,rawprob)
#create observed and expected frequencies
out2.obs<-dim(clams)[1]*rawprob
out2.exp<-dim(clams)[1]*estprob
#calculate Pearson statistic
pear.chi<-sum((out2.obs-out2.exp)^2/out2.exp)
#count the number of parameters--estimated coefficients plus the dispersion parameter
parms<-length(coef(model3))+1
#calculate degrees of freedom and p-value for test
df<-length(out2.obs)-parms-1
p<-1-pchisq(pear.chi,length(out2.obs)-parms-1)
round(c(minsize,pear.chi,df,p),2)->output
names(output)<-c('Group size','Pearson', 'df', 'p-value')
output
}
I obtain the lack-of-fit test p-value for a number group sizes.
> sapply(5:14,getchi)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
Group size 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 13.00 14.00
Pearson 35.13 23.11 18.32 21.03 16.20 9.00 11.49 8.34 7.90 6.96
df 26.00 20.00 17.00 15.00 12.00 11.00 9.00 8.00 7.00 6.00
p-value 0.11 0.28 0.37 0.14 0.18 0.62 0.24 0.40 0.34 0.32
Observe that none of the tests are significant even though we've looked at ten different group sizes. Thus we clearly have no evidence of lack of fit.
I write separate functions for the two models. The functions take two arguments--the values of the season variable at each time point and the values of the dates.
#x is season and y is date
CC.func<-function(x,y) exp(model3$coefficients[1]+model3$coefficients[2]*x+
model3$coefficients[3]*y )
CE.func<-function(x,y) exp(model3$coefficients[1]+model3$coefficients[2]*x+
(model3$coefficients[3]+model3$coefficients[4])*y)
seasons<-rep(c(1,0),3)
With these two functions I just modify the code from Question 1 to produce the desired graph.
oldpar<-par('mar')
par(mar=c(4.1,4.1,2.1 ,9.1))
y.max<-max(clams$LEGAL.CLAMS)
#plot means and line segments
plot(jitter(as.numeric(newdate)),
clams$LEGAL.CLAMS,
ylim=c(0,y.max),cex=.9,type='n',axes=FALSE,xlab='',ylab='Legal-sized clams',
cex.lab=.85)
axis(1,at=1:6,labels=rep(c('Spring','Fall'),3),cex.axis=.75)
axis(2,cex.axis=.75)
lines(1:6,CC.func(seasons,0:5),lty=1,col='skyblue3',lwd=4)
lines(1:6,CE.func(seasons,0:5),lty=1,col='palevioletred',lwd=4)
points(1:6,clam.means[2,],pch=15, col='indianred4',cex=1.7)
points(1:6,clam.means[1,],pch=19,col='royalblue4',cex=1.7)
#add jittered data values
points(jitter(as.numeric(newdate[clams$TREATMENT=='CE'])),
clams[clams$TREATMENT=='CE','LEGAL.CLAMS'],
pch=22,col=2,cex=.75)
points(jitter(as.numeric(newdate[clams$TREATMENT=='CC'])),
clams[clams$TREATMENT=='CC','LEGAL.CLAMS'],
cex=.75,col=4,pch=8)
box()
mtext(text=c('2001','2002','2003'),side=1,line=2,
at=c(1.5,3.5,5.5),cex=.85)
abline(v=2.5,col=1,lwd=4,lty=3)
text(1.5,350,'Both Areas are Open',col='seagreen4',cex=.75,font=2)
text(4.3,350,'Treatment Area is Closed',col='seagreen4',cex=.75,font=2)
mtext('Core Sound: Model Results',cex=1,font=2,side=3,line=.5)
par(xpd=TRUE)
legend(6.5,y.max,c('treatment','control','treatment (mean)',
'control (mean)', 'treatment (model)','control (model)'),
pch=c(22,8,15,19,NA,NA),lty=c(NA,NA,NA,NA,1,1),
lwd=c(NA,NA,NA,NA,3,3),
col=c(2,4, 'indianred4', 'royalblue4', 'palevioletred', 'skyblue3'), cex=rep(.8,6), bty='n')
par(xpd=FALSE)
par(mar=oldpar)

Fig. 9 The fitted negative binomial regression model
The plot of Question 5 and the lack of fit tests of Question 4 all suggest the fit is quite good. The treatment clam population is showing a positive trend with time such that the mean number of clams per trawl exceeds that obtained from the control population. These results suggest that closing a part of Core Sound has had a positive effect on the clam populations. Whether the observed trend is adequate to ensure population recovery to sustainable levels is another question entirely.
| Jack Weiss Phone: (919) 962-5930 E-Mail: jack_weiss@unc.edu Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516 Copyright © 2006 Last Revised--May 12, 2006 URL: http://www.unc.edu/courses/2006spring/ecol/145/001/docs/solutions/final.htm |