Lecture 47 (lab 12)—Tuesday, April 11, 2006

What was covered?

R functions and commands demonstrated

R function options

R packages used

Using lattice graphics to display the results of a multilevel model

> #obtain data
> inverts<- read.table('http://www.unc.edu/courses/2006spring/ecol/145/001/data/lab11/74species.csv', sep=',', header=TRUE)
> names(inverts)
 [1] "species"      "meantemp" "temp"  "PLD" "taxon"   "climate.3"
 [7] "feeding.type" "lntemp"   "lnPLD" "n"   "PLD.15c" "lnPLD.15c"
[13] "X1000.k"      "lograte"

> library(nlme)
> model2<-lme(lnPLD~I(lntemp-log(15)), random=~I(lntemp-log(15))|species, data=inverts, method='ML')

or in composite form

plot.lattice<-function(inverts,model2,seed)
{
set.seed(seed)
sub1<-sample(unique(inverts$species),9)
sub2<-inverts[inverts$species %in% sub1,]
#obtain random effects
ranef.list<-ranef(model2)
#obtain fixed effect estimates
fixef.list<-fixef(model2)
#obtain list of species
species.list<-sort(unique(inverts$species))
#first create data frame with same linking field name as in raw data
ran.data<-data.frame(rownames(ranef.list), ranef.list)
colnames(ran.data)<-c('species','int','slope')
#add random effects to data set by species
inverts2<-merge(inverts, ran.data)
#select species that are to be plotted
data.combined<-inverts2[inverts2$species %in% sub1,]

#change background color to white
trellis.par.set(col.whitebg())

#generate graph
xyplot(lnPLD~I(lntemp-log(15))|species,data=data.combined, layout=c(3,3),
xlab=expression(log(temperature/15)),ylab=expression(log(PLD)),
panel=function(x,y,subscripts){
panel.xyplot(x,y,pch=16)
panel.abline(lm(y~x),lty=1,col=4)
panel.abline(c(fixef.list[1],fixef.list[2]),lty=1,lwd=2,col=2)
panel.abline(c(data.combined$int[subscripts][1]+fixef.list[1],
data.combined$slope[subscripts][1]+fixef.list[2]),lty=2)
},
subscripts=T,strip = strip.custom(par.strip.text = list(cex = 0.85)),
par.settings = list(axis.text = list(cex = 0.7)),
key=list(lines=list(lty=c(1,1,2),col=c(4,2,1)),
text=list(c("OLS fitted trajectory",
"population average trajectory",
"model-based empirical Bayes trajectory"),cex=rep(.85,3)),
border=0,space='top'))
}

> library(lattice)
> plot.lattice(inverts,model2,10)
> plot.lattice(inverts,model2,17)

      

Fig. 1 Fitted trajectories for random selections of species

lnPLD~I(lntemp-log(15))|species 

This defines plotting variables and the grouping variable (species)

layout=c(3,3)

Sets up a tableau of 3 rows and 3 columns of plots

xlab=expression(log(temperature/15))

Label for x-axis. Here expression is used to display mathematical notation. Note:

panel=function(x,y,subscripts){ 

Begins definition of panel function for defining what is included in each plot. The subscripts argument references the elements of each group and is needed only for adding the empirical Bayes trajectories

panel.xyplot(x,y,pch=16)

Adds points

panel.abline(lm(y~x),lty=1,col=4)

Adds OLS line

panel.abline(c(fixef.list[1], fixef.list[2]), lty=1, lwd=2, col=2)

Adds population-averaged line

panel.abline(c(data.combined$int[subscripts][1] + fixef.list[1], data.combined$slope[subscripts][1] + fixef.list[2]),lty=2)

Adds subject-specific lines. Subscripts identifies the elements belonging to a group (species) and [1] selects only the first of these elements (needed because the random effects are duplicated due to the merge). The keyword subscripts is needed because we have to calculate a different intercept and slope for each species based on the variables that are referenced here.

subscripts=T

Required if subscripts are to be used

strip = strip.custom(par.strip.text = list(cex = 0.85))

Adjusts size of text in the strips above each plot (the species names)

par.settings = list(axis.text = list(cex = 0.85))

Adjusts size of numbers on axis

key=list(lines=list(lty=c(1,1,2), col=c(4,2,1)), text=list(c("OLS fitted trajectory", "population average trajectory", "model-based empirical Bayes trajectory"), cex=rep(.85,3)), border=0,space='top')

Creates legend to label lines appearing in plots

Checking the structural form of the level-1 model

Fig. 2   Assessing the structural form at level 1

> plot(inverts$lntemp,residuals(model2, type='pearson'), xlab='log(temperature)', ylab='Pearson residuals', ylim=c(-1,1))
> lines(lowess(residuals(model2, type='pearson')~inverts$lntemp), col=2, lwd=2)

> model3<-lme(lnPLD~I(lntemp-log(15))+I((lntemp-log(15))^2), random=~I(lntemp-log(15))|species, data=inverts, method='ML')
> anova(model2,model3)

       Model df      AIC      BIC    logLik   Test  L.Ratio p-value
model2     1  6 210.4372 230.7441 -99.21859
model3     2  7 199.2543 222.9457 -92.62713 1 vs 2 13.18290   3e-04

The result is highly significant. Notice also that the AIC has dropped substantially. Thus both criteria argue for including a quadratic term.

Each of these parameters can have a level-2 equation associated with it. In particular we now have the option of allowing the coefficient of the quadratic term to vary randomly across species. I test the need for that below.

> anova(model3,model3.5)
         Model df      AIC      BIC    logLik   Test  L.Ratio p-value
model3       1  7 199.2543 222.9457 -92.62713
model3.5     2 10 196.1694 230.0143 -88.08469 1 vs 2 9.084886  0.0282

So we have weak evidence that the quadratic term should include a random effect.

Using level-2 random effects to address research questions of interest

> ranef(model3)[1:10,]
                     (Intercept) I(lntemp - log(15))
Amphiprion melanopus   0.1171001         -0.02799795
Armases miersii        0.4770219         -0.09358326
Balanus amphitrite 1  -0.2219984          0.13166601
Balanus eburneus      -0.7310860          0.05395505
Balanus trigonus      -0.6782908         -0.06610844
Callianassa tyrrhena  -0.7457328         -0.38392852
Cancer gracilis        0.3637000         -0.11882529
Cancer irroratus       0.1905045          0.06848089
Cancer magister        0.8615153         -0.16859720
Cancer oregonensis     0.3654544         -0.11613289

> out.resids<-resids.table(inverts,model3)
> out.resids[1:10,]

                     Species.num         u0          u1     sd.u0     sd.u1
Amphiprion melanopus           1  0.1171001 -0.02799795 0.2098857 0.2864359
Armases miersii                2  0.4770219 -0.09358326 0.2077348 0.2802973
Balanus amphitrite 1           3 -0.2219984  0.13166601 0.1426457 0.2228906
Balanus eburneus               4 -0.7310860  0.05395505 0.1768714 0.2569641
Balanus trigonus               5 -0.6782908 -0.06610844 0.1552050 0.2482023
Callianassa tyrrhena           6 -0.7457328 -0.38392852 0.1390658 0.2253486
Cancer gracilis                7  0.3637000 -0.11882529 0.1439511 0.2110557
Cancer irroratus               8  0.1905045  0.06848089 0.1293948 0.1845003
Cancer magister                9  0.8615153 -0.16859720 0.1863755 0.2525198
Cancer oregonensis            10  0.3654544 -0.11613289 0.1691172 0.2404039

> graph.u1(out.resids,inverts)
> graph.u0(out.resids,inverts)

   

Fig. 3  Caterpillar plots for level-2 residuals

Identifying level-2 units that are poorly fit by the model

Fig. 4  Identifying unusual level-2 units

Checking the adequacy of the correlation structure induced by a mixed model

> apples<- read.table('http://www.unc.edu/courses/2006spring/ecol/145/001/data/lab12/AppleData.csv',
sep=',', header=TRUE)
> names(apples)

[1] "TREE" "APPLE" "size" "appleid" "time" "diam"
> apples[1:15,]
  TREE APPLE size appleid time diam
1    1     1    7       1    1 2.90
2    1     1    7       1    2 2.90
3    1     1    7       1    3 2.90
4    1     1    7       1    4 2.93
5    1     1    7       1    5 2.94
6    1     1    7       1    6 2.94
7    1     4    7       4    1 2.86
8    1     4    7       4    2 2.90
9    1     4    7       4    3 2.93
10   1     4    7       4    4 2.96
11   1     4    7       4    5 2.99
12   1     4    7       4    6 3.01
13   1     5    7       5    1 2.75
14   1     5    7       5    2 2.78
15   1     5    7       5    3 2.80

> apples[115:130,]
    TREE APPLE size appleid time diam
115    2    25    7      50    1 2.75
116    2    25    7      50    2 2.80
117    2    25    7      50    3 2.83
118    2    25    7      50    4 2.85
119    2    25    7      50    5 2.86
120    2    25    7      50    6 2.88
121    3     1    7      51    1 2.91
122    3     1    7      51    2 3.00
123    3     1    7      51    3 3.02
124    3     1    7      51    4 3.03
125    3     1    7      51    5   NA
126    3     1    7      51    6   NA
127    3    10    7      60    1 2.81
128    3    10    7      60    2 2.89
129    3    10    7      60    3 2.87
130    3    10    7      60    4 2.93

Fig. 5  Growth data for individual apples

> apples.sub<-apples[apples$TREE<=2,]
> grp<-groupedData(diam~time|appleid, data=apples.sub)
> plot(grp)

> model0<-lme(diam~1, random=~1|appleid, data=apples, method='ML', na.action=na.omit)

Note the use of the option na.action=na.omit. This instructs lme to drop observations that have missing data for any of the variables currently used in the model (only diam here). The default action with missing data is na.fail which causes the program to abort and not produce any output.

> VarCorr(model0)
appleid = pdLogChol(1)
            Variance    StdDev
(Intercept) 0.009545563 0.09770140
Residual    0.003049309 0.05522054

> 0.009545563/(0.009545563+0.003049309)
[1] 0.7578928

> model1<-lme(diam~time, random=~1|appleid, data=apples, method='ML', na.action=na.omit)
> VarCorr(model1)

appleid = pdLogChol(1)
            Variance     StdDev
(Intercept) 0.0105001410 0.10247020
Residual    0.0004234962 0.02057902

> (0.003049309-0.0004234962)/0.003049309
[1] 0.8611173

So 86% of variability in apple diameter is explained by its linear relationship with time. This is quite high. Notice this truly is a "pseudo" R2 here because the level-2 variance, τ2, actually increased slightly in going from model0 to model1.

> modelOLS<-lm(diam~time, data=apples, na.action=na.omit)
> sapply(list(modelOLS,model1),AIC)

[1] -780.0338 -1820.8447

So we see that the AIC is substantially lower for the model that includes structure.

> model2<-lme(diam~time, random=~time|appleid, data=apples, method='ML', na.action=na.omit)
> sapply(list(model1,model2), AIC)

[1] -1820.845 -1920.751

> anova(model1, model2)
       Model df       AIC       BIC   logLik   Test  L.Ratio p-value
model1     1  4 -1820.845 -1804.399 914.4224
model2     2  6 -1920.751 -1896.082 966.3755 1 vs 2 103.9062  <.0001

> VarCorr(model2)
appleid = pdLogChol(time)
            Variance     StdDev      Corr
(Intercept) 7.847939e-03 0.088588592 (Intr)
time        5.182587e-05 0.007199019 0.542
Residual    2.614601e-04 0.016169727

,

instead of the more general covariance matrix we've been using,

.

I fit this model and compare it to the model in which the random effects are allowed to be correlated.

> model3<-lme(diam~time, random=list(appleid=pdDiag(~time)), data=apples, method='ML', na.action=na.omit)
> sapply(list(model2, model3), AIC)

[1] -1920.751 -1906.999
> anova(model2, model3)
       Model df       AIC       BIC   logLik   Test  L.Ratio p-value
model2     1  6 -1920.751 -1896.082 966.3755
model3     2  5 -1906.999 -1886.442 958.4996 1 vs 2 15.75172   1e-04

Here is the jth level-1 residual for level-2 unit i and is the residual from the same unit but k time units later. N(k) is the number of terms that go into the numerator sum and N(0) is the total number of different residuals used in the numerator calculation.

> ACF(model2)
  lag        ACF
1   0  1.0000000
2   1 -0.1697802
3   2 -0.3497668
4   3 -0.3275452
5   4 -0.1584048
6   5  0.5242356

where z(p) denotes the quantile of a standard normal distribution such that P(Zz(p)) = p.

> plot(ACF(model2),alpha=.05)
> plot(ACF(model2),alpha=.01)

    

Fig. 6  ACF results for α = .05 (left) and α = .01 (right)

Observe that even with the Bonferroni correction there are significant autocorrelations at the first three lags. (The significant autocorrelation at lag 5 can probably be ignored since there isn't much data going into its calculation.) Also notice that contrary to expectation the correlations between nearby observations (in time) are negative!

> model4<-lme(diam~time, random=~time|appleid, data=apples, method='ML', na.action=na.omit, correlation=corAR1())
> sapply(list(model2,model4),AIC)

[1] -1920.751 -1936.372


> summary(model4)

Linear mixed-effects model fit by maximum likelihood
Data: apples
      AIC       BIC  logLik
-1936.372 -1907.592 975.186

Random effects:
Formula: ~time | appleid
Structure: General positive-definite, Log-Cholesky parametrization
            StdDev      Corr
(Intercept) 0.085754233 (Intr)
time        0.006026921 0.83
Residual    0.021343046

Correlation Structure: AR(1)
Formula: ~1 | appleid
Parameter estimate(s):
      Phi
0.4937614

Fig. 7   ACF after including an AR(1) correlation structure

Fixed effects: diam ~ time
               Value   Std.Error  DF   t-value p-value
(Intercept) 2.831036 0.009996493 370 283.20296       0
time        0.028944 0.000973891 370  29.71994       0
Correlation:
    (Intr)
time 0.388

Standardized Within-Group Residuals:
       Min         Q1       Med        Q3       Max
-2.3548201 -0.4006783 0.1000436 0.6102602 2.8733065

Number of Observations: 451
Number of Groups: 80

> plot(ACF(model4),alpha=.01)

Cited References

Course Home Page


Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516
Copyright © 2006
Last Revised--April 20, 2006
URL: http://www.unc.edu/courses/2006spring/ecol/145/001/docs/lectures/lecture47.htm