Lecture 48 (lab 12b)—April 12, 2006
What was covered?
- R topics covered
- ranef function for extracting random effects
- plot options for plotting groupedData objects when there are three levels in the hierarchy
- different ways to specify random effects in 3-level models using lme
- Assessing the normality assumptions for level-1 and level-2 residuals in multilevel models
- Notation for multilevel models when there are three levels
- Fitting 3-level models in R
R functions and commands demonstrated
- lmeControl (from nlme) sets control values for the optimization routines in an lme fit
- lmer (from lme4) replaces the lme function in the new lme4 package
- ranef (from nlme) extracts predictions of the individual random effects from an lme model object
R function options
- collapse= (option of plot.lme) collapses trajectories at a level and plots the mean trajectories for that level
- control= (option of lme) used for passing control information to lme via the lmeControl function
- display= (option of plot.lme) allows the grouping of trajectories by level
- maxIter= (option of lmeControl) sets the maximum number of iterations for the lme optimization algorithm
- msMaxIter= (option of lmeControl) sets the maximum number of iterations for the nlm optimization step inside the lme optimization algorithm
- msVerbose= (option of lmeControl) causes the iteration history to be displayed
- niterEM= (option of lmeControl) sets the number of iterations used by the EM algorithm to refine the initial estimates of the random effects variance-covariance matrix
R packages used
- lme4 contains the function lmer
- nlme contains the functions lme, lmeControl, groupedData, VarCorr, and ranef for mixed effect models
- car package for the qq.plot function
Adding Additional Random Effects
- I briefly return to the larval duration time data set that we studied yesterday.
> #obtain data
> inverts<- read.table('http://www.unc.edu/courses/2006spring/ecol/145/001/data/lab11/74species.csv', sep=',', header=TRUE)
> names(inverts)
[1] "species" "meantemp" "temp" "PLD" "taxon" "climate.3"
[7] "feeding.type" "lntemp" "lnPLD" "n" "PLD.15c" "lnPLD.15c"
[13] "X1000.k" "lograte"
- Yesterday we used the level-1 residuals from a model linear in lntemp to uncover a structural defect in our model. A plot of level-1 residuals versus the predictor lntemp revealed a curvilinear trend indicating that the model had been misspecified. Adding a quadratic term in lntemp at level 1 yielded a statistically significant improvement in the model (as determined from a likelihood ratio test) and a sizeable decrease in AIC. Thus we ended up with the following level-1 model.

- There are now three coefficients at level 1 each of which can give rise to an equation at level 2. Thus the question arises do we need an additional level-2 random effect, i.e., should we permit the newly added quadratic coefficient to be random at level 2? A multilevel model with random effects for the constant, linear, and quadratic terms would be the following.

or in composite form
- The composite formulation resembles the way the model is specified in R. To add an additional random effect the basic form is the following.
random = ~1 + x + x^2|species
where as usual explicit listing of the intercept is actually unnecessary.
- To test whether we need to include a random quadratic effect we should fit two models: one with and one without a random quadratic effect and compare them with a likelihood ratio test. To get convergence, I center the predictor lntemp around log(15).
> library(nlme)
> #random slopes and intercepts
> model4a<-lme(lnPLD~I(lntemp-log(15))+ I((lntemp-log(15))^2), random=~I(log(temp)-log(15))|species, data=inverts, method='ML')
> #random constant, linear, and quadratic terms
> model4b<-lme(lnPLD~I(lntemp-log(15))+I((lntemp-log(15))^2), random=~I(log(temp)-log(15))+ I((log(temp)-log(15))^2)|species, control=lmeControl(maxIter=200, msMaxIter=200, niterEM=100, msVerbose=TRUE), data=inverts, method='ML')
- There a couple of new features in the second function call.
- The random statement includes both the linear and quadratic random effect variables (and an implicit intercept).
random=~I(log(temp)-log(15))+ I((log(temp)-log(15))^2)|species
- Running this on Mac OSX I ran into convergence problems, so I've added some control options using the control argument to increase the number of iterations from the default values. The lmeControl function is the way in which optimization settings are passed to the control argument.
- maxIter=200 increases the number of iterations for the lme optimization from 50 to 200.
- msMaxIter=200 increases the maximum number of iterations for the nlm optimization step from 50 to 200.
- niterEM=100 increases the number of iterations used by the EM algorithm in finding initial estimates for the random effects covariance matrix from 25 to 100.
- msVerbose=TRUE turns on the trace feature so that the entire iteration history is printed out and can be followed.
- I use the anova function to carry out the likelihood ratio test to compare the two models.
> anova(model4a,model4b)
Model df AIC BIC logLik Test L.Ratio p-value
model4a 1 7 199.2543 222.9457 -92.62713
model4b 2 10 196.1266 229.9715 -88.06329 1 vs 2 9.12768 0.0276
So we see that the likelihood ratio test is significant at the .05 level and that there has been a moderate decrease in AIC. This suggests that we should include random effects for all three level-1 parameters.
- As was discussed in lecture 43 the likelihood ratio test just carried out is incorrect. To compare models 4a and 4b we should use a mixture of
and
random variables with equal weights 0.5. Thus in our example above the p-value of our test is calculated as follows.
> .5*(1-pchisq(9.12768,3)) + .5*(1-pchisq(9.12768,2))
[1] 0.01903128
so in this instance our conclusion does not change.
Normality assumptions
- One of the assumptions of the multilevel model is that both the level-1 and level-2 residuals are normally distributed. More specifically we assume
, 
where the level-1 residuals are independent of the level-2 residuals. Notice that the level-2 residuals technically should have a multivariate normal distribution with the given covariance matrix.
- To assess the univariate normality of the level-1 residuals we can examine a normal probability plot. The qq.plot function from the car package is my favorite for generating normal probability plots. I use the label option so that I can label outliers by clicking on them.
> library(car)
> qq.plot(residuals(model4b, type='pearson'), label=inverts$species)
- There appear to be 11 residuals that are outside the confidence bands. Since these are 95% confidence bands we expect on average 5% of the residuals to fall outside of the bands. How many should this be?
> dim(inverts)
[1] 218 14
> .05*218
[1] 10.9
So we expect roughly 11 observations to fall outside of the bands. Since this is exactly what we observe and the residual plot otherwise looks fairly straight, we find no evidence to suggest the level-1 residuals are non-normal in their distribution.
- The level-2 residuals are more problematic. A common way to assess multivariate normality is to examine the level-2 residuals separately to see if they individually have a univariate normal distribution. The problem with this is that while multivariate normality does imply that the marginal distributions should be univariate normal, the reverse does not hold. Thus even if individually the random effects appear to be normally distributed that tells us nothing about their joint distribution. Given these caveats I examine their univariate distributions anyway.
- To obtain the random effects I use the ranef function. Below are the level-2 residuals of the first five level-2 units. The random effects corresponding to different level-1 parameters occur in different columns of a matrix.
> ranef(model4b)[1:5,]
(Intercept) I(log(temp) - log(15)) I((log(temp) - log(15))^2)
Amphiprion melanopus 0.1392374 -0.07367379 -0.031719350
Armases miersii 0.5504058 -0.19963899 -0.080403607
Balanus amphitrite 1 -0.2595831 0.16854758 0.074450184
Balanus eburneus -0.7385579 0.04149017 -0.003295344
Balanus trigonus -0.6313596 -0.16718955 -0.102341535
- To assess univariate normality I produce a normal probability plot for each column separately.
> qq.plot(ranef(model4b)[,1], ylab=expression(u["0 i"]))
> qq.plot(ranef(model4b)[,2], ylab=expression(u["1 i"]))
> qq.plot(ranef(model4b)[,3], ylab=expression(u["2 i"]))

Fig. 2 Assessing whether the marginal distributions of the level-2 random effects are univariate normal
- As Fig. 2 shows only the random effects of the quadratic term look even remotely normal. The intercept random effects appear to be highly skewed, while ten (about 13.5%) of the random effects for the linear term fall outside of the confidence bands.
- Fortunately, the multivariate normal distribution of the level-2 random effects is a less important assumption than is the assumption that the the level-1 residuals have a normal distribution. Also the model we are considering does not contain any level-2 predictors so there is still plenty of opportunity to improve the distributions shown in Fig. 2 be adding additional explanatory variables at level 2.
Multilevel models with three hierarchical levels
Visualizing the data
- Thus far all of the multilevel models we've considered have had only two levels of structure. We now consider the problem of analyzing a multilevel model with three levels. The apple data set we began looking at last time does in fact have three levels of information.
- Level 1 are the repeated measures on individual apples.
- Level 2 consists of the apples
- Level 3 consists of the trees from which multiple apples were selected.
- I begin by reloading the data from last time and pulling out only the variables that are needed for the analysis. I then construct a groupedData object to illustrate some of the additional graphical features that are available for displaying hierarchical data with more than two levels.
> apples<- read.table('http://www.unc.edu/courses/2006spring/ecol/145/001/data/lab12/AppleData.csv',
sep=',', header=TRUE)
> new.apples<-apples[,c("TREE", "APPLE", "appleid", "time", "diam")]
> new.grp<-groupedData(diam~time|TREE/APPLE, data=new.apples)
> library(lattice)
> #change background color to white
> trellis.par.set(col.whitebg())
> plot(new.grp)
- The only things different from a plot for a 2-level model are that
- the tree identifier is included as part of the label in the strip panel, and
- the individual apple trajectories are plotted in tree order.
That's not too exciting. Fortunately there are additional options available.
- In Fig. 3 observations are displayed at the lowest grouped level in the hierarchy, APPLE. We can modify this and display at the highest level, TREE, by adding the option display=1 to the plot function.
> plot(new.grp,display=1)

Fig. 4 Trajectories grouped by the highest level
- Now apples coming from the same tree are displayed together. We can collapse the individual apple trajectories to a single mean trajectory for each tree with the collapse=1 option.
> plot(new.grp,display=1,collapse=1)

Fig. 5 Trajectories collapsed to the tree mean
Fitting the models
- Just as with 2-level models, the proper place to start is the unconditional means model, a model with no predictors but with a random intercept at each level. The notation for a 3-level model is slightly more complicated because we now need three subscripts to reference the levels. So in the equations that follow, i represents the tree, j the apple, and k the individual observation on that apple.

or in composite form
where
,
, and
. Using the same notation explained above for the 3-level groupedData object, we fit the model in R as follows.
> model0a<-lme(diam~1, random=~1|TREE/APPLE, data=new.apples, method='ML', na.action=na.omit)
> AIC(model0a)
[1] -1092.074
> VarCorr(model0a)
Variance StdDev
TREE = pdLogChol(1)
(Intercept) 0.0001511123 0.01229278
APPLE = pdLogChol(1)
(Intercept) 0.0093915920 0.09691023
Residual 0.0030494623 0.05522194
- The variance components tell us how much variance is available to be explained at each level. From the output we see that
= 0.003
= 0.00939, and
= 0.00015 . From this we can calculate the fraction of the total variance that is attributable to variation between trees.
> 0.0001511123/(0.0001511123+0.0093915920+0.0030494623)
[1] 0.0120005
The fraction of the total variance that is attributable to variation between apples on the same tree is
> 0.0093915920/(0.0001511123+0.0093915920+0.0030494623)
[1] 0.7458281
Finally the fraction of the total variance that is due to variation between different observations on the same apple is
> 0.0030494623/(0.0001511123+0.0093915920+0.0030494623)
[1] 0.2421714
- These variance fractions can be used to us in modeling. Levels that account for a lot of the total variance are natural places to include predictors to help explain that variance. What we see from the above fractions is that there is only a relatively trivial amount of variation between trees. Practically speaking it doesn't seem to matter whether apples were selected from the same tree or different trees.
- With so little of the variance occurring between trees there really is no need to continue with the 3-level model. A 2-level model would suffice. Of course with only ten trees to work with we probably don't have enough data to estimate this tree variance component very accurately. So the absence of a substantial TREE effect may say more about the experimental design than it does about tree to tree variation.
- If we include time as a level-1 predictor then we have the option of making time random at both the apple and the tree level. The following model fits a random intercept and random time at level-2, but only a random intercept at level-3.
> model1a<-lme(diam~time,random=list(TREE=~1,APPLE=~time), data=new.apples, method='ML', na.action=na.omit)
- The required syntax in the random argument lists the top level first followed by the level nested inside the top level. I examine the AIC and variance components.
> AIC(model1a)
[1] -1918.749
> VarCorr(model1a)
Variance StdDev Corr
TREE = pdLogChol(1)
(Intercept) 4.483999e-07 0.0006696267
APPLE = pdLogChol(time)
(Intercept) 7.847503e-03 0.0885861352 (Intr)
time 5.183084e-05 0.0071993638 0.542
Residual 2.614533e-04 0.0161695169
- Adding time has led to a sizeable drop in AIC. Observe that the amount of variance estimated to exist at level-3 is negligible relative to level 2. It's unclear whether the variance is really that small or if the software is unable to estimate it accurately. Although it's seems to be pointless, I next let time be random at level 3 also.
> model2a<-lme(diam~time, random=list(TREE=~time, APPLE=~time), data=new.apples, method='ML', na.action=na.omit)
> VarCorr(model2a)
Variance StdDev Corr
TREE = pdLogChol(time)
(Intercept) 1.658243e-04 0.012877278 (Intr)
time 1.290148e-05 0.003591863 0.978
APPLE = pdLogChol(time)
(Intercept) 7.693197e-03 0.087710869 (Intr)
time 3.880095e-05 0.006229041 0.537
Residual 2.612330e-04 0.016162703
- For each level-1 parameter we see that the amount of estimated variability at the apple level exceeds the amount estimated at the tree level by about 3:1 for time and about 46:1 for the intercept. (Note: this model did not converge with the default control settings when I tried fitting it on the Macintosh platform using OS 10.3; the results displayed here are from running R on Windows.)
- Notice that because the same terms are being declared random at both levels we could have used the following shortcut notation.
> model2a<-lme(diam~time, random=~time|TREE/APPLE, data=new.apples, method='ML', na.action=na.omit)
- Lastly we can try adding the level-1 correlation structure we used last time.
> model3a<-lme(diam~time, random=list(TREE=~time, APPLE=~time), data=new.apples, method='ML', na.action=na.omit, correlation=corAR1())
- I compare the AIC values for all the models we've fit.
> sapply(list(model0a, model1a, model2a, model3a), AIC)
[1] -1092.074 -1918.749 -1920.624 -1935.087
- So as was the case with the level-2 models we fit last time, a random slopes and intercepts model with an an AR(1) structure for the residuals ranks best. It is instructive to fit that level-2 model and see if anything has been gained by including the third level, TREEs.
> model3<-lme(diam~time, random=~time|appleid, data=new.apples, method='ML', na.action=na.omit, correlation=corAR1())
> AIC(model3)
[1] -1936.372
- Observe that the AIC is actually lower for the 2-level model than the 3-level model. This suggests little has been gained by adding the third level to the hierarchy. More than likely the problem is that the amount of data we have just doesn't justify this degree of complexity.
Multilevel models using the lme4 package (not done in class)
- There is a second package in R that handles random effects and multilevel designs. It's called the lme4 package and the function that does what lme does is called lmer. This package will eventually replace nlme, but currently it lacks a number of the features available in nlme, hence we haven't been using it. I briefly introduce it here because its way of handling 3-level models is quite different from that of nlme.
- First, to fit a 2-level model with random time we would proceed as follows.
> library(lme4)
> lmer(diam~time+(time|appleid), data=new.apples, method='ML', na.action=na.omit)
Notice that there is no random statement. Instead random effects are entered as if they were ordinary predictors with the exception that the grouping variable is included as part of the predictor name and the entire term, random effect vertical bar grouping factor, is enclosed in parentheses. When an additional level is added, for example TREE, that level just appears as another random predictor in the equation.
- There is a complication when fitting 3-level models with lmer that doesn't arise with lme. The lmer function can handle both nested levels (the kind that lme handles) and crossed levels (not permitted with lme). A crossed level doesn't make any sense for the apple data set but it would be as if an apple could simultaneously come from two trees. In other scenarios a crossed level might make sense.
- The problem for us using the apple data set is that lmer doesn't automatically assume that the levels it encounters are necessarily nested. As a result, if the identifier we use to identify the APPLE level takes on the same values for different trees, lmer will assume we want crossed levels and will estimate things accordingly. The variable APPLE we used above to identify the apple level when using lme repeats its values on different trees.
> apply(table(apples$APPLE, apples$TREE), 1, function(x) sum(x>0))
1 2 3 4 5 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
4 4 2 3 4 2 3 4 5 4 3 1 3 3 1 6 5 2 5 2 2 3 5 4
- Here we see, for example, that APPLE=1 appears on four trees. On the other hand the variable appleid is unique to each tree.
> apply(table(apples$appleid, apples$TREE), 1, function(x) sum(x>0))
1 4 5 10 11 13 14 15 17 18 19 25 32 34 36 40 42 48 49 50
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
51 60 66 67 68 70 72 73 74 76 77 78 83 86 92 93 95 99 100 101
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
111 114 118 120 124 128 129 130 132 135 137 139 142 144 145 149 152 154 159 175
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
177 180 184 187 190 195 196 208 210 212 227 230 233 234 235 242 243 246 247 248
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
- The point is that if we enter the following model we will get the wrong answer.
> #wrong answer (fits a non-nested model)
> lmer(diam~1+(1|APPLE)+(1|TREE), data=new.apples, method='ML', na.action=na.omit)
- On the other hand, if we use appleid to identify the APPLE level the results we obtain will match the ibes we obtained using lme (model0a).
> #correct answer for nested levels
> lmer(diam~1+(1|appleid)+(1|TREE), data=new.apples, method='ML', na.action=na.omit)
Course Home Page