Lecture 46—Monday, April 10, 2006
What was covered?
- Standard representations of multilevel models
- Correlation structures in multilevel models
- Methods of estimation
Terminology defined
A standard mixed model representation of multilevel models
- Consider a two-level multilevel model with a single level-1 predictor, a single level-2 predictor for the intercept, and in which random effects are included in the level-2 equations for both level-1 parameters. We can represent the model as follows.

where as usual we assume
with
independent of
.
- Combining the levels we obtain the following composite model.

- Let's focus on level-2 unit i. In our larval development data set this would correspond to focusing on a single species, say the ith species. Suppose there are ni observations for this species. Then the composite model corresponds to ni separate equations, one for each observation. Thus we have the following.

- This system of equations can be written as a single vector equation in which fixed effects and random effects are sequestered into their own vectors.

- The fixed effects vector and random effects vector in the above equation can each be written as matrix products as shown below.

or more succinctly as
where

- Writing it this way treats β and ui as vectors of regression coefficients. Observe that the matrix Zi is a subset of the matrix Xi. This will always be the case because Xi contains all level-1 and level-2 predictors found in the model while Zi can only contain level-1 predictors, and then only those that have been declared to be random.
- We can write a similar equation for each level-2 unit (species) in the model. If we assemble all of the m level-2 unit equations into a single matrix equation we obtain the following.

or finally most succinctly as
. This is often referred to as the Laird-Ware formulation of the mixed effects model. Sometimes to emphasize that the vector u is a vector of random regression coefficients the equation is written
.
- In manuscripts stating the Laird-Ware equation is often the only explanation given for the analysis that was done. While succinct it does hide a lot of details.
Correlation structure in multilevel models
- As we've discussed previously the primary motivation for including random effects in a regression model is to model observational heterogeneity. By including random effects we account for the fact that not all observations are equivalent. But as we saw in Lecture 41, the inclusion of random effects also induces a correlation between observations that share the same random effects.
- The induced correlation structure changes depending on how the random effects are incorporated in the model. For the unconditional means model (in which only the intercept is random and there are not predictors at any level) we saw that the induced correlation is as follows.

- So observations coming from the same level-2 unit are correlated, but otherwise they are not. Observe that the correlation structure is really very simple. Any two observations from the same level-2 unit have the same correlation. If we let
then the correlation matrix for any level-2 unit takes the following form.

- Such a constant correlation matrix is said to exhibit compound symmetry. When we assemble these compound symmetric matrices ρ for all the level-2 units into a single matrix we obtain the following block-diagonal correlation structure for the model.

where each ρi is a compound symmetric correlation matrix like the one shown above and 0 is a matrix of zeros.
- As soon as additional random effects are added to the mix, the correlation structure becomes more complicated. Consider the random slopes and intercepts model with one level-1 predictor and no level-2 predictors.

where
with
independent of
.
- For this model the correlation for two observations, say observation j and k, coming from the same level-2 unit is given by the following expression.

and 0 for observations coming from different level-2 units. This is obviously an exceedingly complicated expression and notice that the correlation varies as a function of the value of the level-1 predictor.
- The point I wish to make is that a mixed effects model always induce a correlation structure in the data set and given the nature of the data structure it may or may not be appropriate. For example, if the level-1 observations are temporal measurements on the same level-2 unit then there are some natural choices for the correlation structure that involve decreasing correlation with increasing temporal distance. We will see in the computer lab how it is possible to add an additional correlation structure on top of what is induced by the random effects.
Methods of estimating multilevel models
- It turns out that there are two distinct approaches to estimating parameters in multilevel models, full maximum likelihood (ML) and restricted (also called residual) maximum likelihood (REML). Furthermore when the multilevel model is a generalized linear mixed model then there is more than one way to carry out the full maximum likelihood estimation. The details are probably more than you want to know, but you should be aware of the choices you have and you should certainly know what the default method is when you fit a model.
Full maximum likelihood (ML)
- Methods employing maximum likelihood all start from the same point, the likelihood function. Recall that if we have a random sample of observations,
, that are independent and identically distributed with density function
where θ is potentially a vector of parameters of interest, then the likelihood of our sample is the product of the individual density functions.

- With correlated data, i.e., data with a structure, the situation is more complicated. The data consist of observations
where i indicates the level-2 unit and j the level-1 observation within that unit. Typically the level-2 units are a random sample but not the level-1 units.
- Furthermore if we model the structure using a mixed model then the
are not the only random quantities to consider. The random effects
also have a distribution. Thus the natural starting place in constructing the likelihood is with the joint density function
. From elementary probability theory we can write

- Here
is the distribution of the observed data assuming the values of the random effects are known. This will be the probability model we assume for our data at level 1. For the examples we've considered thus far the assumed distribution has always been normal. For normal models this density is also a function of the vector of fixed effect parameters β and the variance
.
is the distribution of the random effects. In all of our examples this has been a multivariate normal distribution and is function of variance parameters in our model,
,
, etc. that I'll collectively denote as τ.
- Thus the joint likelihood of our data and the random effects is given by the following.

- The random effects are generally treated as nuisance parameters and by integrating them out we can obtain the marginal likelihood of our data.

- The bottom line is that obtaining maximum likelihood estimates involves grappling with multiple iterated integrals. For normal-based models the integrand can be simplified and maximum likelihood solutions are usually not difficult to obtain. When the probability density g is from the exponential family, standard numerical approximation schemes are required to approximate the integral. Depending upon the problem the approximation obtained for the integral may not be very good and as a result the quality of the estimates we obtain will be poor.
- The lmer function of the lme4 package in R can fit multilevel models (and other mixed effect models) when the probability density function g is from the exponential family. Thus it can be used for carrying out multilevel logistic regression and Poisson regression (although not negative binomial regression).
- It offers three methods for approximating the integral in the likelihood: PQL (penalized quasi-likelihood), Laplace, and AGQUAD (adaptive Gaussian quadrature). I've listed these in order of increasing accuracy although for a given problem using a more accurate method may not yield an explicit solution due to convergence problems.
Restricted (residual) maximum likelihood (REML)
- REML is an alternative to full maximum likelihood estimation. Rather than maximizing the likelihood of the data, it maximizes the likelihood of the observed residuals. REML obtains estimates of the fixed effects using non-likelihoodlike methods, such as ordinary least squares or generalized least squares, and then using these estimates it maximizes the likelihood of the residuals (subtracting off the fixed effects) to obtain estimates of the variance parameters.
- REML is a good alternative to ML when the sole focus is estimation of variance components. The variance components obtained via ML are biased when the samples are small while REML estimates are unbiased.
- The problem with REML for model building is that the "likelihoods" obtained for different fixed effects are not comparable. Hence it is not valid to compare models with different fixed effects using a likelihood ratio test or AIC when REML is used to estimate the model. For this reason we have not used REML in this course.
- You must be aware of the distinction between ML and REML because REML is the default for most software packages. Thus you will generally need to explicitly specify maximum likelihood estimation if that's what you desire.
Course Home Page