Assignment 10 — Solution

Question 1

Read in and subset the data

# read in data
> srrs2 <- read.table("http://www.unc.edu/courses/2007spring/enst/562/001/assignments/assign10/radon.txt", header=T, sep=",")
# select only Minnesota data
> minn<-srrs2[srrs2$state=="MN",]
# obtain 6-character string to identify counties
> minn$countycode<-substr(as.character(minn$county),1,6)
# log-transform radon while adjusting zero values if necessary
> minn$logradon <- log(ifelse(minn$activity==0, .05, minn$activity))

# set background to white for older versions of R
> trellis.par.set(col.whitebg())
> xyplot(logradon~factor(floor)|countycode,data=minn, layout=c(17,5), strip=strip.custom(par.strip.text=list(cex=0.6)), panel=function(x,y) {
panel.xyplot(x,y)
mymeans<-tapply(y,x,mean)
panel.lines(as.numeric(names(mymeans))+1,mymeans,col=2)
}, xlab='Floor (0=basement, 1=first floor)', ylab='log(radon)')


Fig. 1 Lattice graph of household log radon levels by county for lowest floor in house

Question 2

where

The composite equation for this model is obtained by inserting the level-2 equation into the level-1 equation.

> library(nlme)
> lme(fixed=logradon~1,random=~1|county,data=minn,method='ML')->out0
> out0

Linear mixed-effects model fit by maximum likelihood
Data: minn
Log-likelihood: -1139.845
Fixed: logradon ~ 1
(Intercept)
1.308058

Random effects:
Formula: ~1 | county
      (Intercept)  Residual
StdDev: 0.3056031 0.8090579

Number of Observations: 919
Number of Groups: 85

> VarCorr(out0)
county = pdLogChol(1)
              Variance    StdDev
(Intercept) 0.09339322 0.3056031
Residual    0.65457471 0.8090579

From the output we learn that σ2 = 0.655 and τ2 = 0.093. Thus there is far more variability of household log radon levels within counties than between counties. This implies that the structure in this data set may not be particularly important. It also tells us that in terms of modeling we should invest most of our effort in trying to "explain" variability within counties rather than between counties, because that's where most of the variability lies. Thus level-1 predictors should prove to be more valuable than level-2 predictors.

and can be calculated in R as follows.

> as.numeric(VarCorr(out0)[1,1])/ (as.numeric(VarCorr(out0)[1,1])+ as.numeric(VarCorr(out0)[2,1]))
[1] 0.1248626

Question 3

Part 1

with the same probability models for the random effects as before. The corresponding composite equation is the following.

> lme(fixed=logradon~1+floor, random=~1|county, data=minn, method='ML')->out1
> summary(out1)

Linear mixed-effects model fit by maximum likelihood
Data: minn
     AIC      BIC    logLik
2195.602 2214.896 -1093.801

Random effects:
Formula: ~1 | county
       (Intercept) Residual
StdDev:  0.3230887 0.765721

Fixed effects: logradon ~ 1 + floor
                Value Std.Error  DF   t-value p-value
(Intercept) 1.4589422 0.0515057 833 28.325837       0
floor      -0.7036136 0.0713706 833 -9.858592       0
Correlation:
      (Intr)
floor -0.292

Standardized Within-Group Residuals:
        Min          Q1        Med         Q3        Max
-4.88589873 -0.60768113 0.01106789 0.62978548 3.44052086

Number of Observations: 919
Number of Groups: 85

is highly significant (the reported p-value is rounded to 0) leading us to conclude that there is a statistically significant linear relationship between log radon and floor. In truth because floor is a dichotomous variable coded 0 and 1, we can also conclude that mean log radon levels are different in houses with basements compared to houses without basements.

> sapply(list(out0,out1),AIC)
[1] 2285.690 2195.602

Part 2

> out1a<-lm(logradon~floor,data=minn)

> sapply(list(out1,out1a),AIC)
[1] 2195.602 2273.466

The AIC is dramatically lower in the current model suggesting that allowing the intercepts to be random is a major improvement.

Part 3

> VarCorr(out0)
county = pdLogChol(1)
              Variance    StdDev
(Intercept) 0.09339322 0.3056031
Residual    0.65457471 0.8090579

> VarCorr(out1)
county = pdLogChol(1)
             Variance    StdDev
(Intercept) 0.1043863 0.3230887
Residual    0.5863287 0.7657210

> (as.numeric(VarCorr(out0)[2,1])- as.numeric(VarCorr(out1)[2,1]))/ as.numeric(VarCorr(out0)[2,1])
[1] 0.1042601

Question 4

The probability models for the random effects change because we've added an additional random effect.

The corresponding composite equation is the following.

> lme(fixed=logradon~1+floor, random=~1+floor|county, data=minn, method='ML')->out2

> VarCorr(out2)
county = pdLogChol(1 + floor)
             Variance    StdDev  Corr
(Intercept) 0.1162887 0.3410114 (Intr)
floor       0.1120414 0.3347258 -0.319
Residual    0.5726597 0.7567428

> sapply(list(out1,out2),AIC)
[1] 2195.602 2197.084

As we can see the AIC of the random slopes and intercepts model actually went up relative to random intercepts model. Therefore we should prefer the simpler model and stick with fixed slopes but random intercepts.

Question 5

> cty <- read.table ("http://www.unc.edu/courses/2007spring/enst/562/001/assignments/assign10/cty.txt", header=T, sep=",")
> minn2<-cty[cty$st=='MN',]
> merge(minn,minn2,by.x="cntyfips",by.y="ctfips")->merge3

where as before . The composite equation is the following.

> lme(fixed=logradon~1+floor+log(Uppm), random=~1|county, data=merge3, method='ML')->out3
> summary(out3)

Linear mixed-effects model fit by maximum likelihood
Data: merge3
     AIC      BIC    logLik
2157.913 2182.029 -1073.956

Random effects:
Formula: ~1 | county
       (Intercept) Residual
StdDev: 0.1457421  0.7689218

Fixed effects: logradon ~ 1 + floor + log(Uppm)
                Value  Std.Error  DF  t-value p-value
(Intercept) 1.4627406 0.03756813 833 38.93568       0
floor      -0.6788426 0.06969627 833 -9.74001       0
log(Uppm)   0.7180305 0.09064038  83  7.92175       0
Correlation:
         (Intr)  floor
floor    -0.362
log(Uppm) 0.156 -0.011

Standardized Within-Group Residuals:
        Min          Q1        Med         Q3        Max
-5.12860182 -0.60172541 0.03181403 0.64952352 3.32903088

Number of Observations: 919
Number of Groups: 85

As we can see from the reported p-value, this test is highly significant. Thus we should retain this variable. We can also reach this conclusion by comparing the AIC of this model to that of a model without log(Uppm).

> sapply(list(out1,out3),AIC)
[1] 2195.602 2157.913

The addition of log(Uppm) has led to a dramatic decrease in AIC. Thus we should prefer a model with log(Uppm) as a predictor.

Here the baseline model is the random intercepts model without the log(Uppm) predictor. I extract the components I need from the VarCorr object and carry out the calculations.

> VarCorr(out1)
county = pdLogChol(1)
             Variance    StdDev
(Intercept) 0.1043863 0.3230887
Residual    0.5863287 0.7657210

> VarCorr(out3)
county = pdLogChol(1)
              Variance    StdDev
(Intercept) 0.02124076 0.1457421
Residual    0.59124069 0.7689218

> (as.numeric(VarCorr(out1)[1,1])- as.numeric(VarCorr(out3)[1,1]))/ as.numeric(VarCorr(out1)[1,1])
[1] 0.7965177

Course Home Page


Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516
Copyright © 2007
Last Revised--May 5, 2007
URL: http://www.unc.edu/courses/2007spring/enst/562/001/docs/solutions/assign10.htm