Lecture 31 (lab 8)—March 7, 2006

What was covered?

R functions and commands demonstrated

R function options

The bird-bat data set

> #obtain data
> bats<-read.table( 'http://www.unc.edu/courses/2006spring/ecol/145/001/data/lab8/birdbat.txt', header=TRUE)
> bats
                     species  mass type energy  n
1           Pteropus gouldii 779.0    1  43.70  1
2     Pteropus poliocephalus 628.0    1  34.80  4
3    Hypsignathus monstrosus 258.0    1  23.30  1
4             Eidolon helvum 315.0    1  22.40  1
5        Meliphaga virescens  24.3    2   2.46  1
6    Melipsittacus undulatus  35.0    2   3.93  1
7           Sturnus vulgaris  72.8    2   9.15  6
8            Falco spaverius 120.0    2  13.80  3
9          Falco tinnunculus 213.0    2  14.60  6
10         Corvus ossifragus 275.0    2  22.80  2
11           Larus atricilla 370.0    2  26.20  2
12             Columba livia 384.0    2  25.90  4
13             Columba livia 442.0    2  29.50  6
14             Columba livia 412.0    2  43.70  6
15             Columba livia 330.0    2  34.00  6
16        Corvus crytoleucos 480.0    2  27.80  2
17     Phyllostomas hastatus  93.0    3   8.83  1
18          Plecotus auritus   8.0    3   1.35  3
19 Pipistrellus pipistrellus   6.7    3   1.12 17
20          Plecotus auritus   7.7    3   1.02  3

> attach(bats)

> class(type)
[1] "integer"

> type.f<-factor(type)
> class(type.f)
[1] "factor"
> type
[1] 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3
> type.f
[1] 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3
Levels: 1 2 3

> contrasts(type.f)
  2 3
1 0 0
2 1 0
3 0 1

> type.f<-factor(bats$type,labels=c('nonecho bats','birds','echo bats'))
> levels(type.f)

[1] "nonecho bats" "birds" "echo bats"
> contrasts(type.f)
             birds echo bats
nonecho bats     0         0
birds            1         0
echo bats        0         1

> contrasts(type.f)<-contr.treatment(levels(type.f),base=2)
> contrasts(type.f)

             nonecho bats echo bats
nonecho bats            1         0
birds                   0         0
echo bats               0         1

> contrasts(type.f)<-NULL
> contrasts(type.f)

             birds echo bats
nonecho bats     0         0
birds            1         0
echo bats        0         1

Examining the data

Fig. 1 Energy utilization by animal type

> table(type.f)
type.f
nonecho bats birds echo bats
           4    12         4

> boxplot(energy~type.f, ylab='Rate of energy use')
> points(jitter(type), energy, pch=16, col=2, cex=.7)

> anova(lm(energy~type.f))
Analysis of Variance Table

Response: bats$energy
          Df  Sum Sq Mean Sq F value   Pr(>F)
type.f     2 1644.87  822.44  6.7275 0.007042 **
Residuals 17 2078.24  122.25
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The significant result for the variable type.f means that not all of the groups means are the same. (Note: The box plots suggest that the variability within the three groups is not the same. The F-test reported in the ANOVA table is highly sensitive to such heteroscedasticity so the significant result should be accepted with caution.) To see where the differences lie we can display the coefficient estimates with the summary function. (I use the option corr=FALSE to suppress some of the output.)

> summary(lm(energy~type.f),corr=FALSE)

Call:
lm(formula = bats$energy ~ type.f)

Residuals:
      Min       1Q   Median      3Q      Max
-18.69333 -7.45250 -0.04167 5.97417 22.54667

Coefficients:
               Estimate Std. Error t value Pr(>|t|)
(Intercept)      31.050      5.528   5.617 3.08e-05 ***
type.fbirds      -9.897      6.384  -1.550 0.13948
type.fecho bats -27.970      7.818  -3.578 0.00232 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.06 on 17 degrees of freedom
Multiple R-Squared: 0.4418, Adjusted R-squared: 0.3761
F-statistic: 6.728 on 2 and 17 DF, p-value: 0.007042

> boxplot(mass~type.f,ylab='Body mass')
> points(jitter(type),mass, pch=16, col=2, cex=.7)

> anova(lm(mass~type.f))
Analysis of Variance Table

Response: mass
          Df Sum Sq Mean Sq F value Pr(>F)
type.f     2 434599  217300  7.5001 0.004624 **
Residuals 17 492542   28973
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The mass-energy relationship

Fig. 3 The mass-energy relationship

> plot(mass, energy, type='n', xlab='mass', ylab='energy')
> for(i in 1:3)
points(mass[type==i], energy[type==i], col=i, pch=14+i)
> legend(500, 10, as.character(unique(type.f)), pch=15:17, col=1:3, cex=rep(.8,3), bty='n')

As was discussed in Lecture 23 when comparing the Arrhenius and log-Arrhenius models for the species-area relation, a power equation and its log-transformed equivalent are very different models. They yield different estimates of the parameters, make different assumptions about the error distributions of the residuals, and on the original scale estimate a different type of mean (arithmetic versus geometric mean). Without doing a little more work it's not obvious to me which is the better equation to use with these data. But because Speakman and Racey (1991) used the log-transformed version, so will we. The graph of the log-log transformed data is shown in Fig. 4.

> plot(log(bats$mass), log(bats$energy), type='n', xlab='mass', ylab='energy')
> for(i in 1:3)
points(log(bats$mass[bats$type==i]), log(bats$energy[bats$type==i]), col=i, pch=14+i)
> legend(2, 3.5, as.character(unique(type.f)), pch=15:17, col=1:3, cex=rep(.8,3), bty='n')

What the various terms correspond to in the model equation is indicated below.

> model3b<-lm(log(energy)~log(mass)*type.f, data=bats)

> contrasts(type.f)
             birds echo bats
nonecho bats     0         0
birds            1         0
echo bats        0         1

> coef3<-coef(model3b)
> coef3
    (Intercept)               log(mass)                 type.fbirds
     -0.2024476               0.5897821                  -1.3783902
type.fecho bats   log(mass):type.fbirds   log(mass):type.fecho bats
     -1.2680677               0.2455883                   0.2148750

From the output we see that the fitted line can be written as follows.

Based on the coding shown in the contrasts output, we can construct the lines as follows.

Fig. 5  Model 3—non-parallel lines model

> #nonecho bats x1=x2=0
> abline(c(coef3[1], coef3[2]), lty=2, col=1)
> #birds: x1=1, x2=0
> abline(c(coef3[1]+coef3[3], coef3[2]+coef3[5]), lty=2, col=2)
> #echo bats: x1=0, x2=1
> abline(c(coef3[1]+coef3[4], coef3[2]+coef3[6]), lty=2, col=3)

Testing the non-parallel lines model

> model2<-lm(log(energy)~log(mass)+type.f,data=bats)
> anova(model2,model3b)

Analysis of Variance Table

Model 1: log(energy) ~ log(mass) + type.f
Model 2: log(energy) ~ log(mass) * type.f
  Res.Df     RSS Df Sum of Sq      F Pr(>F)
1     16 0.55332
2     14 0.50487  2   0.04845 0.6718 0.5265

> anova(model3b)
Analysis of Variance Table

Response: log(energy)
                 Df  Sum Sq Mean Sq  F value Pr(>F)
log(mass)         1 29.3919 29.3919 815.0382 8.265e-14 ***
type.f            2  0.0296  0.0148   0.4100 0.6713
log(mass):type.f  2  0.0484  0.0242   0.6718 0.5265
Residuals        14  0.5049  0.0361
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Observe that the line of the ANOVA table labeled log(mass):type.f reports an F-statistic and p-value identical to what we obtained when comparing Model 3 with Model 2 using a partial-F test. When the anova function is applied to a regression model in which a categorical variable is a predictor, the tests that are printed test the categorical variable as a construct. Each line of the ANOVA table reports a partial-F test in which two models are compared—the current model and a model in which the term identified in that line is omitted. Since it is the interaction term that causes the slopes in Model 3 to be different among the three groups, a test of the interaction term as a construct is a test of Model 3 versus Model 2.

> summary(model3b,corr=F)

Call:
lm(formula = log(energy) ~ log(mass) * type.f, data = bats)

Residuals:
      Min        1Q    Median       3Q      Max
-0.251524 -0.126431 -0.009536 0.081244 0.328402

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)
(Intercept)               -0.2024     1.2613  -0.161   0.8748
log(mass)                  0.5898     0.2061   2.861   0.0126 *
type.fbirds               -1.3784     1.2952  -1.064   0.3053
type.fecho bats           -1.2681     1.2854  -0.987   0.3406
log(mass):type.fbirds      0.2456     0.2134   1.151   0.2691
log(mass):type.fecho bats  0.2149     0.2236   0.961   0.3529
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1899 on 14 degrees of freedom
Multiple R-Squared: 0.9832, Adjusted R-squared: 0.9771
F-statistic: 163.4 on 5 and 14 DF, p-value: 6.696e-12

The column labeled t value tests the individual coefficients of the regression model. Thus the last two lines are individual tests of the interaction terms. Each is a test of one dummy variable assuming that all the rest of the variables (including the other dummy variable!) are nonzero. These two tests, individually or taken together, are clearly not equivalent to a test of Model 3 versus Model 2 which requires a test in which both coefficients are set simultaneously equal to zero.

Fig. 6  Model 2—parallel lines model

> plot(log(bats$mass), log(bats$energy), type='n', xlab='mass', ylab='energy')
> for(i in 1:3)
points(log(bats$mass[bats$type==i]), log(bats$energy[bats$type==i]),col=i, pch=14+i)
> legend(2,3.5, as.character(unique(type.f)), pch=15:17, col=1:3, cex=rep(.8,3), bty='n')
> coef(model2)->coef2
> coef2
(Intercept)  log(mass) type.fbirds type.fecho bats
-1.57636019 0.81495749  0.10226192      0.07866368

> abline(c(coef2[1],coef2[2]), lty=2, col=1)
> abline(c(coef2[1]+coef2[3], coef2[2]), lty=2,col=2)
> abline(c(coef2[1]+coef2[4], coef2[2]), lty=2,col=3)

Testing the parallel lines model

> #compare Model 1 with Model 2
> model1<-lm(log(energy)~log(mass),data=bats)
> anova(model1,model2)

Analysis of Variance Table

Model 1: log(energy) ~ log(mass)
Model 2: log(energy) ~ log(mass) + type.f
  Res.Df     RSS Df Sum of Sq      F Pr(>F)
1     18 0.58289
2     16 0.55332  2   0.02957 0.4276 0.6593

> #alternatively apply anova function to Model 2 alone
> anova(model2)

Analysis of Variance Table

Response: log(energy)
          Df  Sum Sq Mean Sq  F value    Pr(>F)
log(mass)  1 29.3919 29.3919 849.9108 2.691e-15 ***
type.f     2  0.0296  0.0148   0.4276 0.6593
Residuals 16  0.5533  0.0346
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Testing linear combinations of regression parameters

> summary(model2,corr=FALSE)

Call:
lm(formula = log(energy) ~ log(mass) + type.f, data = bats)

Residuals:
     Min       1Q   Median      3Q     Max
-0.23224 -0.12199 -0.03637 0.12574 0.34457

Coefficients:
               Estimate Std. Error t value Pr(>|t|)
(Intercept)    -1.57636    0.28724  -5.488 4.96e-05 ***
log(mass)       0.81496    0.04454  18.297 3.76e-12 ***
type.fbirds     0.10226    0.11418   0.896 0.384
type.fecho bats 0.07866    0.20268   0.388 0.703
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.186 on 16 degrees of freedom
Multiple R-Squared: 0.9815, Adjusted R-squared: 0.9781
F-statistic: 283.6 on 3 and 16 DF, p-value: 4.464e-14

where .

> varcov.mat<-vcov(model2)
> var.b2minusb3<-varcov.mat[3,3]+varcov.mat[4,4]-2*varcov.mat[3,4]
> var.b2minusb3
[1] 0.02483792

Using this we obtain the variance we need as follows. (The operator %*% is matrix multiplication.)

> c(0,0,1,-1) %*% varcov.mat %*% c(0,0,1,-1)
[,1]
[1,] 0.02483792

> coef(model2)
 (Intercept)  log(mass) type.fbirds type.fecho bats
-1.57636019 0.81495749   0.10226192      0.07866368

> t.stat<-(coef(model2)[3]-coef(model2)[4]) / sqrt(c(0,0,1,-1) %*% varcov.mat %*% c(0,0,1,-1) )
> t.stat

[,1]
[1,] 0.1497345

> model2$df.residual
[1] 16
> 2*(1-pt(t.stat,model2$df.residual))
[,1]
[1,] 0.8828453

Fig. 7  Reference distribution for t-test

> #code for Fig. 7
> plot(c(-4, 4), c(0, .41), type='n', axes=FALSE, xlab='', ylab='')
> lines(seq(-4, 4, .1), dt(seq(-4, 4, .1), model2$df.residual))
> axis(1)
> abline(h=0, lty=1, col='grey80')
> points(t.stat, 0, pch=4, col=2, cex=2)
> segments(qt(.975, model2$df.residual), dt(qt(.975, model2$df.residual),
model2$df.residual), qt(.975, model2$df.residual), 0)
> segments(-qt(.975, model2$df.residual), dt(qt(.975, model2$df.residual),
model2$df.residual), -qt(.975, model2$df.residual), 0)
> text(3.1, .06, 'critical region', cex=.8)
> text(-3.1, .06, 'critical region', cex=.8)
> text(t.stat, .025, 'observed value', col=2, cex=.8)
> text(1.3, .3, substitute(t[x], list(x=model2$df.residual)), cex=1.5)

Testing user-specified contrasts

> type.f2<-factor(type)
> contrasts(type.f2)<-cbind(c(1,-2,1),c(1,0,-1))
> contrasts(type.f2)

  [,1] [,2]
1    1    1
2   -2    0
3    1   -1

> model2a<-lm(log(energy)~log(mass)+type.f2,data=bats)

> anova(model2a)
Analysis of Variance Table

Response: log(energy)
           Df  Sum Sq Mean Sq  F value Pr(>F)
log(mass)   1 29.3919 29.3919 849.9108 2.691e-15 ***
type.f2     2  0.0296  0.0148   0.4276 0.6593
Residuals  16  0.5533  0.0346
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> anova(model2)

Analysis of Variance Table

Response: log(energy)
           Df  Sum Sq Mean Sq  F value    Pr(>F)
log(mass)   1 29.3919 29.3919 849.9108 2.691e-15 ***
type.f      2  0.0296  0.0148   0.4276 0.6593
Residuals  16  0.5533  0.0346
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> summary(model2a,corr=FALSE)

Call:
lm(formula = log(energy) ~ log(mass) + type.f2, data = bats)

Residuals:
     Min       1Q   Median      3Q     Max
-0.23224 -0.12199 -0.03637 0.12574 0.34457

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.51605    0.21274  -7.126 2.40e-06 ***
log(mass)    0.81496    0.04454  18.297 3.76e-12 ***
type.f21    -0.02098    0.03103  -0.676 0.509
type.f22    -0.03933    0.10134  -0.388 0.703
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.186 on 16 degrees of freedom
Multiple R-Squared: 0.9815, Adjusted R-squared: 0.9781
F-statistic: 283.6 on 3 and 16 DF, p-value: 4.464e-14

Cited Reference

Course Home Page


Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516
Copyright © 2006
Last Revised--March 21, 2006
URL: http://www.unc.edu/courses/2006spring/ecol/145/001/docs/lectures/lecture31.htm