Problem 2
The basic feature that defines these data is that the observations are paired. That makes this problem just like Problem 1 in that we have repeated measures data. The fact that there are only two measurement periods gives us some additional analysis options that we don't have in Problem 1.
I outline five different approaches to analyzing these data: paired t-test, Wilcoxon signed rank test, randomization test, linear model (randomized block design), and random intercepts model. Two of them (paired t-test and randomized block design) are the same method viewed from different perspectives. There are other possibilities beyond these five. As I've said in a previous hint, I don't care which one of these methods you decide to use even though some are clearly better than others.
Preliminaries
To illustrate the methodology I pull out the first species alphabetically from the data set, ACMI (Achillea millefolium), and analyze the change in its cover values.
> pat<-read.table('http://www.unc.edu/courses/2007spring/enst/562/001/assignments/final/coverclass.txt', header=T)
> acmi<-pat[pat$CHISSpCode=='ACMI',]
Method 1: Paired t-test
- The paired t-test appears in every elementary statistics text. It is the standard parametric approach for analyzing paired data. Interestingly it handles the problem of paired data by essentially doing an end-around. In a paired t-test we calculate the differences between the paired values and then do a one-sample t-test on the differences. Hence in a paired t-test we convert paired data into unpaired differences.
- Because a paired t-test is a t-test on differences, it is the differences must satisfy the assumption of a t-test, not the original values. For small samples the important assumption is that the differences should be a sample from a normal distribution. This is typically hard to assess with small samples unless the response is especially unusual in some way.
- In R we can carry out a paired t-test using the t.test function either by first calculating the differences ourselves or by using the paired=TRUE option of t.test.
# paired t-test on raw data
> out1<-t.test(acmi[,4],acmi[,3],paired=TRUE,conf.level=.95)
> out1
Paired t-test
data: acmi[, 4] and acmi[, 3]
t = 1.3933, df = 11, p-value = 0.1911
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.1449341 0.6449341
sample estimates:
mean of the differences
0.25
# one-sample t-test on differences
> out2<-t.test(acmi[,4]-acmi[,3],conf.level=.95)
> out2
One Sample t-test
data: acmi[, 4] - acmi[, 3]
t = 1.3933, df = 11, p-value = 0.1911
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.1449341 0.6449341
sample estimates:
mean of x
0.25
- As advertised the results from the two tests are the same. The p-value, sample estimate, and confidence limits can be accessed as components of the t.test object.
- Is this a valid test for these data? Probably not. Let's look at the paired differences.
> table(acmi[,4]-acmi[,3])
-1 0 1
1 7 4
- So the differences are either –1, 0, or 1. To assume that this is a sample from a normal distribution seems pretty ridiculous.
Method 2: Wilcoxon signed rank test
- The Wilcoxon signed rank test is the nonparametric version of the paired t-test. It works with ranks instead of the raw data and rather than a normal distribution it makes use of theoretical rank calculations based on all possible permutations to calculate significance levels. It is the preferred alternative to the paired t-test when the normality assumption is suspect.
- There are two implementations of the signed rank test in R. The traditional test is carried out by wilcox.test but it's results are only approximate when there are tied observations. A more modern version is wilcox.exact found in the no longer maintained exactRankTests package. The syntax for both functions is identical to t.test except that we need to set a conf.int argument to TRUE to get confidence intervals.
- When run on the ACMI data the wilcox.test function returns a series of warning statements about ties and another that states that it was not possible to obtain a 95% interval exactly.
> out3<-wilcox.test(acmi[,4], acmi[,3], paired=TRUE, conf.int=TRUE, conf.level=.95)
> out3
Wilcoxon signed rank test with continuity correction
data: acmi[, 4] and acmi[, 3]
V = 12, p-value = 0.2330
alternative hypothesis: true mu is not equal to 0
90 percent confidence interval:
-3.297723e-05 1.000000e+00
sample estimates:
(pseudo)median
0.9999193
- The wilcox.exact function works similarly.
> library(exactRankTests)
> out4<-wilcox.exact(acmi[,4], acmi[,3], paired=TRUE, conf.int=TRUE, conf.level=.95)
> out4
Exact Wilcoxon signed rank test
data: acmi[, 4] and acmi[, 3]
V = 12, p-value = 0.375
alternative hypothesis: true mu is not equal to 0
95 percent confidence interval:
-1 1
sample estimates:
(pseudo)median
0.5
- The p-value from wilcox.exact is quite a bit larger than that from wilcox.text. Observe that the confidence intervals (which are supposed to be confidence intervals for the median) returned by the two methods are totally different. There is some question as to whether either of these are right. See for example Geyer (2006) who argues that neither of these functions returns correct confidence intervals when there are ties.
[Additional information added 5/6/07]
- I've discovered that the wilcox.exact function fails to return a point estimate for the species COGI. You may or may not get a confidence interval (I do when I run my code in batch on all the species but not when I run it separately on COGI) but whatever you get is incorrect. We'll need to calculate the point estimate and confidence intervals for COGI ourselves and use the new values to replace what appears in our results matrix.
- The wilcox.exact test calculates something called the Hodges-Lehman estimate of the "treatment effect", which is the median of what are called the Walsh averages, the average of all possible pairs of differences including each difference with itself. Thus if di is the difference on transect i then the Hodges-Lehman estimate of the median is the following.

- The set of Walsh averages can be easily calculated in R using the outer function to produce a matrix of Walsh averages and then the lower.tri and diag function to extract the ones we need. Here's how the Hodges-Lehman estimate of the median is calculated for COGI.
#select COGI
> cover<-read.table('http://www.unc.edu/courses/2007spring/enst/562/001/assignments/final/coverclass.txt', header=T)
> cover[cover$CHISSpCode=="COGI",]->cogi
> cogi.diff<-cogi[,4]-cogi[,3]
#Walsh averages
> Wmat<-outer(cogi.diff,cogi.diff,'+')/2
> Walsh<-c(Wmat[lower.tri(Wmat)],diag(Wmat))
> cogi.median<-median(Walsh)
- To obtain a confidence interval for the Hodges-Lehman estimate of the median we sort the Walsh averages and then extract the values that lie at the appropriate positions of the empirical distribution. The locations are obtained from the quantiles of the signed rank distribution which can be obtained with the qsignrank function in R.
> Walsh.sort<-sort(Walsh)
# number of transects
> n.cogi<-dim(cogi)[1]
# obtain the positions
> q1<-n.cogi*(n.cogi+1)/2+1-qsignrank(.975,n.cogi)
> q2<-qsignrank(.975,n.cogi)
> cogi.lower95<-Walsh.sort[q1]
> cogi.upper95<-Walsh.sort[q2]
> c(cogi.median,cogi.lower95,cogi.upper95)
[1] 0.5 0.0 1.0
Method 3: Randomization test
- A randomization test is probably the most natural choice for this problem. The problem with it is that getting sensible confidence intervals is not a trivial matter. To do a randomization test with paired data, the pairing must not be disturbed by the randomization. The pairings are part of the structure of the data set. What should be random under the null hypothesis of no change is the order of observations in each pair. Thus a randomization test for paired data should only randomly reassign the order of the observations in each pair. Equivalently one could randomly choose for each pair whether the observed paired difference is positive or negative.
- Here's a bit of code that carries out the randomization test the second way. It uses the mean of the paired differences as the test statistic. The variable actdiff that I define first contains the vector of observed differences.
> actdiff<-acmi[,4]-acmi[,3]
> perm.func<-function() {
#randomly assign signs
multiplier<-ifelse(runif(length(actdiff))<.5,-1,1)
mean(actdiff*multiplier) }
> out.perm<-replicate(9999,perm.func())
> alldiffs<-c(out.perm,mean(actdiff))
> 2*min(sum(alldiffs<=mean(actdiff)), sum(alldiffs>=mean(actdiff)))/length(alldiffs)
[1] 0.377
- The perm.test function in the exactRankTests package will also do these calculations. The syntax is the same as t.test.
> perm.test(acmi[,3],acmi[,4],paired=TRUE)
1-sample Permutation Test
data: acmi[, 3] and acmi[, 4]
T = 1, p-value = 0.375
alternative hypothesis: true mu is not equal to 0
- Obtaining confidence intervals is more problematic. The 95% confidence interval for the mean difference should contain all those values that when subtracted from the paired differences yield a statistic that falls in the middle 95% of the randomization distribution. The logic is as follows. Suppose we find d1 and d2 such that
and 
Then it follows that
and 
Putting these two inequalities together yields the desired confidence interval.
- Thus one can just examine the randomization distribution and see how much you would have to change the observed statistic to cause it to enter or exit a tail of the distribution.
- The problem though when the sample is small and there are many ties is that the distribution does not consist of very many different values. Fig. 1 shows a histogram of the ACMI randomization distribution obtained above. A table of the unique values and their relative frequencies is shown below
> hist(alldiffs,col='grey90')
> points(mean(actdiff),0,pch=16,col=2)
> table(round(alldiffs,4))/length(alldiffs)
-0.4167 -0.25 -0.0833 0.0833 0.25 0.4167
0.0316 0.1499 0.3162 0.3138 0.1591 0.0294
-
The observed difference is 0.25 and falls in the second bar from the right in Fig. 1. I can move this difference into the two bars on the ends of the distribution by moving it d1 = .25 – 0.4167 and d2 = .25 – (–0.4167). This gives me a confidence interval for the difference of (–.167, 0.667). From the reported relative frequencies this is a 1 – (0.0316+0.0294) = 94% confidence interval.
- In lieu of doing calculations such as this I'm going to suggest the simpler approach of obtaining a bootstrap confidence interval using the observed differences keeping in mind that it's conceivable that the bootstrap interval you obtain might not agree with the permutation test. Here's how it would work with ACMI. I will use the mean difference as the statistic even though typically bootstrapping isn't used for means.
# define mean function for boot function
> mean.func<-function(x,indices) mean(x[indices])
> library(boot)
> boot(actdiff,mean.func,9999)->myboot
> myboot
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = actdiff, statistic = mean.func, R = 9999)
Bootstrap Statistics :
original bias std. error
t1* 0.25 -0.001450145 0.1703088
> boot.ci(myboot)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 9999 bootstrap replicates
CALL :
boot.ci(boot.out = myboot)
Intervals :
Level Normal Basic
95% (-0.0823, 0.5852 ) (-0.0833, 0.5833 )
Level Percentile BCa
95% (-0.0833, 0.5833 ) (-0.1667, 0.5000 )
Calculations and Intervals on Original Scale
- The bias is very small and the bootstrap distribution (not shown) is fairly symmetric so I would be willing to trust any of the four intervals. Notice that the "best" interval, the BCa, is shifted to the left of the other three. The other three agree with the randomization-based interval and also with the results of the randomization test.
Method 4: Linear model—randomized block design
- Paired data are an example of structured data. The structure manifests itself as observational heterogeneity. As we've seen one way to account for observational heterogeneity is to treat the homogeneous units as blocks and fit a linear model in which the blocks are included as a factor variable. The basic strategy is to treat the pairs as blocks and include the year as a categorical variable. This approach is called a randomized block design. The results match those from a paired t-test.
- Here's how the analysis works in R. First we need to create a single cover class variable by concatenating the two cover class measurements from separate years. At the same time we need to create a new variable that records the time of the measurement and a second variable that records relevé the measurements came from. Here's the R code to do this.
> new.acmi<-data.frame(rep(acmi$PlotCode,2), c(acmi$Coverclass.1983, acmi$Coverclass.2002), rep(c(1983,2002), c(dim(acmi)[1], dim(acmi)[1])))
> colnames(new.acmi)<-c('Plot', 'Cover', 'Year')
- Next we fit a linear model with Cover as the response and Plot and Year as predictors in the form of factors.
> lm(Cover~factor(Year)+factor(Plot), data=new.acmi)->out4
> summary(out4)
Call:
lm(formula = Cover ~ factor(Year) + factor(Plot), data = new.acmi)
Residuals:
Min 1Q Median 3Q Max
-6.250e-01 -1.250e-01 -3.469e-17 1.250e-01 6.250e-01
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.750e-01 3.235e-01 2.705 0.0205 *
factor(Year)2002 2.500e-01 1.794e-01 1.393 0.1911
factor(Plot)602 4.204e-17 4.395e-01 9.57e-17 1.0000
factor(Plot)610 -4.080e-18 4.395e-01 -9.28e-18 1.0000
factor(Plot)619 4.247e-17 4.395e-01 9.66e-17 1.0000
factor(Plot)625 -2.498e-16 4.395e-01 -5.68e-16 1.0000
factor(Plot)723 -5.000e-01 4.395e-01 -1.138 0.2795
factor(Plot)724 -5.000e-01 4.395e-01 -1.138 0.2795
factor(Plot)727 4.730e-16 4.395e-01 1.08e-15 1.0000
factor(Plot)731 -5.000e-01 4.395e-01 -1.138 0.2795
factor(Plot)735 -5.000e-01 4.395e-01 -1.138 0.2795
factor(Plot)742 9.070e-17 4.395e-01 2.06e-16 1.0000
factor(Plot)750 -5.000e-01 4.395e-01 -1.138 0.2795
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4395 on 11 degrees of freedom
Multiple R-Squared: 0.4632, Adjusted R-squared: -0.1225
F-statistic: 0.7908 on 12 and 11 DF, p-value: 0.6546
- The reported t-test of the dummy regressor factor(Year)2002 is a test of whether the cover value change over the two years is the same. The reported p-value is the same as the p-value of the paired t-test. A confidence interval for this difference can be obtained in the usual fashion:
.
> coef(out4)[2]- qt(.975, out4$df.residual)*sqrt(vcov(out4)[2,2])
factor(Year)2002
-0.1449341
> coef(out4)[2]+ qt(.975, out4$df.residual)*sqrt(vcov(out4)[2,2])
factor(Year)2002
0.6449341
- Observe that the numbers match the confidence interval reported in the paired t-test output.
Method 5: Random intercepts model
- Paired data are structured data and another way to handle structured data is by including random effects that differ across the individual pairs. This is a random intercepts model in which we replace the estimated individual block effects of the randomized block design (separate intercepts model) with realizations from a probability distribution. This is a particularly attractive option here because the individual block effects are of no interest to us. This model can be fit with the lme function of the nlme package.
> library(nlme)
> lme(Cover~factor(Year),random=~1|Plot, data=new.acmi,method='ML')->out5
> summary(out5)
Linear mixed-effects model fit by maximum likelihood
Data: new.acmi
AIC BIC logLik
30.4668 35.17902 -11.2334
Random effects:
Formula: ~1 | Plot
(Intercept) Residual
StdDev: 9.31861e-06 0.3864008
Fixed effects: Cover ~ factor(Year)
Value Std.Error DF t-value p-value
(Intercept) 0.6666667 0.1165042 11 5.722254 0.0001
factor(Year)2002 0.2500000 0.1647618 11 1.517342 0.1574
Correlation:
(Intr)
factor(Year)2002 -0.707
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.3723210 0.2156655 0.2156655 0.8626622 0.8626622
Number of Observations: 24
Number of Groups: 12
- Notice the point estimate for the Year dummy regressor is the same as what we obtained from the fixed effects randomized block design (and the paired t-test), although its standard error is different. I calculate a 95% confidence interval for the mean difference in the usual fashion.
> fixef(out5)[2]- qt(.975, out5$fixDF$X[2])*sqrt(vcov(out5)[2,2])
factor(Year)2002
-0.0971998
> fixef(out5)[2]+ qt(.975, out5$fixDF$X[2])*sqrt(vcov(out5)[2,2])
factor(Year)2002
0.5971998
Summary
- The last two approaches outlined above viewed the analysis of these data as a problem in modeling. In keeping with this we could next consider the problem of choosing an appropriate probability model (perhaps something other than normal) for the response, etc. I won't go there.
- Notice that the paired t-test, randomization test, bootstrap, fixed effects randomized block, and random intercepts model all returned similar p-values and fairly similar estimates for the confidence interval. The Wilcoxon tests are formally tests of whether the median difference was zero and so are not entirely comparable in spirit to the rest of these (and the reported confidence intervals are suspect so I don't list them).
Method |
p-value for test |
95% confidence interval for mean difference |
| paired t-test |
0.191 |
(–0.145, 0.645) |
| wilcoxon signed rank |
0.233 |
|
| wilcoxon signed rank (exact) |
0.375 |
|
| permutation test |
0.375 |
(–0.167, 0.667) |
| bootstrap (basic, percentile) |
— |
(–0.083, 0.583) |
| bootstrap (BCa) |
— |
(–0.167, 0.500) |
| fixed effect randomized block model (RBD) |
0.191 |
(–0.145, 0.645) |
| random intercepts model |
0.157 |
(–0.097, 0.597) |