Lecture 34—Monday, April 9, 2007

What was covered?

The multiple testing problem—An example

# The ordered p-values
> pvals<-c(.01,.013,.014,.19,.35,.5,.63,.67,.75,.81)
> numtests<-length(pvals)

#three different thresholds, family-wise alpha=.05
> bonf<-rep(.05/numtests,numtests)
> holm<-.05/seq(numtests,1,-1)
> fdr<-(1:numtests)*.05/numtests
> cbind(pvals,bonf,holm,fdr)

      pvals  bonf        holm   fdr
 [1,] 0.010 0.005 0.005000000 0.005
 [2,] 0.013 0.005 0.005555556 0.010
 [3,] 0.014 0.005 0.006250000 0.015
 [4,] 0.190 0.005 0.007142857 0.020
 [5,] 0.350 0.005 0.008333333 0.025
 [6,] 0.500 0.005 0.010000000 0.030
 [7,] 0.630 0.005 0.012500000 0.035
 [8,] 0.670 0.005 0.016666667 0.040
 [9,] 0.750 0.005 0.025000000 0.045
[10,] 0.810 0.005 0.050000000 0.050

#carry out tests
> b.test<- pvals<bonf
> holm.t<- pvals<holm
> fdr.t<- pvals<fdr
> cbind(pvals,b.test,holm.t,fdr.t)

      pvals b.test holm.t fdr.t
 [1,] 0.010      0      0     0
 [2,] 0.013      0      0     0
 [3,] 0.014      0      0     1
 [4,] 0.190      0      0     0
 [5,] 0.350      0      0     0
 [6,] 0.500      0      0     0
 [7,] 0.630      0      0     0
 [8,] 0.670      0      0     0
 [9,] 0.750      0      0     0
[10,] 0.810      0      0     0

Data sets with structure

The relevance of data structure to statistical analysis

How should structured data be analyzed?

  1. We could ignore the structure and proceed as if the observations were homogeneous and independent. Thus if we have n units (level-1) from each of m different clusters (level-2 units), we would treat this as a single random sample of size mn. Such an approach is almost always invalid. A simple example will illustrate the problem.
  2. We could aggregate everything to the highest level, level 2. In this approach we would average all the level-1 variables so that everything becomes a level-2 variable. This is called complete pooling. In our oak leaf example this would amount to averaging the chlorophyll content of leaves coming from the same tree and then treating the sample of ten trees as being the real sample.
  3. At the opposite extreme from complete pooling is the unpooled approach. This is essentially equivalent to fitting models at level 1 separately for each of the level 2 units. This can be done by actually fitting separate models or by adding indicator categorical variable to the model that reference the identity of the level-2 units.
  4. A fourth approach is the approach we will take in this course. It is essentially a hybrid of the complete pooling approach with the unpooled approach. It is variously called a multilevel model, a random effects model, a random coefficients model, a mixed model, a mixed effects model, or a hierarchical model. We'll go into the details next time.

Course Home Page


Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516
Copyright © 2007
Last Revised--April 10, 2007
URL: http://www.unc.edu/courses/2007spring/enst/562/001/docs/lectures/lecture34.htm