Assignment 2 — Solution

Problem 1

#read in data
corals<-read.table('http://www.unc.edu/courses/2006spring/ecol/145/001/data/assign1/assignment1.csv', header=TRUE, sep=',')

As suggested I begin by trying deciles.

> quantile(corals$WSSTA,seq(0,1,.1))
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
 0   0   1   1   2   3   5   7  10  18   30

Since some of the quantiles are identical I try entering the unique values by hand as cut points for the cut function.
> cut(corals$WSSTA,c(0,1,2,3,5,7,10,18,30),include.lowest=TRUE)

 [1] (18,30] [0,1] [0,1] (1,2] (3,5] (2,3] (18,30] [0,1] [0,1] [0,1] (2,3]
[12] (2,3] (10,18] [0,1] (1,2] (5,7] (1,2] (1,2] (10,18] [0,1] [0,1] (1,2]
[23] (3,5] (3,5] (18,30] [0,1] [0,1] (2,3] (3,5] [0,1] (18,30] (1,2] [0,1]
[34] (1,2] (5,7] (2,3] (18,30] [0,1] [0,1] (3,5] (2,3] (1,2] (18,30] [0,1]
[45] [0,1] [0,1] (5,7] (10,18] (10,18] [0,1] [0,1] [0,1] (7,10] (3,5] (5,7]
[56] (7,10] [0,1] (3,5] (5,7] (3,5] (10,18] [0,1] (1,2] (7,10] [0,1] (5,7]
[67] (10,18] (1,2] [0,1] [0,1] (2,3] [0,1] (18,30] [0,1] (3,5] (7,10] [0,1]
[78] (3,5] (10,18] (7,10] [0,1] (3,5] (5,7] (10,18] [0,1] (1,2] (7,10] (1,2]
[89] (3,5] (18,30] (1,2] [0,1] (1,2] (10,18] (1,2] (18,30] [0,1] (1,2] (7,10]
[100] (5,7] (10,18] (18,30] [0,1] [0,1] (1,2] (5,7] [0,1] (10,18] [0,1] [0,1]
[111] (3,5] (7,10] (5,7] (18,30] (3,5] [0,1] (3,5] (3,5] (1,2] (10,18] [0,1]
[122] (2,3] (3,5] (2,3] (5,7] (18,30] [0,1] [0,1] (7,10] (3,5] (10,18] (18,30]
[133] [0,1] [0,1] (1,2] (3,5] (1,2] (10,18] [0,1] [0,1] [0,1] (1,2] [0,1]
[144] (10,18] [0,1] (2,3] (7,10] (2,3] (7,10] (10,18] (1,2] [0,1] (1,2] (10,18]
[155] [0,1] (18,30] (1,2] [0,1] (7,10] (2,3] (2,3] (10,18] (7,10] [0,1] (3,5]
[166] (7,10] (3,5] (10,18] (10,18] [0,1] (5,7] [0,1] (2,3] (18,30] [0,1] [0,1]
[177] [0,1] (7,10] [0,1] (10,18] (5,7] [0,1] (5,7] (2,3] [0,1] (18,30] [0,1]
[188] (5,7] (7,10] (2,3] (5,7] (10,18] (3,5] [0,1] (5,7] (5,7] (3,5] (18,30]
[199] [0,1] [0,1] (2,3] (3,5] (5,7] (18,30] [0,1] (2,3] (7,10] [0,1] (7,10]
[210] (7,10] (7,10] [0,1] (2,3] (1,2] (1,2] (10,18] (7,10] (3,5] (3,5] (5,7]
[221] (1,2] (10,18] (1,2] [0,1] (1,2] (10,18] (5,7] (18,30] [0,1] [0,1] (3,5]
[232] (2,3] (3,5] (18,30] [0,1] (2,3] (7,10] [0,1] (3,5] (10,18] [0,1] [0,1]
[243] (2,3] [0,1] [0,1] (18,30] [0,1] (1,2] (10,18] (1,2] [0,1] (18,30] [0,1]
[254] [0,1] (7,10] (3,5] [0,1] (10,18] (3,5] (3,5] (3,5] (2,3] (10,18] [0,1]
[265] [0,1] (3,5] (5,7] [0,1] (18,30] [0,1] [0,1] (2,3] (5,7] [0,1] (10,18]
[276] (5,7] [0,1] (5,7] [0,1] (2,3]
Levels: [0,1] (1,2] (2,3] (3,5] (5,7] (7,10] (10,18] (18,30]

When we tabulate these categories we see that there is a problem.

> table(cut(corals$WSSTA,c(0,1,2,3,5,7,10,18,30),include.lowest=TRUE))

[0,1] (1,2] (2,3] (3,5] (5,7] (7,10] (10,18] (18,30]
   89    31    25    34    25     23      30      23

The first category is three times the size of any other. We also see that my choice of cut points has caused values of 0 and 1 to be lumped into a single category. I split them so that they are in separate categories. There are many ways of doing this. In the code below I start my intervals at –1 and drop the include.lowest=TRUE option.

> table(cut(corals$WSSTA,c(-1,0,1,2,3,5,7,10,18,30)))

(-1,0] (0,1] (1,2] (2,3] (3,5] (5,7] (7,10] (10,18] (18,30]
    48    41    31    25    34    25     23      30      23

Another way to do this is to specify right=FALSE as an argument. This flips the intervals around to be closed on the left and open on the right.

> table(cut(corals$WSSTA, c(0,1,2,3,5,7,10,18,30), include.lowest=TRUE, right=FALSE))

[0,1) [1,2) [2,3) [3,5) [5,7) [7,10) [10,18) [18,30]
   48    41    31    42    32     25      28      33

Observe that the results are not identical. It might be worthwhile to plot both to see if the way the grouping is done changes the results at all. I save the results and generate the means and variances of Prevalence in each of the categories

> WSSTA.cuts<-cut(corals$WSSTA,c(-1,0,1,2,3,5,7,10,18,30))
> means<-tapply(corals$PREV_1,WSSTA.cuts,mean)
> vars<-tapply(corals$PREV_1,WSSTA.cuts,var)
> cbind(means,vars)
            means        vars
 (-1,0]  2.291667   33.955674
  (0,1]  2.878049   38.259756
 ( 1,2]  9.483871  242.458065
  (2,3] 13.280000  322.210000
  (3,5] 20.735294 1366.018717
  (5,7] 19.720000 2168.543333
 (7,10] 18.347826 4292.055336
(10,18] 25.333333 7333.816092
(18,30]  1.043478    6.498024

From the numbers the mean-variance relationship appears to be rather striking. I plot the results.

> plot(means,vars,xlab=expression(mu),ylab=expression(sigma^2))
> #NB2 model
> quad.coef<-coef(lm(vars~offset(means)+I(means^2)-1))
> quad.func<-function(x) x+quad.coef*x^2
> lines(0:25,quad.func(0:25),col=1,lty=1,lwd=2)
> #NB1 model
> abline(0,coef(lm(vars~means-1)),col=2,lty=2,lwd=2)
> #Poisson model
> abline(0,1,col=3,lty=1,lwd=2)
> legend(2,7500, c('Poisson', 'quasi-Poisson (NB1)', 'negative binomial(NB2)'), col=c(3,2,1), lwd=c(2,1,2), lty=c(1,2,1), cex=c(.8,.8,.8), bty='n')
> mtext('Mean-Variance Relationship for Disease Prevalence', side=3, line=.5, font=2, cex=.9)

I then repeat the above calculations and graph for the alternate grouping obtained using right=FALSE. Notice that the means and variances for the two approaches are quite different, but the quadratic trend in both cases appears to be the most reasonable model.

        

Problem 2

The only requirements for this plot are the following.

  1. All six distributions should be easily distinguishable on the plot. Choosing a new color and/or symbol for the Poisson would be a simple way to handle this.
  2. There should be some logic to the choice of plotting symbols. If you follow my template from class, I used filled symbols for μ = 1 and open symbols for μ = 5. Since the Poisson distribution uses this parameter too, it would make sense to continue this convention.
  3. The legend should distinguish the negative binomial distributions from the Poisson distributions.

In what follows I choose to plot the Poisson distributions in blue using squares for the symbol types. Following the guidelines outlined above, I use filled squares for λ = 1 and open squares for λ = 5.

> plot(0:10,dnbinom(0:10,size=.1,mu=1), type='n', xlab='k', ylab='P(X=k)')
> points(0:10,dnbinom(0:10,size=.1,mu=1))
> points(0:10,dnbinom(0:10,size=.1,mu=1),pch=16)
> lines(0:10,dnbinom(0:10,size=.1,mu=1))
> points(0:10,dnbinom(0:10,size=10,mu=1),pch=16,col=2)
> lines(0:10,dnbinom(0:10,size=10,mu=1),col=2)
> points(0:10,dnbinom(0:10,size=.1,mu=5),pch=21)
> lines(0:10,dnbinom(0:10,size=.1,mu=5),lty=2)
> points(0:10,dnbinom(0:10,size=10,mu=5),pch=21,col=2)
> lines(0:10,dnbinom(0:10,size=10,mu=5),col=2,lty=2)
> #add Poisson as blue symbols
> points(0:10,dpois(0:10,lambda=1),pch=15,col=4)
> lines(0:10,dpois(0:10,lambda=1),col=4)
> points(0:10,dpois(0:10,lambda=5),pch=22,col=4)
> lines(0:10,dpois(0:10,lambda=5),col=4,lty=2)

I modify the legend so that the four negative binomial curves are distinguished from the two Poisson. This is easily done by adding additional text, I choose "NB: " and "Poisson: ", in the paste expressions.

> legend(6,.8,c(expression(paste("NB: ",mu==1,', ',theta==.1)),
expression(paste("NB: ",mu==1,', ',theta==10)),
expression(paste("NB: ",mu==5,', ',theta==.1)),
expression(paste("NB: ",mu==5,', ',theta==10)),
expression(paste("Poisson: ",lambda==1)),
expression(paste("Poisson: ",lambda==5))),
pch=c(16,16,21,21,15,22),
col=c(1,2,1,2,4,4),lty=c(1,1,2,2,1,2), cex=rep(.9,6),bty='n')
> # add title
> mtext('Various negative binomial and Poisson distributions', side=3, line=.5, cex=.9)

The finished plot is shown below. Observe that in each case for a fixed mean, the Poisson distribution more closely resembles the corresponding negative binomial distribution with the larger value of θ (θ = 10).

Problem 3

I plot one additional gamma distribution with a different scale parameter than was asked for to assist in understanding the effect of scale.

> plot(seq(0,10,.1), dgamma(seq(0,10,.1),shape=1,rate=1), type='n', xlab='x', ylab=expression(paste("f(x; ",alpha,", ",beta,")")))
> lines(seq(0,10,.1), dgamma(seq(0,10,.1), shape=1, scale=1))
> lines(seq(0,10,.1), dgamma(seq(0,10,.1), shape=2, scale=1), col=2)
> lines(seq(0,10,.1), dgamma(seq(0,10,.1), shape=5, scale=1), col=4)
> lines(seq(0,10,.1), dgamma(seq(0,10,.1), shape=2, scale=2), col=2, lty=2)
> lines(seq(0,10,.1), dgamma(seq(0,10,.1), shape=2, scale=.5), col=2, lty=3)
> #create legend
> legend(6,1,c(expression(paste(alpha==1,", ",beta==1)),
expression(paste(alpha==2,", ",beta==.5)),
expression(paste(alpha==2 ,", ",beta==1)),
expression(paste(alpha==2,", ",beta==2)),
expression(paste(alpha==5,", ",beta==1))),
lty=c(1,,3,1,2,1), col=c(1,2,2,2,4), bty='n')
> #add title
> mtext(expression(paste("Gamma distributions with varying ", alpha, " and ", beta)), side=3, line=.5, cex=.9)

The red curves all have the same shape parameter, but a varying scale parameter. Notice the basic shape does stay the same, but as β decreases the graph flattens out. The "scaling" that's taking place is what you would get if you physically grabbed the curve at its peak and pulled it up or pushed it down. Because the area under the curve can't change (it must always equal 1), the tail shrinks or fattens to compensate.

Also recall that for a gamma distribution . Thus for fixed α as β decreases the mean will also decrease and the graph of the density will appear to shift to the left. As a result the graph appears more concentrated (more peaked) at lower values.

Course Home Page


Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516
Copyright © 2006
Last Revised--Feb 6, 2006
URL: http://www.unc.edu/courses/2006spring/ecol/145/001/docs/solutions/assign2.htm