Assignment 4 — Solution

Question 1

My negative loglikelihood function for the negative binomial distribution is shown below.

> NBvec2.neg<-function(data,p) -sum(log(dnbinom(data,mu=p[1],size=p[2])))

I enter each set of data and fit a negative binomial model. Each time I use the mean number of counts for that treatment as an initial guess for mu. I use 1 as the initial guess for theta. I save the results in the objects out1, out2, out3, and out4.

> #Treatment 1
> trt1<-c(19,12,18,18,11,12,7,8,4,4,1,1,1,1,1,1,1)
> c.trt1<-c(0:10,12,13,15,17,19,26)
> trt1.dat<-rep(c.trt1,trt1)
> mean(trt1.dat)

[1] 4.033333
> nlm(function(x) NBvec2.neg(trt1.dat,x),c(4,1),hessian=TRUE)->out1


> #Treatment 2
> trt2<-c(24,16,16,18,15,9,6,5,3,4,3,1)
> counts2<-c(0:10,12)
> trt2.dat<-rep(counts2,trt2)
> mean(trt2.dat)
[1] 3.166667
> nlm(function(x) NBvec2.neg(trt2.dat,x),c(3,1),hessian=TRUE)->out2


> #Treatment 3
> trt3<-c(43,35,17,11,5,4,1,2,2)
> trt3.dat<-rep(0:8,trt3)
> mean(trt3.dat)
[1] 1.483333
> trt3.dat<-rep(0:8,trt3)
> nlm(function(x) NBvec2.neg(trt3.dat,x),c(1,1),hessian=TRUE)->out3

>
> #Treatment 4
> counts4<-c(0:7,10,11)
> trt4<-c(47,23,27,9,7,3,1,1,1,1)
> trt4.dat<-rep(counts4,trt4)
> mean(trt4.dat)
[1] 1.508333
> nlm(function(x) NBvec2.neg(trt4.dat,x),c(1.5,1),hessian=TRUE)->out4

Collect results in single table

> cbind(out1$estimate,out2$estimate,out3$estimate,out4$estimate)->out
> rownames(out)<-c('mu','theta')
> colnames(out)<-c('trt1','trt2','trt3','trt4')
> out
          trt1     trt2     trt3     trt4
   mu 4.033330 3.166665 1.483333 1.508333
theta 1.502877 1.760487 1.333132 1.153521

Question 2

I carry out a parametric test and a randomization test for each treatment.

> #Treatment 1

Pooled Pearson Chi-squared Test

I first examine at what point to lump the tail probabilities. From the output below we see the largest k for which P(X > k) > 5 is k = 11.

> 120*(1-pnbinom(10:15,mu=out1$estimate[1],size=out1$estimate[2]))
[1] 8.215601 6.194139 4.659713 3.498585 2.622275 1.962447

Next I examine the individual probabilities up to and including k = 11.

> dnbinom(0:11,mu=out1$estimate[1],size=out1$estimate[2])*120
[1] 16.909034 18.513716 16.879303 14.358517 11.775818 9.441964 7.455350 5.821693
[9] 4.507925 3.467693 2.653386 2.021462

There are a couple of choices here. I group the last three and decide that 4.5 is close enough to 5 to be left in its own category.

> Ei<-c(dnbinom(0:8, mu=out1$estimate[1], size=out1$estimate[2]), sum(dnbinom(9:11, mu=out1$estimate[1], size=out1$estimate[2])),1-pnbinom(11, mu=out1$estimate[1], size=out1$estimate[2]))*120
> Ei

[1] 16.909034 18.513716 16.879303 14.358517 11.775818 9.441964 7.455350 5.821693
[9] 4.507925 8.142541 6.194139

> Oi<-c(trt1[1:9],sum(trt1[10:11]),sum(trt1[12:length(trt1)]))
> Oi
[1] 19 12 18 18 11 12 7 8 4 5 6

Note: For k = 11, TRT1 is zero and that value wasn't entered. Hence the second group only combines two nonzero values.

> sum((Oi-Ei)^2/Ei)
[1] 6.411393
> sum((Oi-Ei)^2/Ei)->pear1
> 1-pchisq(pear1,length(Oi)-1-2)->p.1
> p.1
[1] 0.6012518

Randomization Test

For the randomization test I populate the observed values with zeros so that I end up with a vector of length 27.

> Oi.1<-c(trt1[1:11],0,trt1[12:13],0,1,0,1,0,1,0,0,0,0,0,0,1)
> length(Oi.full)
[1] 27
> expprob1<-c(dnbinom(0:25,mu=out1$estimate[1],size=out1$estimate[2]),1-pnbinom(25,mu=out1$estimate[1],size=out1$estimate[2]))
> sum(expprob1)
[1] 1

> chisq.test(Oi.1,p=expprob1,simulate.p.value=TRUE)->random1
> random1

Chi-squared test for given probabilities with simulated p-value (based on 2000 replicates)

data: Oi.full
X-squared = 22.8414, df = NA, p-value = 0.4983

> #Treatment 2

Pooled Pearson Chi-squared Test

I first examine at what point to lump the tail probabilities. From the output below we see the largest k for which P(X > k) > 5 is k = 11.

> 120*(1-pnbinom(8:15,mu=out2$estimate[1],size=out2$estimate[2]))
[1] 7.1999603 4.9277910 3.3564198 2.2766839 1.5387632 1.0367602 0.6965987 0.4668941

> dnbinom(0:9,mu=out2$estimate[1],size=out2$estimate[2])*120
[1] 19.602369 22.179282 19.674754 15.850329 12.123718 8.976999 6.500759 4.631921
[9] 3.259910 2.272169

Keep first 7 categories, merge the next 2, and then all the rest.

> Ei.2<-c(dnbinom(0:6,mu=out2$estimate[1], size=out2$estimate[2]), sum(dnbinom(7:8, mu=out2$estimate[1], size=out2$estimate[2])),1-pnbinom(8, mu=out2$estimate[1], size=out2$estimate[2]))*120
> Ei.2
[1] 19.602369 22.179282 19.674754 15.850329 12.123718 8.976999 6.500759 7.891831
[9] 7.199960

> trt2
[1] 24 16 16 18 15 9 6 5 3 4 3 1
> Oi<-c(trt2[1:7],sum(trt2[8:9]),sum(trt2[10:length(trt2)]))
> Oi
[1] 24 16 16 18 15 9 6 8 8
> length(Oi)
[1] 9
> length(Ei.2)
[1] 9
> pearson2<-sum((Oi-Ei.2)^2/Ei.2)
> pearson2
[1] 4.49745
> 1-pchisq(pearson2,length(Ei.2)-1-2)->p.2
> p.2
[1] 0.6096794

Randomization Test

> Oi.2<-c(trt2[1:11],0,1)
> expprob2 <-c(dnbinom(0:11,mu=out2$estimate[1],size=out2$estimate[2]),1-pnbinom(11,mu=out2$estimate[1],size=out2$estimate[2]))
> length(expprob2)
[1] 13
> length(Oi.2)
[1] 13

> chisq.test(Oi.2,p=expprob2,simulate.p.value=TRUE)->random2
> random2

Chi-squared test for given probabilities with simulated p-value (based on 2000 replicates)

data: Oi.full2
X-squared = 8.8654, df = NA, p-value = 0.7166

> #Treatment 3

Pooled Pearson Chi-squared Test

> 120*(1-pnbinom(5:10,mu=out3$estimate[1],size=out3$estimate[2]))
[1] 4.4118050 2.4207674 1.3222542 0.7196144 0.3904771 0.2113574
> dnbinom(0:5,mu=out3$estimate[1],size=out3$estimate[2])*120
[1] 44.272941 31.084638 19.098068 11.175192 6.375749 3.581607
> Ei.3<-c(dnbinom(0:4,mu=out3$estimate[1],size=out3$estimate[2]),1-pnbinom(4,mu=out3$estimate[1],size=out3$estimate[2]))*120
> Ei.3
[1] 44.272941 31.084638 19.098068 11.175192 6.375749 7.993412
> Oi.3<-c(trt3[1:5],sum(trt3[6:length(trt3)]))
> pearson3<-sum((Oi.3-Ei.3)^2/Ei.3)
> pearson3
[1] 1.186620
> 1-pchisq(pearson3,length(Ei.3)-1-2)->p.3
> p.3
[1] 0.756215

Randomization Test

> expprob3<-c(dnbinom(0:7,mu=out3$estimate[1],size=out3$estimate[2]),1-pnbinom(7,mu=out3$estimate[1],size=out3$estimate[2]))
> length(expprob3)
[1] 9
> chisq.test(trt3,p=expprob3,simulate.p.value=TRUE)->random3
> random3

Chi-squared test for given probabilities with simulated p-value (based on 2000 replicates)

data: trt3
X-squared = 2.6892, df = NA, p-value = 0.962

> #Treatment 4

Pooled Pearson Chi-squared Test

> 120*(1-pnbinom(5:10,mu=out4$estimate[1],size=out4$estimate[2]))
[1] 5.1347980 2.9649476 1.7084416 0.9827824 0.5645753 0.3239613
> dnbinom(0:5,mu=out4$estimate[1],size=out4$estimate[2])*120
[1] 45.737256 29.895689 18.240663 10.864958 6.392893 3.733743

> Ei.4<-c(dnbinom(0:4, mu=out4$estimate[1], size=out4$estimate[2]), 1-pnbinom(4, mu=out4$estimate[1], size=out4$estimate[2]))*120
> Oi.4<-c(trt4[1:5],sum(trt4[6:length(trt4)]))
> pearson4<-sum((Oi.4-Ei.4)^2/Ei.4)
> pearson4

[1] 6.603187
> 1-pchisq(pearson4,length(Oi.4)-1-2)->p.4
> p.4

[1] 0.0856807

Randomization Test

> trt4
[1] 47 23 27 9 7 3 1 1 1 1
> Oifull.4<-c(trt4[1:8],0,0,1,1)
> length(Oifull.4)

[1] 12
> expprob4<-c(dnbinom(0:10,mu=out4$estimate[1],size=out4$estimate[2]),1-pnbinom(10,mu=out4$estimate[1],size=out4$estimate[2]))
> chisq.test(Oifull.4,p=expprob4,simulate.p.value=TRUE)->random4
> random4

Chi-squared test for given probabilities with simulated p-value (based on 2000 replicates)

data: Oifull.4
X-squared = 11.988, df = NA, p-value = 0.3348

> #Summary of Results

> p.out<-rbind(c(p.1,p.2,p.3,p.4), c(random1$p.value,random2$p.value,random3$p.value,random4$p.value))
> rownames(p.out)<-c('parametric','randomization')
> colnames(p.out)<-c('trt1','trt2','trt3','trt4')
> p.out

                   trt1      trt2     trt3      trt4
parametric    0.6012518 0.6096794 0.756215 0.0856807
randomization 0.5387306 0.7166417 0.962019 0.3348326

Question 3

I write a function to draw the arrows. It draws an arrow from the mean minus the bound to the mean plus the bound. The bound will be input as the standard error of the estimate times the .975 quantile of a normal distribution.

> arrow.func<-function(x,mean,bound) arrows(x,mean+bound,x,mean-bound,code=3,angle=90,length=.1)

> #draw graph
> plot(1:4,c(out1$estimate[1], out2$estimate[1], out3$estimate[1], out4$estimate[1]),
ylim=c(1,5), type='n', xlab='treatment',ylab='mean count',axes=FALSE,xlim=c(.8,4.2))
> axis(1,at=1:4)
> axis(2)
> box()
> points(1:4, c(out1$estimate[1], out2$estimate[1], out3$estimate[1], out4$estimate[1]), pch=16,cex=1.2)
> arrow.func(1,out1$estimate[1], qnorm(.975)*sqrt(solve(out1$hessian)[1,1]))
> arrow.func(2,out2$estimate[1], qnorm(.975)*sqrt(solve(out2$hessian)[1,1]))
> arrow.func(3,out3$estimate[1], qnorm(.975)*sqrt(solve(out3$hessian)[1,1]))
> arrow.func(4,out4$estimate[1], qnorm(.975)*sqrt(solve(out4$hessian)[1,1]))

Question 4

I use the plkhci function in the Bhat package to obtain the profile likelihood confidence intervals.

> library(Bhat)
> x.in<-list(label=c('mu','theta'), est=out1$estimate, low=c(.1,.1), high=c(10,10))
> plkhci(x.in,function(p) NBvec2.neg(trt1.dat,p),'mu')->bound1
> bound1
[1] 3.401719 4.804918

> x.in<-list(label=c('mu','theta'), est=out2$estimate, low=c(.1,.1), high=c(10,10))
> plkhci(x.in,function(p) NBvec2.neg(trt2.dat,p),'mu')->bound2
> bound2

[1] 2.675589 3.759237

> x.in<-list(label=c('mu','theta'), est=out3$estimate, low=c(.1,.1), high=c(10,10))
> plkhci(x.in,function(p) NBvec2.neg(trt3.dat,p),'mu')->bound3
> bound3

[1] 1.196123 1.842141

> x.in<-list(label=c('mu', 'theta'), est=out2$estimate, low=c(.1,.1), high=c(10,10))
> plkhci(x.in,function(p) NBvec2.neg(trt4.dat,p),'mu')->bound4
> bound4
[1] 1.207350 1.889753

> #draw graph
> plot(1:4,c(out1$estimate[1], out2$estimate[1], out3$estimate[1], out4$estimate[1]), ylim=c(1,5), type='n', xlab='treatment', ylab='mean count', axes=FALSE, xlim=c(.8,4.2))
> axis(1,at=1:4)
> axis(2)
> box()
> points(1:4, c(out1$estimate[1], out2$estimate[1], out3$estimate[1], out4$estimate[1]), pch=16, cex=1.2)

I write an arrow function that draws an arrow from the first component of the vector I call bound to the second component of bound.

> profilearrow.func<-function(x,bound) arrows(x, bound[1], x, bound[2], code=3, angle=90, length=.1)

> profilearrow.func(1,bound1)
> profilearrow.func(2,bound2)
> profilearrow.func(3,bound3)
> profilearrow.func(4,bound4)
> mtext('95% profile likelihood CI for mean',side=3,line=.5)

Observe that the plot of the profile confidence intervals is virtually identical to the plot of the Wald confidence intervals.

 

Course Home Page


Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516
Copyright © 2006
Last Revised--March 23, 2006
URL: http://www.unc.edu/courses/2006spring/ecol/145/001/docs/solutions/assign4.htm