Lecture 11 (lab 3) —Tuesday, January 31, 2006

What was covered?

R functions and commands demonstrated

R function options

Data Entry

Table 1—Data for Lab Exercise

# of aphids on a stem
Number of stems
0
6
1
8
2
9
3
6
4
6
5
2
6
5
7
3
8
1
9
4
Total
50

Constructing the Loglikelihood

> dpois(aphid.data,lambda=1)
 [1] 3.678794e-01 3.678794e-01 3.678794e-01 3.678794e-01 3.678794e-01
 [6] 3.678794e-01 3.678794e-01 3.678794e-01 3.678794e-01 3.678794e-01
[11] 3.678794e-01 3.678794e-01 3.678794e-01 3.678794e-01 1.839397e-01
[16] 1.839397e-01 1.839397e-01 1.839397e-01 1.839397e-01 1.839397e-01
[21] 1.839397e-01 1.839397e-01 1.839397e-01 6.131324e-02 6.131324e-02
[26] 6.131324e-02 6.131324e-02 6.131324e-02 6.131324e-02 1.532831e-02
[31] 1.532831e-02 1.532831e-02 1.532831e-02 1.532831e-02 1.532831e-02
[36] 3.065662e-03 3.065662e-03 5.109437e-04 5.109437e-04 5.109437e-04
[41] 5.109437e-04 5.109437e-04 7.299195e-05 7.299195e-05 7.299195e-05
[46] 9.123994e-06 1.013777e-06 1.013777e-06 1.013777e-06 1.013777e-06

> log(dpois(aphid.data,lambda=1))
 [1] -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000
 [7] -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000
[13] -1.000000 -1.000000 -1.693147 -1.693147 -1.693147 -1.693147
[19] -1.693147 -1.693147 -1.693147 -1.693147 -1.693147 -2.791759
[25] -2.791759 -2.791759 -2.791759 -2.791759 -2.791759 -4.178054
[31] -4.178054 -4.178054 -4.178054 -4.178054 -4.178054 -5.787492
[37] -5.787492 -7.579251 -7.579251 -7.579251 -7.579251 -7.579251
[43] -9.525161 -9.525161 -9.525161 -11.604603 -13.801827 -13.801827
[49] -13.801827 -13.801827

> sum(log(dpois(aphid.data,lambda=1)))
[1] -215.9158

> poisson.func<-function(lambda) sum(log(dpois(aphid.data,lambda)))
> poisson.func(1)
[1] -215.9158
> poisson.func(2)
[1] -146.0014

The function seems to work.

Graphing the Loglikelihood

> poisson.func(seq(1,6,.1))
[1] -89.3177

It returns a single number. I was expecting it to return a list of numbers one for each value of seq(1,6,.1).

> poisson.func(1:2)
[1] -179.2257
> poisson.func(1)
[1] -215.9158
> poisson.func(2)
[1] -146.0014

As we can see when I give it both 1 and 2 the answer I get is neither the value at 1 nor the value at 2.

> dpois(aphid.data,1)
 [1] 3.678794e-01 3.678794e-01 3.678794e-01 3.678794e-01 3.678794e-01
 [6] 3.678794e-01 3.678794e-01 3.678794e-01 3.678794e-01 3.678794e-01
[11] 3.678794e-01 3.678794e-01 3.678794e-01 3.678794e-01 1.839397e-01
[16] 1.839397e-01 1.839397e-01 1.839397e-01 1.839397e-01 1.839397e-01
[21] 1.839397e-01 1.839397e-01 1.839397e-01 6.131324e-02 6.131324e-02
[26] 6.131324e-02 6.131324e-02 6.131324e-02 6.131324e-02 1.532831e-02
[31] 1.532831e-02 1.532831e-02 1.532831e-02 1.532831e-02 1.532831e-02
[36] 3.065662e-03 3.065662e-03 5.109437e-04 5.109437e-04 5.109437e-04
[41] 5.109437e-04 5.109437e-04 7.299195e-05 7.299195e-05 7.299195e-05
[46] 9.123994e-06 1.013777e-06 1.013777e-06 1.013777e-06 1.013777e-06

> dpois(aphid.data,2)
 [1] 0.1353352832 0.1353352832 0.1353352832 0.1353352832 0.1353352832
 [6] 0.1353352832 0.2706705665 0.2706705665 0.2706705665 0.2706705665
[11] 0.2706705665 0.2706705665 0.2706705665 0.2706705665 0.2706705665
[16] 0.2706705665 0.2706705665 0.2706705665 0.2706705665 0.2706705665
[21] 0.2706705665 0.2706705665 0.2706705665 0.1804470443 0.1804470443
[26] 0.1804470443 0.1804470443 0.1804470443 0.1804470443 0.0902235222
[31] 0.0902235222 0.0902235222 0.0902235222 0.0902235222 0.0902235222
[36] 0.0360894089 0.0360894089 0.0120298030 0.0120298030 0.0120298030
[41] 0.0120298030 0.0120298030 0.0034370866 0.0034370866 0.0034370866
[46] 0.0008592716 0.0001909493 0.0001909493 0.0001909493 0.0001909493

> dpois(aphid.data,1:2)
 [1] 3.678794e-01 1.353353e-01 3.678794e-01 1.353353e-01 3.678794e-01
 [6] 1.353353e-01 3.678794e-01 2.706706e-01 3.678794e-01 2.706706e-01
[11] 3.678794e-01 2.706706e-01 3.678794e-01 2.706706e-01 1.839397e-01
[16] 2.706706e-01 1.839397e-01 2.706706e-01 1.839397e-01 2.706706e-01
[21] 1.839397e-01 2.706706e-01 1.839397e-01 1.804470e-01 6.131324e-02
[26] 1.804470e-01 6.131324e-02 1.804470e-01 6.131324e-02 9.022352e-02
[31] 1.532831e-02 9.022352e-02 1.532831e-02 9.022352e-02 1.532831e-02
[36] 3.608941e-02 3.065662e-03 1.202980e-02 5.109437e-04 1.202980e-02
[41] 5.109437e-04 1.202980e-02 7.299195e-05 3.437087e-03 7.299195e-05
[46] 8.592716e-04 1.013777e-06 1.909493e-04 1.013777e-06 1.909493e-04

If you compare the output from the three function calls. you'll see that in the last call R alternates its use of λ = 1 and λ = 2. For the first observation it uses 1, for the second it uses 2, for the third it uses 1, etc. Thus R is trying to match up the entries in each vector argument and when it runs out of entries in the second vector it just recycles the vector over and over again. Of course when we take the log of the results and then sum everything up we get total gibberish.

> poisson.func(1)
[1] -215.9158
> poisson.func(2)
[1] -146.0014
> sapply(1:2,poisson.func)
[1] -215.9158 -146.0014

Now we got the right answer. There is a sister function to sapply that instead of returning a vector returns a list. It's called lapply.

> lapply(1:2,poisson.func)
[[1]]
[1] -215.9158

[[2]]
[1] -146.0014

Fig. 1 Plot of loglikelihood

A list is a data object in which the different components can be of different types (character, numeric, etc.) and of different lengths. We can even convert a list into a vector using the unlist function.

> unlist(lapply(1:2,poisson.func))
[1] -215.9158 -146.0014

> plot(seq(1,6,.1), sapply(seq(1,6,.1), poisson.func), type='l', xlab=expression(lambda), ylab='loglikelihood')

> plot(seq(3,4,.01), sapply(seq(3,4,.01), poisson.func), type='l', xlab=expression(lambda), ylab='loglikelihood')

> plot(seq(3.4,3.5,.001), sapply(seq(3.4,3.65,.001), poisson.func), type='l', xlab=expression(lambda), ylab='loglikelihood')

     

Fig. 2   Zooming in on the maximum likelihood estimate

> mean(aphid.data)
[1] 3.46

Maximizing the loglikelihood numerically

nlm

> ?nlm

Fig. 3 Help screen for nlm

poisson.negloglik<-function(lambda) -sum(log(dpois(aphid.data, lambda=lambda)))

> nlm(poisson.negloglik,3)
$minimum
[1] 124.1764

$estimate
[1] 3.459998

$gradient
[1] 6.571497e-08

$code
[1] 1

$iterations
[1] 4

> nlm(poisson.negloglik,3,gradtol=1E-12)
$minimum
[1] 124.1764

$estimate
[1] 3.46

$gradient
[1] 1.991984e-09

$code
[1] 2

$iterations
[1] 5

optim

> ?optim

Fig. 4  Help screen for optim

> optim(3,poisson.negloglik)
$par
[1] 3.460547

$value
[1] 124.1764

$counts
function gradient
20 NA

$convergence
[1] 0

$message
NULL

Warning message:
one-diml optimization by Nelder-Mead is unreliable: use optimize in: optim(3, poisson.negloglik)

optimize

> ?optimize

Fig. 5   Help screen for optimize

> optimize(poisson.negloglik,c(3,4))
$minimum
[1] 3.459989

$objective
[1] 124.1764

Constructing confidence intervals for the MLE

Wald confidence intervals

Here is the 0.975 quantile of a standard normal distribution.

> out<-nlm(poisson.negloglik,3,hessian=TRUE)
> out
$minimum
[1] 124.1764

$estimate
[1] 3.459998

$gradient
[1] 6.571497e-08

$hessian
[,1]
[1,] 14.44799

$code
[1] 1

$iterations
[1] 4

> out$estimate
[1] 3.459998
> out$hessian
[,1]
[1,] 14.44799

> qnorm(.975)
[1] 1.959964

> #lower bound
> out$estimate-qnorm(.975)*sqrt(1/out$hessian)
[,1]
[1,] 2.944361

> #upper bound
> out$estimate+qnorm(.975)*sqrt(1/out$hessian)
[,1]
[1,] 3.975636

Profile likelihood confidence intervals

where is the MLE of λ. We reject at α = .05 if where is the .95 quantile of a chi-squared distribution with one degree of freedom.

Fig. 6 Profile likelihood confidence interval

> lower.limit<- -out$minimum-.5*qchisq(.95,1)
> lower.limit
[1] -126.0971

> plot(seq(2,5,.1), sapply(seq(2,5,.1), poisson.func), type='l', xlab=expression(lambda), ylab='loglikelihood')
> abline(h=lower.limit,col=4,lty=2)

where . Thus we seek the roots (zeros) of the function f .

> ?uniroot

Fig 7   Help screen for uniroot

> root.function<-function(lambda) poisson.func(lambda)-lower.limit
> uniroot(root.function,c(2.5,3.5) )
$root
[1] 2.96967

$f.root
[1] -0.0002399496

$iter
[1] 6

$estim.prec
[1] 6.103516e-05

> uniroot(root.function,c(3.5,4.5))
$root
[1] 4.00152

$f.root
[1] -8.254986e-05

$iter
[1] 6

$estim.prec
[1] 6.103516e-05

Profile likelihood confidence intervals using the Bhat package

> library(Bhat)
> ?plkhci

Fig 8   Help documentation for plkhci

> x.in<-list(label='lambda',est=3.4, low=2, up=5)
> plkhci(x.in,poisson.negloglik,'lambda')

At the end of quite a bit of output we find the calculated confidence interval:

[1] 2.966524 4.005384

This matches our hand calculations above.

Checking for lack of fit

Graphical Analysis

> lambdahat<-out$estimate
> exp<-dpois(0:9,lambdahat)*50
> max(exp)
[1] 10.84896
> max(table(aphid.data))
[1] 9

Fig. 9 Examining the fit of the Poisson model

> coords<-barplot(table(aphid.data), ylim=c(0,11))
> points(coords,exp,pch=16,cex=1.5)
> lines(coords,exp)
> legend(coords[7],11, c('observed','expected'), pch=c(15,16), col=c('gray60',1), lty=c(NA,1), cex=c(.9,.9), bty='n')
> box()

Notice in the legend statement I specified NA, the R code for missing, as the first element of the vector entry for lty. This suppresses the drawing of a line for the first entry in the legend. I use the filled square symbol, pch=15, to represent the bars in the legend.

Goodness of Fit Test

where m is the number of categories.

The degrees of freedom are the number of categories, m, minus one minus the number of estimated parameters, p, used in obtaining the expected frequencies. The null hypothesis of this test is that the fit is adequate. We reject the null hypothesis at level α if the observed value of our test statistic exceeds the 1 – α quantile of a chi-squared distribution with m – 1 – p degrees of freedom.

Reject if

> dpois(0:12,lambdahat)*50
 [1] 1.571490812 5.437355498 9.406620322 10.848963362 9.384348629
 [6] 6.493966013 3.744851867 1.851025857 0.800568284 0.307773876
[11] 0.106489708 0.033495837 0.009657961

Goodness of Fit Test by Pooling

> #P(X = 6)
> dpois(6,lambdahat)*50
[1] 3.744852
> #P(X > 6)
> (1-ppois(6,lambdahat))*50
[1] 3.112403

> expected<-c(sum(dpois(0:1,lambdahat)), dpois(2:5,lambdahat), 1-ppois(5,lambdahat))*50
> expected
[1] 7.008846 9.406620 10.848963 9.384349 6.493966 6.857255

Fig. 10 Pooled categories

> rawcounts<-table(aphid.data)
> observed<-c(sum(rawcounts[1:2]), rawcounts[3:6], sum(rawcounts[7:length(rawcounts)]))
> observed
   2 3 4 5
14 9 6 6 2 13

> names(observed)<-c('0-1',2:5,'>5')
> observed
0-1 2 3 4 5 >5
 14 9 6 6 2 13

> coords<-barplot(observed,ylim=c(0,15))
> points(coords,expected,pch=16,cex=1.5)
> lines(coords,expected)
> legend(coords[2],15, c('observed','expected'), pch=c(15,16), col=c('gray60',1), lty=c(NA,1), cex=c(.9,.9), bty='n')
> box()

> pearson<-sum((observed-expected)^2/expected)
> pearson
[1] 18.99147
> df<-length(observed)-1-1
> df
[1] 4
> p.val<-1-pchisq(pearson,df)
> p.val
[1] 0.0007889844
> qchisq(.95,df)
[1] 9.487729

Goodness of Fit Test—Randomization Test Version

> ?qchisq.test

Fig. 11  Help screen for chisq.test

> expprob<-c(sum(dpois(0:1,lambdahat)),dpois(2:5,lambdahat),1-ppois(5,lambdahat))
> chisq.test(observed,p=expprob)

Chi-squared test for given probabilities

data: observed
X-squared = 18.9915, df = 5, p-value = 0.001929

> expprob2<-c(dpois(0:8,lambdahat),1-ppois(8,lambdahat))
> #check that all cells are accounted for
> sum(expprob2)
[1] 1
> chisq.test(rawcounts,p=expprob2)

Chi-squared test for given probabilities

data: rawcounts
X-squared = 48.5686, df = 9, p-value = 1.999e-07

Warning message:
Chi-squared approximation may be incorrect in: chisq.test(rawcounts, p = expprob2)

> out.bad<-chisq.test(rawcounts,p=expprob2)
Warning message:
Chi-squared approximation may be incorrect in: chisq.test(rawcounts, p = expprob2)

> names(out.bad)
[1] "statistic" "parameter" "p.value" "method" "data.name" "observed" "expected" "residuals"
> out.bad["statistic"]
$statistic
X-squared
48.56862

> out.bad["parameter"]
$parameter
df
9

> 1-pchisq(as.numeric(out.bad["statistic"]), as.numeric(out.bad["parameter"])-1)
[1] 7.69037e-08

> chisq.test(rawcounts,p=expprob2,simulate.p.value=TRUE)

Chi-squared test for given probabilities with simulated p-value (based on 2000 replicates)

data: rawcounts
X-squared = 48.5686, df = NA, p-value = 0.0004998

> chisq.test(rawcounts,p=expprob2,simulate.p.value=TRUE,B=20000)

Chi-squared test for given probabilities with simulated p-value (based on 20000 replicates)

data: rawcounts
X-squared = 48.5686, df = NA, p-value = 0.00015

Cited References

Course Home Page


Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516
Copyright © 2006
Last Revised--Feb 3, 2006
URL: http://www.unc.edu/courses/2006spring/ecol/145/001/docs/lectures/lecture11.htm