Lecture 18—Monday, February 13, 2006

What was covered?

Terminology Defined

AIC for comparing models when some, but not all, use a transformed response variable

Fig. 1 Log is a 1-to-1 mapping

So here we see that the random variable Y only serves to identify the set of ω elements whose probability we wish to calculate. Any function of Y that yields the same set of ω will also yield the same probability. The logarithm is such a function.

i.e., the sample space elements that make up the set on the left side of the equality also make up the set on the right side of the equality, and vice versa. This follows because the logarithm is a one-to-one mapping (Fig. 1). Hence each logY that satisfies logY ≤ log y corresponds to a unique and distinct Y satisfying Yy and vice versa. Hence the set of elements from our sample space that satisfy one inequality automatically satisfy the other.

where . When only the upper limit of integration is a function of y, we have

To calculate AIC from the above likelihood we would use the maximum likelihood estimates of μi and σ2 obtained from the log-transformed model.

Models for excess zeros

A derivation of the probability distributions for excess zero models

Then we have the following.

where in the last step I use the definition of conditional probability for these events.

Thus we explicitly assume that the distribution of X is entirely governed by the two distributions g1 and g2.

Zero-inflated Poisson (ZIP) model (mixture model)

Hurdle Poisson model (conditional model)

The expression that appears in the formula for is called a truncated Poisson distribution. It's truncated because in this case we've removed the zero category from the Poisson distribution. The denominator is there to renormalize the probabilities so that they still sum to 1.

Comparing the two excess zero models

Fig. 2  Heterogeneous zeros in habitat models

Mixture Poisson (ZIP) model

Conditional Poisson (Hurdle) Model

Some final remarks

  • The R packages pscl and zicounts provide implementations of both kinds of excess zero models. Poisson and negative binomial distributions for the nonzero counts are supported. The statistical package Stata also has functions dedicated to fitting ZIP and ZINB models.
  • There is the suggestion in the literature that zero-inflated negative binomial (ZINB) models often have convergence problems (Famoye and Singh 2006).
  • Zero-inflated models have a long history in the econometrics literature but the interest in these kinds of models in disciplines such as ecology appear to stem from Lambert (1992).
  • It's worth noting that Warton (2005) argues that many of the published uses of excess zero models are probably unnecessary. He argues that the negative binomial probability model by itself is sufficient to handle most occurrences of zero-inflation (relative to a Poisson distribution) in environmental and ecological data.
  • Some references on excess zero models

    Course Home Page


    Jack Weiss
    Phone: (919) 962-5930
    E-Mail: jack_weiss@unc.edu
    Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516
    Copyright © 2006
    Last Revised--Feb 16, 2006
    URL: http://www.unc.edu/courses/2006spring/ecol/145/001/docs/lectures/lecture18.htm