In this formula we treat σ2 and μ as the variables and θ as a fixed constant. As a result we get a parabola in μ.
- If we treat the negative binomial as a compound Poisson distribution in which a gamma distribution with parameters α and β is the mixing distribution, then the above formula can be re-expressed in terms of the parameters α and β as follows.

- Now just as in the original formula where we chose θ to be constant and μ to be the variable, we have a choice of which one of α and β should be fixed and which should be constant.
- Case 1: α fixed, β varies. Here we let the scale parameter vary. Then we have

This is the same quadratic relationship we had before with α replacing θ. This is called the NB2 model for the variance.
- Case 2: α variable, β fixed. Here we let only the shape parameter vary. Regrouping things in terms of the variable α yields the following.

Now we have a linear relationship between the variance and the mean. This is called the NB1 model.
- The NB1 model resembles a Poisson model except that in the Poisson model k = 1. Historically the NB1 model has been fit as a quasi-Poisson model using what's called a quasi-likelihood approach. This is somewhat inferior to a full likelihood approach so this made the NB1 model a less attractive choice than the NB2 model. Recently methods for fitting NB1 using a maximum likelihood approach have been described (Jansakul and Hinde 2004). Up until now use of the NB1 model has been restricted largely to the econometrics literature, but I think the development of full likelihood methods for fitting it should change that.
Zero Inflation
- Zero inflation refers to observing more zeros than is predicted by a particular probability model, typically, more zeros than is predicted by a Poisson model. Zero inflation often occurs in applications and is particularly common in ecology, especially when the event being tracked is rare or sporadic.
- A number of so-called zero-inflated models have been proposed in which the zeros are treated as heterogeneous, some of them coming from the same probability generating mechanism that generated the observed counts (the true zeros) and some coming from another process (the false zeros). Such an interpretation can make sense in habitat suitability studies in which, say, the abundance of a colonizing species is being tracked. In this situation one might suspect that there exists habitat that although suitable for habitation is still uncolonized because it's inaccessible (false zeros), as well as other habitat that although accessible has not been colonized because it is not suitable (true zeros).
- It is my experience, also see Warton (2005), that negative binomial models are very good at accounting for excess zeros thus making it unnecessary to model the zero category separately.
- I have typically found that for heteroscedastic count data with few or no zeros, log-transformed count models often do as well or better than negative binomial models in describing the data. But as soon as the zero category becomes prominent, log-transformed models become nonsensical. A log-transformation works by converting an asymmetric distribution of counts into a symmetric distribution of log counts. But a pile of zeros at one end of the distribution can never be transformed away. After the transformation is applied that pile will still be at the end of the distribution resulting in a transformed distribution that is neither bell-shaped nor symmetric.
Polya-Eggenberger Urn Model
- A completely different way in which a negative binomial distribution can arise in theory is via what is called the Polya-Eggenberger urn model. This is one of the many so-called Polya urn models that have been proposed. The description of this one is as follows.
- Suppose we have an urn with N balls in it of which a fraction p are red and the remaining fraction 1 – p are black.
- At each trial we draw one ball from the urn, observe its color, and return it to the urn along with θ N balls of the same color (where usually we take 0 < θ < 1 although this is not required). By adding more balls of the same color we make the occurrence of the same event at the next trial more likely.
- Let Xr = number of red balls in the urn after r trials of this sort. It is not hard to write down the formula for
but it is rather involved, particularly the simplification that is required to proceed, so we'll skip it.
- It can be shown that if we take the limit of this expression in a certain way,
while
, that the limiting distribution is negative binomial.
- It's hard to know how useful this result is in practice. What is important is that the Polya-Eggenberger scheme clearly violates the independent increments hypothesis that was one of the three basic assumptions of the Poisson model.
- Thus we see that if either one of the major assumptions of the Poisson model, homogeneity or independence, is violated we can be led, under certain circumstances, to a negative binomial model.
A Generalized Poisson Model
- There is an even more direct way in which violation of the independence hypothesis can lead from a Poisson model to a negative binomial model. This is through what's sometimes called a generalized Poisson model.
- Suppose events come in clusters. The clusters themselves are distributed spatially according to a Poisson distribution with parameter λ.
- Within each cluster suppose the number of events observed follows a logarithmic distribution (also called a log series or Fisher log series distribution) with parameter p, 0 < p < 1. Then in any cluster i we have

Some Reflections on the Negative Binomial
- We have now seen three distinct more or less biologically relevant mechanisms that can lead to a negative binomial distribution.
- A gamma mixing distribution for a heterogeneous Poisson—apparent contagion
- A Polya-Eggenberger urn model—true contagion
- A generalized Poisson model with Poisson distributed clusters and log series counts in a cluster—true contagion
- The desire to distinguish true dependence (contagion) from sham dependence (apparent contagion) has a long history.
- E. C. Pielou (1977), also Feller (1943), noted that apparent contagion (an independence model) and true contagion (a dependence model) are indistinguishable in practice because they can both give rise to identical distributions, the negative binomial.
- Cliff and Ord (1973, 1981) agree with this but argue that with a little bit more information, for example counts at multiple time periods, it should be possible to distinguish apparent and true contagion in practice.
- In a classic paper, Boswell and Patil (1970) outlined 12 distinct ways in which a negative binomial distribution can arise in practice, of which we've described four (if you include the original definition which is essentially that of a waiting time distribution). Because the negative binomial can arise in so many distinct ways it is probably not surprising that it often fits ecological data really well.
- Its appeal for use as a probability generating mechanism includes the following.
- It offers a model for heteroscedasticity.
- It can deal with zero inflation without auxiliary assumptions.
- It respects the discreteness of the data. (It doesn't insult the data by transforming them and then pretending they're continuous.)
- It can be motivated biologically--see the three mechanisms described above.
- We will study the negative binomial model again in a regression setting later in this course.
Other Probability Models
- Multinomial Distribution. Another discrete probability model is the multinomial model. It generalizes the binomial distribution in that more than two categories are permitted, but the total number of trials is still fixed. While I have used this distribution in non-ecological settings, I have not yet come across a good ecological application, so we will not consider it further in this course.
- Normal Distribution. I assume that you are very familiar with the normal distribution. The only comment I wish to make is that the importance of the normal distribution stems almost entirely from the central limit theorem. In layperson's terms this theorem states that if the value of a random variable X derives from many additive effects, then X will have an approximate normal distribution.
- Gamma Distribution.
- The gamma distribution is a continuous distribution that should be viewed as a viable alternative to the normal distribution whenever the data in question are heteroscedastic.
- Just as in the negative binomial the variance has a quadratic relationship with the mean,
, in the gamma distribution.
- The gamma distribution is defined for only positive values of the random variable. Zeros are not allowed.
- It's a two-parameter distribution. The parameters are typically denoted α (shape) and β (scale).
- The formulas for the density, the mean, and the variance were given in Lecture 6.
- It is clearly an underutilized distribution in regression modeling. In principle it should be possible to use a gamma distribution whenever the use of a lognormal distribution is contemplated. Its primary advantage over the lognomal is that it doesn't require that the data be transformed and thus doesn't suffer from the problems inherent therein.
- Lognormal Distribution
- The lognormal distribution is a continuous distribution that should be viewed as a viable alternative to the normal distribution whenever the data in question are heteroscedastic.
- Like the negative binomial and the gamma distributions the lognormal distribution assumes that the variance has a quadratic relationship with the mean,
.
- Like the gamma distribution the lognormal distribution is defined for only positive values. Zeros are not allowed.
- It's a two-parameter distribution. The parameters are typically denoted μ and σ2, although these are not its mean and variance.
- The easiest way to understand the lognormal distribution is as follows. If
then
where now μ and σ2 are the mean and variance of log X.
- The density, mean, and variance of the lognormal are fairly complicated expressions so it is usually easier to think in terms of the distribution of log X rather than the distribution of X.
- Anytime you log transform a response, carry out ordinary regression, and perform standard statistical tests on the results, you're assuming a lognormal distribution for the original response.
- Analogous to the central limit theorem of the normal distribution, there is a central limit theorem for the lognormal distribution. It essentially states that if the value of a random variable X derives from many multiplicative effects then X will have an approximate lognormal distribution.
- The lognormal distribution is heavily used in ecology to describe species abundance relations.