Lecture 6 —Monday, January 23, 2006
What was covered?
- Joint, marginal, and conditional probability distributions
- gamma distribution
- negative binomial model as a model of heterogeneity
Terminology Defined
Review of Joint, Marginal, and Conditional Distributions
- Suppose we have a population of 20 individual whom we cross-classify on the basis of two variables X and Y. Variable X has two categories, labeled 1 and 2, while Y has three categories, labeled 1, 2, and 3. The contingency table shown in Fig. 1 summarizes the frequency distribution of X and Y in our population. The raw counts are listed in the cells shaded in gray, the cells in yellow are the row and column totals, the cell in blue is the grand total.
- From this table there are three sets of probabilities one can calculate: the joint probabilities,
and two sets of conditional probabilities ,
and
.
- To calculate the joint probabilities we divide each entry in the table by the grand total 20. The entries that were the row and column totals now are called the marginal probabilities. So we have the marginal probabilities for X (last column) and the marginal probabilities for Y (last row).
- To calculate
, since Y defines the columns, we divide each entry in a column by the column total for that column.
- To calculate
, since X defines the rows, we divide each entry in a row by the row total for that row.
- The table below lists the three sets of probabilities for our frequency table.
- From the way the row and column margins were defined, it is clear in Table 2 that the joint probabilities in each row when summed are equal to the row marginal probabilities, and the joint probabilities in each column when summed are equal to the column marginal probabilities. In mathematical notation we have the following.


- As an example, I calculate P(X = 1)

- We can also carry out these calculations using conditional probabilities. Using the following identities


we can calculate the marginal probabilities as follows.

- Applying these formulas to our table, I once again calculate P(X = 1).

- If the variable being summed over in the above formulas is in fact continuous, then the sums are replaced by integrals as shown below.

- Note: To simplify notation I use the same symbol f for all the density functions involved. In practice, of course, all would be different from each other.
- If one of the variables is discrete and the other is continuous, then the resulting formula will be a hybrid of the continuous and discrete formulas shown above.
Nonhomogeneous Poisson Process (continued)
- As noted last time, in a nonhomogeneous Poisson process, the rate constant
is allowed to vary according to some distribution. Given a particular realization from this distribution, say
, the resulting random variable X will have a Poisson distribution with
.
- We still want to calculate the unconditional (or marginal) probability
, but now the calculation is more complicated because we have multiple values of
to consider. Using the above formulas we have (assuming that
varies continuously)

where in the last step I use the fact that given the value of
, X has a Poisson distribution with parameter
. So, what remains is to come up with a marginal density function for
.
- The function
that appears in the above integral is called a mixing distribution for the Poisson.
Choosing a Probability Distribution for 
- Let's list the obvious requirements for such a density.
- Since the Poisson distribution is a model of counts and
is the mean of that distribution, we must have
> 0. Thus a distribution such as the normal distribution that allows both positive and negative values is clearly out.
- Without any specific knowledge about how
might vary across subjects we should probably choose a function that is flexible, that can describe a wide range of possible distributions for
.
- We should probably choose a function that will allow us to actually compute the integral above. Hence it needs to be "complementary" to the Poisson mass function that it multiplies in the integral. (Note: this last point is less important today with availability of MCMC for estimating such integrals. We'll return to this point at the end of this course.)
- One distribution that satisfies all three requirements is the gamma distribution. The gamma distribution is a two-parameter continuous distribution (half an elephant!) that takes the following form.

Here α and β are positive parameters (called the shape and scale parameters, respectively) and
is the gamma function that was defined in the last lecture. In the functional notation above I use a semicolon to separate the random variable from its parameters.
- If
then the mean and variance of X are as follows.

- Because it will be important in what follows I demonstrate that the formula given above is truly a probability density by proving that it integrates to 1.

The Gamma Distribution as a Mixing Distribution for the Poisson
- Returning to our calculation of the marginal distribution of a nonhomogeneous Poisson process, I use the gamma distribution as the mixing distribution.

- Compare the integrand with the formula for the gamma distribution. The two terms of the integrand look like two of the terms of the gamma distribution—if we make the identifications:
and
. All that's missing are the corresponding
and
terms which with the new identifications would become
and
. I multiply and divide by each of these terms so that the integral remains unchanged.

- The integral that remains is just the integral over its domain of a gamma distribution with parameters
. But as we demonstrated above, this integrates to 1. Thus we are left with the following.

- I next reparameterize this using the mean of a gamma distribution.

Inside each of the parentheses I multiply numerator and denominator by μ and then make the above substitution.

- You should recognize the last expression as just the ecologist's parameterization of the negative binomial distribution with α playing the role of θ.
- We conclude that the marginal density of a nonhomogeneous Poisson process when the gamma distribution is used as a mixing distribution, is negative binomial with parameters μ and α.
Some Comments
- The argument given above provides an ecological rationalization for fitting a negative binomial model to count data. Essentially, if you suspect heterogeneity may be at play, the negative binomial should be a good choice.
- For example, if you have count data for a given species in different transects where the habitat quality of the transects varies, then you might suspect the fitness of the species to vary also yielding heterogeneity in their the parameters in different transects.
- If you have count data for many species collectively and the species distribution varies across quadrats, then the average rate parameter in a quadrat would be expected to vary—once again yielding heterogeneity.
- If you are monitoring a disease for which hosts have a varying susceptibility, you might suspect heterogeneity to be present.
- While heterogeneity may initially be present, it's possible that by including useful predictors in a regression model that you may end up "explaining" much of it. The negative binomial error distribution then could still be used to account for the lingering heterogeneity that is not accounted for by the model.
- The connection between the gamma mixing distribution and the final negative binomial model opens up some interesting possibilities. Suppose you suspect heterogeneity in your population but have no idea what form it might take. When you fit a negative binomial model you obtain μ and θ from which you can then obtain α and β of the gamma mixing distribution. The gamma probability distribution with these values of α and β then can be used to explicitly characterize, as a distribution, the way in which the heterogeneity manifests itself in your sample.
The Gamma Distribution as a Mixing Distribution for the Poisson—Alternative Approach
- It turns out there is a second way to use the gamma distribution as a mixing distribution and obtain a negative binomial distribution in the end. This second way appears to be only a trivial modification of what we've described above but is different enough so that it is not immediately obvious that it will yield the same result. This approach appears in a standard textbook on negative binomial regression (Hilbe 2007) and appears in the literature as the preferred way to implement negative binomial regression from a Bayesian perspective using WinBUGS (Durham et al. 2004).
- The first modification to our approach above is the manner in which the Poisson distribution is formulated. We now assume X is Poisson with "parameter"
, i.e., we write the usual single Poisson parameter λ as a product of two parameters, r and μ. The choice of symbol for the second parameter is propitious because it will turn out to be the usual mean of the negative binomial distribution.
- The second modification is in how the mixing distribution is specified. We assume that the first parameter r in the product is distributed as a gamma random variable in which the shape and scale parameters are identical.
- Thus our new assumptions are the following:
where
.
- So, in this approach we write the mean of the Poisson distribution as a product of two parameters in which the second parameter is a random variable drawn from a gamma distribution with a mean of 1. (Recall that the mean of a gamma distribution is given by the ratio of its two parameters which in this case are same.) As noted previously it will turn out that μ appearing in the Poisson "parameter" is the mean of a negative binomial distribution. Hence the attraction of this formulation is that it makes modeling the negative binomial mean in a regression setting easy because it appears as an explicit parameter in our model.
- I show that these starting assumptions lead to exactly the same negative binomial distribution we obtained above. Parroting what we did above we again use conditional probabilties to find the marginal distribution of X. This time
is the density function for a gamma distribution in which the shape and scale parameters are equal, i.e.,
.

- Compare the integrand with the formula for the gamma distribution. The two terms of the integrand look like two of the terms of the gamma distribution—if we make the identifications:
and
. All that's missing are the corresponding
and
terms which with the new identifications would become
and
. I multiply and divide the integrand by each of these terms so that the integral remains unchanged.

- The integral that remains is just the integral over the domain of a gamma distribution with parameter identifications
. But as we demonstrated above, such an integral integrates to 1 because it's a probability density.

- Thus we are left with the following after a little algebra.

- This is exactly the same result obtained above. It is the ecologist's parameterization of the negative binomial distribution with α playing the role of θ.
The Negative Binomial as a Model of True Contagion
- There is yet another ecological motivation for the negative binomial distribution. It turns out the negative binomial distribution is the limiting form for something called the Polya-Eggenberger urn model. We'll explore what this means next time.
Cited References
- Durham, Catherine A., Iain Pardoe, and Esteban Vega. 2004. A methodology for evaluating how product characteristics impact choice in retail settings with many zero observations: An application to restaurant wine purchase. Journal of Agricultural and Resource Economics 29(1): 112–131.
- Hilbe, Joseph M. 2007. Negative Binomial Regression. Cambridge University Press.
Course Home Page