Lecture 4 —Wednesday, January 18, 2006
What was covered?
- Derivation of Poisson probability mass function
- Negative binomial distribution as defined in math books
Terminology Defined
Poisson Distribution
- Last time we stated the three assumptions of the Poisson distribution (more properly, the assumptions of a homogeneous Poisson process). Today we derive its probability mass function. For reference the three assumptions are listed briefly below.
- Events occur at a constant rate λ that is constant both in time and space—homogeneity.
- Events occurring in one interval have no effect on events occurring in a second interval, as long as the two intervals are disjoint—independence.
- The probability of observing two or more events in an interval, when the interval in question is very small, is approximately zero.
- Suppose Nt is a Poisson random variable with parameter λ. This means that in a time period of length t (equivalently replace time with length, area, or volume) we expect on average to observe λt events. More formally, we state this as
.
Derivation of the Poisson Probability Mass Function
- Consider an interval of length t in which a homogeneous Poisson process Nt is operating. By homogeneity let it be the interval [0, t]. Divide this interval into n equal subintervals.

Since each interval is of length
we expect on average to encounter
events in each interval.
- Choose n in such a way that the following two requirements are satisfied.
- n is large enough so that
, and
- n is large enough so that each interval contains either 0 or 1 events.
- We know that the second requirement is possible from assumption 3. (When n is made large the interval is made small and hence the probability of observing two or more events in that interval will become negligible.)
- By assumption 2, events occurring in one interval are independent of events occurring in any other interval. Furthermore by construction, the number of events we see in each interval is either 0 or 1. Hence the intervals correspond to independent Bernoulli trials. What is p, the probability of success, in each interval?
- We showed last time that if X~Bernoulli(p), then E(X) = p. But in each interval we have
. Thus it follows p =
. Note: We've guaranteed that this a probability because we chose n large enough so that
. Hence p =
is a number between 0 and 1 as required.
- Let
denote the outcomes of the n Bernoulli trials. Then clearly
. Since Nt is the sum of n independent Bernoulli random variables with parameter p, it has a binomial distribution with parameters n and p. Thus we have

- So is this the answer? Well, not quite. The problem is that these really aren't Bernoulli trials. The reason is that as long as each interval has finite length the probability of seeing two or more events in that interval is not zero as we've been assuming.
- For example, take any interval and divide it in half so that now we have two intervals each of length
. In each interval we now have
. So the probability of seeing one event in each of these intervals is, by independence,
. This probability is small when n is large, but not zero. But if we observe one event in each of these intervals then we've seen two events in the original interval. But that's not allowed for Bernoulli trials.
- The solution is to shrink the intervals down to nothing. We can do this by letting the number of intervals become infinite.
Limiting Distribution
- Before calculating
I rewrite our formula in such a way that the limiting behavior becomes more obvious. I begin by writing out the binomial coefficient and splitting up the terms containing exponents.

- In what remains of the numerator of what was the binomial coefficient there are k terms being multiplied together (count em!). The next term has an nk in its denominator, a total of k copies of n. I place one copy of n under each of the k former binomial coefficient terms.

- Now we're ready to take the limit as
.
I look at each group of terms separately.
because in each term we're subtracting from 1 a term which goes to zero as n gets big.
because this term does not depend upon n.
because inside the parentheses we have an expression approaching 1 and 1 raised to a fixed power is 1.
- The term
is more problematic. Like the last term, the expression inside the parentheses is approaching 1. Unlike the last term the exponent is changing too, it's approaching infinity. So we have a number close to 1 being raised to a very large power. But if you take a number slightly bigger than 1 and multiply itself enough times you can get a big number. So it's not clear what's going on here. This is an example of what's called an indeterminate form. It's actually a very famous indeterminate form and the calculation of this limit is usually taught in a second semester calculus class. There you learn that
and with a little algebra applied to our expression you can therefore show that


- Thus if Nt is a Poisson random variable with rate parameter λ, its probability mass function is given by

Mean and Variance of a Poisson Random Variable
- To calculate the mean we would need to compute
which turns out not to be hard, but there is an easier way.
- Before taking the limit as
of Nt in the above derivation, Nt had a binomial distribution with parameters n and p =
. But we know the mean and variance of a binomial random variable. It's just np and np(1–p). Thus we have taking the limits of these expressions

because in the last expression
.
- Observe that a Poisson random variable is a random variable in which the mean and variance are equal. In particular, as the mean increases the variance will increase. Thus in a regression setting if we are fitting a regression model to the mean of a distribution where the distribution is Poisson, the spread around the regression line will increase with the mean. Thus the Poisson model is a model for heteroscedasticity!
Alternate Formula for a Poisson Probability
- In applications it is often the case that t in the Poisson formula is being held constant at some value. This would be the case for example if we were obtaining counts from quadrats all of which were the same size. In such a case we typically suppress all reference to t and say that X is a Poisson random variable with parameter λ, and write

or because λ is also the mean (when reference to t is suppressed)
- Comment: Since the Poisson distribution is a probability distribution, it must be the case that when we sum over all possible probabilities we get 1. For the Poisson that would mean

Observe that
. Can you finish the argument?
Negative Binomial Distribution
- The probability mass function of the negative binomial distribution comes in two distinct versions. The first one is the one that appears in every introductory probability textbook; the second is the one that appears in books and articles in ecology. Although the ecological definition is just a reparameterization of the mathematical definition, the reparameterization has a profound impact on the way the negative binomial distribution gets used. We'll begin with the mathematical definition.
- Suppose we have a sequence of independent Bernoulli trials in which the probability of a success on any given trial is a constant p. Let Xr denote the number of failures that are endured before r successes are achieved. Then Xr is said to have a negative binomial distribution with parameter p (and r).
- The negative binomial is a two-parameter distribution, but like the ordinary binomial one of the parameters, in this case r, is usually treated as known.
- From an ecological standpoint this definition is rather bizarre and except for modeling the number of rejections one has to suffer before getting a manuscript submission accepted for publication, it's hard to see how this distribution could possibly be useful. Stay tuned!
Probability Mass Function
- Let Xr be a negative binomial random variable with parameter p. Using the definition given above let's calculate
, the probability of experiencing x failures before r successes are observed.
- Note: The change in notation from k to x is deliberate. Unfortunately in a number of ecological textbooks the symbol k means something very specific for the negative binomial distribution so I don't want to use it in a generic sense here.
- If we experience x failures and r successes, then it must be the case that we had a total of x + r Bernoulli trials. Furthermore, we know that the last Bernoulli trial resulted in a success, the rth success, because that's when the experiment stops.

- What we don't know is where in the first x + r – 1 Bernoulli trials the x failures and r – 1 successes occurred. Since the probability of a success is a constant p on each of these trials, we're back in the binomial probability setting where the number of trials is x + r – 1. Thus we have the following.

- So we're done. Note: it's a nontrivial exercise to show that this is a true probability distribution, i.e.,

Mean and Variance
- I'll just state the results.

- Next time we'll derive the negative binomial probability mass function that ecologists actually use.
Course Home Page