Assignment 3

Due Date

Friday, February 10, 2006

Organization of Song in Cardinals (Background)

The basic building block of all bird songs is the syllable. A bird such as the cardinal can make at least ten distinct sounds, or syllables. Individual syllables sung in rapid succession are said to belong to the same utterance, and a series of consecutive utterances is called a bout.

Lemon and Chatfield (1971) set out to characterize the variability in the number of utterances per bout for the North American cardinal. The subjects were male cardinals (Richmondena cardinalis) nesting near the campus of the University of Western Ontario. Over a period of several months, the songs of these birds were taped and analyzed with a sound spectrograph, a device that graphs the frequency of a bird's call as a function of time. The spectrograph of one of the syllables in a cardinal's repertoire, Song Type D, is shown below.

During the course of the study, a total of 250 bouts of Song Type D were recorded. The table below shows the frequency distribution of the number of utterances per bout.

# Utterances/Bout
Frequency
1
132
2
52
3
34
4
9
5
7
6
5
7
5
8+
6
Total
250

So what we have here is the number of times a bird calls in succession before it "decides" to shut up. The question of interest is whether or not this "decision" can be modeled entirely as a random process.

Recall that our first definition of the negative binomial distribution (before turning to the ecologist's definition) was in terms of the number of failures before r successes were obtained. (Note: a completely equivalent way of viewing the negative binomial distribution is in terms of the total number of trials needed to achieve r successes.) If we fix r to be 1, so that we stop when we obtain 1 success, then we have a special case of the negative binomial distribution that occurs commonly enough to be given its own name. It's called the geometric probability distribution and it has a single parameter p, the probability of success. Although not as ecumenical as the binomial and the Poisson, the geometric distribution is a discrete probability distribution that might be an appropriate null model here. Let p be the probability that the current utterance will be the last utterance in a bout. If we assume that p is constant across utterances and that the utterances in a bout are independent of each other, then we are led to the geometric probability model:

where X is the random variable representing the number of utterances (including the last) in a bout. Essentially, we view the last utterance in a bout as a "success" and we ask how many independent Bernoulli trials (trials with a 0-1 outcome) are required (i.e., all failures followed by a single success) before a bout terminates. If we were to use our old negative binomial notation (in terms of number of failures) where r = 1, we would write this probability as follows.

R has both negative binomial probability distributions functions as well as geometric probability distribution functions. Both of these functions are formulated in terms of the "number of failures" rather than the number of trials, so in our notation above they return probabilities for X1 not X.

Suppose at each trial there is a probabilityp = 0.4 that this call will be the last. If we want the probability that a bird calls a total of 10 times, then using R we would calculate the following.

= dgeom(9,.4) = dnbinom(9,1,.4)

#using geometric
> dgeom(9,.4)
[1] 0.004031078
#using negative binomial
> dnbinom(9,1,.4)
[1] 0.004031078
#using the formula
> (1-.4)^9*.4
[1] 0.004031078

NB. Please note that the data in the table above record the values for X, but that the R probability functions calculate the probabilities of X1. You will need to make an appropriate conversion from X to X1 when you use the R functions in answering the questions below.

The Questions

(1) Using the formula for geometric probability model given above, write down the likelihood for the bird utterance data that appear in the table above. Thus your answer should involve the actual data values shown in the table above. You do not need to simplify your expression in any way.

Hint 1 : This problem is slightly more complicated than the one we considered in class in the following sense. The last category in the table is not 8, but 8+, meaning the researcher stopped recording the bird after it made it 8 utterances. In other words, the recorded counts here, 6, could correspond to 8 utterances or 9 utterances or 10 utterances or etc. or any combination of these. So the best you can say for this category is that it represents 8 or more. Your term in the likelihood for this category must reflect this fact.

(2) Write a function in R to calculate the loglikelihood for the data. You may use either the geometric probability functions or the negative binomial probability functions. Your function should require a single argument that corresponds to the value p.

Hint 2 : After making the adjustment from X to X1, you will also need to deal with the last category. I claim that the dgeom function of R will handle the first 7 categories, but for the last category you will need to use the pgeom function in some way. I suggest that the way you should use it is the same way we used pchisq and ppois to calculate tail probabilities in Tuesday's class.
Hint 3 : When you sum the terms in the loglikelihood you will need to treat the last category separately since it will have a different formula from the rest. This will require a modification of the simple formula we used in class for the Poisson loglikelihood. There are lots of ways to do this. One way would be to make the sum a sum of two sums--the first sum corresponding to the first 7 categories and the last sum corresponding to the last category.

(3) Graph the loglikelihood function you obtained in (2) and use the graph to approximate the maximum likelihood estimate of p.

(4) Use one of R's numerical optimization functions to approximate the maximum likelihood estimate of p.

(5) Construct both the 95% Wald confidence interval and the 95% profile likelihood confidence interval for p.

(6) Calculate the expected frequencies of the #utterances/bout under the estimated geometric probability model using the maximum likelihood estimate of p you obtained in (4). Plot the observed and expected frequencies together in the same plot.

(7) Carry out a Pearson chi-squared lack-of-test for the model results. Do this two ways.

  1. By pooling cells that don't meet the minimal size criteria (if necessary) and then using the asymptotic distribution of the test statistic to assess fit.
  2. Without pooling (except for the expected probabilities in the tail of the distribution) and then using the simulate.p.value option of the chisq.test function of R to obtain a Monte Carlo-based p-value.

Does the model fit?

Cited Reference

Lemon, Robert E. and Chatfield, Christopher. 1971. Organization of song in cardinals. Animal Behaviour 19: 1–17.

Course Home Page


Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516
Copyright © 2006
Last Revised--Jan 31, 2006
URL: http://www.unc.edu/courses/2006spring/ecol/145/001/docs/assignments/assign3.htm