Assignment 5

Due Date: Friday, February 24, 2006

Comparing Models with AIC

Data 

slugsurvey.txt (from the web site of the textbook Statistical Computing by Michael Crawley)

The Problem  

Hints

  1. The square root transformed model should be handled in the same way the log-transformed model was handled in class. As was the case there you will need to write a function to fit the model and another function for calculating the loglikelihood in terms the original response. The formula for the likelihood you will want to use in your loglikelihood function appears at the end of the notes to Lecture 18.
  2. Added 2/21/06. In your square root model you would think it would not be necessary to first add a constant to the response variable because unlike the logarithm the square root function is defined at zero. BUT, in the likelihood expression in which we back-transform to the original response this would lead to division by zero. Thus in order to calculate the likelihood for comparison with other models we do need to add a constant. For consistency this should be done both in the negative loglikelihood function you provide to nlm and the likelihood function you write to back-transform things in terms of the original response.
  3. For the four NB models I recommend setting up your functions just like the ZIP functions we wrote in class. Begin by creating the field.dummy variable and then include two separate lines that define mu and theta in terms of the components of the vector p and perhaps the field.dummy variable. Unlike the ZIP model you won't need to split the data into zero terms and other terms nor will you need to use the ifelse construction we used. The one line you used in the function you wrote to fit the NB model in Assignment 4 can be adapted here. Thus your four NB functions should only differ in the mu and theta lines. The rest of the code will be the same for each function.
  4. To produce the AIC table I ask you to turn in, just use the AIC.func code we created in class on Tuesday (and will also appear in the notes to that class). To use this function you will only need to change the list of model names and the list of models that you use as input arguments to the function.
  5. For the goodness-of-fit test of the model you select, just collapse the data into a single set of count categories. It is not necessary (for this exercise anyway) for you to assess the fit separately in the two field types or by concatenating the counts from the two field types into one long vector. Of course, if one of the models that allows parameters to differ across field types ends up being the best model you will need to fit the model separately for the two field types and obtain separate expected frequencies. Having done so you may then combine the results into a single set of expected frequencies for your goodness of fit test.
  6. Because n = r in your data, this amounts to just averaging the two sets of probabilities.

  7. On Tuesday, Feb 21 we will revisit the slug data set and attempt to refit some of these models as generalized linear models. You may wish to use the results we obtain then as a further check on your work.

Cited Reference

Burnham, K. P. and D. R. Anderson. 2002. Model Selection and Multimodel Inference. Springer-Verlag, New York.

Course Home Page


Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516
Copyright © 2006
Last Revised--Feb 16, 2006
URL: http://www.unc.edu/courses/2006spring/ecol/145/001/docs/assignments/assign5.htm