Due Date: Friday, April 21, 2006
The file WaterUsageData.csv contains the amount of water used over time during the production of landscape trees. This is a comma-delimited text file in which the variable names appear in the first row.
This data set comes from Schabenberger and Pierce (2002), p. 512. There are two age classes of trees (AGE = 1 or 2) and two species (SPECIES = 1 or 2). Water usage is recorded in the variable WU and TIME records the day of the measurement (measured as the number of days since the first of the year). The variables TREE, TREECNT, and ID all identify the individual tree from which the measurement was taken and can be considered equivalent.
The remaining variables can be ignored. The goal of this assignment is to develop a model of water usage as a function of time for individual trees.
Question 1 Restrict the data set so that you're only looking at trees of species 2 and in age class 2. For this restricted data set generate a customized lattice plot that resembles the one shown in Fig. 2 of Lecture 43. Each panel should display a different individual tree.
Hint 1: Type ?panel.functions to see the various panel functions that are available for drawing lines and curves. You will want to use panel.abline for the linear regression, panel.loess for the lowess curve, and panel.curve for the quadratic regression.
Hint 2: Within the panel function of xyplot fit the quadratic regression model in a separate line using x and y as generic variables and saving the result to a variable. Then within the panel.curve function you can reference the coefficients of the regression by applying the coef function to the model object you created. The expression argument of panel.curve can just be a polynomial equation in which you use coef to insert the correct coefficients where needed and you use x as the generic variable. You don't need function(x) to define the expression. The remaining default settings for panel.curve will work except you'll want to set the color yourself and perhaps the line type and/or line width.
Hint 3: For the key you can steal most of the settings from the example I did in lab 12 (lecture 47) for the larval development multilevel model. Other than details involving labels, colors, line types, etc. the only major change you'll need to make is to replace the space argument I used with the corner argument in order to get the key to appear where I ask you to place it. Type ?xyplot to see the options for key. To get the key to appear in the spot where the panels are missing you need to specify values for x=, y=, and corner=. I suggest using the bottom left for the corner argument, indicated by c(0,0), and then choosing values for x and y that are less than 1 (1 corresponding to the top and right edges) until you get the key positioned correctly.
Hint 4: To get the tree numbers to appear as labels in the panel strips at the top of each graph you'll first need to make a factor out of the numeric variable TREE. Be sure to use this factor variable then as the grouping variable in the xyplot function.
Question 2 Based on the graphical display of Question 1 a quadratic model looks good. Fit a quadratic level-1 model to the full data set (and not just the ten trees you used to produce the graph). This level-1 model has three parameters which means that there are potentially three level-2 equations that can have random effects. As a start fit three models, one for each parameter, such that in each model you allow a different level-1 parameter to be random, but the remaining parameters are fixed. Decide which of these three models is best.
Hint 5: Remember that an intercept will automatically be added by default as a random effect whenever another variable is specified in the random argument. Thus when you specify the linear and quadratic parameters as random be sure to explicitly remove the intercept term from the random argument.
Question 3 To the model you chose as best in Question 2 try adding the remaining random effects. There are three possible choices here and hence three models that can result. Decide which of these three models is best.
Question 4 Since the predictor is time we should check to see whether the default level-1 residual correlation structure is adequate. Try adding a corCAR1 structure and see if the fit is improved. The correlation model specified by corCAR1 is
where s is the amount of time that has elapsed between observations.
Hint 6: Because the time measurements are not equally spaced, the corAR1 model we used in lab is not appropriate here. corCAR1 specifies a continuous autoregressive correlation structure that uses the actual values of time rather than the integer values that are used by corAR1. Enter ?corCAR1 at the console to see the options. You will need to fill in the form argument by specifying TIME as the covariate along with the appropriate grouping factor, but the value argument can be left unspecified.
Schabenberger, Oliver and Francis J. Pierce. 2002. Contemporary Statistical Models for the Plant and Soil Sciences. CRC Press: Boca Raton, FL.
| Jack Weiss Phone: (919) 962-5930 E-Mail: jack_weiss@unc.edu Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516 Copyright © 2006 Last Revised--April 13, 2006 URL: http://www.unc.edu/courses/2006spring/ecol/145/001/docs/assignments/assign11.htm |