Assignment 1

Due Date

Friday, January 27, 2006

Data Source

http://www.unc.edu/courses/2006spring/ecol/145/001/data/assign1/assignment1.csv

Included are six years of temperature data and disease prevalence for coral reefs in the Great Barrier Reef of Australia. The variables of interest are PREV_1 (prevalence of White syndrome as counts, i.e., number of infected reefs observed) and WSSTA (# of Weekly Sea Surface Temperature Anomalies that occurred in the previous year) as well as variables locating a reef's geographic position, LAT_DD and LON_DD, the date on which it was observed, DATE, and its name, REEF_NAME. The rest of the variables can be ignored. The data are in the form of a comma-delimited text file in which the variable names appear as the first row of the file.

Questions

For all questions submit the R code you used to produce your answers as well as your answer. Electronic submissions are welcome. Note: you can obtain help on a command by using the help function in R. For example, to see the options for R's plot function, type help(plot).

  1. In the full data set there are 48 different reefs (identified by the variable REEF_NAME) that were visited repeatedly (roughly once a year) over a 6-year period. Some reefs were missed in some years. Thus we have what are called unbalanced data. Give me one line of R code whose output makes it easy to identify which reefs were missed one or more times. Using the output identify those reefs. (Hint: The data set is an example of what an epidemiologist would call a person-period data set, or better yet here, a coral-period data set. Thus in the data set each coral reef has multiple records, one for each measurement time. One of the functions we used during Tuesday's class will do the job to answer this question.)

    BONUS: Can you figure out how to get R to list only the reefs that were not visited every year? This can be done with one additional line of code.
    Hint 1: The output of every R function can be assigned to a variable.
    Hint 2: The method we used to locate the hotspots (i.e., the way we subsetted the data in coloring points on a scatter plot) is relevant here.

  2. Plot disease prevalence (PREV_1) versus the temperature metric (WSSTA) using all six years of data (ignoring the data structure, i.e., treat all 280 observations as if they were independent. Add a linear regression line to the plot. Add a lowess smooth to the plot. Comment on what you see. You may need to change the vertical scale to see anything at all.
  3. We discovered in the data set that covered only one year that the samples were taken in such way that observations near each other spatially also were sampled at about the same time of year. When all six years of data are examined, does this pattern continue? (Provide evidence with a graph)
  4. In Tuesday's class we used the summary function of R to view detailed regression results from a linear model (lm) object. When summary is applied to other kinds of objects, the information you get is different. Try using summary on the WSSTA variable. Explain what each element in the output represents.
  5. A useful summary plot for comparing distributions across groups when there are too many data points to plot individually is the boxplot. In R the command is boxplot(variable) where the argument, that which goes inside the parentheses, can be the variable you wish a boxplot of. Produce a boxplot of WSSTA. Using either the web, a textbook, or whatever, explain everything you see in the boxplot. Your answer to question 4 may be helpful here in interpreting the boxplot. If you're totally at a loss as to what you're seeing, here is a journal article to look at (available online at UNC).
  6. Reese, R. Allan. 2005. Boxplots. Significance 2(3): 134–135.

  7. What percentage of the prevalence values (PREV_1) are zero? Ideally, write one line of R code to do the entire calculation for you.
    Hint 1: the sum function can be used to add up a list of numbers. The operator for division is / and for multiplication is *.
    Hint 2: If x is a vector then we can extract, e.g., the third element of x using the notation x[3]

Course Home Page


Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516
Copyright © 2006
Last Revised--Jan 17, 2006
URL: http://www.unc.edu/courses/2006spring/ecol/145/001/docs/assignments/assign1.htm