Lecture 3 (lab) —Tuesday, January 17, 2006

What was covered?

R functions and commands demonstrated

R function options

R packages used

Overview

Data Entry and Manipulation

#read in data from Web
> corals<-read.table('http://www.unc.edu/courses/2006spring/ecol/145/001/data/lab1/corals.csv',
header=TRUE, sep=',')
> corals[1:5,]
REEF_NAME SECTOR SHELF LAT_DD LON_DD DATE YEAR YEARII WEEK WSSTA TSASUMS_SM PREV_1 CORAL_COVE INC
1 MACGILLIV CL M -14.64833 145.4910 7/24/2002 2002 3-Feb 30 3 0 2 20.08802 2
2 MARTIN (1 CL I -14.75530 145.3722 7/26/2002 2002 3-Feb 30 6 1 32 19.47236 31
3 LIZARD IS CL M -14.69147 145.4692 7/28/2002 2002 3-Feb 31 0 0 3 15.71672 2
4 NORTH DIR CL M -14.74355 145.5158 7/28/2002 2002 3-Feb 31 7 2 9 24.63439 7
5 NO NAME CL O -14.62787 145.6453 7/30/2002 2002 3-Feb 31 2 1 75 49.20807 26
> corals[1:5,2:3]
SECTOR SHELF
1 CL M
2 CL I
3 CL M
4 CL M
5 CL O
> corals[1:5,c(2,4)]
SECTOR LAT_DD
1 CL -14.64833
2 CL -14.75530
3 CL -14.69147
4 CL -14.74355
5 CL -14.62787
> corals[1:5,c("SECTOR","LAT_DD")]
SECTOR LAT_DD
1 CL -14.64833
2 CL -14.75530
3 CL -14.69147
4 CL -14.74355
5 CL -14.62787
> corals$SECTOR
[1] CL CL CL CL CL CL CL CL CL SW SW SW SW SW SW SW CB CB CB CB CA CA CA CA CA CA CA CA CA CA WH
[32] WH WH WH WH WH WH WH WH TO TO TO TO TO TO TO TO
Levels: CA CB CL SW TO WH
> SECTOR
Error: Object "SECTOR" not found
> attach(corals)
> SECTOR

[1] CL CL CL CL CL CL CL CL CL SW SW SW SW SW SW SW CB CB CB CB CA CA CA CA CA CA CA CA CA CA WH
[32] WH WH WH WH WH WH WH WH TO TO TO TO TO TO TO TO
Levels: CA CB CL SW TO WH
> detach(corals)
> SECTOR

Error: Object "SECTOR" not found

Plotting Data

Fig. 2 Scatter plot

We investigated whether white syndrome, an emerging disease of corals on the GBR was related to the number of warm temperature anomalies. WSSTA defines an anomaly as a week in which the ocean temperature exceeds the long-term (1985-2003) local mean temperature value for that calendar week by 1°C. Thus, a WSSTA is both a site- and week-specific measure of thermal stress. We quantified the thermal history of each reef by calculating the number of WSSTAs during the 52 weeks preceding each disease survey date. Our metric then allowed us to determine the number of weeks that the temperature was unusually warm for a particular reef in the year prior to the survey date.

The geographical locations of the reefs that make up their sample can be seen at the following link: australia.pdf. The graph you see there was created in R using the read.shape function from the maptools package and the plot function of base R. Source materials for the map are varioius shape and dbase data files I downloaded from an Australian government web site.

> names(corals)
[1] "REEF_NAME" "SECTOR" "SHELF" "LAT_DD" "LON_DD" "DATE" "YEAR"
[8] "YEARII" "WEEK" "WSSTA" "TSASUMS_SM" "PREV_1" "CORAL_COVE" "INC"
> plot(corals$WSSTA,corals$PREV_1)

Fig. 3 Scatter plot with axis labels

> plot(corals$WSSTA,corals$PREV_1,
xlab='WSSTA',ylab='Disease Prevalence')

> lm(corals$PREV_1~corals$WSSTA)

Call:
lm(formula = corals$PREV_1 ~ corals$WSSTA)Coefficients:
(Intercept) corals$WSSTA
-24.20 15.30

> summary(lm(corals$PREV_1~corals$WSSTA))

Call:
lm(formula = corals$PREV_1 ~ corals$WSSTA)Residuals:
Min 1Q Median 3Q Max
-141.121 -34.460 -5.403 23.597 198.879 Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -24.201 19.124 -1.265 0.212
corals$WSSTA 15.302 3.403 4.496 4.82e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 72.53 on 45 degrees of freedom
Multiple R-Squared: 0.31, Adjusted R-squared: 0.2946
F-statistic: 20.21 on 1 and 45 DF, p-value: 4.821e-05

Fig. 4 Regression line

> model1<-lm(corals$PREV_1~corals$WSSTA)
> model1<-lm(PREV_1~WSSTA,data=corals)
> abline(model1)

Evaluating the Regression Model—Lowess

Fig. 5 Lowess curve added

This function performs the computations for the LOWESS smoother (see the reference below). lowess returns a list containing components x and y which give the coordinates of the smooth. The smooth should be added to a plot of the original points with the function lines.

> lines(lowess(corals$PREV_1~corals$WSSTA),lwd=2,col=3)

R color codes

Fig. 6 R's numeric color codes

Fig. 7 Range limited using ylim

> plot(corals$WSSTA,corals$PREV_1, xlab='WSSTA',
ylab='Disease Prevalence', ylim=c(0,50))
> abline(model1)
> lines(lowess(corals$PREV_1~corals$WSSTA), lwd=2, col=3)

Evaluating the Regression Model—Residual Analysis

Fig. 8 Residual plot

Here i is the index of the observation.

Constant Variance and Mispecification

> plot(corals$WSSTA,residuals(model1))
> lines(lowess(residuals(model1)~corals$WSSTA), lwd=2,col=3)
> abline(h=0,lty=2,col=4)

Fig. 9 Line type values, lty=

Normality

> library(car)
> qq.plot(residuals(model1))

Accounting for Heteroscedasticity and Lack of Normality

> log(corals$PREV_1)
[1]  0.6931472 3.4657359 1.0986123 2.1972246 4.3174881 4.4998097 3.4011974
[8]  3.8918203 0.0000000 2.9957323 3.4339872 3.5263605 1.0986123 2.0794415
[15] 1.3862944 5.0039463 5.3981627 5.8171112 5.8377304 5.7525726 0.6931472
[22] 2.1972246 3.0910425 2.9444390 4.0943446 4.5217886 4.6634391 2.7080502
[29] -Inf -Inf 3.0910425 2.4849066 1.0986123 0.0000000 2.7725887
[36] 0.0000000 3.3322045 2.9957323 3.1354942 2.5649494 0.0000000 -Inf
[43] -Inf 0.6931472 1.0986123 1.3862944 -Inf

> lm(log(corals$PREV_1)~corals$WSSTA)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in foreign function call (arg 4)

> table(corals$PREV_1)

0 1 2 3 4 8 9 12 13 15 16 19 20 22 23 28 30 31 32 34
5 4 3 4 2 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1
49 60 75 90 92 106 149 221 315 336 343
1 1 1 1 1 1 1 1 1 1 1

Fig. 11 Log-transformed response

> model2<- lm(log(corals$PREV_1+1)~ corals$WSSTA)
> summary(model2)

Call:
lm(formula = log(corals$PREV_1 + 1) ~ corals$WSSTA)

Residuals:
Min 1Q Median 3Q Max
-3.2626 -1.1688 0.3321 1.2968 2.3470

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.66543 0.41904 3.974 0.000253 ***
corals$WSSTA 0.19965 0.07457 2.677 0.010325 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.589 on 45 degrees of freedom
Multiple R-Squared: 0.1374, Adjusted R-squared: 0.1182
F-statistic: 7.167 on 1 and 45 DF, p-value: 0.01033

> plot(corals$WSSTA,log(corals$PREV_1+1), xlab='WSSTA', ylab='Log(Prevalence)')
> abline(model2)
> lines(lowess(log(corals$PREV_1+1)~corals$WSSTA), lwd=2,col=3)
> lines(lowess(log(corals$PREV_1+1)~corals$WSSTA,f=.9), lwd=2,col=4)

> qq.plot(residuals(model2))
> plot(corals$WSSTA,residuals(model2))
> lines(lowess(residuals(model2)~corals$WSSTA), lwd=2,col=3)
> abline(h=0,lty=2,col=4)

    

Fig. 12 Residual plots for the log-transformed response

Data Structure

library(date)

> corals$DATE
[1]  7/24/2002 7/26/2002 7/28/2002 7/28/2002 7/30/2002 7/31/2002 7/31/2002
[8]  8/1/2002 8/2/2002 9/5/2002 9/6/2002 9/7/2002 9/8/2002 9/9/2002
[15] 9/10/2002 9/11/2002 9/16/2002 9/17/2002 9/18/2002 9/19/2002 10/24/2002
[22] 10/25/2002 10/26/2002 10/28/2002 11/1/2002 11/2/2002 11/3/2002 11/5/2002
[29] 11/7/2002 11/8/2002 11/25/2002 11/26/2002 11/27/2002 11/28/2002 11/30/2002
[36] 12/1/2002 12/3/2002 12/4/2002 12/5/2002 3/7/2003 3/8/2003 3/10/2003
[43] 3/12/2003 3/15/2003 3/18/2003 3/19/2003 3/21/2003
45 Levels: 10/24/2002 10/25/2002 10/26/2002 10/28/2002 11/1/2002 ... 9/9/2002

> as.character(corals$DATE)
[1]  "7/24/2002" "7/26/2002" "7/28/2002" "7/28/2002" "7/30/2002" "7/31/2002"
[7]  "7/31/2002" "8/1/2002" "8/2/2002" "9/5/2002" "9/6/2002" "9/7/2002"
[13] "9/8/2002" "9/9/2002" "9/10/2002" "9/11/2002" "9/16/2002" "9/17/2002"
[19] "9/18/2002" "9/19/2002" "10/24/2002" "10/25/2002" "10/26/2002" "10/28/2002"
[25] "11/1/2002" "11/2/2002" "11/3/2002" "11/5/2002" "11/7/2002" "11/8/2002"
[31] "11/25/2002" "11/26/2002" "11/27/2002" "11/28/2002" "11/30/2002" "12/1/2002"
[37] "12/3/2002" "12/4/2002" "12/5/2002" "3/7/2003" "3/8/2003" "3/10/2003"
[43] "3/12/2003" "3/15/2003" "3/18/2003" "3/19/2003" "3/21/2003"

> as.date(as.character(corals$DATE))
[1]  24Jul2002 26Jul2002 28Jul2002 28Jul2002 30Jul2002 31Jul2002 31Jul2002 1Aug2002
[9]  2Aug2002 5Sep2002 6Sep2002 7Sep2002 8Sep2002 9Sep2002 10Sep2002 11Sep2002
[17] 16Sep2002 17Sep2002 18Sep2002 19Sep2002 24Oct2002 25Oct2002 26Oct2002 28Oct2002
[25] 1Nov2002 2Nov2002 3Nov2002 5Nov2002 7Nov2002 8Nov2002 25Nov2002 26Nov2002
[33] 27Nov2002 28Nov2002 30Nov2002 1Dec2002 3Dec2002 4Dec2002 5Dec2002 7Mar2003
[41] 8Mar2003 10Mar2003 12Mar2003 15Mar2003 18Mar2003 19Mar2003 21Mar2003

> as.numeric(as.date(as.character(corals$DATE)))
[1]  15545 15547 15549 15549 15551 15552 15552 15553 15554 15588 15589 15590 15591
[14] 15592 15593 15594 15599 15600 15601 15602 15637 15638 15639 15641 15645 15646
[27] 15647 15649 15651 15652 15669 15670 15671 15672 15674 15675 15677 15678 15679
[40] 15771 15772 15774 15776 15779 15782 15783 15785

> plot(as.date(as.character(corals$DATE)), corals$LAT_DD, xlab='Date', ylab='Latitude')

> table(corals$WSSTA)

0 1 2 3 4 5 6 7 8 9 10 11 13
3 4 5 9 3 6 5 5 1 1  2  2  1

> sum(table(corals$WSSTA))
[1] 47

So we see there are 47 reefs total whose values range from 0 to 13 for the variable WSSTA.

> corals$WSSTA>=8
[1]  FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
[14] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
[27] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[40] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Fig. 14 Hot spots are colored red

> corals$LAT_DD[corals$WSSTA>=8]
[1] -14.78517 -21.46982 -23.48365 -23.88350 -23.24957 -16.77532 -16.38352

> #simplify our life and create a true date variable
> corals$numdate<-as.date(as.character(corals$DATE))
> names(corals)
[1]  "REEF_NAME" "SECTOR" "SHELF" "LAT_DD" "LON_DD" "DATE"
[7]  "YEAR" "YEARII" "WEEK" "WSSTA" "TSASUMS_SM" "PREV_1"
[13] "CORAL_COVE" "INC" "numdate"

> points(corals$numdate[corals$WSSTA>=8],corals$LAT_DD[corals$WSSTA>=8],col=2,pch=16)

Fig. 15 Options for pch= in plots

Fig. 16 High prevalence (blue) and high WSSTA (red)

> table(corals$PREV_1)

0 1 2 3 4 8 9 12 13 15 16 19 20 22 23 28 30 31 32 34 49
5 4 3 4 2 1 2  1  1  1  1  1  2  2  1  1  1  1  1  1  1
60 75 90 92 106 149 221 315 336 343
 1  1  1  1   1   1   1   1   1   1

> points(corals$numdate[corals$PREV_1>=60], corals$LAT_DD[corals$PREV_1>=60], col=4, pch=16)

> #now let's put them side-by-side
> par(mfrow=c(1,2))
>
#first temperature
> plot(as.date(as.character(corals$DATE)), corals$LAT_DD, xlab='Date', ylab='Latitude')
> points(corals$numdate[corals$WSSTA>=8],corals$LAT_DD[corals$WSSTA>=8], col=2, pch=16)
> mtext('Temperature Anomalies',line=.5,side=3)
>
#next disease
> plot(as.date(as.character(corals$DATE)), corals$LAT_DD, xlab='Date', ylab='Latitude')
> points(corals$numdate[corals$PREV_1>=60], corals$LAT_DD[corals$PREV_1>=60], col=4, pch=16)
> mtext('Disease Anomalies',line=.5,side=3)
>
#restore the default setting
> par(mfrow=c(1,1))

Fig. 17 Comparing temperature and disease anomalies

> objects()
[1] "corals" "model1" "model2"

Course Home Page


Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516
Copyright © 2006
Last Revised--Jan 20, 2006
URL: http://www.unc.edu/courses/2006spring/ecol/145/001/docs/lectures/lecture3.htm