#read in data
corals<-read.table('http://www.unc.edu/courses/2006spring/ecol/145/001/data/assign1/assignment1.csv', header=TRUE, sep=',')
We are told that each reef appears as many times in the data set as it was visited. To get a count for the number of times each reef was visited, use the table function on REEF_NAME.
table(corals$REEF_NAME)
19131S 19138S 19159S 20104S 21529S 22088S AGINCOURT BORDER IS
6 6 6 6 6 6 6 6
BROOMFIEL CARTER CHICKEN CHINAMAN DAVIES DECAPOLIS DIP EAST CAY
6 6 6 6 6 5 6 6
FITZROY I GANNET CA GREEN ISL HASTINGS HAVANNAH HAYMAN IS HORSESHOE HYDE
6 6 6 6 6 6 6 6
JOHN BREW LADY MUSG LANGFORD LINNET LIZARD IS LOW ISLES MACGILLIV MACKAY
6 6 6 6 6 6 6 6
MARTIN (1 MICHAELMA MYRMIDON NO NAME NORTH DIR ONE TREE OPAL (2) PANDORA (
6 6 6 6 6 6 6 6
REBE RIB ST. CRISP THETFORD TURNER CA WRECK ISL YONGE
6 6 6 5 6 6 6
From the list we can see that there are two reefs who were visited five times, Decapolis and Thetford.
An expression such as table(corals$REEF_NAME)<6 is a Boolean expression that evaluates to TRUE or FALSE.
table(corals$REEF_NAME)<6
19131S 19138S 19159S 20104S 21529S 22088S AGINCOURT BORDER IS
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
BROOMFIEL CARTER CHICKEN CHINAMAN DAVIES DECAPOLIS DIP EAST CAY
FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
FITZROY I GANNET CA GREEN ISL HASTINGS HAVANNAH HAYMAN IS HORSESHOE HYDE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
JOHN BREW LADY MUSG LANGFORD LINNET LIZARD IS LOW ISLES MACGILLIV MACKAY
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
MARTIN (1 MICHAELMA MYRMIDON NO NAME NORTH DIR ONE TREE OPAL (2) PANDORA (
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
REBE RIB ST. CRISP THETFORD TURNER CA WRECK ISL YONGE
FALSE FALSE FALSE TRUE FALSE FALSE FALSE
The expression table(corals$REEF_NAME) returns a vector. To access its elements I can enter something like
table(corals$REEF_NAME)[c(2,4)]
19138S 20104S
6 6
table(corals$REEF_NAME)[table(corals$REEF_NAME)<6]
DECAPOLIS THETFORD
5 5
I plot disease prevalence against WSSTA, adding a linear regression and a lowess curve just as we did in class.
plot(corals$WSSTA,corals$PREV_1, xlab='WSSTA',ylab='Disease Prevalence')
abline(lm(corals$PREV_1~corals$WSSTA))
lines(lowess(corals$PREV_1~corals$WSSTA),lwd=2,col=3)
Clearly the scale is all wrong. After some experimentation, I settle on a y-axis that ranges from 0 to 20.
plot(corals$WSSTA,corals$PREV_1,xlab='WSSTA',ylab='Disease Prevalence',ylim=c(0,20))
lines(lowess(corals$PREV_1~corals$WSSTA),lwd=2,col=3)
abline(lm(corals$PREV_1~corals$WSSTA))
| Plot before rescaling | Plot with y-axis rescaled from 0 to 20 |
|---|---|
![]() |
![]() |
The lowess curve reveals that disease prevalence is not monotonic with WSSTA. Prevalence increases as WSSTA increases from 0 to 5, but then decreases afterword. This suggests perhaps a quadratic realtionship is more appropriate here.
library(date)
plot(as.date(as.character(corals$DATE)), corals$LAT_DD, xlab='Date', ylab='Latitude')
As the plot shows the complete confounding of space and time has continued for all six years of the study. This will have a profound effect on the way the data will need to be analyzed.
summary(corals$WSSTA)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 1.000 3.000 6.064 8.000 30.000
Summary returns basic descriptive statistics for the variable WSSTA. We see the minimum value is 0 and the maximum value is 30. The first, second, and third quartiles are 1, 3, and 8 respectively. These are the numbers for which 25%, 50%, and 75% of the observations are less than or equal to. The mean is 6.064.
boxplot(corals$WSSTA)
The boxplot with various features identified is shown to the right.
Method 1
The table function can be used to obtain a list of counts.
table(corals$PREV_1)
0 1 2 3 4 5 6 7 8 9 11 12 13 14 15 16 17 18 19 20 21
109 41 27 17 10 4 5 2 5 7 2 3 4 1 1 2 2 1 3 3 1
22 23 24 26 27 28 30 31 32 34 35 37 49 51 60 75 77 90 92 101 106
2 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1
149 221 315 336 343
1 1 1 1 1
This is a vector of numbers and I can extract the first number, the number of zero counts, using bracket notation as follows.
table(corals$PREV_1)[1]
0
109
Now if I add up all the numbers produced by table, I obtain the total number of observations made on all the reefs.
sum(table(corals$PREV_1))
[1] 280
All that's left to do is to divide these two numbers and multiply by 100.
table(corals$PREV_1)[1]/sum(table(corals$PREV_1))*100
0
38.92857
So 38.9% of the prevalence observations were zero.
Method 2
Following the logic used in answering the bonus problem in Problem 1, another way to obtain the number of zero counts is by the following.
sum(corals$PREV_1==0)
[1] 109
The sum function coerces Boolean TRUE values to 1, and FALSE values to 0. Therefore the sum is just the number of zeros. Then the rest of the argument precedes as above.
| Jack Weiss Phone: (919) 962-5930 E-Mail: jack_weiss@unc.edu Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516 Copyright © 2006 Last Revised--Jan 30, 2006 URL: http://www.unc.edu/courses/2006spring/ecol/145/001/docs/solutions/assign1.htm |