R Snippets

Here are some code snippets I’ve written in the course of my research. Some are particular to ecology but most are of general utility. No doubt they are amateurish by the standards of real programmers, but I hope they can be usefule to you nonetheless; at least save you from writing your own.

Reforming Data

Replace multiple partial string matches

This will replace many unique elements in a character vector with just a few. For instance, the USDA database PLANTS includes growth habit with overlapping values, such as [Tree]​[Tree, Shrub]​[Shrub, Vine]. Graphing or analyzing all 30+ combinations is cumbersome, so replacematch() could simplify the above to [Tree]​[Tree]​[Shrub].

Each element of x is matched (via grep) to the closest value in patterns and replaced by it; earlier patterns take priority. By default, elements with no match are left as is, but alternately can be replaced with another string or NA. This distinction is useful since NA values are typically omitted in graphs and analyses.

replacematch <- function(x, patterns, no.match=NULL, ...) {
  y <- as.character(x)
  for(i in patterns) y <- replace(y, grep(pattern=i, x=y, ...), i)
  if(!is.null(no.match)) y[is.na(match(y, patterns))] <- no.match
  return(y)
}

Merge dataframes in a list

I often run lapply() where the function output is a dataframe; but a list of dataframes is usually not useful. This compresses them into one. It relies on column names, so their order does not matter. If the dataframes have different columns, merge.df() uses all of them, filling in with NAs where necessary.

merge.df <- function(x) {
  if(!is.list(x) | !unique(sapply(x, mode))=="list") {
    stop("x must be a list of dataframes") }

  cnames <- unique(unlist(lapply(x, names)))
  cids <- sapply(x, function(i) match(cnames, names(i)))
  merged <- lapply(1:length(cnames), function(i) {
    unlist(lapply(1:length(x), function(j) {
      cid <- cids[i,j]
      if(is.na(cid)) rep(NA, dim(x[[j]])[1]) else x[[j]][,cid]
    }))
  })
  names(merged) <- cnames
  return(as.data.frame(merged))
}

Convert indices from numerical to logical

The base function which() creates a numerical index vector from a logical one; this does the reverse. If the last value does not coincide with the length of the object you’re indexing (which is rare) you must specify that length.

unwhich <- function(x, l=max(x)) { 1:l %in% x }

Random dimensions of a volume

This function will generate random numbers that have the specified product. Use if you have a known volume, like species niche size, but want to randomly assign the dimension lengths in n-dimensional space, as on different resource axes.

vol.dims <- function(vol, dims=2) {
  l <- runif(dims)
  ls <- dims * (l / sum(l)) 
  (vol^(1/dims))^ls
}

Data Queries

Find first instances

You could easily have repeated species or plot IDs in a vector, and want the index for the first instance of every value.

find.firsts <- function(x) { match(unique(x), x) }

Just tell me how big

If you don't know if something is a vector or not, and want to see how big it is regardless, use size(). It will also recurse through lists, giving the size of each element.

size <- function(x, recurse=TRUE) {	
  if(is.recursive(x) & recurse) lapply(x, size)
  else { if(is.vector(x)) length(x) else dim(x) }
}

Most common element

This function returns the value of the most common element in an object; multiple elements if there is a tie.

most.common <- function(x) {
  count <- sapply(unique(x), function(i) sum(x==i, na.rm=TRUE))
  unique(x)[which(count==max(count))]
}

Distance matrix for a grid

Imagine you have a rectangular grid and want to produce a distance matrix for every cell. Normally you would generate a list of coordinates and then call dist(). However, converting the resulting distance object to a matrix is extremely slow for large grids, so I wrote rect.dist() which is faster; it also requires just the grid dimensions, rather than all coordinates.

For example, producing the distance matrix for a 100 × 60 grid using as.matrix(dist(coordinates)) required 447 seconds on my MacBook, while rect.dist(100, 60) needed only 61.

rect.dist <- function(x, y=x) {
  l <- x * y
  d <- matrix(NA, l, l)

  for(i in 1:x) {
    for(j in 1:y) {
      d[, y * (i - 1) + j] <- sqrt(rep((i - 1:x)^2, each=y) + rep((j - 1:y)^2, x))
    }
  }
  rownames(d) <- paste(rep(1:x, each=y), 1:y, sep="-")
  colnames(d) <- rownames(d)
  return(d)
}