Writing functions in R

R for palaeobiologists: Workshop and Hackathon

Function basics

What is a function

takes input –> does something –> returns output

mean(c(1, 2, 3))
[1] 2


A function needs a name, arguments in (), and a body in {}


subtract <- function(arg1, arg2) { 
  arg1 - arg2 
} 


subtract(2, 1)
[1] 1

Why do we need functions

  • Readability
  • Organisation
  • Modularity
  • Reusability

Imagine calculating the mean without standard functions like mean or sum:

  data <- c(1,2,3)
  total <- 0
  count <- 0
  for (value in data) {
    total <- total + value
    count <- count + 1
  }
  total/count
[1] 2

Arguments

Arguments need to be provided in the correct order, or specified by name:

subtract(2, 1)
[1] 1


subtract(1, 2)
[1] -1


subtract(arg2 = 1, arg1 = 2)
[1] 1

Default values

Make function use more convenient, can hide complexities.

subtract(2)
Error in subtract(2): argument "arg2" is missing, with no default


This will work if we set a default for arg2:

subtract <- function(arg1, arg2 = 1) {
  arg1 - arg2
}


subtract(2)
[1] 1


Ellipsis (‘…’)

Additional, optional arguments can be allowed by using ‘…’ as the last argument:

my_plot <- function(arg1, arg2, ...) {
  plot(arg1, arg2, ...)
}


my_plot(2, 1, col = "red", pch = 17, cex = 2)

Return

A function generally should return something, but this does not:

subtract <- function(arg1, arg2) {
  result <- arg1 - arg2
}
subtract(2,1)


Return explicitly with return, or place return value at the end of the function:

subtract <- function(arg1, arg2) {
  result <- arg1 - arg2
  return(result)
}
subtract(2,1)
[1] 1


Return multiple objects

just_return <- function(arg1, arg2) {
return(arg1)
return(arg2)
}
just_return(2, 1)
[1] 2

This did not work as intended. R functions only return one object. Instead use lists or other data structures:

just_return <- function(arg1, arg2) {
return(c(arg1, arg2))
}
just_return(2, 1)
[1] 2 1

Binary operators

Standard function syntax:

sum(c(1,2))
[1] 3

Operator syntax:

1 + 2
[1] 3

Most binary operators come in %:

3 %in% c(1,2,3)
[1] TRUE


Custom binary operators – let’s define an operator for “not in”:

`%!in%` <- function(x, y) !(x %in% y)
3 %!in% c(1,2,3)
[1] FALSE

Control structures – if

if a condition is true, do something.

if (1 + 1 == 2) print("True")
[1] "True"


add_or_subtract <- function(arg1, arg2, operation) {
 if (operation == "add") {
   result <- arg1 + arg2
 }
 if (operation == "subtract") {
   result <- arg1 + arg2
 }
 result
}
add_or_subtract(2,1,"add")
[1] 3

Control structures – else

else instructs what to do when the if condition is not met.

if (1 + 1 == 3) print("True") else print("False")
[1] "False"


add_or_subtract <- function(arg1, arg2, operation) {
 if (operation == "add") {
   result <- arg1 + arg2
 } else {
   result <- arg1 - arg2
 }
 return(result)
}
add_or_subtract(2,1,"subtract")
[1] 1

Control structures – switch

Instead of many if and else statements, try switch

fossil_description <- function(fossil) {
 switch(fossil,
  ammonite = "coiled shell",
  Tyrannosaurus = "serrated teeth",
  Lepidodendron = "scaly bark", 
  "not a fossil"
 )
}
fossil_description("Tyrannosaurus")
[1] "serrated teeth"
fossil_description("Lewis")
[1] "not a fossil"

Control structures – for loops

Loops are used for repeating similar actions multiple times. for loops iterate over a set of values. The iterator (i) changes with every iteration of the loop:

for (i in c(1,2,3)) print(i)
[1] 1
[1] 2
[1] 3

To generate sequences of integers, we can use seq_len. Let’s make a function:

print_repetitions <- function(n) {
 for (i in seq_len(n)) { 
   print(i)
 }
}
print_repetitions(2)
[1] 1
[1] 2

Control structures – while loops

while loops repeat a task until a condition is no longer met.

add_until_4 <- function(x) {
  while(x < 4) {
    x <- x + 1
    print(x)
  }
}
add_until_4(1)
[1] 2
[1] 3
[1] 4

Exercise 1 - Function for latitudinal binning

Create a function that can sort a data.frame into hemispheres. That is, we want a new column that identifies the hemisphere of each entry of the data set. As an exemplary data set, we will use the reefs data from palaeoverse here and in the next exercises.