Creating your own functions

To write your own function, you need to use the function function(). In the brackets () you specify argument(s) that will be used in the function (with our without defaults), and in curly brackets { } you specify what the function should do, referring to the argument(s).

The general form is:

myFun <- function(arg1, arg2) {
  ## Here you type expressions that use the arguments
}

Example of custom function

Each line inside the function is an object assignment, a function call, a subsetting, a conditional statement, an if/else statement, a for loop, etc. - basically anything you have now learned how to do in R that you want the function to do!

Below is an easy example that calculates the mean of two values (x and y):

mean_xy <- function(x, y){
  (x + y)/2
}

This function can be used the same way as how you have been using functions before:

mean_xy(2,6)
[1] 4

Or:

mean_xy(x = 2, y = 6)
[1] 4

Use return() in a function

To have a function output something, you must return something. Either the value of the last command is returned (as in mean_xy) or you can use return().

mean_xy <- function(x, y){
  z <- (x + y)/2
  return(z)
}

mean_xy(x = 2, y = 6)
[1] 4

Here are a few other examples.

mean_xy_2 <- function(x, y){
  z <- (x + y)/2
  x
  z
}

mean_xy_2(x = 1, y = 3)
[1] 2

Note that x is not returned. Only the last expression is returned.

mean_xy_3 <- function(x, y){
  z <- x + y 
  return(x)
  z
}

mean_xy_3(x = 1, y = 3)
[1] 1

Note that z is not returned, if a return statement is encountered in the function anything after that statement is not executed.

You can create functions with a variable number of arguments using .... For example, here’s a function that returns the mean of all the values in a vector of arbitrary length:

mean_vector <- function(...){ 
  z <- mean(c(...))
  return(z)
} 

mean_vector(1,2,3)
[1] 2
mean_vector(1,2,3,4,5,6,7,8,9,10) 
[1] 5.5

The arguments in a vector do not have to be single values. Functions can be vectorized:

mean_x <- function(x){ 
  z <- mean(x)
  return(z)
} 
x <- c(1,2,3,4,5)
mean_x(x)
[1] 3

Custom functions in R are useful if you have a bunch of commands that you have to use multiple times. By combining them in a function you 1) save time, 2) keep your code concise, and 3) make less coding mistakes.

In the next example a function called my_descriptives is made to calculate a mean of a vector only for the positive values.

Build a custom function

my_descriptives <- function(x){
  x.trim <- x[x>=0]
  out <- mean(x.trim)
  return(out)
}

In the first line inside the function a sub sample of the vector is taken x.trim, with only values >=0. In the second line, the mean of this x.trim is taken.

This function can be used to describe a vector in my data set, but there are negative values where only positive values are allowed.

In this data set, there is a variable Ages:

data$Ages
 [1]  41  45  41  44  60  33  34  43  42  59  47  52  39  30  50 -50  41  53  48
[20]  33  55  48  43  47  50

There is one value -50, that is clearly an error.

my_descriptives(data$Ages)
[1] 44.91667

Compare the output with using the standard function mean():

mean(data$Ages)
[1] 41.12

In the standard mean function, the negative outlier is included and influences the mean!

The output of a function does not need to be a scalar. This version of the function my_decriptives() provides the whole summary of the variable, instead of only the mean.

my_descriptives <- function(x){
  x.trim <- x[x>=0]
  out <- summary(x.trim)
  return(out)
}
my_descriptives(data$Ages)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  30.00   41.00   44.50   44.92   50.00   60.00 

Again, let’s compare the output to the standard summary() function.

summary(data$Ages)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -50.00   41.00   44.00   41.12   50.00   60.00 

If you have multiple objects to return, you have to put them in an object container, like a list, vector, array or data.frame. It is not possible to return multiple individual objects like this:

return(x,y)

but it is possible to return them in a vector or list like this:

return(c(x,y))

return(list(x,y)

Here is an example of the function with multiple outputs:

my_descriptives2 <- function(x){
  x.trim <- x[x>0]
  below0 <- sum(x<0)
  meanX <- mean(x.trim)
  return(list(below0, meanX))
}

The function additionally returns how many values were negative.

my_descriptives2(data$Ages)
[[1]]
[1] 1

[[2]]
[1] 44.91667

There was 1 value below zero, as provided by the first element in the list.

Notice how the function gives an error if you do not put the items in a list:

my_descriptives2_wrong <- function(x){
  x.trim <- x[x>0]
  below0 <- sum(x<0)
  meanX <- mean(x.trim)
  return(below0, meanX)
}

my_descriptives2_wrong(data$Ages)
Error in return(below0, meanX): multi-argument returns are not permitted

Specifying default arguments of a function, can be done by filling in the default value in the function() call. Here is an example of a function with a default argument (y = 2).

calc4 <- function(x, y = 2){ 
  z1 <- x + y
  z2 <- x * y 
  return(c(z1, z2))
} 

calc4(x = 1) ## uses y = 2
[1] 3 2
calc4(x = 1, y = 3) ## overwrites default value of y
[1] 4 3

Function environments

Each function, whether built-in or user-defined, has an associated environment, which can be thought of as a container that holds all of the objects present at the time the function is created.

When a function is created on the command line, it’s environment is the so-called “Global Environment”:

w <- 2
f <- function(y) {
  d <- 3
  return(d * (w + y))
}
environment(f)
<environment: R_GlobalEnv>

The function objects() (or ls()), when called from the command line, lists the objects in the Global Environment:

objects()
 [1] "Ages"                   "calc4"                  "data"                  
 [4] "f"                      "mean_vector"            "mean_x"                
 [7] "mean_xy"                "mean_xy_2"              "mean_xy_3"             
[10] "my_descriptives"        "my_descriptives2"       "my_descriptives2_wrong"
[13] "myFun"                  "PatientID"              "w"                     
[16] "x"                     

Global and Local Variables

In the function f() defined above, the variable w is said to be global to f() and the variable d, because it’s created within f(), is said to be local to f(). Global variables (like w) are visible from within a function, but local variables (like d) aren’t visible from outside the function. In fact, local variables are temporary, and disappear when the function call is completed:

f(y = 1)
d

You get an error: Error in eval(expr, envir, enclos) : object ‘d’ not found, indicating that the variable d does not exist in the ‘Global Environment’.

When a global and local variable share the same name, the local variable is used:

w <- 2
d <- 4

f <- function(y) {
  d <- 3
  return(d * (w + y))
}

f(y = 1)
[1] 9

Note also that when an assignment takes place within a function, and the local variable shares its name with an existing global variable, only the local variable is affected:

w <- 2
d <- 4 # This value of d will remain unchanged.

f <- function(y) {
  d <- 3 # This doesnt affect the value of d in the global environment
  return(d * (w + y))
}

f(y = 1)
[1] 9
d
[1] 4