Functions 1

Functions are things you can do. R comes with predefined functions which do many things from basic file management to complex statistics. To get started, below are some often used functions. This default set of functions is easily extended by defining your own functions and adding those defined by others conveniently in CRAN and BioConductor packages:

http://stat.ethz.ch/R-manual/R-patched/library/base/html/00Index.html

http://cran.r-project.org/web/packages/available_packages_by_name.html

http://www.bioconductor.org/packages/release/bioc/

Functions are expressed as:

function.name(), e.g., t.test() or, an operator, e.g., +

Easily obtain functions from other R users using install.packages():

install.packages("packageName")

Packages are basically a collection of new functions made by people around the world managed by the “CRAN” package repository team.

The package name must be quoted when installing. Besides installing the package on your PC, you need to load it into your R session before you can use it:

library(packageName)

Example functions

Built-in Named Functions

R has an extensive set of built-in functions, a few of which are listed below:

First six elements/rows:

head()

Last six elements/rows:

tail()

Generate a sequence of values:

seq(from=1, to=10, by=2)
[1] 1 3 5 7 9

Create numbers from string numbers:

as.numeric(c("1","2","3","4"))
[1] 1 2 3 4

Sort a vector alphabetically or numerically:

sort(c(3,4,2,5,1))
[1] 1 2 3 4 5

Retrieve the maximum value from a vector:

max(c(111,333,444,55,6,777,999))
[1] 999

Generate 10 random numbers from a normal distribution with mean 0 and standard deviation 1:

rnorm(10)
 [1]  0.638149020 -1.477336337 -0.491813383  0.002060021 -0.769983184
 [6] -1.013374635 -0.303588331 -1.010995799  0.154611784 -1.444606356

For a (very long) list of all named functions available in base-R have a look at this website:

https://stat.ethz.ch/R-manual/R-patched/library/base/html/00Index.html

# Function arguments {.unnumbered}

### Input of Functions

Each function accepts one or more values passed to it as arguments, performs computations or operations on those values, and returns a single result.

To use a function, type the name of the function with the values of its argument(s) in parentheses ( ), then hit ‘Enter’. For example:

sqrt(2)
[1] 1.414214

Values passed as arguments can be in the form of variables, such as x below:

x <- 2

sqrt(x)
[1] 1.414214

or they can be entire expressions, such as x^2 + 5 below:

sqrt(x^2 + 5)
[1] 3

Most functions take multiple arguments, each of which has a name, and some of which are optional. One way to see what arguments a function takes and which ones are optional is to use the function args().Another way to view a function’s arguments is to look at its help file by typing a ? in front of the function ’s name like so: ?sqrt

For example, to see what arguments round() takes (using args()), we’d type:

args(round)
function (x, digits = 0, ...) 
NULL

We see that round() has two arguments, x, a numeric value to be rounded, and digits, an integer specifying the number of decimal places to round to. Thus to round 4.679 to 2 decimal places, we type:

round(4.679, 2)
[1] 4.68

Optional Arguments and Default Values

The specification digits = 0 in the output from args(round) tells us that digits has a default value of 0. This means that it’s an optional argument and if no value is passed for that argument, rounding is done to 0 decimal places (i.e. to the nearest integer).

Positional Matching and Named Argument Matching

When we type round(4.679, 2) R knows, by positional matching, that the first value, 4.679, is the value to be rounded and the second one, 2, is the number of decimal places to round to.

We can also specify values for the arguments by name. For example:

round(x = 4.679, digits = 2)
[1] 4.68

When named argument matching is used, as above, the order of the arguments is irrelevant. For example, we get the same result by typing:

round(digits = 2, x = 4.679)
[1] 4.68

The two types of argument specification (positional and named argument matching) can be mixed in the same function call:

round(4.679, digits=2)
[1] 4.68

Although it may be confusing, it is also possible to specify the first argument in the second position, by using named argument matching for the second argument like so:

round(digits=2, 4.679)
[1] 4.68
Tip

For readability of your code it is advisable to name most of the arguments when calling a function

# Creating your own functions {.unnumbered}

To write your own function, you need to use the function function(). In the brackets () you specify argument(s) that will be used in the function (with our without defaults), and in curly brackets { } you specify what the function should do, referring to the argument(s).

The general form is:

myFun <- function(arg1, arg2) {
## Here you type expressions that use the arguments
}

Example of custom function

Each line inside the function is an object assignment, a function call, a subsetting, a conditional statement, an if/else statement, a for loop, etc. - basically anything you have now learned how to do in R that you want the function to do!

Below is an easy example that calculates the mean of two values (x and y):

mean_xy <- function(x, y){
(x + y)/2
}

This function can be used the same way as how you have been using functions before:

mean_xy(2,6)
[1] 4

Or:

mean_xy(x = 2, y = 6)
[1] 4

Use return() in a function

To have a function output something, you must return something. Either the value of the last command is returned (as in mean_xy) or you can use return().

mean_xy <- function(x, y){
  z <- (x + y)/2
  return(z)
}

mean_xy(x = 2, y = 6)
[1] 4

Here are a few other examples.

mean_xy_2 <- function(x, y){
  z <- (x + y)/2
  x
  z
}

mean_xy_2(x = 1, y = 3)
[1] 2

Note that x is not returned. Only the last expression is returned.

mean_xy_3 <- function(x, y){
  z <- x + y 
  return(x)
  z
}

mean_xy_3(x = 1, y = 3)
[1] 1

Note that z is not returned, if a return statement is encountered in the function anything after that statement is not executed.

You can create functions with a variable number of arguments using .... For example, here’s a function that returns the mean of all the values in a vector of arbitrary length:

mean_vector <- function(...){ 
z <- mean(c(...))
return(z)
} 

mean_vector(1,2,3)
[1] 2
mean_vector(1,2,3,4,5,6,7,8,9,10) 
[1] 5.5

The arguments in a vector do not have to be single values. Functions can be vectorized:

mean_x <- function(x){ 
z <- mean(x)
return(z)
} 
x <- c(1,2,3,4,5)
mean_x(x)
[1] 3

Custom functions in R are useful if you have a bunch of commands that you have to use multiple times. By combining them in a function you 1) save time, 2) keep your code concise, and 3) make less coding mistakes.

In the next example a function called my_descriptives is made to calculate a mean of a vector only for the positive values.

Build a custom function

my_descriptives <- function(x){
  x.trim <- x[x>=0]
  out <- mean(x.trim)
  return(out)
}

In the first line inside the function a sub sample of the vector is taken x.trim, with only values >=0. In the second line, the mean of this x.trim is taken.

This function can be used to describe a vector in my data set, but there are negative values where only positive values are allowed.

In this data set, there is a variable Ages:

data$Ages
 [1]  57  43  39  45  30  48  56  49  42  49  35  58  48  51  36 -50  51  43  41
[20]  39  47  31  43  44  57

There is one value -50, that is clearly an error.

my_descriptives(data$Ages)
[1] 45.08333

Compare the output with using the standard function mean():

mean(data$Ages)
[1] 41.28

In the standard mean function, the negative outlier is included and influences the mean!

The output of a function does not need to be a scalar. This version of the function my_decriptives() provides the whole summary of the variable, instead of only the mean.

my_descriptives <- function(x){
  x.trim <- x[x>=0]
  out <- summary(x.trim)
  return(out)
}
my_descriptives(data$Ages)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  30.00   40.50   44.50   45.08   49.50   58.00 

Again, let’s compare the output to the standard summary() function.

summary(data$Ages)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -50.00   39.00   44.00   41.28   49.00   58.00 

If you have multiple objects to return, you have to put them in an object container, like a list, vector, array or data.frame. It is not possible to return multiple individual objects like this:

return(x,y)

but it is possible to return them in a vector or list like this:

return(c(x,y))

return(list(x,y)

Here is an example of the function with multiple outputs:

my_descriptives2 <- function(x){
  x.trim <- x[x>0]
  below0 <- sum(x<0)
  meanX <- mean(x.trim)
  return(list(below0, meanX))
}

The function additionally returns how many values were negative.

my_descriptives2(data$Ages)
[[1]]
[1] 1

[[2]]
[1] 45.08333

There was 1 value below zero, as provided by the first element in the list.

Notice how the function gives an error if you do not put the items in a list:

my_descriptives2_wrong <- function(x){
  x.trim <- x[x>0]
  below0 <- sum(x<0)
  meanX <- mean(x.trim)
  return(below0, meanX)
}

my_descriptives2_wrong(data$Ages)
Error in return(below0, meanX): multi-argument returns are not permitted

Specifying default arguments of a function, can be done by filling in the default value in the function() call. Here is an example of a function with a default argument (y = 2).

calc4 <- function(x, y = 2){ 
  z1 <- x + y
  z2 <- x * y 
  return(c(z1, z2))
} 

calc4(x = 1) ## uses y = 2
[1] 3 2
calc4(x = 1, y = 3) ## overwrites default value of y
[1] 4 3

Function environments

Each function, whether built-in or user-defined, has an associated environment, which can be thought of as a container that holds all of the objects present at the time the function is created.

When a function is created on the command line, it’s environment is the so-called “Global Environment”:

w <- 2
f <- function(y) {
d <- 3
return(d * (w + y))
}
environment(f)
<environment: R_GlobalEnv>

The function objects() (or ls()), when called from the command line, lists the objects in the Global Environment:

objects()
 [1] "Ages"                   "calc4"                  "data"                  
 [4] "f"                      "mean_vector"            "mean_x"                
 [7] "mean_xy"                "mean_xy_2"              "mean_xy_3"             
[10] "my_descriptives"        "my_descriptives2"       "my_descriptives2_wrong"
[13] "myFun"                  "PatientID"              "w"                     
[16] "x"                     

Global and Local Variables

In the function f() defined above, the variable w is said to be global to f() and the variable d, because it’s created within f(), is said to be local to f(). Global variables (like w) are visible from within a function, but local variables (like d) aren’t visible from outside the function. In fact, local variables are temporary, and disappear when the function call is completed:

f(y = 1)
d

You get an error: Error in eval(expr, envir, enclos) : object ‘d’ not found, indicating that the variable d does not exist in the ‘Global Environment’.

When a global and local variable share the same name, the local variable is used:

w <- 2
d <- 4

f <- function(y) {
d <- 3
return(d * (w + y))
}

f(y = 1)
[1] 9

Note also that when an assignment takes place within a function, and the local variable shares its name with an existing global variable, only the local variable is affected:

w <- 2
d <- 4 # This value of d will remain unchanged.

f <- function(y) {
d <- 3 # This doesnt affect the value of d in the global environment
return(d * (w + y))
}

f(y = 1)
[1] 9
d
[1] 4