R Course - Day 3

Functions I

Built-in Named Functions

head() #Print the first 6 lines of a dataframe

tail() #Print the last 6 lines of a dataframe

seq(from=1, to=10, by=2) #Create a sequence from 1 to 10 with steps of 2

as.numeric(c("1","2","3","4")) #Turn string into numbers

sort(c(3,4,2,5,1)) #Sort numbers or letters numerically or alphabetically

max(c(111,333,444,55,6,777,999)) #Determine the maximum value of a vector

rnorm(10) #Sample 10 values from a r(andom) norm(al) distribution between 0 and 1

Built-in functions in base-R :

https://stat.ethz.ch/R-manual/R-patched/library/base/html/00Index.html

Function arguments

arguments, the object in the parentheses ( ):

sqrt(2)
[1] 1.414214

or

x <- 2
sqrt(x)
[1] 1.414214

or

sqrt(x^2 + 5)
[1] 3

Functions with multiple arguments

args(round)
function (x, digits = 0, ...) 
NULL
round(4.679, 2)
[1] 4.68

digits is optional and defaults to 0:

round(4.679)
[1] 5

Named Argument Matching

round(x = 4.679, digits = 2)
[1] 4.68
round(digits = 2, x = 4.679)
[1] 4.68

Mixing is possible:

round(4.679, digits=2)
[1] 4.68
round(digits=2, 4.679)
[1] 4.68

Creating your own functions

myFun <- function(arg1, arg2) {
  ## Here you type expressions that use the arguments
}

myFun(arg1, arg2)

Example of a custom function

mean_xy <- function(x, y){
  (x + y)/2
}

We can us it like this:

mean_xy(2,6)
[1] 4

or

mean_xy(x = 2,y = 6)
[1] 4

Use return() in a function

mean_xy <- function(x, y){
  z <- (x + y)/2
  return(z)
}

mean_xy(x = 2, y = 6)
[1] 4

Default return value

mean_xy_2 <- function(x, y){
  z <- (x + y)/2
  x
  z
}

mean_xy_2(x = 1, y = 3)
[1] 2

x is not returned!

mean_xy_3 <- function(x, y){
  z <- x + y 
  return(x)
  z
}

mean_xy_3(x = 1, y = 3)
[1] 1

z is not returned!

Function with multiple input values

mean_vector <- function(...){ 
  z <- mean(c(...))
  return(z)
} 

mean_vector(1,2,3)
[1] 2

No limit to number of inputs:

mean_vector(1,2,3,4,5,6,7,8,9,10) 
[1] 5.5

Vector as an input to a function

mean_x <- function(x){ 
  z <- mean(x)
  return(z)
} 
x <- c(1,2,3,4,5)
mean_x(x)
[1] 3

Build a custom function to clean data

Calculate mean of only the positive values (data cleaning)

my_descriptives <- function(x){
  x.trim <- x[x >= 0]
  out <- mean(x.trim)
  return(out)
}

Patient data:

data$Ages
 [1]  56  53  53  33  44  52  56  38  39  72  48  40  57  39  32 -50  37  50  37
[20]  29  46  46  47  50  54

-50 is clearly wrong

Build a custom function to clean data

my_descriptives(data$Ages)
[1] 46.16667

In stead of:

mean(data$Ages)
[1] 42.32

Returning vectors

my_descriptives <- function(x){
  x.trim <- x[x>=0]
  out <- summary(x.trim)
  return(out)
}

my_descriptives(data$Ages)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  29.00   38.75   46.50   46.17   53.00   72.00 

Returning multiple values

It is not possible to use return(x,y)

my_descriptives2_wrong <- function(x){
  x.trim <- x[x>0]
  below0 <- sum(x<0)
  meanX <- mean(x.trim)
  return(below0, meanX)
}

my_descriptives2_wrong(data$Ages)
Error in return(below0, meanX): multi-argument returns are not permitted

Returning multiple values

To return multiple values we use: return(c(x,y)) or return(list(x,y))

my_descriptives2 <- function(x){
  x.trim <- x[x>0]
  below0 <- sum(x<0)
  meanX <- mean(x.trim)
  return(list(below0, meanX))
}

my_descriptives2(data$Ages)
[[1]]
[1] 1

[[2]]
[1] 46.16667

Function environments and scoping

The Top-Level (or Global) Environment

w <- 2
f <- function(y) {
  d <- 3
  return(d * (w + y))
}
environment(f)
<environment: R_GlobalEnv>

Objects in the Global Environment:

objects()
 [1] "Ages"                   "data"                   "f"                     
 [4] "mean_vector"            "mean_x"                 "mean_xy"               
 [7] "mean_xy_2"              "mean_xy_3"              "my_descriptives"       
[10] "my_descriptives2"       "my_descriptives2_wrong" "PatientID"             
[13] "w"                      "x"                     

Global and Local Variables

w is a global variable

d is a local variable to f()

f <- function(y) {
  d <- 3
  return(d * (w + y))
}

f(y = 1)
[1] 9
d
Error: object 'd' not found

Local preference over Global

w <- 2
d <- 4

f <- function(y) {
  d <- 3
  return(d * (w + y))
}

f(y = 1)
[1] 9

Local assignment of variables

w <- 2
d <- 4 # This value of d will remain unchanged.

f <- function(y) {
  d <- 3 # This doesnt affect the value of d in the global environment
  return(d * (w + y))
}

f(y = 1)
[1] 9
d
[1] 4