<- function(arg1, arg2) {
myFun ## Here you type expressions that use the arguments
}
Creating your own functions
To write your own function, you need to use the function function()
. In the brackets ()
you specify argument(s) that will be used in the function (with our without defaults), and in curly brackets { }
you specify what the function should do, referring to the argument(s).
The general form is:
Example of custom function
Each line inside the function is an object assignment, a function call, a subsetting, a conditional statement, an if/else statement, a for loop, etc. - basically anything you have now learned how to do in R that you want the function to do!
Below is an easy example that calculates the mean of two values (x
and y
):
<- function(x, y){
mean_xy + y)/2
(x }
This function can be used the same way as how you have been using functions before:
mean_xy(2,6)
[1] 4
Or:
mean_xy(x = 2, y = 6)
[1] 4
Use return() in a function
To have a function output something, you must return something. Either the value of the last command is returned (as in mean_xy
) or you can use return()
.
<- function(x, y){
mean_xy <- (x + y)/2
z return(z)
}
mean_xy(x = 2, y = 6)
[1] 4
Here are a few other examples.
<- function(x, y){
mean_xy_2 <- (x + y)/2
z
x
z
}
mean_xy_2(x = 1, y = 3)
[1] 2
Note that x is not returned. Only the last expression is returned.
<- function(x, y){
mean_xy_3 <- x + y
z return(x)
z
}
mean_xy_3(x = 1, y = 3)
[1] 1
Note that z is not returned, if a return statement is encountered in the function anything after that statement is not executed.
You can create functions with a variable number of arguments using ...
. For example, here’s a function that returns the mean of all the values in a vector of arbitrary length:
<- function(...){
mean_vector <- mean(c(...))
z return(z)
}
mean_vector(1,2,3)
[1] 2
mean_vector(1,2,3,4,5,6,7,8,9,10)
[1] 5.5
The arguments in a vector do not have to be single values. Functions can be vectorized:
<- function(x){
mean_x <- mean(x)
z return(z)
} <- c(1,2,3,4,5)
x mean_x(x)
[1] 3
Custom functions in R are useful if you have a bunch of commands that you have to use multiple times. By combining them in a function you 1) save time, 2) keep your code concise, and 3) make less coding mistakes.
In the next example a function called my_descriptives
is made to calculate a mean of a vector only for the positive values.
Build a custom function
<- function(x){
my_descriptives <- x[x>=0]
x.trim <- mean(x.trim)
out return(out)
}
In the first line inside the function a sub sample of the vector is taken x.trim
, with only values >=0
. In the second line, the mean of this x.trim
is taken.
This function can be used to describe a vector in my data set, but there are negative values where only positive values are allowed.
In this data set, there is a variable Ages
:
$Ages data
[1] 41 45 41 44 60 33 34 43 42 59 47 52 39 30 50 -50 41 53 48
[20] 33 55 48 43 47 50
There is one value -50
, that is clearly an error.
my_descriptives(data$Ages)
[1] 44.91667
Compare the output with using the standard function mean()
:
mean(data$Ages)
[1] 41.12
In the standard mean
function, the negative outlier is included and influences the mean!
The output of a function does not need to be a scalar. This version of the function my_decriptives()
provides the whole summary of the variable, instead of only the mean.
<- function(x){
my_descriptives <- x[x>=0]
x.trim <- summary(x.trim)
out return(out)
}
my_descriptives(data$Ages)
Min. 1st Qu. Median Mean 3rd Qu. Max.
30.00 41.00 44.50 44.92 50.00 60.00
Again, let’s compare the output to the standard summary()
function.
summary(data$Ages)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-50.00 41.00 44.00 41.12 50.00 60.00
If you have multiple objects to return, you have to put them in an object container, like a list, vector, array or data.frame. It is not possible to return multiple individual objects like this:
return(x,y)
but it is possible to return them in a vector or list like this:
return(c(x,y))
return(list(x,y)
Here is an example of the function with multiple outputs:
<- function(x){
my_descriptives2 <- x[x>0]
x.trim <- sum(x<0)
below0 <- mean(x.trim)
meanX return(list(below0, meanX))
}
The function additionally returns how many values were negative.
my_descriptives2(data$Ages)
[[1]]
[1] 1
[[2]]
[1] 44.91667
There was 1
value below zero, as provided by the first element in the list.
Notice how the function gives an error if you do not put the items in a list:
<- function(x){
my_descriptives2_wrong <- x[x>0]
x.trim <- sum(x<0)
below0 <- mean(x.trim)
meanX return(below0, meanX)
}
my_descriptives2_wrong(data$Ages)
Error in return(below0, meanX): multi-argument returns are not permitted
Specifying default arguments of a function, can be done by filling in the default value in the function()
call. Here is an example of a function with a default argument (y = 2
).
<- function(x, y = 2){
calc4 <- x + y
z1 <- x * y
z2 return(c(z1, z2))
}
calc4(x = 1) ## uses y = 2
[1] 3 2
calc4(x = 1, y = 3) ## overwrites default value of y
[1] 4 3
Function environments
Each function, whether built-in or user-defined, has an associated environment, which can be thought of as a container that holds all of the objects present at the time the function is created.
When a function is created on the command line, it’s environment is the so-called “Global Environment”:
<- 2
w <- function(y) {
f <- 3
d return(d * (w + y))
}environment(f)
<environment: R_GlobalEnv>
The function objects()
(or ls()
), when called from the command line, lists the objects in the Global Environment:
objects()
[1] "Ages" "calc4" "data"
[4] "f" "mean_vector" "mean_x"
[7] "mean_xy" "mean_xy_2" "mean_xy_3"
[10] "my_descriptives" "my_descriptives2" "my_descriptives2_wrong"
[13] "myFun" "PatientID" "w"
[16] "x"
Global and Local Variables
In the function f()
defined above, the variable w
is said to be global to f()
and the variable d
, because it’s created within f()
, is said to be local to f()
. Global variables (like w
) are visible from within a function, but local variables (like d
) aren’t visible from outside the function. In fact, local variables are temporary, and disappear when the function call is completed:
f(y = 1)
d
You get an error: Error in eval(expr, envir, enclos) : object ‘d’ not found, indicating that the variable d does not exist in the ‘Global Environment’.
When a global and local variable share the same name, the local variable is used:
<- 2
w <- 4
d
<- function(y) {
f <- 3
d return(d * (w + y))
}
f(y = 1)
[1] 9
Note also that when an assignment takes place within a function, and the local variable shares its name with an existing global variable, only the local variable is affected:
<- 2
w <- 4 # This value of d will remain unchanged.
d
<- function(y) {
f <- 3 # This doesnt affect the value of d in the global environment
d return(d * (w + y))
}
f(y = 1)
[1] 9
d
[1] 4