Authors: Sten Willemsen, Karl Brand and Elizabeth Ribble
Objects in R
Introduction
All the things we work with in R (such as the numbers in the calculations in the previous section) are called objects. All objects have a mode and a class. The mode describes how the data is stored in R and how it can be used. Text variables for example have to be handled differently than numbers, we cannot multiply two words.
Elementary objects can be combined with each other to form more complex objects this leads to several types of containers, like lists and data.frames
The class of an object is an attribute that can be used to further specify how an object is used in R. For example it can be used to indicate how it should be printed and plotted. For many objects the class will simply be equal to the mode.
Functions are special objects that can do things with other objectsl, for example the print function displays the contents of an object in the console. Functions are the topic of an other chapter but because we obviously want to do something with the various objects we will see we cannot avoid them altogether.
The elementary data types
The simplest variables just have a single value of a certain data type (Or mode1). The most important data types in R are:2:
mode
description
numeric: character: logical:
Posibly fractional numerical values like 1.0, 1.2 or 1e12 (that is 10 raised to the power of 12) text, for example ‘man’, ‘woman’, ‘censored’, etc. TRUE and FALSE
Let’s examine variables of these data types in a bit more detail:
Numbers
mode(1)
[1] "numeric"
# for most basic data types the class is the same as the modeclass(1)
[1] "numeric"
class(2.14)
[1] "numeric"
# functions that start with as.* do conversionan_integer <-as.integer(2.14)# integers are special numeric values that only store whole # numbersan_integer
[1] 2
class(an_integer)
[1] "integer"
mode(an_integer)
[1] "numeric"
# convert to numeric againback_to_numeric <-as.numeric(an_integer)class(back_to_numeric)
[1] "numeric"
class(1L) # explicit integer
[1] "integer"
Text
# character values should be surrounded by quotes "or 'class("a")
[1] "character"
class('a')
[1] "character"
mode("a")
[1] "character"
# even numbers in quotes are interpreted as textclass("1")
[1] "character"
mode("1")
[1] "character"
Logical
It can only be a yes or a no. More specifically, a TRUE or a FALSE.
class(TRUE)
[1] "logical"
class(FALSE)
[1] "logical"
mode(TRUE)
[1] "logical"
Often logical values are he result of comparisons:
Operator
Meaning
==
Equal to
!=
Not equal to
>
Greater than
<
Smaller than
>=
Greater than or equal to
<=
Smaller than or equal to
Logival values can be combined using & (and) and | or, and inverted using ! (not).
a <-c(TRUE, TRUE, FALSE, FALSE)b <-c(TRUE, FALSE, TRUE, FALSE)a & b
[1] TRUE FALSE FALSE FALSE
a | b
[1] TRUE TRUE TRUE FALSE
!a
[1] FALSE FALSE TRUE TRUE
Vectors
Vectors of these data types are the most elementary data structure in R. All other structures (like the data.table) are constructed using these vectors. In R there is also no structure that is smaller than a vector. A single number is not treated differently from a numeric vector of length ten; In fact R sees the single number simply as a numeric vector of length 1. The length() of a vector can be obtained by using the function length().
A vector can be created using the function c(). (The c stands for ‘concatenate’, ‘coerce’ or ‘combine’)
c(1, 2, 3)
[1] 1 2 3
c('spam', 'ham', 'eggs')
[1] "spam" "ham" "eggs"
c("double", "quotes", "work",'like', 'single')
[1] "double" "quotes" "work" "like" "single"
c(TRUE, FALSE)
[1] TRUE FALSE
length(c(25, 4))
[1] 2
In the output we see that R shows the row number of the first element of each row between straight brackets. This makes it easier to refer to a particular element, especially when the vectors are long. We can work with vectors in the same way as with single numbers. In principle all operations are carried out in an element wise fashion.
c(1, 2, 3) *c(4, 5, 6)
[1] 4 10 18
Note that when the lengths do not match they are recycled.
c(1, 2, 3, 4) *c(4, 5)
[1] 4 10 12 20
# if the larger length is not an exact multiple# of the smaller this often indicates a mistake# and a warning is givenc(1, 2, 3) *c(4, 5)
Warning in c(1, 2, 3) * c(4, 5): longer object length is not a multiple of
shorter object length
[1] 4 10 12
We can also give names to the elements of a vector:
named_v <-c(foo=1, bar=2)print(named_v)
foo bar
1 2
mode(named_v)
[1] "numeric"
class(named_v)
[1] "numeric"
When we try to create a vector that consists of different data types they will be converted to a data type that is capable of containing all of them. For example:
c("eleven", 12)
[1] "eleven" "12"
The second element of the resulting vector is now also of type character. In general it is better not to trust this implicit conversion. Instead to it explicitly, in this case by using the function as.character().
An other way to create a vector is by using the function vector. vector('numeric', 8) creates a numeric vector of length 8. The vector function is often used to pre-allocate room where the results of future computations can be stored.
Matrices
Vectors have just a single dimension (every element is characterized by a single number (index) that indicates its position within the vector). They can be generalized to vectors that are two dimensional, that is they have both rows and columns. Like simple vectors all elements of a matrix have the same .
length(char_mat) #just counts the number of elements
[1] 8
Arrays
We do not have to stop with two dimensions. Arrays are even more general than matrices as they can have any number of dimensions. All elements have to be of the same type.
Elements of a vector, matrix or array are always of the same type. A list differs from a vector by also allowing its elements to be of a different type. We can make a list using the function list.
list1 <-list("eleven", 12)mode(list1)
[1] "list"
class(list1)
[1] "list"
list2 <-list(c(1, 2, 3), c('foo', 'bar'))
We can also assign a name to the elements of a list:
A useful function for lists is str. It gives us its structure.
str(list4)
List of 3
$ numbers: num [1:3] 1 2 3
$ chars : chr [1:2] "aap" "noot"
$ sublist:List of 2
..$ : num 1
..$ : chr "a"
data.frames
This is a rectangular table in which every column contains a variable and every row an observation. They are similar to the spreadsheets: a series, or list of equal length columns, or vectors. Notably, the columns (vectors) can be of different classes, unlike a matrix or array.
vec lets
1 12 a
2 48 b
3 12 c
4 48 d
5 12 e
6 48 f
7 12 g
8 48 h
9 12 i
10 48 j
11 12 k
12 48 l
dim(my_df)
[1] 12 2
class(my_df)
[1] "data.frame"
mode(my_df) # a data.frame is a special kind of list
[1] "list"
str(my_df)
'data.frame': 12 obs. of 2 variables:
$ vec : num 12 48 12 48 12 48 12 48 12 48 ...
$ lets: chr "a" "b" "c" "d" ...
Most data sets come in the form of data.frames or can be converted to them so we are going to see them frequently later in the course.
factors
A factoris a special kind of vector for categorical data. The factor contains different integers one for every category. Each unique value has an associated ‘label’ that tell us what the code means. Factors are frequently used when we model categorical data. An advantage of a factor over a character is that we can limit the number of possible outcomes. It is also less likely to make mistakes due to typing errors. Factors can be created by means of the function factor().
f <-c('male', 'female', 'male')factor(f)
[1] male female male
Levels: female male
levels(f)
NULL
When a factor is displayed R also shows us the unique values the variable can take. These are called the ‘levels’ of the factor.
functions
Functions are used to do something. We have already seen several of them. Like mode, class and as.integer and you will see lots more. Note that in R functions are objects too.
mode(mode) # like all objects functions have a mode
[1] "function"
class(mode)
[1] "function"
print(mode)
function (x)
{
if (is.expression(x))
return("expression")
if (is.call(x))
return(switch(deparse(x[[1L]])[1L], `(` = "(", "call"))
if (is.name(x))
"name"
else switch(tx <- typeof(x), double = , integer = "numeric",
closure = , builtin = , special = "function", tx)
}
<bytecode: 0x000001bb6c151cc8>
<environment: namespace:base>
# do not worry if you do not understand the meaning of what is printed yet
Note that operators are functions to. When you want to use them in the same way as normal functions just put them between back-ticks (“`”):
mode(`+`)
[1] "function"
`+`(1, 1)
[1] 2
Base R has many factions but many more are come from various extension packages. These can be installed using install.packages() :
install.packages("packageName",lib ="/directory/to/my custom R library",repos ="http://cran.xl-mirror.nl")# usually lib and repos can be omitted (left at the default)
The package name must be quoted when installing.
library("packageName") ## quotes are optional when loading a package
Functions will be discussed in more detail later.
Missing values
Whenever the value of a variable is missing this is denoted by NA in R. Usually this means that the values exists however we do not know it. Sometimes the result of a calculation is not finite (for example when we define a positive or negative number by zero). In this case the result is defined to be Inf of -Inf in R. When a value cannot be computed at all (for example when we divide zero by zero) R will define the result as NaN, which stands for ‘Not a Number’. Finally, R sometimes uses the special value NULL to indicate that a variable is not yet defined. Here we will mostly deal with data that is just missing, that is NA.
a <-c(1, -1, 0, NA, NULL)a/0
[1] Inf -Inf NaN NA
is.na(a)
[1] FALSE FALSE FALSE TRUE
is.finite(a)
[1] TRUE TRUE TRUE FALSE
is.null(a) # note this looks at the whole object
[1] FALSE
l <-list(foo= a, b=c('b'))l[1] <-NULL# deletes elements from a listl
$b
[1] "b"
mode, storage.mode and type are closely related concepts, we will not discuss the differences here. See also ?mode.↩︎
There are a few more like complex and raw which we will not discuss.↩︎