Common Objects

Authors: Sten Willemsen, Karl Brand and Elizabeth Ribble

Objects in R

Introduction

All the things we work with in R (such as the numbers in the calculations in the previous section) are called objects. All objects have a mode and a class. The mode describes how the data is stored in R and how it can be used. Text variables for example have to be handled differently than numbers, we cannot multiply two words.

Elementary objects can be combined with each other to form more complex objects this leads to several types of containers, like lists and data.frames

The class of an object is an attribute that can be used to further specify how an object is used in R. For example it can be used to indicate how it should be printed and plotted. For many objects the class will simply be equal to the mode.

Functions are special objects that can do things with other objectsl, for example the print function displays the contents of an object in the console. Functions are the topic of an other chapter but because we obviously want to do something with the various objects we will see we cannot avoid them altogether.

The elementary data types

The simplest variables just have a single value of a certain data type (Or mode1). The most important data types in R are:2:

mode description
numeric: character: logical: Posibly fractional numerical values like 1.0, 1.2 or 1e12 (that is 10 raised to the power of 12) text, for example ‘man’, ‘woman’, ‘censored’, etc. TRUE and FALSE

Let’s examine variables of these data types in a bit more detail:

Numbers

mode(1)    
[1] "numeric"
# for most basic data types the class is the same as the mode
class(1)    
[1] "numeric"
class(2.14)
[1] "numeric"
# functions that start with as.* do conversion
an_integer <- as.integer(2.14)
# integers are special numeric values that only store whole 
# numbers
an_integer 
[1] 2
class(an_integer)
[1] "integer"
mode(an_integer)
[1] "numeric"
# convert to numeric again
back_to_numeric <- as.numeric(an_integer)
class(back_to_numeric)
[1] "numeric"
class(1L) # explicit integer
[1] "integer"

Text

# character values should be surrounded by quotes "or '
class("a")
[1] "character"
class('a')
[1] "character"
mode("a")
[1] "character"
# even numbers in quotes are interpreted as text
class("1")
[1] "character"
mode("1")
[1] "character"

Logical

It can only be a yes or a no. More specifically, a TRUE or a FALSE.

class(TRUE)
[1] "logical"
class(FALSE)
[1] "logical"
mode(TRUE)
[1] "logical"

Often logical values are he result of comparisons:

Operator Meaning
== Equal to
!= Not equal to
> Greater than
< Smaller than
>= Greater than or equal to
<= Smaller than or equal to

Logival values can be combined using & (and) and | or, and inverted using ! (not).

a <- c(TRUE, TRUE, FALSE, FALSE)
b <- c(TRUE, FALSE, TRUE, FALSE)

a & b
[1]  TRUE FALSE FALSE FALSE
a | b
[1]  TRUE  TRUE  TRUE FALSE
!a
[1] FALSE FALSE  TRUE  TRUE

Vectors

Vectors of these data types are the most elementary data structure in R. All other structures (like the data.table) are constructed using these vectors. In R there is also no structure that is smaller than a vector. A single number is not treated differently from a numeric vector of length ten; In fact R sees the single number simply as a numeric vector of length 1. The length() of a vector can be obtained by using the function length().

A vector can be created using the function c(). (The c stands for ‘concatenate’, ‘coerce’ or ‘combine’)

c(1, 2, 3)
[1] 1 2 3
c('spam', 'ham', 'eggs')
[1] "spam" "ham"  "eggs"
c("double", "quotes", "work",
  'like', 'single')
[1] "double" "quotes" "work"   "like"   "single"
c(TRUE, FALSE)
[1]  TRUE FALSE
length(c(25, 4))
[1] 2

In the output we see that R shows the row number of the first element of each row between straight brackets. This makes it easier to refer to a particular element, especially when the vectors are long. We can work with vectors in the same way as with single numbers. In principle all operations are carried out in an element wise fashion.

c(1, 2, 3) * c(4, 5, 6)
[1]  4 10 18

Note that when the lengths do not match they are recycled.

c(1, 2, 3, 4) * c(4, 5)
[1]  4 10 12 20
# if the larger length is not an exact multiple
# of the smaller this often indicates a mistake
# and a warning is given
c(1, 2, 3) * c(4, 5) 
Warning in c(1, 2, 3) * c(4, 5): longer object length is not a multiple of
shorter object length
[1]  4 10 12

We can also give names to the elements of a vector:

named_v <- c(foo=1, bar=2)
print(named_v)
foo bar 
  1   2 
mode(named_v)
[1] "numeric"
class(named_v)
[1] "numeric"

When we try to create a vector that consists of different data types they will be converted to a data type that is capable of containing all of them. For example:

c("eleven", 12)
[1] "eleven" "12"    

The second element of the resulting vector is now also of type character. In general it is better not to trust this implicit conversion. Instead to it explicitly, in this case by using the function as.character().

An other way to create a vector is by using the function vector. vector('numeric', 8) creates a numeric vector of length 8. The vector function is often used to pre-allocate room where the results of future computations can be stored.

Matrices

Vectors have just a single dimension (every element is characterized by a single number (index) that indicates its position within the vector). They can be generalized to vectors that are two dimensional, that is they have both rows and columns. Like simple vectors all elements of a matrix have the same .

my_matrix <- matrix(data = c(1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))
my_matrix
      [,1]
 [1,]    1
 [2,]    2
 [3,]    3
 [4,]    4
 [5,]    5
 [6,]    6
 [7,]    7
 [8,]    8
 [9,]    9
[10,]   10
[11,]   11
[12,]   12
dim(my_matrix) # second dimension (# columns) is 1
[1] 12  1
matrix2 <- matrix(data = c(1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
                  nrow = 3)
matrix2
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
dim(matrix2)
[1] 3 4
char_mat <- matrix(data = c('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'), nrow = 2)
class(char_mat)
[1] "matrix" "array" 
mode(char_mat)
[1] "character"
length(char_mat) #just counts the number of elements 
[1] 8

Arrays

We do not have to stop with two dimensions. Arrays are even more general than matrices as they can have any number of dimensions. All elements have to be of the same type.

letters[1:12]
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l"
my_array <- array(letters[1:12], c(2, 2, 3))
my_array
, , 1

     [,1] [,2]
[1,] "a"  "c" 
[2,] "b"  "d" 

, , 2

     [,1] [,2]
[1,] "e"  "g" 
[2,] "f"  "h" 

, , 3

     [,1] [,2]
[1,] "i"  "k" 
[2,] "j"  "l" 
class(my_array)
[1] "array"
mode(my_array)
[1] "character"

Lists

Elements of a vector, matrix or array are always of the same type. A list differs from a vector by also allowing its elements to be of a different type. We can make a list using the function list.

list1 <- list("eleven", 12)
mode(list1)
[1] "list"
class(list1)
[1] "list"
list2 <- list(c(1, 2, 3), c('foo', 'bar'))

We can also assign a name to the elements of a list:

list3 <- list(numbers=c(1, 2, 3), chars=c('foo', 'bar'))

It is also possible for a list to contain other lists.

list4 <- list(numbers=c(1, 2, 3), chars=c('aap', 'noot'), 
               sublist=list(1,'a'))

A useful function for lists is str. It gives us its structure.

str(list4)
List of 3
 $ numbers: num [1:3] 1 2 3
 $ chars  : chr [1:2] "aap" "noot"
 $ sublist:List of 2
  ..$ : num 1
  ..$ : chr "a"

data.frames

This is a rectangular table in which every column contains a variable and every row an observation. They are similar to the spreadsheets: a series, or list of equal length columns, or vectors. Notably, the columns (vectors) can be of different classes, unlike a matrix or array.

my_df <- data.frame("vec" = c(12, 48), "lets" = letters[1:12])
my_df
   vec lets
1   12    a
2   48    b
3   12    c
4   48    d
5   12    e
6   48    f
7   12    g
8   48    h
9   12    i
10  48    j
11  12    k
12  48    l
dim(my_df)
[1] 12  2
class(my_df)
[1] "data.frame"
mode(my_df) # a data.frame is a special kind of list
[1] "list"
str(my_df)
'data.frame':   12 obs. of  2 variables:
 $ vec : num  12 48 12 48 12 48 12 48 12 48 ...
 $ lets: chr  "a" "b" "c" "d" ...

Most data sets come in the form of data.frames or can be converted to them so we are going to see them frequently later in the course.

factors

A factoris a special kind of vector for categorical data. The factor contains different integers one for every category. Each unique value has an associated ‘label’ that tell us what the code means. Factors are frequently used when we model categorical data. An advantage of a factor over a character is that we can limit the number of possible outcomes. It is also less likely to make mistakes due to typing errors. Factors can be created by means of the function factor().

f <- c('male', 'female', 'male')
factor(f)
[1] male   female male  
Levels: female male
levels(f)
NULL

When a factor is displayed R also shows us the unique values the variable can take. These are called the ‘levels’ of the factor.

functions

Functions are used to do something. We have already seen several of them. Like mode, class and as.integer and you will see lots more. Note that in R functions are objects too.

mode(mode)   # like all objects functions have a mode
[1] "function"
class(mode)
[1] "function"
print(mode)  
function (x) 
{
    if (is.expression(x)) 
        return("expression")
    if (is.call(x)) 
        return(switch(deparse(x[[1L]])[1L], `(` = "(", "call"))
    if (is.name(x)) 
        "name"
    else switch(tx <- typeof(x), double = , integer = "numeric", 
        closure = , builtin = , special = "function", tx)
}
<bytecode: 0x000001bb6c151cc8>
<environment: namespace:base>
# do not worry if you do not understand the meaning of what is printed yet

Note that operators are functions to. When you want to use them in the same way as normal functions just put them between back-ticks (“`”):

mode(`+`)   
[1] "function"
`+`(1, 1)
[1] 2

Base R has many factions but many more are come from various extension packages. These can be installed using install.packages() :

install.packages("packageName",
                 lib = "/directory/to/my custom R library",
                 repos = "http://cran.xl-mirror.nl")

# usually lib and repos can be omitted (left at the default)

The package name must be quoted when installing.

library("packageName")      ## quotes are optional when loading a package

Functions will be discussed in more detail later.

Missing values

Whenever the value of a variable is missing this is denoted by NA in R. Usually this means that the values exists however we do not know it. Sometimes the result of a calculation is not finite (for example when we define a positive or negative number by zero). In this case the result is defined to be Inf of -Inf in R. When a value cannot be computed at all (for example when we divide zero by zero) R will define the result as NaN, which stands for ‘Not a Number’. Finally, R sometimes uses the special value NULL to indicate that a variable is not yet defined. Here we will mostly deal with data that is just missing, that is NA.

a <- c(1, -1, 0, NA, NULL)
a/0
[1]  Inf -Inf  NaN   NA
is.na(a)
[1] FALSE FALSE FALSE  TRUE
is.finite(a)
[1]  TRUE  TRUE  TRUE FALSE
is.null(a) # note this looks at the whole object
[1] FALSE
l <- list(foo= a, b=c('b'))
l[1] <- NULL # deletes elements from a list
l
$b
[1] "b"

  1. mode, storage.mode and type are closely related concepts, we will not discuss the differences here. See also ?mode.↩︎

  2. There are a few more like complex and raw which we will not discuss.↩︎