17 Base R (Part I)

17.1 Topics Covered

Base R syntax and variables
Base R data types: numeric, integer, character, logical, and date
Base R data models: vector, matrix, array, data frame, and list
Arithmetic and comparison operators
Useful base R functions
Reading and writing tabulated data
Using comments

17.2 Introduction

This first chapter of this appendix introduces base R and assumes that you have no prior experience using the language. If you have a solid understanding of R, feel free to skip this chapter or just skim it. With that said, we feel strongly that a foundational understanding of the R language, and coding etiquette in general, goes a long way.

17.2.1 Getting Help

Base R and R packages have built-in help or documentation, which can be accessed using the help() function. Inside of the help() function, you must place the function, class, or data object name in quotes. The documentation will then load in the Help tab in RStudio. For functions, it is common for example code to be provided. In the example below, we are obtaining help for the base R c() function, which is used to combine objects into a single object. For example, two vectors can be merged to a single vector.

The help is also useful for finding the default arguments for function parameters.

help(c)
?c

17.2.2 Defining Variables

Variables are used to store values or data objects for use in your code. R traditionally uses the <- operator for variable assignment; however, you can also use =. In the code block below, we are assigning the number 1 to the variable x, the character string "GIS" to the variable y, and the logical TRUE to the variable z.

There are a few rules for variable names:

Cannot use reserved words, or those that have a special use in R
Cannot start with numbers
Only dots/periods and underscore special characters can be used in variable names

We like to keep variable names short so that they are easy to work with and call. Variables can be overwritten if they are used more than once. So, all of your variables must have unique names if you want to maintain them throughout a script. Variable names are also case sensitive; for example, x and X are treated as two separate variables.

The print() function can be used to return, or print, the content of a variable in the console.

x <- 1
y <- "GIS"
z <- TRUE

print(x)

[1] 1

print(y)

[1] "GIS"

print(z)

[1] TRUE

17.3 Data Types

There are several base data types in R, which are described in Table 17.1. Note that numbers are not quoted while characters are quoted. If a number is quoted, it will be treated like a character. In other words, mathematical operations cannot be performed using it. By default, numbers will be treated as real numbers. In order for them to be treated like whole numbers, the integer type can be defined. Complex numbers allow for both real and imaginary components. We do not make use of this data type in this text. The character type requires that the string be placed in quotes. Logicals, either TRUE or FALSE, must be uppercase and not quoted.

In order to determine the data type of a variable, you can use the typeof() or class() function. Table 17.1 also provides the functions used to define, convert, and check the data type of a variable. Functions beginning with as. are used for conversion between types while functions beginning with is. are used to check the type and return a logical TRUE or FALSE. We demonstrate the use of these functions below.

We will save our discussion of factors for later in the chapter.

The Date data type can be complex since date data formatting is not standardized. Note the need to define the date formatting in our example. We discuss the specifics of working with dates as necessary within the text.

**Table 17.1.** R base data types.
Data Type	Description	Assign	Convert	Check
Numeric	Number treated as a number (can perform mathematical operations on it); treated as having decimal values by default	numeric()	as.numeric()	is.numeric()
Integer	Number treated as a whole number without decimal values	integer()	as.integer()	is.integer()
Complex	Number with real and imaginary components	complex()	as.complex()	is.complex()
Character	Character string or numbers not treated as numbers; must be quoted	character()	as.character()	is.character()
Factor	Characters with pre-defined levels; has associated numeric code; can be unordered or ordered	factor()	as.factor()	is.factor()
Logical	Boolean TRUE or FALSE; must be stated without quotes and in uppercase	logical()	as.logical()	is.logical()
Date	Date treated as date; different date formats supported	Date()	as.Date()	is.Date()

Before moving on, we will experiment with data type conversion. Here, you can see that the variable x is initially defined as a numeric type. Using as.integer(), we convert it to an integer data type. It is also possible to convert numeric data to a character or a string using as.character(). Characters that represent numbers can be converted back to numeric using as.numeric(). Characters can be converted to dates using as.Date(); however, you will need to provide some additional info to define how the dates are formatted.

x <- 1
typeof(x)

[1] "double"

y <- as.integer(x)
typeof(y)

[1] "integer"

z <- as.character(y)
typeof(z)

[1] "character"

w <- as.numeric(z)
typeof(w)

[1] "double"

a <- TRUE
typeof(a)

[1] "logical"

d <- c("01/20/2020")
typeof(d)

[1] "character"

d2 <- as.Date(d, "%m/%d/%Y")

When converting data to a logical type, the result is not always obvious. As demonstrated below, when converting a number to a logical, 0 yields FALSE and any other value yields TRUE. The character string “TRUE” yields TRUE while the character string “FALSE” yields FALSE. A number with an NA assignment yields NA while a character string other than “TRUE” or “FALSE” yields NA.

x <- 0
print(as.logical(x))

[1] FALSE

x <- 1
print(as.logical(x))

[1] TRUE

x <- 2
print(as.logical(x))

[1] TRUE

x <- NA
print(as.logical(x))

[1] NA

x <- "TRUE"
print(as.logical(x))

[1] TRUE

x <- "FALSE"
print(as.logical(x))

[1] FALSE

x <- "Some text"
print(as.logical(x))

[1] NA

17.4 Data Models

It would be very limiting if we could only work with single numbers, character strings, logicals, or dates. In other words, there is a need to be able to combine single data points and collectively represent them as a variable. Table 17.2 lists and describes the base R data models, which are discussed in this section.

**Table 17.2.** R data models.
Data Model	Description	Assign	Convert	Check
Vector	One-dimensional array	vector()	as.vector()	is.vector()
Matrix	Two-dimensional array	matrix()	as.matrix()	is.matrix()
Array	n-dimensional array	array()	as.array()	is.array()
Data Frame	Table where each row can have a different data type	data.frame()	as.data.frame()	is.data.frame()
List	Container object to hold other data	list()	as.list()	is.list()

17.4.1 Vectors

Vectors are a one-dimensional array. Instead of storing a single piece of information, you can provide a set of values or character strings. Note that all data components must be of the same type (for example, numeric, character, or logical); you cannot mix data types in a vector. This is different from some other programming languages. In the example, we have created objects to store numeric (x), character (y), and logical (z) data. Vectors that store only a single piece of data or a constant are called scalars; however, they are treated the same as vectors in R, so it is not necessary to make this distinction.

The c(), or combine, function is used to combine pieces of data into a single vector. It is one of the most commonly used functions in R, so you’ll get used to seeing it and using it.

x <- c(1, 2, 3, 4, 5,6, 7)
y <- c("GIS", "Spatial", "Analytics", "R", "Data Science", "Remote Sensing")
z <- c(TRUE, FALSE, TRUE, TRUE, FALSE)
print(x)

[1] 1 2 3 4 5 6 7

print(y)

[1] "GIS"            "Spatial"        "Analytics"      "R"             
[5] "Data Science"   "Remote Sensing"

print(z)

[1]  TRUE FALSE  TRUE  TRUE FALSE

To extract specific pieces of data, you can use square bracket notation. R starts indexing at 1 as opposed to 0, which is more common in other programming languages, such as Python and JavaScript. To call a single data element, just call the index in the square brackets. You can also call a range of contiguous values by calling the index range and using a colon. When doing so, the data points at the start and end index will be included in the subset along with all data points between them. In other languages it is common to not include the value at the last provided index. If you want to call discontinuous data points, you can use the c() function and provide a list of indices.

print(y[2])

[1] "Spatial"

print(y[3:5])

[1] "Analytics"    "R"            "Data Science"

print(y[c(1, 3, 5)])

[1] "GIS"          "Analytics"    "Data Science"

print(y[c(1, 3:6)])

[1] "GIS"            "Analytics"      "R"              "Data Science"  
[5] "Remote Sensing"

17.4.2 Matrices

A matrix is a two-dimensional array (or, values stored in rows and columns). In GIS and remote sensing, this is similar to a single-band raster grid where each cell is defined by a row and column combination. All the cells in a matrix must have the same data type (for example, a matrix of numeric values). A matrix is generated using the matrix() function. You can provide a set of values, the number of rows, and the number of columns. The byrow argument is used to determine how to populate the matrix with the provided data. If set to TRUE, the values will fill across the rows sequential. Or, all columns in a row will be filled before moving on to the next row. FALSE means that columns will be filled sequentially. You can also provide column and row names using the dimnames parameter.

m <- matrix(1:50, nrow=10, ncol=5)
print(m)

      [,1] [,2] [,3] [,4] [,5]
 [1,]    1   11   21   31   41
 [2,]    2   12   22   32   42
 [3,]    3   13   23   33   43
 [4,]    4   14   24   34   44
 [5,]    5   15   25   35   45
 [6,]    6   16   26   36   46
 [7,]    7   17   27   37   47
 [8,]    8   18   28   38   48
 [9,]    9   19   29   39   49
[10,]   10   20   30   40   50

data1 <- c("A1", "B1", "C1", "A2", "B2", "C2", "A3", "B3", "C3")
rNames <- c("1", "2", "3")
cNames <- c("A", "B", "C")
m1 <- matrix(data1, nrow=3, ncol=3, byrow=TRUE, dimnames=list(rNames, cNames))
print(m1)

  A    B    C   
1 "A1" "B1" "C1"
2 "A2" "B2" "C2"
3 "A3" "B3" "C3"

Since we are now working in two-dimensions, we need to define two indices to extract a specific row/column or cell location from the matrix. The first value represents the row while the second value represents the column. So, [2,1] would indicate the value at row 2 and column 1. A blank in either position indicates to select all rows or all columns. So, selecting data is similar for matrices and vectors except that we need to specify a different number of indices. It is also possible to subset based on the row or column names as opposed to indices.

m1[1,]

   A    B    C 
"A1" "B1" "C1"

m1[,1]

   1    2    3 
"A1" "A2" "A3"

m1[1,2:3]

   B    C 
"B1" "C1"

m1[1:2,1:2]

  A    B   
1 "A1" "B1"
2 "A2" "B2"

m1[1, c(1,3)]

   A    C 
"A1" "C1"

m1["1", c("A", "B")]

   A    B 
"A1" "B1"

17.4.3 Arrays

What if you need to expand to more than two dimensions? Arrays allow you to store data in n-dimensions. For example, a three-dimensional array is similar to a multiband image where the first dimension represents rows, the second represents columns, and the third represents the image bands or channels. You could also think of a three-dimensional array as a cube where each cell in the cube is a smaller cube defined by its position in the three-dimensional space (This type of data structure is often referred to as a voxel). A four-dimensional array could be used to add a time component to the data, which would be difficult to visualize since we only have three spatial dimensions to work with. Similar to matrices, all values stored in an array must be of the same type (for example, a numeric array). If you work in Python, matrices and arrays in R are comparable to numpy arrays.

data2 <- seq(from=1, to=150, by=2)
rNames <- c("R1", "R2", "R3", "R4", "R5")
cNames <- c("C1", "C2", "C3", "C4", "C5")
bNames <- c("B1", "B2", "B3")
a1 <- array(data2, c(5, 5, 3), dimnames=list(rNames, cNames, bNames))
print(a1)

, , B1

   C1 C2 C3 C4 C5
R1  1 11 21 31 41
R2  3 13 23 33 43
R3  5 15 25 35 45
R4  7 17 27 37 47
R5  9 19 29 39 49

, , B2

   C1 C2 C3 C4 C5
R1 51 61 71 81 91
R2 53 63 73 83 93
R3 55 65 75 85 95
R4 57 67 77 87 97
R5 59 69 79 89 99

, , B3

    C1  C2  C3  C4  C5
R1 101 111 121 131 141
R2 103 113 123 133 143
R3 105 115 125 135 145
R4 107 117 127 137 147
R5 109 119 129 139 149

Since you now have more dimensions, you will need to provide more indices to extract specific values or ranges of values. The first argument will specify the indices for the first dimension (rows), the second will specify the second dimension (columns), and the third would be the third dimension (for example, image bands). Also similar to a matrix, you can define dimension names and use them to subset the data.

a1[1, 1, 1]

[1] 1

a1[1:3, 1:3, 1]

   C1 C2 C3
R1  1 11 21
R2  3 13 23
R3  5 15 25

17.4.4 Data Frames

Both matrices and arrays can only store data of the same type. Or, you cannot create columns with different data types. So, there is a need for yet another data model. A data frame is similar to a matrix; however, each column can hold different types of data. A data frame is very similar to a Microsoft Excel spreadsheet or a Pandas data frame in Python. We have found data frames to be the most common data type that we use in R. They are generally considered the workhorse of R data models.

In the provided example, we are creating a data frame to store information about courses. First, we generate vectors to store each column of data. Note that each column must have the same length or the same number of data points to combine them into a data frame. Here, we are generating a mix of numeric and character vectors. Using the data.frame() function, we then combine the vectors into a data frame. Once it is printed, you can see that each column took the name of its associated vector variable.

course_prefix <- c("Geog", "Geog", "Geol", "Geol", "Geog")
course_num <- c(107, 350, 101, 104, 455)
course_name <- c("Physical Geography", "GIScience", "Planet Earth", "Earth Through Time", "Remote Sensing")
enrollment <- c(210, 45, 235, 80, 35)
course_data <- data.frame(course_prefix, course_num, course_name, enrollment)
print(course_data)

  course_prefix course_num        course_name enrollment
1          Geog        107 Physical Geography        210
2          Geog        350          GIScience         45
3          Geol        101       Planet Earth        235
4          Geol        104 Earth Through Time         80
5          Geog        455     Remote Sensing         35

Extracting elements from a data frame is identical to extracting elements from a matrix since they are also two-dimensional. We must specify indices for both the rows and the columns. We can also use the column names or row names. Column names are automatically generated when a data frame is created. We use $ when referencing a column using its name (for example, df$Col1).

print(course_data[,1])

[1] "Geog" "Geog" "Geol" "Geol" "Geog"

print(course_data[1,])

  course_prefix course_num        course_name enrollment
1          Geog        107 Physical Geography        210

print(course_data[1,3])

[1] "Physical Geography"

print(course_data[,"course_name"])

[1] "Physical Geography" "GIScience"          "Planet Earth"      
[4] "Earth Through Time" "Remote Sensing"

print(course_data$course_name)

[1] "Physical Geography" "GIScience"          "Planet Earth"      
[4] "Earth Through Time" "Remote Sensing"

print(course_data$enrollment)

[1] 210  45 235  80  35

17.4.5 Lists

We think of lists as containers that store other data objects. Lists can be used to store multiple vectors, matrices, arrays, data frames, and even other lists. To call an element in a list, use $. You can then use the same selection methods for data models already discussed. We find that we don’t tend to create many lists. However, it is common for analyses to generate list objects that you will then need to work with or extract data or results from. This will be our primary use of lists in this course.

Lists in R are completely different from lists in Python. A Python list is more similar to an R vector other than that the elements in the Python list are not required to be of the same data type.

vec1 <- sample(1:1000, 50, replace=TRUE)
vec2 <- rnorm(200, mean=250, sd = 100)
data1 <- c("A1", "B1", "C1", "A2", "B2", "C2", "A3", "B3", "C3")
rnames <- c("1", "2", "3")
cnames <- c("A", "B", "C")
matrix1 <- matrix(data1, nrow=3, ncol=3, byrow=TRUE, dimnames=list(rnames, cnames))
data2 <- seq(from=1, to=150, by=2)
rnames <- c("R1", "R2", "R3", "R4", "R5")
cnames <- c("C1", "C2", "C3", "C4", "C5")
bnames <- c("B1", "B2", "B3")
array1 <- array(data2, c(5, 5, 3), dimnames=list(rnames, cnames, bnames))
list1 <- list(Vector_1 = vec1, Vector_2 = vec2, Matrix1 = matrix1, Array_1 = array1)
print(list1$Vector_1)

 [1] 306 638 928 906 224 397 436 749  23 800 385 553 211 126 427 106 988 105 860
[20]  44 529 756 840 401  44 693 238 691 781 668 979 933 644 814 502 885 144 859
[39] 999 825 110 976 397 271  63 460 584 754 186 562

print(list1$Array_1[1, 1, 1])

[1] 1

17.5 Factors

What if we need to create character data in which only certain values or levels are allowed? This is the use of the factor data type; it is similar to the character data type but with defined levels or values.

In the example, we are generating a random vector containing 1,500 records of different academic years. We then define the vector as a factor using the factor() function. To check to make sure the data are represented as a factor, we then use is.factor() (again, there are a lot of is. and as. functions in R). This returns TRUE, so we know that the data are now stored as factors. Using the levels() function, we can obtain a list of the available levels, in this case the academic years.

One component of factors that is a bit confusing is that each unique category is assigned a placeholder integer value. So, the data are actually being stored as integer codes, and each integer is associated with a specific category.

ac_year <- rep(c("Freshman",
                 "Sophmore",
                 "Junior",
                 "Senior", 
                 "Graduate"), 
               1500*c(0.35,0.20,0.15,0.20, 0.10))
ac_year2 <- factor(ac_year)
is.factor(ac_year2)

[1] TRUE

levels(ac_year2)

[1] "Freshman" "Graduate" "Junior"   "Senior"   "Sophmore"

It is also possible to specify an order for the factor levels to produce an ordered factor. When we printed the levels above, they printed in alphabetical order. However, it would make more sense to specify the order based on the academic progression. Whenever we create a factor, we can specify an order using order = TRUE and providing the levels in the desired order as the argument for the levels parameter. Checking the levels, we can see that they are now in the desired order.

ac_year3 <- factor(ac_year, 
                   order=TRUE, 
                   levels=c("Freshman",
                            "Sophmore",
                            "Junior",
                            "Senior", 
                            "Graduate"))
levels(ac_year3)

[1] "Freshman" "Sophmore" "Junior"   "Senior"   "Graduate"

If you subset your data, you may need to remove levels that are no longer being used or do not occur in the subset. This can be accomplished using the droplevels() function. By default, this function removes any levels not used in the data subset. Here, we have extracted out only “Senior” and “Graduate” records. However, after printing the levels, we see that all levels are still defined. Using droplevels() can fix this issue. The result can be checked using the levels() function. In the next chapter, we explore other methods for manipulating factors using the forcats package, which is part of the tidyverse metapackage. Here, we focused specifically on base R factor manipulation.

ac_year4 <- ac_year3[ac_year == "Senior" | ac_year == "Graduate"]
levels(ac_year4)

[1] "Freshman" "Sophmore" "Junior"   "Senior"   "Graduate"

ac_year5 <- droplevels(ac_year4)
levels(ac_year5)

[1] "Senior"   "Graduate"

17.6 Operators

17.6.1 Arithmetic Operators

The mathematical or arithmetic operators used in R are defined in Table 17.3. We use these throughout the text. Note that modulus (%%) returns remainder after division while integer division (%/%) rounds down to the nearest whole number following division. Examples are provided in the following code blocks.

**Table 17.3.** R math operators.
Operator	Meaning
+	Addition
-	Subtraction
*	Multiplication
/	Division
^	Exponentiation
%%	Modulus (remainder after division)
%/%	Integer division

x <- 7
y <- 3
print(x+y)

[1] 10

print(x-y)

[1] 4

print(x*y)

[1] 21

print(x/y)

[1] 2.333333

print(x^y)

[1] 343

print(x%%y)

[1] 1

print(x%/%y)

[1] 2

17.6.2 Comparison Operators

Comparison operators (Table 17.4) return logical TRUE or FALSE. In other words, they function as a test. Multiple tests or statements can be combined using logical operators. We will experiment with these in the next chapter as we build data queries using dplyr and the tidyverse.

**Table 17.4.** Comparison and logical operators.
Operator	Meaning
==	Equal to
!=	Not equal to
>	Greater than
<	Less than
>=	Greater than or equal to
<=	Less than or equal to
&	A AND B
\|	A OR B
!	A NOT B
%in%	Is element in a vector?

x <- 7
y <- 3
print(x==y)

[1] FALSE

print(x!=y)

[1] TRUE

print(x>y)

[1] TRUE

print(x<y)

[1] FALSE

17.7 Creating Sequences of Values

There are also multiple base functions available for generating vectors or sequences of values. The : operator returns a sequence of whole numbers including the provided start and stop value. seq() allows for defining a series of values by specifying a start value, stop value, and either a by or length.out argument. This function is used to generate a regularly spaced sequence of values where by is used to define the interval or length.out is used to define the number of desired values. rep() is used to repeat a value or vector of values a given number of times. If a times argument is provided, the value or sequence is repeated the desired number of times. In the example, the integers 2, 4, 6, and 8 are repeated 3 times. In contrast, each will replicate the values in the sequence such that all instances of the same value are grouped together.

x <- 1:10
print(x)

 [1]  1  2  3  4  5  6  7  8  9 10

y <- seq(1,2, by=.25)
print(y)

[1] 1.00 1.25 1.50 1.75 2.00

z <- seq(1,2, length.out=5)
print(z)

[1] 1.00 1.25 1.50 1.75 2.00

w <- rep(c(2,4,6,8), times=3)
print(w)

 [1] 2 4 6 8 2 4 6 8 2 4 6 8

u <- rep(c(2,4,6,8), each=3)
print(u)

 [1] 2 2 2 4 4 4 6 6 6 8 8 8

Arithmetic and logical operators can be applied to each element in a vector

x <- 1:10
print(x)

 [1]  1  2  3  4  5  6  7  8  9 10

y <- x+10
print(y)

 [1] 11 12 13 14 15 16 17 18 19 20

z <- x>5
print(z)

 [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

17.8 Other Useful Base R Functions

There are a variety of other useful base R functions. We have provided a list below of some commonly used functions.

ncol(): returns the number of columns in a data frame or matrix
nrow(): returns the number of rows in a vector, data frame, or matrix
length(): returns the number of data points in a data frame column or vector
rbind(): merges rows from multiple data objects with the same number of columns
cbind(): merges columns from multiple data objects with the same number of rows
merge(): merges two data frames based on common row or column names
setwd(): sets the working directory
getwd(): returns the working directory path as a string
table(): creates a contingency table of counts of each combination of factor levels
rnorm(): creates a specified number of random values based on a normal distribution
sample(): selects a specified number or random samples from a vector with or without replacement

17.9 Working with Table Data

17.9.1 Reading Tables

In all the examples provided in this section so far, we have generated data to experiment with. However, this is impractical or impossible for a large data set. More commonly, you will read data into R as opposed to create it from scratch.

Tables can be read in using the read.table() or read.csv() function. read.csv() is specifically used to read comma separate values files (.csv). To read in data you need to either set a working directory where the data are housed using setwd() or call the entire file path. We recommend setting a working directory.

R uses the forward slash in folder paths as opposed to the backslash, as is used by the Windows operating system. So, you have to switch these around in your code if you copy and paste from Windows File Explorer. You can also double up the backslashes.

For reading in tables, we will primarily use the read_csv() function from the readr package, which is part of the tidyverse as opposed to base R. This is demonstrated in the next chapter.

In the example, we are reading in a file called matts_movies.csv from our working directory. We specifying that the separator is commas, which is used by default in CSV files as the name implies, and that there is a header, so the first row should be treated as column names as opposed to data.

setwd("gslrData/chpt17/data/")
movies <- read.csv("matts_movies.csv", sep=",", header=TRUE, stringsAsFactors=TRUE)

Once data are read in, it is generally a good idea to explore or inspect them to make sure there are no issues and that they read in as anticipated. The head() function prints the first six records in the table while the tail() function prints the last six records. You can specify an additional n argument if you want a different number than the default six records. The str() function provides information about the structure of the data, including the data type for each column. If the data type is incorrectly defined, you can use the appropriate as.() function to make conversions. Note that these data are read in as a data frame without directly stating this since there are multiple columns of different data types.

When reading in data tables that contain character or string data, it is important to consider whether you want the data to be represented at a character or a factor. In versions of R prior to 4.0, the default was to convert all string data to factors when using read.csv(). However, the default in 4.0 or later is to maintain them as characters. The read.csv() function has an optional stringsAsFactors argument that can be used to change this behavior. Alternatively, you can use factor() or as.factor() to augment specific columns. In the example above, we used the stringsAsFactors argument to read in all character columns as factors.

The names() function can be used to print the column names of a table or store them in a vector. We can also change the names by providing a vector of new names. If you would like to only change a subset of names, you can provide an index or indices in square brackets.

head(movies)

                Movie.Name       Director Release.Year My.Rating  Genre Own
1            Almost Famous  Cameron Crowe         2000      9.99  Drama Yes
2 The Shawshank Redemption Frank Darabont         1994      9.98  Drama Yes
3            Groundhog Day   Harold Ramis         1993      9.96 Comedy Yes
4             Donnie Darko  Richard Kelly         2001      9.95 Sci-Fi Yes
5          Children of Men Alfonso Cuaron         2006      9.94 Sci-Fi Yes
6               Annie Hall    Woody Allen         1977      9.93 Comedy Yes

tail(movies)

                              Movie.Name           Director Release.Year
1847 The Nutty Proffessor II: The Klumps        Peter Segal         2000
1848                        Dreamcatcher    Lawrence Kasdan         2003
1849                              Jumper         Doug Liman         2008
1850                       Baby Geniuses          Bob Clark         1999
1851                         The Postman      Kevin Costner         1997
1852                  The Last Airbender M. Night Shyamalan         2010
     My.Rating  Genre Own
1847      1.76 Comedy  No
1848      1.65 Horror  No
1849      1.22 Action  No
1850      1.01 Family  No
1851      0.88  Drama  No
1852      0.67 Action  No

str(movies)

'data.frame':   1852 obs. of  6 variables:
 $ Movie.Name  : Factor w/ 1852 levels "D\xe9j\xe0 Vu",..: 103 1622 596 431 325 127 1161 908 993 1187 ...
 $ Director    : Factor w/ 801 levels "G\xe9la Babluani",..: 113 243 289 634 41 793 786 139 382 175 ...
 $ Release.Year: int  2000 1994 1993 2001 2006 1977 1998 2000 2007 1995 ...
 $ My.Rating   : num  9.99 9.98 9.96 9.95 9.94 9.93 9.92 9.91 9.9 9.88 ...
 $ Genre       : Factor w/ 18 levels "Action","Classic",..: 6 6 4 13 13 4 11 16 16 16 ...
 $ Own         : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...

names(movies)

[1] "Movie.Name"   "Director"     "Release.Year" "My.Rating"    "Genre"       
[6] "Own"

nrow(movies)

[1] 1852

ncol(movies)

[1] 6

Microsoft Excel spreadsheets can be read in using the read.xlsx() function from the xlsx package. You need to load in the xlsx package before you can use this function. This can be accomplished using the library() or require() functions. Generally, to read in packages it is preferred to use library() unless you are calling from inside a function. In that case, it is best to use require(). Remember that you must install packages before they can be used.

Note that there are additional functions and packages available to call in other data types including XML, SPSS, SAS, Stata, NetCDF, HDF5, and database files. We discuss reading in vector and raster geospatial data elsewhere in the text. The data.table and readr packages are useful when working with large data sets and tables.

17.9.2 Writing Tables

There are also functions available to write results out to permanent files on disk. For example, write.csv() or write.table() can be used to save results as CSV or text files.

The foreign package provides the write.dbf() function for saving to .dbf format. The xlxs package provides write.xlxs() for saving results to Excel spreadsheet format.

If a folder path is not specified, the result is written to the current working directory. If you do not want to save the result to the current working directory, you must specify the entire desired file path.

17.10 Comments

Comments are meant to make your code more interpretable. They are meant for humans as opposed to computers. Commented lines will not be executed. We highly recommend commenting your code, as you may forget how or why you did something or someone else may want to use or manipulate your code. You can also comment out lines that you don’t want to execute temporarily, perhaps during the debugging process.

Different programming languages define comments differently. R uses #. Any line beginning with # will not be executed. The code block below provides examples of commenting.

#Build columns as vectors
course_prefix <- c("Geog", "Geog", "Geol", "Geol", "Geog")
course_num <- c(107, 350, 101, 104, 455)
course_name <- c("Physical Geography", "GIScience", "Planet Earth", "Earth Through Time", "Remote Sensing")
enrollment <- c(210, 45, 235, 80, 35)
#Combine vectors to a data frame
course_data <- data.frame(course_prefix, course_num, course_name, enrollment)

17.11 Quitting R

The q() function can be used to end your R session and save your work. You can also use the save methods available in the File menu in RStudio.

17.12 Concluding Remarks

That’s it! It might seem that you haven’t learned much R yet. However, data types and structures are a large component of working in this environment. So, this is an accomplishment. You will get practice working with many of the techniques discussed here throughout the text.

17.13 Questions

Explain the difference between the character and factor R data types.
Explain the difference between R vectors, matrices, and arrays.
Explain the difference between R matrices and data frames.
Explain the modulus operation.
Explain two methods to get help for implementing an R function within RStudio.
Explain how to access official R package documentation on the web.
What is the purpose of the base R head() function?
Explain the difference between the library() and require() functions.