help(c)
?c
17 Base R (Part I)
17.1 Topics Covered
- Base R syntax and variables
- Base R data types: numeric, integer, character, logical, and date
- Base R data models: vector, matrix, array, data frame, and list
- Arithmetic and comparison operators
- Useful base R functions
- Reading and writing tabulated data
- Using comments
17.2 Introduction
This first chapter of this appendix introduces base R and assumes that you have no prior experience using the language. If you have a solid understanding of R, feel free to skip this chapter or just skim it. With that said, we feel strongly that a foundational understanding of the R language, and coding etiquette in general, goes a long way.
17.2.1 Getting Help
Base R and R packages have built-in help or documentation, which can be accessed using the help()
function. Inside of the help()
function, you must place the function, class, or data object name in quotes. The documentation will then load in the Help tab in RStudio. For functions, it is common for example code to be provided. In the example below, we are obtaining help for the base R c()
function, which is used to combine objects into a single object. For example, two vectors can be merged to a single vector.
The help is also useful for finding the default arguments for function parameters.
17.2.2 Defining Variables
Variables are used to store values or data objects for use in your code. R traditionally uses the <-
operator for variable assignment; however, you can also use =
. In the code block below, we are assigning the number 1
to the variable x
, the character string "GIS"
to the variable y
, and the logical TRUE
to the variable z
.
There are a few rules for variable names:
- Cannot use reserved words, or those that have a special use in R
- Cannot start with numbers
- Only dots/periods and underscore special characters can be used in variable names
We like to keep variable names short so that they are easy to work with and call. Variables can be overwritten if they are used more than once. So, all of your variables must have unique names if you want to maintain them throughout a script. Variable names are also case sensitive; for example, x
and X
are treated as two separate variables.
The print()
function can be used to return, or print, the content of a variable in the console.
17.3 Data Types
There are several base data types in R, which are described in Table 17.1. Note that numbers are not quoted while characters are quoted. If a number is quoted, it will be treated like a character. In other words, mathematical operations cannot be performed using it. By default, numbers will be treated as real numbers. In order for them to be treated like whole numbers, the integer type can be defined. Complex numbers allow for both real and imaginary components. We do not make use of this data type in this text. The character type requires that the string be placed in quotes. Logicals, either TRUE
or FALSE
, must be uppercase and not quoted.
In order to determine the data type of a variable, you can use the typeof()
or class()
function. Table 17.1 also provides the functions used to define, convert, and check the data type of a variable. Functions beginning with as.
are used for conversion between types while functions beginning with is.
are used to check the type and return a logical TRUE
or FALSE
. We demonstrate the use of these functions below.
We will save our discussion of factors for later in the chapter.
The Date data type can be complex since date data formatting is not standardized. Note the need to define the date formatting in our example. We discuss the specifics of working with dates as necessary within the text.
Data Type | Description | Assign | Convert | Check |
---|---|---|---|---|
Numeric |
Number treated as a number (can perform mathematical operations on it); treated as having decimal values by default |
numeric() |
as.numeric() |
is.numeric() |
Integer |
Number treated as a whole number without decimal values |
integer() |
as.integer() |
is.integer() |
Complex |
Number with real and imaginary components |
complex() |
as.complex() |
is.complex() |
Character |
Character string or numbers not treated as numbers; must be quoted |
character() |
as.character() |
is.character() |
Factor |
Characters with pre-defined levels; has associated numeric code; can be unordered or ordered |
factor() |
as.factor() |
is.factor() |
Logical |
Boolean TRUE or FALSE; must be stated without quotes and in uppercase |
logical() |
as.logical() |
is.logical() |
Date |
Date treated as date; different date formats supported |
Date() |
as.Date() |
is.Date() |
Before moving on, we will experiment with data type conversion. Here, you can see that the variable x
is initially defined as a numeric type. Using as.integer()
, we convert it to an integer data type. It is also possible to convert numeric data to a character or a string using as.character()
. Characters that represent numbers can be converted back to numeric using as.numeric()
. Characters can be converted to dates using as.Date()
; however, you will need to provide some additional info to define how the dates are formatted.
x <- 1
typeof(x)
[1] "double"
y <- as.integer(x)
typeof(y)
[1] "integer"
z <- as.character(y)
typeof(z)
[1] "character"
w <- as.numeric(z)
typeof(w)
[1] "double"
a <- TRUE
typeof(a)
[1] "logical"
[1] "character"
d2 <- as.Date(d, "%m/%d/%Y")
When converting data to a logical type, the result is not always obvious. As demonstrated below, when converting a number to a logical, 0 yields FALSE
and any other value yields TRUE
. The character string “TRUE” yields TRUE
while the character string “FALSE” yields FALSE
. A number with an NA
assignment yields NA
while a character string other than “TRUE” or “FALSE” yields NA
.
x <- 0
print(as.logical(x))
[1] FALSE
x <- 1
print(as.logical(x))
[1] TRUE
x <- 2
print(as.logical(x))
[1] TRUE
x <- NA
print(as.logical(x))
[1] NA
x <- "TRUE"
print(as.logical(x))
[1] TRUE
x <- "FALSE"
print(as.logical(x))
[1] FALSE
x <- "Some text"
print(as.logical(x))
[1] NA
17.4 Data Models
It would be very limiting if we could only work with single numbers, character strings, logicals, or dates. In other words, there is a need to be able to combine single data points and collectively represent them as a variable. Table 17.2 lists and describes the base R data models, which are discussed in this section.
Data Model | Description | Assign | Convert | Check |
---|---|---|---|---|
Vector |
One-dimensional array |
vector() |
as.vector() |
is.vector() |
Matrix |
Two-dimensional array |
matrix() |
as.matrix() |
is.matrix() |
Array |
n-dimensional array |
array() |
as.array() |
is.array() |
Data Frame |
Table where each row can have a different data type |
data.frame() |
as.data.frame() |
is.data.frame() |
List |
Container object to hold other data |
list() |
as.list() |
is.list() |
17.4.1 Vectors
Vectors are a one-dimensional array. Instead of storing a single piece of information, you can provide a set of values or character strings. Note that all data components must be of the same type (for example, numeric, character, or logical); you cannot mix data types in a vector. This is different from some other programming languages. In the example, we have created objects to store numeric (x
), character (y
), and logical (z
) data. Vectors that store only a single piece of data or a constant are called scalars; however, they are treated the same as vectors in R, so it is not necessary to make this distinction.
The c()
, or combine, function is used to combine pieces of data into a single vector. It is one of the most commonly used functions in R, so you’ll get used to seeing it and using it.
x <- c(1, 2, 3, 4, 5,6, 7)
y <- c("GIS", "Spatial", "Analytics", "R", "Data Science", "Remote Sensing")
z <- c(TRUE, FALSE, TRUE, TRUE, FALSE)
print(x)
[1] 1 2 3 4 5 6 7
print(y)
[1] "GIS" "Spatial" "Analytics" "R"
[5] "Data Science" "Remote Sensing"
print(z)
[1] TRUE FALSE TRUE TRUE FALSE
To extract specific pieces of data, you can use square bracket notation. R starts indexing at 1 as opposed to 0, which is more common in other programming languages, such as Python and JavaScript. To call a single data element, just call the index in the square brackets. You can also call a range of contiguous values by calling the index range and using a colon. When doing so, the data points at the start and end index will be included in the subset along with all data points between them. In other languages it is common to not include the value at the last provided index. If you want to call discontinuous data points, you can use the c()
function and provide a list of indices.
17.4.2 Matrices
A matrix is a two-dimensional array (or, values stored in rows and columns). In GIS and remote sensing, this is similar to a single-band raster grid where each cell is defined by a row and column combination. All the cells in a matrix must have the same data type (for example, a matrix of numeric values). A matrix is generated using the matrix()
function. You can provide a set of values, the number of rows, and the number of columns. The byrow
argument is used to determine how to populate the matrix with the provided data. If set to TRUE
, the values will fill across the rows sequential. Or, all columns in a row will be filled before moving on to the next row. FALSE
means that columns will be filled sequentially. You can also provide column and row names using the dimnames
parameter.
[,1] [,2] [,3] [,4] [,5]
[1,] 1 11 21 31 41
[2,] 2 12 22 32 42
[3,] 3 13 23 33 43
[4,] 4 14 24 34 44
[5,] 5 15 25 35 45
[6,] 6 16 26 36 46
[7,] 7 17 27 37 47
[8,] 8 18 28 38 48
[9,] 9 19 29 39 49
[10,] 10 20 30 40 50
data1 <- c("A1", "B1", "C1", "A2", "B2", "C2", "A3", "B3", "C3")
rNames <- c("1", "2", "3")
cNames <- c("A", "B", "C")
m1 <- matrix(data1, nrow=3, ncol=3, byrow=TRUE, dimnames=list(rNames, cNames))
print(m1)
A B C
1 "A1" "B1" "C1"
2 "A2" "B2" "C2"
3 "A3" "B3" "C3"
Since we are now working in two-dimensions, we need to define two indices to extract a specific row/column or cell location from the matrix. The first value represents the row while the second value represents the column. So, [2,1]
would indicate the value at row 2 and column 1. A blank in either position indicates to select all rows or all columns. So, selecting data is similar for matrices and vectors except that we need to specify a different number of indices. It is also possible to subset based on the row or column names as opposed to indices.
17.4.3 Arrays
What if you need to expand to more than two dimensions? Arrays allow you to store data in n-dimensions. For example, a three-dimensional array is similar to a multiband image where the first dimension represents rows, the second represents columns, and the third represents the image bands or channels. You could also think of a three-dimensional array as a cube where each cell in the cube is a smaller cube defined by its position in the three-dimensional space (This type of data structure is often referred to as a voxel). A four-dimensional array could be used to add a time component to the data, which would be difficult to visualize since we only have three spatial dimensions to work with. Similar to matrices, all values stored in an array must be of the same type (for example, a numeric array). If you work in Python, matrices and arrays in R are comparable to numpy arrays.
data2 <- seq(from=1, to=150, by=2)
rNames <- c("R1", "R2", "R3", "R4", "R5")
cNames <- c("C1", "C2", "C3", "C4", "C5")
bNames <- c("B1", "B2", "B3")
a1 <- array(data2, c(5, 5, 3), dimnames=list(rNames, cNames, bNames))
print(a1)
, , B1
C1 C2 C3 C4 C5
R1 1 11 21 31 41
R2 3 13 23 33 43
R3 5 15 25 35 45
R4 7 17 27 37 47
R5 9 19 29 39 49
, , B2
C1 C2 C3 C4 C5
R1 51 61 71 81 91
R2 53 63 73 83 93
R3 55 65 75 85 95
R4 57 67 77 87 97
R5 59 69 79 89 99
, , B3
C1 C2 C3 C4 C5
R1 101 111 121 131 141
R2 103 113 123 133 143
R3 105 115 125 135 145
R4 107 117 127 137 147
R5 109 119 129 139 149
Since you now have more dimensions, you will need to provide more indices to extract specific values or ranges of values. The first argument will specify the indices for the first dimension (rows), the second will specify the second dimension (columns), and the third would be the third dimension (for example, image bands). Also similar to a matrix, you can define dimension names and use them to subset the data.
a1[1, 1, 1]
[1] 1
a1[1:3, 1:3, 1]
C1 C2 C3
R1 1 11 21
R2 3 13 23
R3 5 15 25
17.4.4 Data Frames
Both matrices and arrays can only store data of the same type. Or, you cannot create columns with different data types. So, there is a need for yet another data model. A data frame is similar to a matrix; however, each column can hold different types of data. A data frame is very similar to a Microsoft Excel spreadsheet or a Pandas data frame in Python. We have found data frames to be the most common data type that we use in R. They are generally considered the workhorse of R data models.
In the provided example, we are creating a data frame to store information about courses. First, we generate vectors to store each column of data. Note that each column must have the same length or the same number of data points to combine them into a data frame. Here, we are generating a mix of numeric and character vectors. Using the data.frame()
function, we then combine the vectors into a data frame. Once it is printed, you can see that each column took the name of its associated vector variable.
course_prefix <- c("Geog", "Geog", "Geol", "Geol", "Geog")
course_num <- c(107, 350, 101, 104, 455)
course_name <- c("Physical Geography", "GIScience", "Planet Earth", "Earth Through Time", "Remote Sensing")
enrollment <- c(210, 45, 235, 80, 35)
course_data <- data.frame(course_prefix, course_num, course_name, enrollment)
print(course_data)
course_prefix course_num course_name enrollment
1 Geog 107 Physical Geography 210
2 Geog 350 GIScience 45
3 Geol 101 Planet Earth 235
4 Geol 104 Earth Through Time 80
5 Geog 455 Remote Sensing 35
Extracting elements from a data frame is identical to extracting elements from a matrix since they are also two-dimensional. We must specify indices for both the rows and the columns. We can also use the column names or row names. Column names are automatically generated when a data frame is created. We use $
when referencing a column using its name (for example, df$Col1
).
print(course_data[,1])
[1] "Geog" "Geog" "Geol" "Geol" "Geog"
print(course_data[1,])
course_prefix course_num course_name enrollment
1 Geog 107 Physical Geography 210
print(course_data[1,3])
[1] "Physical Geography"
print(course_data[,"course_name"])
[1] "Physical Geography" "GIScience" "Planet Earth"
[4] "Earth Through Time" "Remote Sensing"
print(course_data$course_name)
[1] "Physical Geography" "GIScience" "Planet Earth"
[4] "Earth Through Time" "Remote Sensing"
print(course_data$enrollment)
[1] 210 45 235 80 35
17.4.5 Lists
We think of lists as containers that store other data objects. Lists can be used to store multiple vectors, matrices, arrays, data frames, and even other lists. To call an element in a list, use $
. You can then use the same selection methods for data models already discussed. We find that we don’t tend to create many lists. However, it is common for analyses to generate list objects that you will then need to work with or extract data or results from. This will be our primary use of lists in this course.
Lists in R are completely different from lists in Python. A Python list is more similar to an R vector other than that the elements in the Python list are not required to be of the same data type.
vec1 <- sample(1:1000, 50, replace=TRUE)
vec2 <- rnorm(200, mean=250, sd = 100)
data1 <- c("A1", "B1", "C1", "A2", "B2", "C2", "A3", "B3", "C3")
rnames <- c("1", "2", "3")
cnames <- c("A", "B", "C")
matrix1 <- matrix(data1, nrow=3, ncol=3, byrow=TRUE, dimnames=list(rnames, cnames))
data2 <- seq(from=1, to=150, by=2)
rnames <- c("R1", "R2", "R3", "R4", "R5")
cnames <- c("C1", "C2", "C3", "C4", "C5")
bnames <- c("B1", "B2", "B3")
array1 <- array(data2, c(5, 5, 3), dimnames=list(rnames, cnames, bnames))
list1 <- list(Vector_1 = vec1, Vector_2 = vec2, Matrix1 = matrix1, Array_1 = array1)
print(list1$Vector_1)
[1] 306 638 928 906 224 397 436 749 23 800 385 553 211 126 427 106 988 105 860
[20] 44 529 756 840 401 44 693 238 691 781 668 979 933 644 814 502 885 144 859
[39] 999 825 110 976 397 271 63 460 584 754 186 562
print(list1$Array_1[1, 1, 1])
[1] 1
17.5 Factors
What if we need to create character data in which only certain values or levels are allowed? This is the use of the factor data type; it is similar to the character data type but with defined levels or values.
In the example, we are generating a random vector containing 1,500 records of different academic years. We then define the vector as a factor using the factor()
function. To check to make sure the data are represented as a factor, we then use is.factor()
(again, there are a lot of is.
and as.
functions in R). This returns TRUE
, so we know that the data are now stored as factors. Using the levels()
function, we can obtain a list of the available levels, in this case the academic years.
One component of factors that is a bit confusing is that each unique category is assigned a placeholder integer value. So, the data are actually being stored as integer codes, and each integer is associated with a specific category.
ac_year <- rep(c("Freshman",
"Sophmore",
"Junior",
"Senior",
"Graduate"),
1500*c(0.35,0.20,0.15,0.20, 0.10))
ac_year2 <- factor(ac_year)
is.factor(ac_year2)
[1] TRUE
levels(ac_year2)
[1] "Freshman" "Graduate" "Junior" "Senior" "Sophmore"
It is also possible to specify an order for the factor levels to produce an ordered factor. When we printed the levels above, they printed in alphabetical order. However, it would make more sense to specify the order based on the academic progression. Whenever we create a factor, we can specify an order using order = TRUE
and providing the levels in the desired order as the argument for the levels
parameter. Checking the levels, we can see that they are now in the desired order.
ac_year3 <- factor(ac_year,
order=TRUE,
levels=c("Freshman",
"Sophmore",
"Junior",
"Senior",
"Graduate"))
levels(ac_year3)
[1] "Freshman" "Sophmore" "Junior" "Senior" "Graduate"
If you subset your data, you may need to remove levels that are no longer being used or do not occur in the subset. This can be accomplished using the droplevels()
function. By default, this function removes any levels not used in the data subset. Here, we have extracted out only “Senior” and “Graduate” records. However, after printing the levels, we see that all levels are still defined. Using droplevels()
can fix this issue. The result can be checked using the levels()
function. In the next chapter, we explore other methods for manipulating factors using the forcats package, which is part of the tidyverse metapackage. Here, we focused specifically on base R factor manipulation.
ac_year4 <- ac_year3[ac_year == "Senior" | ac_year == "Graduate"]
levels(ac_year4)
[1] "Freshman" "Sophmore" "Junior" "Senior" "Graduate"
ac_year5 <- droplevels(ac_year4)
levels(ac_year5)
[1] "Senior" "Graduate"
17.6 Operators
17.6.1 Arithmetic Operators
The mathematical or arithmetic operators used in R are defined in Table 17.3. We use these throughout the text. Note that modulus (%%
) returns remainder after division while integer division (%/%
) rounds down to the nearest whole number following division. Examples are provided in the following code blocks.
Operator | Meaning |
---|---|
+ |
Addition |
- |
Subtraction |
* |
Multiplication |
/ |
Division |
^ |
Exponentiation |
%% |
Modulus (remainder after division) |
%/% |
Integer division |
17.6.2 Comparison Operators
Comparison operators (Table 17.4) return logical TRUE
or FALSE
. In other words, they function as a test. Multiple tests or statements can be combined using logical operators. We will experiment with these in the next chapter as we build data queries using dplyr and the tidyverse.
Operator | Meaning |
---|---|
== |
Equal to |
!= |
Not equal to |
> |
Greater than |
< |
Less than |
>= |
Greater than or equal to |
<= |
Less than or equal to |
& |
A AND B |
| |
A OR B |
! |
A NOT B |
%in% |
Is element in a vector? |
17.7 Creating Sequences of Values
There are also multiple base functions available for generating vectors or sequences of values. The :
operator returns a sequence of whole numbers including the provided start and stop value. seq()
allows for defining a series of values by specifying a start value, stop value, and either a by
or length.out
argument. This function is used to generate a regularly spaced sequence of values where by
is used to define the interval or length.out
is used to define the number of desired values. rep()
is used to repeat a value or vector of values a given number of times. If a times
argument is provided, the value or sequence is repeated the desired number of times. In the example, the integers 2, 4, 6, and 8 are repeated 3 times. In contrast, each
will replicate the values in the sequence such that all instances of the same value are grouped together.
x <- 1:10
print(x)
[1] 1 2 3 4 5 6 7 8 9 10
[1] 1.00 1.25 1.50 1.75 2.00
[1] 1.00 1.25 1.50 1.75 2.00
[1] 2 4 6 8 2 4 6 8 2 4 6 8
[1] 2 2 2 4 4 4 6 6 6 8 8 8
Arithmetic and logical operators can be applied to each element in a vector
17.8 Other Useful Base R Functions
There are a variety of other useful base R functions. We have provided a list below of some commonly used functions.
-
ncol()
: returns the number of columns in a data frame or matrix -
nrow()
: returns the number of rows in a vector, data frame, or matrix -
length()
: returns the number of data points in a data frame column or vector -
rbind()
: merges rows from multiple data objects with the same number of columns -
cbind()
: merges columns from multiple data objects with the same number of rows -
merge()
: merges two data frames based on common row or column names -
setwd()
: sets the working directory -
getwd()
: returns the working directory path as a string -
table()
: creates a contingency table of counts of each combination of factor levels -
rnorm()
: creates a specified number of random values based on a normal distribution -
sample()
: selects a specified number or random samples from a vector with or without replacement
17.9 Working with Table Data
17.9.1 Reading Tables
In all the examples provided in this section so far, we have generated data to experiment with. However, this is impractical or impossible for a large data set. More commonly, you will read data into R as opposed to create it from scratch.
Tables can be read in using the read.table()
or read.csv()
function. read.csv()
is specifically used to read comma separate values files (.csv). To read in data you need to either set a working directory where the data are housed using setwd()
or call the entire file path. We recommend setting a working directory.
R uses the forward slash in folder paths as opposed to the backslash, as is used by the Windows operating system. So, you have to switch these around in your code if you copy and paste from Windows File Explorer. You can also double up the backslashes.
For reading in tables, we will primarily use the read_csv()
function from the readr package, which is part of the tidyverse as opposed to base R. This is demonstrated in the next chapter.
In the example, we are reading in a file called matts_movies.csv from our working directory. We specifying that the separator is commas, which is used by default in CSV files as the name implies, and that there is a header, so the first row should be treated as column names as opposed to data.
Once data are read in, it is generally a good idea to explore or inspect them to make sure there are no issues and that they read in as anticipated. The head()
function prints the first six records in the table while the tail()
function prints the last six records. You can specify an additional n
argument if you want a different number than the default six records. The str()
function provides information about the structure of the data, including the data type for each column. If the data type is incorrectly defined, you can use the appropriate as.()
function to make conversions. Note that these data are read in as a data frame without directly stating this since there are multiple columns of different data types.
When reading in data tables that contain character or string data, it is important to consider whether you want the data to be represented at a character or a factor. In versions of R prior to 4.0, the default was to convert all string data to factors when using read.csv()
. However, the default in 4.0 or later is to maintain them as characters. The read.csv()
function has an optional stringsAsFactors
argument that can be used to change this behavior. Alternatively, you can use factor()
or as.factor()
to augment specific columns. In the example above, we used the stringsAsFactors
argument to read in all character columns as factors.
The names()
function can be used to print the column names of a table or store them in a vector. We can also change the names by providing a vector of new names. If you would like to only change a subset of names, you can provide an index or indices in square brackets.
head(movies)
Movie.Name Director Release.Year My.Rating Genre Own
1 Almost Famous Cameron Crowe 2000 9.99 Drama Yes
2 The Shawshank Redemption Frank Darabont 1994 9.98 Drama Yes
3 Groundhog Day Harold Ramis 1993 9.96 Comedy Yes
4 Donnie Darko Richard Kelly 2001 9.95 Sci-Fi Yes
5 Children of Men Alfonso Cuaron 2006 9.94 Sci-Fi Yes
6 Annie Hall Woody Allen 1977 9.93 Comedy Yes
tail(movies)
Movie.Name Director Release.Year
1847 The Nutty Proffessor II: The Klumps Peter Segal 2000
1848 Dreamcatcher Lawrence Kasdan 2003
1849 Jumper Doug Liman 2008
1850 Baby Geniuses Bob Clark 1999
1851 The Postman Kevin Costner 1997
1852 The Last Airbender M. Night Shyamalan 2010
My.Rating Genre Own
1847 1.76 Comedy No
1848 1.65 Horror No
1849 1.22 Action No
1850 1.01 Family No
1851 0.88 Drama No
1852 0.67 Action No
str(movies)
'data.frame': 1852 obs. of 6 variables:
$ Movie.Name : Factor w/ 1852 levels "D\xe9j\xe0 Vu",..: 103 1622 596 431 325 127 1161 908 993 1187 ...
$ Director : Factor w/ 801 levels "G\xe9la Babluani",..: 113 243 289 634 41 793 786 139 382 175 ...
$ Release.Year: int 2000 1994 1993 2001 2006 1977 1998 2000 2007 1995 ...
$ My.Rating : num 9.99 9.98 9.96 9.95 9.94 9.93 9.92 9.91 9.9 9.88 ...
$ Genre : Factor w/ 18 levels "Action","Classic",..: 6 6 4 13 13 4 11 16 16 16 ...
$ Own : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
names(movies)
[1] "Movie.Name" "Director" "Release.Year" "My.Rating" "Genre"
[6] "Own"
nrow(movies)
[1] 1852
ncol(movies)
[1] 6
Microsoft Excel spreadsheets can be read in using the read.xlsx()
function from the xlsx package. You need to load in the xlsx package before you can use this function. This can be accomplished using the library()
or require()
functions. Generally, to read in packages it is preferred to use library()
unless you are calling from inside a function. In that case, it is best to use require()
. Remember that you must install packages before they can be used.
Note that there are additional functions and packages available to call in other data types including XML, SPSS, SAS, Stata, NetCDF, HDF5, and database files. We discuss reading in vector and raster geospatial data elsewhere in the text. The data.table and readr packages are useful when working with large data sets and tables.
17.9.2 Writing Tables
There are also functions available to write results out to permanent files on disk. For example, write.csv()
or write.table()
can be used to save results as CSV or text files.
The foreign package provides the write.dbf()
function for saving to .dbf format. The xlxs package provides write.xlxs()
for saving results to Excel spreadsheet format.
If a folder path is not specified, the result is written to the current working directory. If you do not want to save the result to the current working directory, you must specify the entire desired file path.
17.11 Quitting R
The q()
function can be used to end your R session and save your work. You can also use the save methods available in the File menu in RStudio.
17.12 Concluding Remarks
That’s it! It might seem that you haven’t learned much R yet. However, data types and structures are a large component of working in this environment. So, this is an accomplishment. You will get practice working with many of the techniques discussed here throughout the text.
17.13 Questions
- Explain the difference between the character and factor R data types.
- Explain the difference between R vectors, matrices, and arrays.
- Explain the difference between R matrices and data frames.
- Explain the modulus operation.
- Explain two methods to get help for implementing an R function within RStudio.
- Explain how to access official R package documentation on the web.
- What is the purpose of the base R
head()
function? - Explain the difference between the
library()
andrequire()
functions.
17.10 Comments
Comments are meant to make your code more interpretable. They are meant for humans as opposed to computers. Commented lines will not be executed. We highly recommend commenting your code, as you may forget how or why you did something or someone else may want to use or manipulate your code. You can also comment out lines that you don’t want to execute temporarily, perhaps during the debugging process.
Different programming languages define comments differently. R uses
#
. Any line beginning with#
will not be executed. The code block below provides examples of commenting.