# The R Language Part II

## Objectives

- Further explore the R language
- Create your own
**functions** - Use
**while**and**for loops** - Use
**if…else**,**next**and**break**, and**which**

## Overview

In this second section on the R language, we will explore more advanced scripting techniques including **functions**, **loops**, and **if…else** statements. Most programming and scripting languages provide these types of capabilities, and R is no exception. Such techniques can be very useful in helping you process large volumes of data efficiently or automate repetitive tasks.

## Create Your Own Function

As you’ve already seen, R provides a multitude of functions either from base R for from one of the many available packages. However, sometimes you may want to define your own function to perform an operation or analysis specific to your work or to make a task easier to implement. With this in mind, R has a built-in *function()* function, which allows you to define your own function. I will demonstrate this using a few examples.

In the first example, I am creating a function to rescale data. The function will accept an input **vector** or **data frame** column and rescale the data from 0 to 1. You can then define a value to multiply by to rescale the 0-1 data to a new scale. Note that the function is being stored to a **variable** (*scale2*), which can later be used to call the function. It accepts two arguments: *data* and *scale*. What the function actually does is defined within the curly brackets. Here is the process:

- The maximum value is calculated and stored as a variable (
*max1*) - The minimum value is calculated and stored as a variable (
*min1*) - The minimum is subtracted from each data point and stored as a vector (
*n*) - The data range is calculated and stored as a variable (
*d*) - The data are rescaled from 0-1
- The rescaled data are multiplied by a factor to change the scale
- The final rescaled data are returned

Note that variables generated within a function can not be used outside of the function. This is known as **local scope**. Or, variables defined in a function are local variables and can only be used within the function. In contrast, variables outside of the function can be used globally, or have **global scope** or are global variables.

```
scale2 <- function(data, scale){
max1 <- max(data)
min1 <- min(data)
n <- data-min1
d <- max1-min1
s <- n/d
s2 <- s*scale
return(s2)
}
```

Once a function is defined, it can be used, which is really the whole point of creating it in the first place. In the example below I am creating a numeric vector (*x*). I am then using the new function to rescale it from 0 to 100 and then from 0 to 1.

```
x <- c(1, 14, 21, 16, 18, 16, 19, 20, 6, 8, 9, 11, 17)
x100 <- scale2(x, 100)
x1 <- scale2(x, 1)
print(x)
[1] 1 14 21 16 18 16 19 20 6 8 9 11 17
print(x100)
[1] 0 65 100 75 85 75 90 95 25 35 40 50 80
print(x1)
[1] 0.00 0.65 1.00 0.75 0.85 0.75 0.90 0.95 0.25 0.35 0.40 0.50 0.80
```

Functions that you create are not permanent. So, if you would like to use the function in a new script, it will need to be defined again in that script.

I have provided a second example for calculating **root mean square error**, or **RMSE**, for an assessment of georeferencing results. Here, four arguments are required (the correct and predicted coordinates in the x and y directions). The function then calculates **RMSE** components, including **residuals**, **square residuals**, **RMSE _{x}**,

**RMSE**, and

_{y}**RMSE**, and returns a

_{Total}**list**object holding this information. I then test the function on some example data and return the

**RMSE**measures.

```
rmse_georef <- function(x_c, y_c, x_p, y_p){
x_residual <- x_c - x_p
y_residual <- y_c - y_p
x_residual_sq <- x_residual^2
y_residual_sq <- y_residual^2
rmse_x <- sqrt(sum(x_residual_sq)/length(x_residual))
rmse_y <- sqrt(sum(y_residual_sq)/length(y_residual))
rmse_total <- sqrt(rmse_x^2 + rmse_y^2)
rmse_list <- list(x_residual, y_residual, x_residual_sq, y_residual_sq, rmse_x, rmse_y, rmse_total)
names(rmse_list) <- c("X.Residuals", "Y.Residuals", "X.Sq.Residuals", "Y.Sq.Residuals", "RMSE.X", "RMSE.Y", "RMSE.Total")
return(rmse_list)
}
x_actual <- c(584026.624, 583179.7805, 589507.5837, 579463.0782, 585908.4986, 588190.2715)
y_actual <- c(4474131.442, 4479283.074, 4476648.449, 4478436.23, 4470697.021, 4480318.105)
x_predicted <-c(584041.7902, 583211.7964, 589496.2211, 579447.4653, 585909.7985, 588206.0155)
y_predicted <- c(4474159.608, 4479295.524, 4476664.073, 4478462.252, 4470719.12, 4480344.345)
example_rmse <-rmse_georef(x_actual, y_actual, x_predicted, y_predicted)
print(example_rmse$RMSE.Total)
[1] 28.64713
print(example_rmse$RMSE.X)
[1] 17.68929
print(example_rmse$RMSE.Y)
[1] 22.53325
```

## Using While Loops

**While loops** are used to perform some process while a condition is TRUE. In the example, the variable *x1* is initially 100. The loop then prints *x1* followed by subtracting 1 from the current value. This process will continue until the condition is no longer TRUE. In this case it will continue until *x1* reaches 90 at which point the condition will evaluate to FALSE and the loop will be exited. It is important to define a condition that will eventually evaluate to FALSE. If your condition always remains TRUE, then the loop will never stop. This is know as an **infinite loop**.

## Using For Loops

I don’t use **while loops** that often in R. However, I tend to use **for loops** frequently. In contrast to **while loops**, **for loops** do not rely on a condition. Instead, a process is executed for all features. For example, you could process all data points in a **vector** or **data frame** columns or rows. You could perform the same process for all files in a list of files.

In the first example, I have create a **vector** of country names. I then use a **for loop** to process each element. Specifically, the loop will print “I would like to go to” followed by the county name.

```
x <- c("Austria", "New Zealand", "Norway", "Canada", "Cuba")
for(i in x){
print(paste("I would like to go to ", i, ".", sep=""))
}
[1] "I would like to go to Austria."
[1] "I would like to go to New Zealand."
[1] "I would like to go to Norway."
[1] "I would like to go to Canada."
[1] "I would like to go to Cuba."
```

Note that *i* is simply a variable and does not need to be called *i* as shown in this example.

```
x <- c("Austria", "New Zealand", "Norway", "Canada", "Cuba")
for(country in x){
print(paste("I would like to go to ", country, ".", sep=""))
}
[1] "I would like to go to Austria."
[1] "I would like to go to New Zealand."
[1] "I would like to go to Norway."
[1] "I would like to go to Canada."
[1] "I would like to go to Cuba."
```

I have found **for loops** to be especially useful for processing multiple files. Here is an example for **raster** grids. Note that some of the functions used here have not been discussed yet. We will discuss them when we talk about spatial data in R. I am just providing this example to make a point that **for loops** can help you process geospatial data efficiently. For example, you could process thousands of files using only a few lines of code. I have not provided these data, so you will not be able to execute this example. Here I am reading all elevation grids in a folder then finding all cells that have an elevation greater than 500 meters. I then write the results out to binary **raster** grids.

## If and If…Else

**If** is used to only perform some operation if the condition is TRUE. In this example, the statement is printed because the condition evaluated to TRUE, or because 4 is less than or equal to 6. If you change the value stored in the variable *a* to a number larger than 6, nothing will be printed.

What if you want different operations to be performed based on whether a single condition is true? This can be accomplished using an **if…else** statement. If the condition evaluates to TRUE, then the operation in the **if** statement will be performed. If it evaluates to FALSE, then the condition in the **else** statement will be executed. In the example, “Value greater than 6” is returned because the condition evaluates to FALSE, so the operation defined within **else** is executed.

```
a <- 8
if(a <=6){
print("Value less than 6.")
}else{
print("Value greater than than 6.")
}
[1] "Value greater than than 6."
```

What if you want to include more than one criteria? Then you can include **else if** as shown below. Note that you can include multiple **else if** conditions.

```
a <- 8
if(a <= 6){
print("Value less than or equatl to 6.")
}else if(a > 6 & a <10){
print("Value is between 6 and 10.")
}else{
print("Value is greater than or equal to 10.")
}
[1] "Value is between 6 and 10."
```

In this example, I am now providing a **vector** with multiple elements. By combining a **for loop** and **if-else**, I obtain a result for each data point.

```
b <- c(1, 3, 5, 7, 9, 11)
for(num in b){
if(num <= 6){
print("Value less than or equal to 6.")
}else if(num > 6 & num <10){
print("Value is between 6 and 10.")
}else{
print("Value is greater than or equal to 10.")
}
}
[1] "Value less than or equal to 6."
[1] "Value less than or equal to 6."
[1] "Value less than or equal to 6."
[1] "Value is between 6 and 10."
[1] "Value is between 6 and 10."
[1] "Value is greater than or equal to 10."
```

Lastly, this example shows how to combine the results to a single **vector**. In this case, the each result, regardless of the evaluation of the conditions, is written to an initially empty vector (*c*).

```
b <- c(1, 3, 5, 7, 9, 11)
c = c()
for(num in b){
if(num <= 6){
c=c(c, paste(num, "is less than or equal to 6.", sep=" "))
}else if(num > 6 & num <10){
c=c(c, paste(num, "is between 10 and 6.", sep=" "))
}else{
c=c(c, paste(num, "is greater than or equal to 10.", sep=" "))
}
}
print(c)
[1] "1 is less than or equal to 6."
[2] "3 is less than or equal to 6."
[3] "5 is less than or equal to 6."
[4] "7 is between 10 and 6."
[5] "9 is between 10 and 6."
[6] "11 is greater than or equal to 10."
```

## Next and Break

In a **for loop** it is possible to stop the loop prematurely. For example, the **next** statement will allow you to skip over the next statement in the loop while **break** will allow you to exit the loop completely. This is demonstrated in the example below. First, I generate a sequence of numbers from 1 to 21. Then I set up a **for loop** that will append the number to an empty vector (*b*) unless it is an odd number (in this case **modulus** will yield a remainder of 1). I do this using **next** inside of an **if** statement, which will cause the loop to skip the odd numbers. I also don’t want to append values larger then 15 to the **vector**, so I use **break** to stop the loop once it reaches 15.

## Which

**Which** in R is used to return the index for features in a **vector** or rows in a **data frame** that meet a certain criteria. You could then use these indices for selection.

Now that you have an understanding of programming in R, we can move on to a discussion of data analysis in R. Throughout this course, we will apply the coding, data manipulation, and analysis techniques learned in these early sections. In the next section we will explore data summarization and simple statistical tests.