The R Language Part II

Objectives

  1. Further explore the R language
  2. Create your own functions
  3. Use while and for loops
  4. Use if…else, next and break, and which

Overview

In this second section on the R language, we will explore more advanced scripting techniques including functions, loops, and if…else statements. Most programming and scripting languages provide these types of capabilities, and R is no exception. Such techniques can be very useful in helping you process large volumes of data efficiently or automate repetitive tasks.

Create Your Own Function

As you’ve already seen, R provides a multitude of functions either from base R for from one of the many available packages. However, sometimes you may want to define your own function to perform an operation or analysis specific to your work or to make a task easier to implement. With this in mind, R has a built-in function() function, which allows you to define your own function. I will demonstrate this using a few examples.

In the first example, I am creating a function to rescale data. The function will accept an input vector or data frame column and rescale the data from 0 to 1. You can then define a value to multiply by to rescale the 0-1 data to a new scale. Note that the function is being stored to a variable (scale2), which can later be used to call the function. It accepts two arguments: data and scale. What the function actually does is defined within the curly brackets. Here is the process:

  1. The maximum value is calculated and stored as a variable (max1)
  2. The minimum value is calculated and stored as a variable (min1)
  3. The minimum is subtracted from each data point and stored as a vector (n)
  4. The data range is calculated and stored as a variable (d)
  5. The data are rescaled from 0-1
  6. The rescaled data are multiplied by a factor to change the scale
  7. The final rescaled data are returned

Note that variables generated within a function can not be used outside of the function. This is known as local scope. Or, variables defined in a function are local variables and can only be used within the function. In contrast, variables outside of the function can be used globally, or have global scope or are global variables.

Once a function is defined, it can be used, which is really the whole point of creating it in the first place. In the example below I am creating a numeric vector (x). I am then using the new function to rescale it from 0 to 100 and then from 0 to 1.

Functions that you create are not permanent. So, if you would like to use the function in a new script, it will need to be defined again in that script.

I have provided a second example for calculating root mean square error, or RMSE, for an assessment of georeferencing results. Here, four arguments are required (the correct and predicted coordinates in the x and y directions). The function then calculates RMSE components, including residuals, square residuals, RMSEx, RMSEy, and RMSETotal, and returns a list object holding this information. I then test the function on some example data and return the RMSE measures.

Using While Loops

While loops are used to perform some process while a condition is TRUE. In the example, the variable x1 is initially 100. The loop then prints x1 followed by subtracting 1 from the current value. This process will continue until the condition is no longer TRUE. In this case it will continue until x1 reaches 90 at which point the condition will evaluate to FALSE and the loop will be exited. It is important to define a condition that will eventually evaluate to FALSE. If your condition always remains TRUE, then the loop will never stop. This is know as an infinite loop.

Using For Loops

I don’t use while loops that often in R. However, I tend to use for loops frequently. In contrast to while loops, for loops do not rely on a condition. Instead, a process is executed for all features. For example, you could process all data points in a vector or data frame columns or rows. You could perform the same process for all files in a list of files.

In the first example, I have create a vector of country names. I then use a for loop to process each element. Specifically, the loop will print “I would like to go to” followed by the county name.

Note that i is simply a variable and does not need to be called i as shown in this example.

I have found for loops to be especially useful for processing multiple files. Here is an example for raster grids. Note that some of the functions used here have not been discussed yet. We will discuss them when we talk about spatial data in R. I am just providing this example to make a point that for loops can help you process geospatial data efficiently. For example, you could process thousands of files using only a few lines of code. I have not provided these data, so you will not be able to execute this example. Here I am reading all elevation grids in a folder then finding all cells that have an elevation greater than 500 meters. I then write the results out to binary raster grids.

If and If…Else

If is used to only perform some operation if the condition is TRUE. In this example, the statement is printed because the condition evaluated to TRUE, or because 4 is less than or equal to 6. If you change the value stored in the variable a to a number larger than 6, nothing will be printed.

What if you want different operations to be performed based on whether a single condition is true? This can be accomplished using an if…else statement. If the condition evaluates to TRUE, then the operation in the if statement will be performed. If it evaluates to FALSE, then the condition in the else statement will be executed. In the example, “Value greater than 6” is returned because the condition evaluates to FALSE, so the operation defined within else is executed.

What if you want to include more than one criteria? Then you can include else if as shown below. Note that you can include multiple else if conditions.

In this example, I am now providing a vector with multiple elements. By combining a for loop and if-else, I obtain a result for each data point.

Lastly, this example shows how to combine the results to a single vector. In this case, the each result, regardless of the evaluation of the conditions, is written to an initially empty vector (c).

Next and Break

In a for loop it is possible to stop the loop prematurely. For example, the next statement will allow you to skip over the next statement in the loop while break will allow you to exit the loop completely. This is demonstrated in the example below. First, I generate a sequence of numbers from 1 to 21. Then I set up a for loop that will append the number to an empty vector (b) unless it is an odd number (in this case modulus will yield a remainder of 1). I do this using next inside of an if statement, which will cause the loop to skip the odd numbers. I also don’t want to append values larger then 15 to the vector, so I use break to stop the loop once it reaches 15.

Which

Which in R is used to return the index for features in a vector or rows in a data frame that meet a certain criteria. You could then use these indices for selection.

Now that you have an understanding of programming in R, we can move on to a discussion of data analysis in R. Throughout this course, we will apply the coding, data manipulation, and analysis techniques learned in these early sections. In the next section we will explore data summarization and simple statistical tests.

Back to Course Page

Back to WV View