Raster-Based Spatial Analysis in R

Objectives

  1. Process raster data including changing the cell size, spatial extent, and cell values
  2. Perform raster math and calculations
  3. Generate distance and density grid surfaces
  4. Produce conservative or binary models
  5. Create liberal models using weighted overlay
  6. Work with the raster package

Overview

In this section we will focus on working with raster data in R. We will primarily rely on the raster package since it is currently the dominant method for handling and analyzing raster data. Similar to the last module, we will visualize the results using tmap. In the first code block, I am calling in the required packages.

In this section, we will work with data from West Virginia. Here is a brief description of the data.

  • elev: 30 m resolution digital elevation model (DEM) in meters from the National Elevation Dataset (NED)
  • slp: slope in degrees derived from the elevation raster grid
  • lc: 30 m resolution categorical land cover grid from the 2011 National Land Cover Database (NLCD)
  • airports: airports in West Virginia
  • pnts: random points
  • ws: watershed boundaries for subset of the state
  • str: man-made structures mapped as points for portion of the state

All raster grids are being read using the raster() function from the raster package, as they are all single-band rasters. The vector data are being read using st_read() from the sf package. All raster data are in TIFF format while all vector data are stored as shapefiles. I also map the DEM to check that it has loaded correctly. I recommend reviewing your data prior to any analyses.

Crop, Merge, and Mask

A common preprocessing task is to extract out a spatial subset of a raster grid. In R, this can be accomplished using a variety of methods from the raster package. In the first code block below I am defining a rectangular extent by providing the xmin, xmax, ymin, and ymax values relative to the projection of the data (NAD 83 UTM Zone 17N) using the extent() function from the raster package.

In the next set of code I am using the crop() function to extract out all the cells within the defined extent. By calling the result, I can inspect the number of rows and columns, spatial resolution, number of bands, and projection information. Note that this raster grid is being stored in memory. Lastly, I plot the cropped extent.

Other than cropping a raster grid using a defined rectangular extent, you can also extract based on row and column numbers. As an example, I am extracting out columns 500 through 600 and rows 400 through 800. Note that the syntax here is a bit different from the example above, as you will need to call the raster grid inside of extent(). This method is useful if you want to remove the margin rows and columns of a raster grid.

Other than extracting out a subset of the data, you can also merge multiple raster grids into a single object using the merge() function. To demonstrate this, I first extract a different spatial extent so that I have two grids to merge that partially overlap. I then use the merge() function to combine or mosaic the two grids. Lastly, I map the result.

In all the results above the data have filled a rectangular extent. What if you would like to extract values that fall within an irregular extent as defined by a raster mask or vector polygon layer? This can be accomplished using the mask() function. In the example, I have extracted out cells that occur within the watershed boundaries only. Note that the raster grid is still rectangular, or there is a complete set of rows and columns. However, cells outside the mask have been coded to a NA, Null, or NoDATA assignment. Regardless of the shape of the area of interest, all raster grids must be rectangular.

Resample and Aggregate

It is also possible to change the cell size of a grid. The raster package provides three different functions to accomplish this.

  1. resample(): change cell size using bilinear interpolation (inverse distance weighted average of nearest 4 cell values) or nearest neighbor (only considers nearest cell value)
  2. aggregate(): increase the cell size by a defined factor and calculate a statistic from the smaller, original cells to populate the larger, new cells
  3. disaggregate(): make cell size smaller by a defined factor

When resampling cells, make sure to use a method appropriate for the input raster data type. For example, taking the average or using bilinear interpolation would not be appropriate for a categorical grid since you can’t average categories. Instead, you would want to use maximum, minimum, or majority. Nearest neighbor is also appropriate for categorical data since no averaging is applied.

In the example, I have used the aggregate() function to increase the cell size by a factor of 5. So, the 30 m grid is converted to a 150 m grid. The new cells will be populated with the mean from the original cells that fall within them. The cell size can be checked by calling the resulting object.

Reclassify

Reclassify is used to change or recode cell values in a grid. When reclassifying a continuous grid, a ranges of values will be recoded or binned to a new value. So, the result of reclassify is always a categorical raster grid.

Before reclassifying a grid, you must generate a matrix of values to define the reclassification. In the example, I am generating a matrix with four rows and three columns. I first create a vector of values then convert them to a matrix using the matrix() function. Here is a description of the reclassification being performed.

  • 0 > value <= 300: 1
  • 300 > value <= 500: 2
  • 500 > value <= 800: 3
  • 800 > value <= 10,000: 4

Note that these ranges have no gaps. Also, I generally set the smallest and largest values to be much smaller or larger than the values in the grid to make sure all values are included.

In the reclassify() function I provide the raster data to be reclassified (elev), the matrix used to describe the reclassification (m), and the right argument. When right is set to TRUE, the right side of each class will be closed. So, the largest value is included in that bin as opposed to the next bin. For example, 300 would be coded as 1 as opposed to 2 in the example.

Now, I am reclassifying the land cover, a categorical raster grid, to “Not Forest” and “Forest”. All forest types have codes between 40 and 50 (41, 42, and 43). So, that range is coded to 1 and all other ranges are coded to 0.

Raster Math

Performing mathematical operations on grids is easy in R. The syntax is the same as operations applied to vectors or matrices. In the first example, the elevation grid is being multiplied by a conversion factor to convert the units from meters to feet. This operation will be applied to each cell and the result will be returned to a new cell in the output.

In this example, I am dividing the elevation grid in feet by the elevation grid in meters. This returns the conversion factor at each cell location.

When Boolean operations are applied to raster grids, the result will be 0 or 1. 0 always indicates that the logic evaluated to FALSE while 1 indicates TRUE. In the example below, I am trying to find all cells that have an elevation greater than 600 meters. Those that do are coded to 1 while those that don’t are coded to 0. TRUE areas are shown in red.

In this example, I am finding all cells coded to 0 or FALSE from the prior example. This effectively inverts the result: now cells that are less than or equal to 600 meters in elevation are coded to TRUE while high elevation cells are coded to FALSE.

We will explore these methods in more detail below in the context of conservative models.

Raster Summarization

It is possible to extract cell values at point locations. This is often used to generate sample data for analyses and machine learning where training data are required.

The extract() function can accomplish this task. In the first example, I am extracting the elevation values at random points. the sp argument is used to specify whether spatial information should be included in the output. When defined as TRUE, spatial information will be maintained and the result will be a SpatialPointsDataFrame. I am also using the st_as_sf() function to convert the sp object to a sf object.

Now, I am visualizing the results by plotting the extracted elevation to the point feature size aesthetic.

It is also possible to summarize raster data relative to areas or polygons. Since now there can be multiple cells within the areas, you must indicate a statistical measure to return. Here, I am extracting the mean elevation by watershed. I then map the results. Depending on the spatial resolution of the grid and the number and complexity of the polygons, these calculations can take some time.

Distance and Density grids

Euclidean distance is simply a measure of the straight line distance from a cell to a source feature. In the first example, I am calculating the distance from each cell to the nearest point feature. To start, I generate a blank grid by making a copy of the elev data then converting all cell values to NA. I then use the distanceFromPoints() function to repopulate the grid with the distance measurement. This function requires an sp object, so I make the conversion within the function. The resulting distance will be in the map units, in this case meters. You can convert to different units using raster math.

When calculating distance from line features, you must first convert the vector data to a raster grid. I begin by creating a blank grid then populating it with 1 at locations where there is an interstate using the rasterize() function. I then convert the cells to points using rasterToPoints(). I can then use the distanceFromPoints() function to generate the distance surface. Note that this process can be slow.

Kernel density surfaces can be created using the sp.kde() function from the spatialEco() package. This function will not accept sf objects, so it is necessary to convert the point data to sp. The search radius must be provided in the map units using the bw, or bandwidth, argument. I am using a value of 500 meters in this example. Larger values will result in a more generalized pattern. There is not necessarily a correct radius, as this depends on whether you are interested in local patterns or more generalized patterns. The blank grid created above is being used to determine the extent and cell size of the result. This process can be slow if a large number of point features is included, a large spatial extent is calculated, and/or a small cell size is used.

Conservative Model

Conservative models yield a TRUE/FALSE or binary output. A site or cell either meets the criteria or it does not. You can use the methods discussed above to produced conservative models by generating binary surfaces for each criterion then multiplying them together to obtain a final model.

In this example, we are trying to determine locations that are suitable for a new hotel. The location should be:

  1. At an elevation above 500 meters
  2. On a slope less than 15 degrees
  3. Not further than 3 km from an airport
  4. Not further than 5 km from an interstate

To begin, I first visualize the input vector and raster data.

Next, I create Euclidean distance grids for distance from airports and distance from interstates using the methods demonstrated above.

I now have the four required input raster grids. The next step is to obtain binary surfaces for each criterion. This can be accomplished using logical operators and raster math.

Once the binary surfaces have been generated, I simply multiply them together to obtain a final suitability model. A location or cell would have to be TRUE (or 1) for all criterion in order for 1 to be returned.

Based on the required conditions, few cells in the extent meet all the criteria.

Liberal Models

What if you wanted more “gray levels” or a range of suitability scores in the output as opposed to only yes or no? This could be accomplished using weighted overlay. The goal here is to combine scored grids and weights to obtain a model with a range of suitability values. Let’s repeat the process above relating to determining a location for a hotel using liberal modeling methods. Here are the criteria:

  1. High elevation preferred
  2. Shallow slope preferred
  3. Close to airports preferred
  4. Close to interstates preferred

I will not recalculate the distance grids, as I already have them from the conservative modeling process completed above.

All scores should be on the same scale, so I will rescale the data from 0 to 1 using the range of values in each grid. The minimum and maximum values from the grids can be extracted using the cellStats() function. I then use raster math to scale each grid form 0 to 1. For all surfaces except elevation, we are subtracting from 1 since low values are preferred.

Once each criterion has been converted to the same scale and scored, then I multiply them together to obtain a model. It is also possible to incorporate weights if the criteria are of different importance.

This will create a final suitability model that takes into account each criterion and its relative importance. In contrast to a conservative model, a range of suitability scores will be returned.

Lastly, reclassification can be used to define suitability ranges based on cut-off values as demonstrated below.

In this section, I have tried to focus on techniques that I commonly use when working with and analyzing raster data. Again, you will likely need to solve different types of problems to analyze your own data. The documentation for the raster package is very thorough, so it is a great resource, and there are a variety of additional resources on the web.

We will work with raster data in later sections when we make models using machine learning, so you will get more practice working with and manipulating raster data in R.

Before we move on to machine-learning-based modeling, we will explore working with LiDAR point cloud data and remotely sensed imagery in R in the next section.

Back to Course Page

Back to WV View

Download Data