LiDAR and Image Processing in R

LiDAR and Image Processing in R


  1. Work with large volumes of LiDAR data using LAS catalogs
  2. Generate raster grids from LiDAR data
  3. Create indices from satellite image bands
  4. Perform principle component analysis (PCA) and tasseled cap transformations on multispectral data
  5. Assess change using the difference between indices from different dates
  6. Apply moving windows to images


This module expands upon the prior module by demonstrating additional analyses that can be performed on raster data, specifically remotely sensed imagery. To process the image data, we will use the raster and RStoolbox packages. Additionally, I will demonstrate the processing of light detection and ranging (LiDAR) data using the lidR package. In the first code block I am loading in the required packages.

A link to the package documentation for lidR can be found here. This package is updated on a regular basis, so I recommend checking for updates periodically if you plan to use this package. Documentation for the RStoolbox package can be found here.


Read in LiDAR Data as a LAS Catalog

Before we begin, I want to note that the LiDAR data used in the following examples are large files, as is common for point cloud data sets. So, it will take some time to execute the example code. Also, some results will be written to your hard drive. So, feel free to simply read through the examples as opposed to executing the code. The LiDAR data used in this demonstration were obtained from the WV Elevation and LiDAR Download Tool.

I have provided six adjacent tiles of LiDAR. As opposed to reading in the data separately, I will read them in as a LAS catalog, which allows you to work with and process multiple LAS files collectively. This is similar to a LAS Dataset in ArcGIS. I will focus on using LAS catalogs here with a goal of showing you how to process large volumes of LiDAR data as opposed to individual tiles. As an alternative to using LAS catalogs, you could process multiple data tiles using for loops; however, that will not be demonstrated here.

In this first example, I am reading in all six LAS files from a folder using the readLAScatalog() function. For these LiDAR data, the projection information is not read in, so I manually define the projection using the projection() function. This is not always required and depends on the LiDAR data used. The summary() function provides a summary of the data in the catalog including the spatial extent, projection, area covered, number of points (69.34 million!!!), point density, and number of files. lascheck() is used to assess if there are any issues with the data. Some issues are noted here, but they will not impact our processing and analyses. The author of the package recommends to always run this function after data are called into R to check for any issues.

Create a DTM

Before starting an analysis of all data in a LAS catalog, you will need to define some processing options. In the code block below I am defining processing “chunks” as 500-by-500 meter tiles with a 20 meter overlap. It is generally good to define an overlap to avoid any gaps in generated raster grid surfaces. You will need to experiment with appropriate chunk sizes. This will depend on the point density of the data and the available memory in your computer. The goal is to provide small enough tiles so that you will not run out of memory but not too small to produce a lot of tiles and process the data inefficiently. I am also having the process write a processing summary. The processing options are stored within the LAS catalog object, as demonstrated by summarizing the object.

Once processing options are defined, you can perform analyses and process the data. In this example, I am creating a digital terrain model (DTM) to represent ground elevations without any above ground features. I am also setting a new processing option to indicate where to store the resulting raster grids on the local disk. Each tile will be labeled with the bottom-left coordinate of the extent. Within the grid_terrain() function, I am setting the output cell size to 2-by-2 meters and I am using the k-nearest neighbor inverse distance weighting interpolation method with 10 neighbors and a power of 2.

Once the results are generated, the individual tiles will be stored on disk but also referenced to a virtual raster mosaic object for additional processing. Note that it is possible to store the raster data in memory, but this is not recommend if you are processing a large volume of data.

As already suggested, it is generally a good idea to visualize the result. Below I am using tmap to visualize the terrain model.

You can write the produced virtual raster mosaic to disk as a single continuous grid using the writeRaster() function from the raster package. It is not advisable to do so if a large volume of data is processed and/or the cell size is small, as the file may be extremely large.

Create a Hillshade

A hillshade offers a means to visualize a terrain surface by casting shadows over the terrain using a defined illuminating position. The process of producing a hillshade in R is different from the process in ArcGIS. First, you will need to produce slope and aspect surfaces. You will then use these surfaces to generate the hillshade with a defined illuminating position. All the required functions are made available by the raster package. In the example, the illuminating source has been placed at a northwestern position and at an angle of 45-degrees above the horizon. In the next block of code the hillshade is displayed using tmap.

Create a nDSM

The lasnormalize() function is used to subtract ground elevations from each point and requires input LAS data and a DTM. The result is LAS files written to disk. Here, I have provided a new folder path in which to store the output. Once these points are generated, they can be used to produce a canopy height model (CHM) or normalized digital surface model (nDSM) where the grid values represent height of features above the ground.

The grid_canopy() function is used to generate a digital surface model (DSM). If normalized data are provided, the result will be a CHM or nDSM. In this example, I am using the original point cloud data that have not been normalized. Once the surface is produced, it can be written out to a permanent file. I am also using the pitfree algorithm to generate a smoothed model or remove pits.

Since I used the original data that have not been normalized, in order to generate an nDSM, I subtract the DTM from the DSM. It is common to obtain some negative values due to noise in the data. So, I then convert all cells that have a negative height to zero and print the data information to inspect the result.

Again, I visualize the result with tmap.

Calculate Point Cloud Statistics in Cells

It is also possible to calculate a statistic from the point data within each grid cell using the grid_metrics() function. In the example, I am calculating the mean elevation from the first return data within 10-by-10 meter cells. Before doing so, I define a filter option so that only the first returns are used. I then display the results using tmap.

Visualize Return Intensity

Other then elevation measurements, LiDAR data can also store return intensity information. This is a measure of the proportion of the emitted energy that is associated with each return from each laser pulse. In the two code blocks below, I am visualizing the intensity values as the mean first return intensity within 5-by-5 meter cells.

This can provide an informative visualization of the LiDAR data.

Voxelize Point Cloud Data

In this last example relating to LiDAR, I am reading in a single LAS file. I then generate a DTM from the point cloud then normalize the data using the DTM. Next, I generate a voxel from the data. A voxel is a 3D raster, where the smallest data unit is a 3D cube as opposed to a 2D grid cell. You can think of a voxel as a stack of cubes where each cube holds a measurement. In the example, I am generating a voxel in which each cube has dimensions of 5 meters in all three dimensions and stores the standard deviation of the normalized height measurements from the returns within it.

Voxels are used extensively when analyzing 3D point clouds. For example, they can be used to explore the 3D structure of forests.

This is all we will cover here relating to the lidR package. However, there are many other valuable tools available. For example, there are methods for segmenting forest tree crowns and calculating metrics for each tree. If you work with LiDAR data, I recommend exploring this package further.

Image Processing

Normalized Difference Vegetation Index (NDVI)

Now we will explore methods for processing remotely sensed imagery. First, we will calculate the normalized difference vegetation index or NDVI from Landsat 8 Operational Land Imager (OLI) data.

First, I read in and visualize the data as a false-color composite. I am using the brick() function to read in the image since it is multi-band.

NDVI is calculated by differencing the NIR and red channels then dividing by the sum. Here, Band 5 is NIR and Band 4 is red. In the next code block I perform this calculation then plot the result with tmap.

Dark green areas indicate vegetated areas, such as forests and fields, while lighter areas indicate areas that are not vegetated or where the vegetation is unhealthy or stressed.

Principle Component Analysis (PCA)

You can generate principle component bands from a raw image using the rasterPCA() function from the RSToolbox() package. Principle component analysis (PCA) is used to generate new, uncorrelated data from correlated data. If we assume variability in data represents information then PCA is a useful means to capture the information content in fewer variables. When applied to imagery, we can generate uncorrelated bands from the raw imagery and potentially reduce the number of bands needed to represent the information.

In this example, I am applying PCA to all of the bands in the Landsat 8 image.

The function generates a list object that contains the new image and information about the PCA analysis. I extract the resulting raster bands then displayed the first three principle components as a false-color image.

Now, I print the information about the PCA results. The majority of the variance is explained in the first three principle components.

Here are the Eigenvalues as an Eigen matrix. To obtain each PCA band, you would multiply the band values by the associated weights and add the results.

Tasseled Cap Transformation

The tasseled cap transformation is similar to PCA; however, defined values are provided to generate brightness, greenness, and wetness bands from the original spectral bands. Different coefficients are used for different sensors, including Landsat Thematic Mapper (TM), Enhanced Thematic Mapper Plus (ETM+), and Operational Land Imager (OLI) data. In this example, I am calling in Landsat 7 ETM+ data using the brick() function.

The two images represent a portion of the Black Hills of South Dakota before and after a fire event.

Once the data are read in, I rename the bands then apply the defined coefficients to each band for each transformation. I add the results to obtain the transformation.

In the next set of code, I stack the brightness, greenness, and wetness bands then display them as a false color composite. In the result, the extent of the fire is obvious.

Differenced Normalized Burn Index (DNBI)

To explore the extent of the fire, I will use the differenced normalized burn index (DNBI). This is obtained from the SWIR and NIR bands. A burned area will have a high SWIR reflectance and a low NIR reflectance in comparison to a forest extent. So, by calculating this index before and after a fire event, I can map the extent and severity of the fire. The code below shows this result. In the DNBI, high values indicate burned areas.

Moving Windows

It is common to analyze local patterns in image data by applying moving windows or kernels over the image to perform a transformation or summarization of the pixel values within the image. In R, this can be accomplished using the focal() function from the raster package.

In the first example, I am calculating the average NDVI value in 5-by-5 cell windows. This involves generating a kernel using the matrix() function. Specifically, I create a kernel with dimensions of 5-by-5 values, or a total of 25 cells. I then fill each cell with 1/25. Passing this over the grid will result in calculating the mean in each window.

The Sobel filter is used to detect edges in images and raster data in general. It can be applied using different size kernels, such as 3-by-3 or 5-by-5. In the next code block, I am building then printing two 5-by-5 Sobel filters to detect edges in the x- and y-directions.

Once the filters are generated, I apply them to the NDVI data to detect edges, or changes in NDVI values. Mapping the results, we can see that these filters do indeed highlight edges.

There are many different filters that can be applied to images. Fortunately, the process is effectively the same once you build the required filter as a matrix.

The last two modules have provided a variety of examples for working with raster data and other remotely sensed data. If you would like to explore additional raster and image analysis techniques, have a look at the documentation for the raster and RStoolbox packages. Another common remote sensing task is to classify images into categories. We will cover this topic in later modules as an application of machine learning.

Back to Course Page

Back to WV View

Download Data