Working with Spatial Data

Objectives

1. Read in vector data using the sp and sf packages
2. Convert between the sp and sf data models
3. Define and transform datums and projections for vector data
4. Access and work with attribute data associated with vector features
5. Read in raster data using the raster package
6. Define and transform datums and projections for raster data

Overview

We are now ready to begin discussing loading, visualizing, and analyzing spatial data in R. In this section specifically, we will discuss reading in vector and raster data, working with datums and projections, and saving spatial output to disk. I will assume that you have prior knowledge of spatial data but have not worked with these data in R. If you have trouble with the theoretical components of GIScience and spatial data, we provide an additional course here focused on these topics. This course covers a variety of GIScience topics including data models, datums and projections, digital cartography, spatial analysis, and spatial modeling. The lab exercises make use of ArcGIS Pro.

Since vector and raster spatial data have fundamentally different data structures, different methods and packages are available for working with them in R. We will start with vector data.

The link at the bottom of the page provides the example data and R Markdown file used to generate this module.

Vector Data

Vector data, including points, lines, and polygons, have a more complex data structure than raster data, which is effectively an array. Within R, we need to be able to store the geographic, attribute, and coordinate system components of vector data, and there are different means to accomplish this. Initially, vector data were read into R using the sp package. More recently, the sf package has provided an alternative. I will discuss both methods here, but we will primarily focus on sf in this course.

sp

The sp package provides a variety of options for storing vector data including the following:

• SpatialPoints: stores points but no attributes
• SpatialPointsDataFrame: stores points and attributes
• SpatialLines: stores lines but no attributes
• SpatialLinesDataFrame: stores lines and attributes
• SpatialPolygons: stores polygons but no attributes
• SpatialPolygonsDataFrame: stores polygons and attributes

These data models are all subclasses of the Spatial class. This class makes use of the s4 class, in which data components are stored in slots. Let’s start by exploring this structure.

Anytime I work with spatial data in R, I load the rgdal library, which provides bindings to the Geospatial Data Abstraction Library (GDAL) and PROJ libraries. When you install these R packages the GDAL and PROJ libraries will install on your computer outside of R. Here is a link to GDAL and PROJ. I am also using the tmap package here to visualize the data in map space. We will discuss this in detail in the next section, so don’t worry if you don’t understand the syntax yet.

library(sp)
library(rgdal)
library(tmap)

Let’s build a SpatialPoints object from simulated data. First, I create a series of x, y coordinate pairs representing longitude and latitude. I then convert these data to a SpatialPoints object using the SpatialPoints() function.

When I call summary() on the new SpatialPoints object, you can see that coordinates are provided, but there is no coordinate system information defined.

pnts <- rbind(c(-79, 36), c(-101, 41), c(-80, 27), c(-91, 52), c(-68, 42))
pnts.sp <- SpatialPoints(pnts)
summary(pnts.sp)
Object of class SpatialPoints
Coordinates:
min max
coords.x1 -101 -68
coords.x2   27  52
Is projected: NA
proj4string : [NA]
Number of points: 5

The is.projected() function can be used to test whether a projected coordinate system is defined for an object of the Spatial class. This test confirms that the projection information is missing, as NA is returned. You can use the proj4string() function on the SpatialPoints object to define a projection using a PROJ string. You can also define the projection using the European Petroleum Survey Group (EPSG) code as demonstrated in the line that is commented out. Calling summary() on the layer confirms that spatial reference information is now present. You will notice that “Is projected” is FALSE. This is because this layer is not technically “projected” since we did not use a projected coordinate system. Instead, a geographic coordinate system is defined. So, these data have been referenced to an ellipsoidal model of the Earth as opposed to a projected coordinate system.

There are many geographic and projected coordinate systems. So, you probably won’t know the EPSG code or PROJ string for a specific projection. However, you can look them up on this website. This PDF provides a great explanation of coordinate systems in R.

#Add a coordinate system
is.projected(pnts.sp)
[1] NA
proj4string(pnts.sp) <-CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
#proj4string(pnts.sp) <-CRS("+init=epsg:4326")
summary(pnts.sp)
Object of class SpatialPoints
Coordinates:
min max
coords.x1 -101 -68
coords.x2   27  52
Is projected: FALSE
proj4string : [+proj=longlat +datum=WGS84 +no_defs]
Number of points: 5

We now have a SpatialPoints object. However, there is no attribute information. This is because this is not a SpatialPointsDataFrame. The next block of code demonstrates how to generate a SpatialPointsDataframe by combining SpatialPoints and a data frame. Calling summary() confirms that attributes are now present.

atts <- data.frame(attr1 = c("a", "b", "m", "l", "z"), attr2 = c(101:105),
attr3 = c("Yes", "Yes", "No", "Yes", "Yes"))
pnts.spdf <- SpatialPointsDataFrame(pnts.sp, atts)
summary(pnts.spdf)
Object of class SpatialPointsDataFrame
Coordinates:
min max
coords.x1 -101 -68
coords.x2   27  52
Is projected: FALSE
proj4string : [+proj=longlat +datum=WGS84 +no_defs]
Number of points: 5
Data attributes:
attr1               attr2        attr3
Length:5           Min.   :101   Length:5
Class :character   1st Qu.:102   Class :character
Mode  :character   Median :103   Mode  :character
Mean   :103
3rd Qu.:104
Max.   :105                     

Generally, you will not build your spatial data from scratch. Instead, you will read spatial data into R. I will provide examples for reading in shapefile data of counties, rivers, towns, and interstates in West Virginia. This can be accomplished using the readOGR() function from the rgdal package. The dsn argument here is set to the subfolder within the working directory that houses the data. This could also be a geodatabase if you are reading in a feature class.

counties <- readOGR(dsn = "spatial_data", layer = "wv_counties")
towns <- readOGR(dsn = "spatial_data", layer = "towns")
inter <- readOGR(dsn = "spatial_data", layer = "interstates")
riv <- readOGR(dsn = "spatial_data", layer = "rivers")

I then use the tmap package to plot the data in map space. Again, we will discuss the specifics of this in the next section.

tm_shape(counties) +
tm_polygons()+
tm_shape(inter)+
tm_lines(col= "red", lwd= 2)+
tm_shape(riv)+
tm_lines(col="blue", lwd=1.5)+
tm_shape(towns)+
tm_bubbles()

Let’s explore the data structure of the town points layer in greater detail. The s4 class and the associated Spatial subclass contain many slots. In the first line of code, I am printing the mean population by county. I have to specify the SpatialPointsDataFrame object (towns), the slot (data), and the column of interest (“POPULATION”"). Note the the data slot stores the attribute information.

mean(towns@data$POPULATION) [1] 23100.25 The coords slot holds the coordinate pairs that define the data point locations relative to the coordinate reference system. print(towns@coords) coords.x1 coords.x2 [1,] 483651.0 4182266 [2,] 480624.9 4123803 [3,] 444926.6 4244910 [4,] 557646.2 4348117 [5,] 432818.0 4254382 [6,] 573456.1 4370492 [7,] 374835.7 4252617 [8,] 760773.2 4372254 [9,] 590140.4 4387856 [10,] 522082.3 4419227 [11,] 453171.4 4346219 [12,] 428654.7 4248147 [13,] 438223.5 4245607 [14,] 453623.5 4352732 [15,] 537004.9 4472359 [16,] 525852.3 4436142 As expected, the spatial information is more complex for a polygon. Here is an example of printing the first five coordinates that define a single polygon in the counties data set. Additional slots are available for the the ring direction, to note if holes are present, and for the area of the feature. print(head((counties@polygons[[1]]@Polygons[[1]]@coords))) [,1] [,2] [1,] 522753.7 4431485 [2,] 522933.1 4431929 [3,] 522978.8 4432050 [4,] 523011.0 4432217 [5,] 523000.2 4432362 [6,] 522981.4 4432550 Here is an example of extracting the spatial reference information for the towns data. You can see that the spatial reference is WGS84 UTM Zone 17N. print(proj4string(towns)) [1] "+proj=utm +zone=17 +datum=WGS84 +units=m +no_defs" This example shows how to extract the bounding box coordinates for the data. bbox(towns) min max coords.x1 374835.7 760773.2 coords.x2 4123803.4 4472359.3 sf More recently, a second method has been developed for reading vector data into R. The sf package (Simple Features for R) uses a very different data structure than that used by sp as I will demonstrate here. First, I will read in the same data layers using the st_read() function from sf. library(sf) counties_sf <- st_read("spatial_data/wv_counties.shp") towns_sf <- st_read("spatial_data/towns.shp") inter_sf <- st_read("spatial_data/interstates.shp") riv_sf <- st_read("spatial_data/rivers.shp") In the next code block, I run is.data.frame() on the towns_sf object, and it returns TRUE, indicating that this object is indeed a data frame. That’s weird. Actually, this object is a data frame, but how is this possible? is.data.frame(towns_sf) [1] TRUE If we summarize the towns_sf object you will see that each attribute is represented as a column, and that the last column is called geometry. So, the geometric or spatial information exists as a column in the table. summary(towns_sf) AREA PERIMETER TOWN_ TOWN_ID TOWNS_ Min. :0 Min. :0 Min. : 17.0 Min. : 17.0 Min. : 17.0 1st Qu.:0 1st Qu.:0 1st Qu.: 62.5 1st Qu.: 62.5 1st Qu.: 62.5 Median :0 Median :0 Median :162.0 Median :162.0 Median :162.0 Mean :0 Mean :0 Mean :147.1 Mean :147.1 Mean :147.1 3rd Qu.:0 3rd Qu.:0 3rd Qu.:228.0 3rd Qu.:228.0 3rd Qu.:228.0 Max. :0 Max. :0 Max. :270.0 Max. :270.0 Max. :270.0 TOWNS_ID PLACEFP TYPE INTPTLAT Min. : 17.0 Length:16 Length:16 Length:16 1st Qu.: 62.5 Class :character Class :character Class :character Median :162.0 Mode :character Mode :character Mode :character Mean :147.1 3rd Qu.:228.0 Max. :270.0 INTPTLNG LAT LONG NAME Length:16 Min. :37.26 Min. :-82.43 Length:16 Class :character 1st Qu.:38.37 1st Qu.:-81.65 Class :character Mode :character Median :39.27 Median :-81.20 Mode :character Mean :38.99 Mean :-80.95 3rd Qu.:39.52 3rd Qu.:-80.51 Max. :40.40 Max. :-77.97 POPULATION FAMILIES HOUSEHOLDS X_COORD Min. :10753 Min. : 2873 Min. : 4211 Min. :374822 1st Qu.:12366 1st Qu.: 3453 1st Qu.: 5141 1st Qu.:443236 Median :18178 Median : 4628 Median : 7896 Median :482122 Mean :23100 Mean : 6083 Mean : 9791 Mean :503578 3rd Qu.:27875 3rd Qu.: 7202 3rd Qu.:10807 3rd Qu.:542150 Max. :57287 Max. :15231 Max. :25306 Max. :760757 Y_COORD geometry Min. :4123586 POINT :16 1st Qu.:4247294 epsg:32617 : 0 Median :4346950 +proj=utm ...: 0 Mean :4315852 3rd Qu.:4375934 Max. :4472140  If I print the information in this column for a single town, you can see that all the geometric information, including the geometry type, dimensions, bounding box coordinates, spatial reference information, and actual coordinates, are present. So, when using sf, vector features are read in as a data frames with a special geometry column that stores the geometric information in a list structure. This is an object-based vector data model, where the attribute and spatial information are stored in the same file. Similarly, a feature class in a geodatabase stores the geometric information in a table column as a binary large object (BLOB). town1 <- towns_sf[1,] print(town1$geometry)
Geometry set for 1 feature
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 483651 ymin: 4182266 xmax: 483651 ymax: 4182266
Projected CRS: WGS 84 / UTM zone 17N

One benefit of using sf is that functions that accept data frames can also be applied to these spatial objects, since they are data frames. You will see many examples of this in the following modules.

It is possible to remove the geographic information from the data frame using st_drop_geometry(), as demonstrated in the next code block.

towns_just_attr <- st_drop_geometry(towns_sf)

Conversion

Since there are two separate data models available in R for vector data, which should you use? In this course, we will primarily work with sf; however, there are many packages available for analyzing spatial data in R, and not all are able to use sf data at the time of this writing. So, it is important to be familiar with sp and also know how to convert between types. We will now discuss conversion methods.

I am reading in the towns data again using sp and sf.

towns_sp <- readOGR(dsn = "spatial_data", layer = "towns")
towns_sf <- st_read("spatial_data/towns.shp")

In this example, I am converting the sf object to sp using the as() function from sp. Setting Class to “Spatial” indicates that I want the result to be an object of class Spatial. The class() function is used to confirm this.

towns_sp_from_sf <- as(towns_sf, Class="Spatial")
class(towns_sp_from_sf)
[1] "SpatialPointsDataFrame"
attr(,"package")
[1] "sp"

Using the st_as_sf() function from sf, you can convert an object of class Spatial to a data frame, as defined by sf.

towns_sf_from_sp <- st_as_sf(towns_sp)
class(towns_sf_from_sp)
[1] "sf"         "data.frame"

Working with Vector Coordinate Systems

When working with spatial data in R, it is important that coordinate reference information be provided and that it is correctly defined. I would recommend checking the spatial reference information every time you read data into R. Using sf, projection information can be printed, written to an object, removed, or defined using the st_crs() function. In this first example, I am simply printing the projection information for the counties_sf object, which will return the EPSG code and PROJ string.

st_crs(counties_sf)
Coordinate Reference System:
User input: WGS 84 / UTM zone 17N
wkt:
PROJCRS["WGS 84 / UTM zone 17N",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]],
CONVERSION["UTM zone 17N",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",-81,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
ID["EPSG",32617]]

The coordinate reference information can also be saved as a list, and individual components can be saved to vectors.

proj_info <- st_crs(counties_sf)
proj_info_proj4 <- as.vector(proj_info$proj4string) In this example, I am removing the projection information by setting it to NA. I am then redefining the projection using the PROJ string and EPSG code. st_crs(counties_sf) <- NA st_crs(counties_sf) Coordinate Reference System: NA st_crs(counties_sf) <- "+proj=utm +zone=17 +datum=NAD83 +units=m +no_defs" st_crs(counties_sf) Coordinate Reference System: User input: +proj=utm +zone=17 +datum=NAD83 +units=m +no_defs wkt: PROJCRS["unknown", BASEGEOGCRS["unknown", DATUM["North American Datum 1983", ELLIPSOID["GRS 1980",6378137,298.257222101, LENGTHUNIT["metre",1]], ID["EPSG",6269]], PRIMEM["Greenwich",0, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8901]]], CONVERSION["UTM zone 17N", METHOD["Transverse Mercator", ID["EPSG",9807]], PARAMETER["Latitude of natural origin",0, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8801]], PARAMETER["Longitude of natural origin",-81, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8802]], PARAMETER["Scale factor at natural origin",0.9996, SCALEUNIT["unity",1], ID["EPSG",8805]], PARAMETER["False easting",500000, LENGTHUNIT["metre",1], ID["EPSG",8806]], PARAMETER["False northing",0, LENGTHUNIT["metre",1], ID["EPSG",8807]], ID["EPSG",16017]], CS[Cartesian,2], AXIS["(E)",east, ORDER[1], LENGTHUNIT["metre",1, ID["EPSG",9001]]], AXIS["(N)",north, ORDER[2], LENGTHUNIT["metre",1, ID["EPSG",9001]]]] st_crs(counties_sf) <- NA st_crs(counties_sf) Coordinate Reference System: NA st_crs(counties_sf) <- 26917 st_crs(counties_sf) Coordinate Reference System: User input: EPSG:26917 wkt: PROJCRS["NAD83 / UTM zone 17N", BASEGEOGCRS["NAD83", DATUM["North American Datum 1983", ELLIPSOID["GRS 1980",6378137,298.257222101, LENGTHUNIT["metre",1]]], PRIMEM["Greenwich",0, ANGLEUNIT["degree",0.0174532925199433]], ID["EPSG",4269]], CONVERSION["UTM zone 17N", METHOD["Transverse Mercator", ID["EPSG",9807]], PARAMETER["Latitude of natural origin",0, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8801]], PARAMETER["Longitude of natural origin",-81, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8802]], PARAMETER["Scale factor at natural origin",0.9996, SCALEUNIT["unity",1], ID["EPSG",8805]], PARAMETER["False easting",500000, LENGTHUNIT["metre",1], ID["EPSG",8806]], PARAMETER["False northing",0, LENGTHUNIT["metre",1], ID["EPSG",8807]]], CS[Cartesian,2], AXIS["(E)",east, ORDER[1], LENGTHUNIT["metre",1]], AXIS["(N)",north, ORDER[2], LENGTHUNIT["metre",1]], USAGE[ SCOPE["Engineering survey, topographic mapping."], AREA["North America - between 84Â°W and 78Â°W - onshore and offshore. Canada - Nunavut; Ontario; Quebec. United States (USA) - Florida; Georgia; Kentucky; Maryland; Michigan; New York; North Carolina; Ohio; Pennsylvania; South Carolina; Tennessee; Virginia; West Virginia."], BBOX[23.81,-84,84,-78]], ID["EPSG",26917]] What if you would like to transform the datum or projection? This can be accomplished with the st_transform() function. You will need to provide the EPSG code or PROJ string for the desired output datum or projection. As an example, I am converting the county data from WGS84 UTM Zone 17N to the WGS84 geographic coordinate system. counties_wgs84 <- st_transform(counties_sf, crs= 4236) st_crs(counties_wgs84) Coordinate Reference System: User input: EPSG:4236 wkt: GEOGCRS["Hu Tzu Shan 1950", DATUM["Hu Tzu Shan 1950", ELLIPSOID["International 1924",6378388,297, LENGTHUNIT["metre",1]]], PRIMEM["Greenwich",0, ANGLEUNIT["degree",0.0174532925199433]], CS[ellipsoidal,2], AXIS["geodetic latitude (Lat)",north, ORDER[1], ANGLEUNIT["degree",0.0174532925199433]], AXIS["geodetic longitude (Lon)",east, ORDER[2], ANGLEUNIT["degree",0.0174532925199433]], USAGE[ SCOPE["Geodesy."], AREA["Taiwan, Republic of China - onshore - Taiwan Island, Penghu (Pescadores) Islands."], BBOX[21.87,119.25,25.34,122.06]], ID["EPSG",4236]] Save to Disk The results of your analyses in R can be saved to disk using both rgdal and sf. rgdal provides the writeORG() function which is demonstrated below. Note that you can save to a variety of vector formats; however, you must provide an object of class Spatial. So, here I am converting the data frame to the Spatial class within the writeORG() function. If you do not want to save these files to your machine, do not execute these lines of code. writeOGR(obj=as(counties_wgs84, Class="Spatial"), dsn="spatial_data", layer="counties_wgs84", driver="ESRI Shapefile", overwrite_layer=TRUE) Data can also be exported using the st_write() function from sf. Note that I had to remove the “AREA” column since writing this field will cause an error associated with the field length. counties_wgs84$AREA <- NULL
st_write(counties_wgs84, "spatial_data/counties_wgs84b.shp", overwrite=TRUE)

For point data, you can also save the result as a CSV file with the coordinates written to the table as demonstrated below.

st_write(towns_sf, "towns_out.csv", layer_options = "GEOMETRY=AS_XY", overwrite=TRUE)

Again, I prefer to use sf to read and write vector data files in R. However, you may find sp and/or rgdal to be necessary for some tasks, so it is good to be familiar with all three packages and the associated data models.

Raster Data

As mentioned above, raster data have a simpler data structure than vector data, as these data are represented simply as an array of values. A single-band raster is similar to a matrix in R while a multi-band raster is similar to a 3-dimensional array. The primary difference between spatial raster data and other array data sets is that spatial reference information is associated with the data so that they will draw at the correct location within the map space and each cell or pixel will correspond to a certain land area. For example, a cell may cover 30-by-30 meters of land area.

Unlike vector data, there is one primary package used to work with raster grids in R: the raster package. Note that the raster package also makes use of rgdal. At the time of this writing, the new terra package is beginning to replace raster. I will cover this new package in a later module.

Raster Data Structure

To gain a better understanding of raster data, let’s build a raster grid from scratch. Using the raster() function from the raster package, I first generate an empty raster by defining the number of rows, number of columns, and the minimum and maximum coordinates in the x and y directions. I am using large numbers here because I will eventually define a UTM projection for this raster, so, easting will be reported in meters relative to the central meridian in the zone, and northing will be the distance from the equator in meters. Since the grid has 10 rows and 10 columns, there will be 100 total cells. I then create 100 random values between 0 and 255 to simulate 8-bit data using the sample() function. I set replacement equal to TRUE so that a value can be selected more than once. I then fill the empty raster with the random values using the values() function also from the raster package. The next step is to define the projection using the projection() function, again from the raster package. Here, I am using the EPSG code for WGS84 UTM Zone 17N. Lastly, I plot the raster using tmap.

library(raster)
library(rgdal)
basic_raster <- raster(ncol = 10, nrow = 10, xmn = 500000, xmx = 500100, ymn = 4200000, ymx = 4200100)
vals <- sample(c(0:255), 100, replace=TRUE)
values(basic_raster) <- vals
projection(basic_raster) <- "+init=EPSG:32617"
tm_shape(basic_raster)+
tm_raster()+
tm_layout(legend.outside = TRUE)

By simply typing the name of the raster into the console, information about the data is returned. It is always good to inspect the data to make sure there are no issues. The class of this object is a RasterLayer, which is defined by the raster package. It has 10 rows, 10 columns, and 100 total cells. The resolution is 10-by-10 meters. This is because we defined the extent as 100 meters of easting and northing, so each cell would need to measure 10 meters to fill the spatial extent. The EPSG code and PROJ string are provided to define the projection. The data are currently stored in memory, and the values in the cells range from 3 to 244. Note that you will probably have different values since the data were generated randomly.

basic_raster
class      : RasterLayer
dimensions : 10, 10, 100  (nrow, ncol, ncell)
resolution : 10, 10  (x, y)
extent     : 5e+05, 500100, 4200000, 4200100  (xmin, xmax, ymin, ymax)
crs        : +proj=utm +zone=17 +datum=WGS84 +units=m +no_defs
source     : memory
names      : layer
values     : 0, 254  (min, max)

Similar to vector data, you probably won’t create raster grids from scratch. Instead, you will read in files. We will begin with a discussion of how to read in single-band raster data.

This can be accomplished using the raster() function and calling the file name. Note that you will have to provide the entire path if the raster grid is not in your working directory. I am reading in two raster grids in the code block below. I then call the objects to inspect them. Both are projected into WGS84 UTM Zone 17N, have a cell size of 30-by-30 meters, 926 rows, 995 columns, and 921,370 total cells.

The dem layer is a digital elevation model, or DEM, while the lc layer is a land cover grid representing a portion of the National Land Cover Database (NLCD) 2011. The DEM is an example of a continuous raster grid where each cell holds a numeric measurement, in this case elevation in meters. The land cover data are an example of a categorical raster grid where each cell holds a number that is a stand-in for a category, in this case a land cover class.

#You will need to set your own working directory.
dem <- raster("spatial_data/dem_example.tif")
lc <- raster("spatial_data/lc_example.tif")
dem
class      : RasterLayer
dimensions : 928, 997, 925216  (nrow, ncol, ncell)
resolution : 30, 30  (x, y)
extent     : 619291.8, 649201.8, 4312766, 4340606  (xmin, xmax, ymin, ymax)
crs        : +proj=utm +zone=17 +datum=WGS84 +units=m +no_defs
source     : dem_example.tif
names      : dem_example
values     : 343, 1348  (min, max)
lc
class      : RasterLayer
dimensions : 928, 997, 925216  (nrow, ncol, ncell)
resolution : 29.99999, 29.99999  (x, y)
extent     : 619291.8, 649201.7, 4312766, 4340606  (xmin, xmax, ymin, ymax)
crs        : +proj=utm +zone=17 +datum=WGS84 +units=m +no_defs
source     : lc_example.tif
names      : lc_example
values     : 11, 95  (min, max)

I am now using tmap to visualize the data in map space. Note that we will talk about refining these maps in the next section. Here, I am just inspecting the data.

tm_shape(dem)+
tm_raster()+
tm_layout(legend.outside = TRUE)

tm_shape(lc)+
tm_raster(style="cat")+
tm_layout(legend.outside = TRUE)

To call in multi-band raster data, such as multispectral imagery, you cannot use the raster() function, as this is intended for single-band data only. Instead, you must use either brick() or stack(). A RasterBrick can only reference a single file while a RasterStack can reference multiple files or a subset of bands from one file. You can use either method to call in a multi-band file, as demonstrated in the following two examples where a Sentinel-2 multispectral image of Vancouver, British Columbia is read into R.

Note that I am using the plotRGB() function from the raster package to visualize the multi-band data as opposed to using tmap. This function allows you to specify which bands to display with each color channel and allows for contrast stretching. In the examples, the red channel is showing NIR, green is showing red, and blue is showing green. So, this is a standard false color composite where vegetated areas appear red (since they reflect a larger percentage of incoming NIR radiation).

#You will need to set your own working directory.
s1 <- brick("spatial_data/sentinel_vancouver.tif")
plotRGB(s1, r=4, g=3, b=2, stretch="lin")

#You will need to set your own working directory.
s2 <- stack("spatial_data/sentinel_vancouver.tif")
plotRGB(s1, r=4, g=3, b=2, stretch="lin")

RasterBrick and RasterStack yield the same results in this case. However, if you want to reference multiple files, you cannot use a RasterBrick.

It is generally recommended to use RasterBricks when referencing a single file, as this is more computationally efficient and can speed up processing. I have found that RasterStacks may fail if all the referenced files do not have the exact same extent, number of rows, number of columns, cell size, and spatial reference system. So, I generally just stack or composite raster grids outside of R before reading them in. This is just a personal preference.

This is an example of displaying the image using histogram equalization as opposed to a linear stretch.

plotRGB(s1, r=4, g=3, b=2, stretch="hist")

Here is an example of a different composite in which the red channel is showing shortwave infrared (SWIR), the green channel is showing NIR, and the blue channel is showing green.

plotRGB(s1, r=5, g=4, b=2, stretch="lin")

As a side note, the raster(), brick(), and stack() functions are capable of reading in any raster layer even if it does not represent geospatial data. Here is an example in which I have read in a picture of my cats: Freddy and Peri.

cats <- brick("spatial_data/example_image.tif")
plotRGB(cats, r=1, g=2, b=3)

Working with Raster Coordinate Systems

Working with raster coordinate systems is pretty similar to working with those for vector data. You can use the crs() function to obtain the projection information for a RasterLayer, RasterBrick, or RasterStack. In the example, you can see that the Sentinel-2 image is reference to WGS84 UTM Zone 10N.

crs(s1)
CRS arguments:
+proj=utm +zone=10 +datum=WGS84 +units=m +no_defs 

Similar to the example for vector data above, in this code block I am removing the spatial reference information for the Sentinel-2 data by setting it to NA. I then reapply the projection using the appropriate PROJ string. By calling crs() on this object, the projection information in returned.

crs(s1) <- NA
crs(s1)
CRS arguments: NA
crs(s1) <- "+proj=utm +zone=10 +ellps=WGS84 +towgs84=0,0,0,-0,-0,-0,0 +units=m
+no_defs"
crs(s1)
CRS arguments:
+proj=utm +zone=10 +ellps=WGS84 +units=m +no_defs 

Lastly, the projectRaster() function can be used to reproject the data into a different geographic or projected coordinate system. In the code block below I am transforming the Sentinel-2 image to the WGS84 geographic coordinate system using the corresponding EPSG code. Note that this can be slow for large data sets.

s1_wgs84 <- projectRaster(s1, crs = "+init=epsg:4236")
crs(s1_wgs84)
CRS arguments: +proj=longlat +ellps=intl +no_defs 

Save to Disk

Raster data can be written to disk as a permanent file using the writeRaster() function. I generally prefer to use TIFF or IMG files when reading data in or writing data out from R. However, a variety of formats are available and can be specified using the format argument.

writeRaster(s1_wgs84, filename="s1_wgs84", format = "GTiff", overwrite=TRUE)

Concluding Remarks

Now that you know how to read in geospatial data in R and the characteristics of the data models used to represent vector and raster data, you can start using these data in your analyses. In the next section, we will explore visualizing spatial data using the tmap package. In later sections, we will explore vector- and raster-based spatial analysis.