Spatial Analysis with Python

The goal of this module is to introduce a variety of libraries and modules for working with, visualizing, and analyzing geospatial data using Python. Currently, there are a variety of options, each of which have their own pros and cons. Below is a list of some common tools for geospatial analysis in Python.

  1. ArcPy: Python site package integrated into the ArcGIS Desktop and ArcGIS Pro software environments. ArcPy can be used within ArcMap and ArcGIS Pro via the Python window. ArcGIS Desktop and ArcGIS Pro now include Jupyter Notebook as part of the base installation. You can also work within notebooks inside of ArcGIS Pro. Since these technologies are not free and open-source, we will not discuss them in this course. However, they are very powerful and worth investigating. I use ArcPy a lot in my own work. Further, it is very well documented.
  2. GeoPandas: expands Pandas to allow for working with vector data using the Series and DataFrame structures. I will provide an introduction to GeoPandas in this module, which will build upon our prior discussion of Pandas. GeoPandas relies on several dependencies including the following which we have already discussed: NumPy, Pandas, and matplotlib. It also relies on the following libraries/modules:
    • Shapely: used to perform geometric and overlay operations.
    • Fiona: used to read and write geospatial data files via links to the Geospatial Data Abstraction Library (GDAL)
    • PyProj: interface to PROJ and allows for reading and transformation of geographic and projected coordinate systems.
    • MapClassify: allows for choropleth mapping.
    • Descartes: use vector geospatial features with matplotlib.
  3. Rasterio: allows for reading, writing, and manipulation of raster geospatial data. This package relies heavily on NumPy so that rasters can be manipulated as arrays.
  4. Contextily: allows for retrieval of raster tile layers for use as base maps within matplotlib plots.
  5. EarthPy: expands the functionality of GeoPandas and Rasterio. Very useful for raster and multispectral image analysis.
  6. WhiteboxTools: provides more than 440 tools for analyzing geospatial data, including tools for data preparation, vector analysis and overlay, raster and image analysis, geomorphometric analysis, hydrological analysis, and LiDAR processing. These tools can be accessed in Python, R, QGIS, and via a command-line interface. They can also be accessed via a graphical user interface (GUI) known as Whitebox GAT. This tool set was developed by Dr. John Lindsay of the Geomorphometry & Hydrogeomatics Research Group at the University of Guelph.

I honestly struggled with which tool sets to discuss. After some research, I landed on the following objectives. After working through this module you will be able to:

  1. read and write vector geospatial data using GeoPandas.
  2. read and write raster geospatial data using Rasterio.
  3. visualize geospatial data and make maps using GeoPandas, Rasterio, and matplotlib.
  4. perform a wide variety of spatial analysis tasks using WhiteboxTools.

Since there are many interrelated libraries and modules that depend on one another, setting up an environment for working with geospatial data in Python can be tricky. So, make sure to watch the provided set-up video.

In this first set of code, I am reading in the required libraries. At this point, you should be familiar with loading in NumPy, Pandas, and matplotlib. Once installed, GeoPandas, Contextily, Rasterio, and EarthPy are easy to import.

The WhiteboxTools import is a bit different. Have a look at the installation and use within Python directions made available as part of the software documentation.

I have used the standard alias names for all of the packages or modules in the imports below.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import geopandas as gpd
import contextily as ctx
import rasterio as rio
import whitebox
wbt = whitebox.WhiteboxTools()
import earthpy as et
import earthpy.spatial as es
import earthpy.plot as ep
plt.rcParams['figure.figsize'] = [10, 8]

GeoPandas

First, I am reading in a polygon vector layer of West Virginia county boundaries using GeoPandas. The plot() method is then used to display the data using matplotlib and some default settings. The axes values represent measurements within the projected coordinate system.

wv_c = gpd.read_file("C:/Users/amaxwel6/python_ds/data/spatial/wv_counties.shp")
wv_c.plot()
<AxesSubplot:>

png

Using the head() method, you can see that the structure of these data are very similar to a Pandas DataFrame. In fact, it is a Pandas DataFrame with one key addition: the geometric information is stored in a column in the table. If you are familiar with vector data models, this is similar to other object-based models in which the geometric and tabulated data are stored in the same object or table. This is similar to a feature class in a file geodatabase or an sf object in R.

On of the advantages of referencing our vector data using this method is that we can apply the Pandas methods discussed in the prior module to our geospatial data. For example, you can query and subset records, add new columns, group and summarize, and perform table joins.

print(wv_c.head())
           AREA      PERIMETER        NAME  STATE  FIPS STATE_ABBR  \
0  2.819364e+08   69539.619328        Ohio     54 -1146         WV   
1  8.071574e+08  125167.039523    Marshall     54 -1148         WV   
2  1.686168e+09  179373.536795     Preston     54 -1145         WV   
3  5.954419e+08  145624.281267      Morgan     54 -1147         WV   
4  9.470626e+08  164158.984954  Monongalia     54 -1147         WV

   SQUARE_MIL  POP2000  POP00SQMIL  MALE2000  ...   1999   1990   2000   2001  \
0     106.176    47427       446.7     22177  ...  47719  50871  47427  46653   
1     306.994    35519       115.7     17288  ...  34968  37356  35519  35294   
2     648.324    29334        45.2     14535  ...  29814  29037  29334  29299   
3     228.983    14943        65.3      7343  ...  13895  12128  14943  15227   
4     361.159    81866       226.7     41291  ...  77006  75509  81866  82315

    2002   2003   2004   2005   2006  \
0  46241  45612  45309  44958  44662   
1  34999  34837  34602  34250  33896   
2  29628  29684  29817  30052  30384   
3  15317  15545  15745  15987  16337   
4  82705  83670  84034  84592  84752

                                            geometry  
0  POLYGON ((522753.988 4431484.270, 522933.396 4...  
1  POLYGON ((522753.988 4431484.270, 522920.947 4...  
2  POLYGON ((605953.930 4397497.446, 606406.156 4...  
3  POLYGON ((749219.462 4385550.735, 749158.156 4...  
4  POLYGON ((549590.510 4396972.754, 550334.558 4...

[5 rows x 51 columns]

The crs attribute lists the spatial coordinate system information for the file. It is generally a good idea to explore this information to make sure that the projection information is read in correctly. Projections are referenced using European Petroleum Survey Group (EPSG) codes. Here, you can see that the data are projected into UTM Zone 17 North. The geographic coordinate system or datum is the North American Datum of 1983. The easting and northing measurements are reported in meters. The area of use of the coordinate system is also defined.

wv_c.crs
<Derived Projected CRS: EPSG:26917>
Name: NAD83 / UTM zone 17N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: North America - between 84°W and 78°W - onshore and offshore. Canada - Nunavut; Ontario; Quebec. United States (USA) - Florida; Georgia; Kentucky; Maryland; Michigan; New York; North Carolina; Ohio; Pennsylvania; South Carolina; Tennessee; Virginia; West Virginia.
- bounds: (-84.0, 23.81, -78.0, 84.0)
Coordinate Operation:
- name: UTM zone 17N
- method: Transverse Mercator
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

The bounds() method provides the rectangular bounds of each feature in the data set relative to the coordinate system as a Pandas DataFrame.

In contrast, the total_bounds() method provides the bounds for the entire dataset as an array.

print(wv_c.bounds.head())
            minx          miny           maxx          maxy
0  522261.974350  4.429689e+06  541040.468414  4.448459e+06
1  511107.417585  4.396784e+06  541195.100727  4.431487e+06
2  594643.292184  4.339496e+06  630729.073926  4.397910e+06
3  717612.679003  4.363827e+06  755619.845699  4.397837e+06
4  549531.827256  4.365691e+06  605990.034387  4.397497e+06
wv_c.total_bounds
array([ 355936.99218436, 4117434.51500363,  782843.89838413,
       4498773.09408554])

The GeoPandas centroid method returns the centroid of each feature in the data set. There are a variety of spatial operations that can be performed using GeoPandas and its dependencies. However, we will focus on WhiteboxTools in this course. If you are interested in the operations available, please review the GeoPandas documentation referenced above.

wv_cen = wv_c.centroid
wv_cen.plot()
<AxesSubplot:>

png

Contextily allows for raster tile base maps to be added to maps produced with matplotlib.

In the example below, the geospatial data are not drawing at the correct location. This is because the raster tile base maps are referenced to the Web Mercator projection whereas the counties data are referenced to UTM Zone 17 North. Python is not able to perform projection on-the-fly operations. So, it is important that data layers have the same spatial reference system.

So, in the next set of code I use the to_crs() GeoPandas method and appropriate EPSG code to create a copy of the counties data that have been transformed to the Web Mercator projection. When I recreate the map and add the base map, everything lines up. So, it is important that you consider the impact of projections and make changes where necessary. When performing overlay and spatial analysis, it is preferable that all of your data layers are referenced to the same spatial coordinate reference system.

wv_map = wv_c.plot()
ctx.add_basemap(wv_map)

png

wv_cWebM = wv_c.to_crs("EPSG:3395")
wv_map = wv_cWebM.plot()
ctx.add_basemap(wv_map)

png

The to_file() GeoPandas method allows you to save results back to disk. Since GeoPandas uses Fiona and GDAL, a variety of formats can be read and written.

wv_cWebM.to_file("D:/data/out/wv_counties_wm.shp")

The GeoPandas plot() method allows you to plot or map your vector data using matplotlib. The column parameter allows you to symbolize relative to attribute data (population data in this example). You can also use color ramps available via matplotlib or make your own.

Here is a link to a list of named matplotlib color ramps.

wv_c.plot(column="2006", cmap="BuGn")
<AxesSubplot:>

png

Using methods that we have already discussed in relation to Pandas, I am generating a new population density field from the available population and area attributes. I then symbolize relative to this new field. The scheme argument allows you to specify a classification method.

You can also add a legend, change its position, and set a title as demonstrated in the following examples.

wv_c['pop_den'] = wv_c['2006']/wv_c['SQUARE_MIL'] 
wv_c.plot(column="pop_den", cmap="YlOrRd", scheme="quantiles")
<AxesSubplot:>

png

fig, ax = plt.subplots(1, 1)
wv_c.plot(column='pop_den', cmap="YlOrRd", ax=ax, legend=True, legend_kwds={'label': "Population Density", 'orientation': "horizontal"})
<AxesSubplot:>

png

plt.rcParams['figure.figsize'] = [10, 8]
fig, ax = plt.subplots(1, 1)
wv_c.plot(column='pop_den', cmap="YlOrRd", ax=ax, legend=True, legend_kwds={'label': "Population Density", 'orientation': "vertical"})
<AxesSubplot:>

png

Referencing back to the data visualization module, here I am creating a figure with multiple axes in which to place several maps. I change the scheme argument to compare different legend classification methods.

plt.rcParams['figure.figsize'] = [10, 8]
fig, ax = plt.subplots(2, 2)
wv_c.plot(column="pop_den", ax=ax[0,0], cmap="YlOrRd", scheme="Quantiles")
ax[0,0].set_title("Quantiles", fontsize=24, color="#000000")
ax[0,0].axis("off")
wv_c.plot(column="pop_den", ax=ax[0,1], cmap="YlOrRd", scheme="Percentiles")
ax[0,1].set_title("Percentiles", fontsize=24, color="#000000")
ax[0,1].axis("off")
wv_c.plot(column="pop_den", ax=ax[1,0], cmap="YlOrRd", scheme="EqualInterval")
ax[1,0].set_title("Equal Interval", fontsize=24, color="#000000")
ax[1,0].axis("off")
wv_c.plot(column="pop_den", ax=ax[1,1], cmap="YlOrRd", scheme="NaturalBreaks")
ax[1,1].set_title("Natural Breaks", fontsize=24, color="#000000")
ax[1,1].axis("off")
fig.patch.set_facecolor('#bcd1bc')

png

Lastly, I generate a map layout that incorporates multiple layers. The population density is displayed then overlaid with the county boundaries using boundary.plot(). I then include some point data representing towns. To make sure that all of these layers plot in the same map space, I reference the same axis with the ax parameter.

Here are some helpful links for setting marker and line styles:

plt.rcParams['figure.figsize'] = [10, 8]
fig, ax = plt.subplots(1, 1)
wv_c.plot(column='pop_den', cmap="YlGn", ax=ax, legend=True, legend_kwds={'label': "Population Density", 'orientation': "vertical"})
wv_c.boundary.plot(ax=ax, edgecolor="black")
wv_t = gpd.read_file("D:/data/spatial/towns.shp")
wv_t.plot(ax=ax, marker="D", color="red", markersize=50)
<AxesSubplot:>

png

Rasterio

We will now move on to discuss working with raster geospatial data using Rasterio and EarthPy.

The Rasterio open() method allows for data to be read from disk in either reading or writing mode. Here, we will primarily use reading mode, which is the default, since we just want to connect to the data and not manipulate it on the disk. Rasterio can read a variety of raster data types and formats due its reliance on GDAL.

Once a connection is made to a raster data layer, a variety of attributes are available to describe and explore the data. In the example below, I have printed the file name, number of bands, number of columns, number of rows, bounding geometry, and spatial coordinate system as an EPSG code.

When you are done using a layer, it is generally recommend to disconnect using the close() method.

elev = rio.open("D:/data/spatial/elevation1.tif")
print(elev.name)
print(elev.count)
print(elev.width)
print(elev.height)
print(elev.bounds)
print(elev.crs)
#elev.close()
D:/data/spatial/elevation1.tif
1
2114
1923
BoundingBox(left=551467.8904750665, bottom=4335184.114613933, right=614887.8904750665, top=4392874.114613933)
EPSG:26917

Once a connection is made to a data layer, it can be read into memory as a NumPy array, since raster data are essentially arrays. This is convenient because we can now use NumPy operations, functions, and methods on our gridded data. This is accomplished using the read() method.

elev_arr = elev.read(1)
print(elev_arr)
[[404 411 419 ... 511 513 515]
 [402 409 416 ... 513 514 517]
 [399 406 412 ... 513 516 519]
 ...
 [310 311 312 ... 607 593 590]
 [312 316 315 ... 592 576 577]
 [315 317 315 ... 577 560 565]]

Once the raster data are written to memory as a NumPy array, they can be mapped or visualized with matplotlib using the imshow() method that was demonstrated in the prior data visualization module.

In the following example, I am using multiple axes to display the data using different color ramps.

plt.rcParams['figure.figsize'] = [10, 8]
plt.imshow(elev_arr, cmap="gist_gray")
<matplotlib.image.AxesImage at 0x27242cca940>

png

plt.rcParams['figure.figsize'] = [10, 8]
fig, ax = plt.subplots(2, 2)
ax[0,0].imshow(elev_arr, cmap="gist_gray")
ax[0,0].axis("off")
ax[0,1].imshow(elev_arr, cmap="copper")
ax[0,1].axis("off")
ax[1,0].imshow(elev_arr, cmap="YlOrBr")
ax[1,0].axis("off")
ax[1,1].imshow(elev_arr, cmap="gist_earth")
ax[1,1].axis("off")
fig.patch.set_facecolor('#bcd1bc')

png

It is also possible to read in multiband raster data, such as multispectral imagery, using the same method as that used for a single-band data set. In the example below, I am reading in a Landsat 8 image. Using the available attributes, you can see that there are 7 bands present in the data set.

ls8 = rio.open("D:/data/spatial/ls8example.tif")
print(ls8.name)
print(ls8.count)
print(ls8.width)
print(ls8.height)
print(ls8.bounds)
print(ls8.crs)
D:/data/spatial/ls8example.tif
7
2018
1817
BoundingBox(left=475547.7429123717, bottom=5422062.195491019, right=536094.5524265328, top=5476568.37203741)
EPSG:32610

If no arguments are provided to the read() method, then all bands will be read into the NumPy array. You can also specify a specific band to read in. Unlike most indexing in Python, the band indexing for read() starts at 1.

In the first example below, I have read in all of the bands and obtaining a NumPy array of shape (7, 1817, 2018). When I specify a specific band, I obtain a NumPy array of shape (1817, 2018).

ls8_arr = ls8.read()
print(ls8_arr.shape)

ls8_arr2 = ls8.read(5)
print(ls8_arr2.shape)
(7, 1817, 2018)
(1817, 2018)

In the example below, I am using bracket notation to display one of the bands from the data set. Specifically, I am specifying the index associated with the 7th band, or SWIR2. This display is a bit dim because there is no stretching applied and the values do not fill the full radiometric range. In the next line of code, I investigate the distribution of the values then use this information to set min and max values to improve the display. This is accomplished using the vmin and vmax arguments.

plt.rcParams['figure.figsize'] = [10, 8]
plt.imshow(ls8_arr[6,:,:], cmap="gist_gray")
<matplotlib.image.AxesImage at 0x27242bf8370>

png

print(ls8_arr[6,:,:].max())
print(ls8_arr[6,:,:].min())
print(ls8_arr[6,:,:].mean())
255
0
9.92667778654738
plt.rcParams['figure.figsize'] = [10, 8]
plt.imshow(ls8_arr[6,:,:], cmap="gist_gray", vmin=0, vmax=50)
<matplotlib.image.AxesImage at 0x272489c80a0>

png

Now, I am using multiple axes to display the blue, green, red, and NIR bands. This is the band order in this data set:

  • 1: Blue Edge
  • 2: Blue
  • 3: Green
  • 4: Red
  • 5: NIR
  • 6: SWIR1
  • 7: SWIR2

Make sure you understand how the array indexing was used to reference the correct band for each map.

plt.rcParams['figure.figsize'] = [10, 8]
fig, ax = plt.subplots(2, 2)
ax[0,0].imshow(ls8_arr[1,:,:], cmap="Blues", vmin=0, vmax=25)
ax[0,0].axis("off")
ax[0,0].set_title("Blue", fontsize=24, color="#000000")
ax[0,1].imshow(ls8_arr[2,:,:], cmap="Greens", vmin=0, vmax=25)
ax[0,1].axis("off")
ax[0,1].set_title("Green", fontsize=24, color="#000000")
ax[1,0].imshow(ls8_arr[3,:,:], cmap="Reds", vmin=0, vmax=25)
ax[1,0].axis("off")
ax[1,0].set_title("Red", fontsize=24, color="#000000")
ax[1,1].imshow(ls8_arr[4,:,:], cmap="Oranges", vmin=0, vmax=50)
ax[1,1].axis("off")
ax[1,1].set_title("NIR", fontsize=24, color="#000000")
fig.patch.set_facecolor('#bcd1bc')

png

I prefer to use the EarthPy library for displaying multispectral data as true color and false color composites. This can be accomplished with the plot_rgb() function and mapping the desired bands to the red, green, and blue color channels. Here, I am displaying the data using simulated true color where red is mapped to red, green is mapped to green, and blue is mapped to blue. Again, indexing starts at 0. The stretch argument allows for a contrast stretch to be applied, and the str_clip argument allows the specification of the percentage of high and low values to clip.

In the following examples, I generate different false color composites. Again, make sure you understand the band indexing.

ep.plot_rgb(ls8_arr,
            rgb=[3, 2, 1],
            stretch=True,
            str_clip=1)
plt.show()

png

ep.plot_rgb(ls8_arr,
            rgb=[4, 3, 2],
            stretch=True,
            str_clip=1)
plt.show()

png

ep.plot_rgb(ls8_arr,
            rgb=[6, 4, 2],
            stretch=True,
            str_clip=1)
plt.show()

png

The plot_bands() function allows for each band to be plotted separately using multiple axes within a single figure.

The hist() function will produce a histogram for each image band.

ep.plot_bands(ls8_arr)
plt.show()

png

band_titles = ["Blue Edge", "Blue", "Green", "Red",
               "NIR", "SWIR1", "SWIR2"]
ep.hist(ls8_arr,
        title=band_titles)
plt.show()

png

Again, there are some functions and methods available for analyzing raster data with Rasterio and EarthPy. However, that will not be covered here. If you are interested in that topic, please consult the documentation referenced above.

WhiteboxTools for Spatial Analysis

Here, we will focus on using WhiteboxTools for performing spatial analysis operations. I have decided to focus on this tool set because the functions are easy to implement and a large number of tools are available. Another benefit is that WhiteboxTools does not require a lot of dependencies and is, thus, easy to set up. Again, there are other options, including GeoPandas, Rasterio, and ArcPy. Throughout this section I will use GeoPandas, Rasterio, and matplotlib to visualize the input data and results.

First, I create variables to reference the path to a set of data layers stored on my local disk. This will simplify the later code. Also, I turn off the verbose mode for WhiteboxTools so that progress information is not printed to my console. I am mainly doing this to make a cleaner web page; you do not need to do this for your own analyses. In fact, I would recommend maintaining the default verbose printing, as this can be informative.

wshed = "D:/data/spatial/watersheds.shp"
wv_t = "D:/data/spatial/towns.shp"
wv_c = "D:/data/spatial/wv_counties.shp"
riv = "D:/data/spatial/rivers.shp"
elev = "D:/data/spatial/elevation1.tif"
elev2 = "D:/data/spatial/elev_lidar.tif"
lc = "D:/data/spatial/lc_example.tif"
pnts = "D:/data/spatial/example_points.shp"
t1 = "D:/data/spatial/harvest.csv"
out_fold = "D:/data/out/"
wbt.verbose=False

Example 1: Dissolve

First, I perform a simple dissolve on a set of watershed boundaries. Note that the output is saved to disk as opposed to memory, which is generally how WhiteboxTools operates. I am also specifying a field on which to dissolve. If no field is defined, then all features will be dissolved to a single feature.

Currently, WhiteboxTools only supports shapefiles. So, you will need to convert other vector formats for use.

Once the results are obtained, I read them back in using GeoPandas then plot using matplotlib.

wbt.dissolve(
    wshed, 
    out_fold + "wshed_diss.shp", 
    field="ICU", 
    snap=0.0, 
)
0
wshed2 = gpd.read_file(wshed)
diss = gpd.read_file(out_fold + "wshed_diss.shp")
plt.rcParams['figure.figsize'] = [10, 8]
fig, ax = plt.subplots(1, 2)
wshed2.plot(ax=ax[0], legend=False)
ax[0].axis("off")
diss.plot(column='ICU', cmap="Dark2", ax=ax[1])
ax[1].axis("off")
fig.patch.set_facecolor('#bcd1bc')

png

Example 2: Table Join

This example is actually not a WhiteboxTools example. However, I will use the results in a later example.

Here, I am using Pandas to perform a table join to associate some deer harvest data from the West Virginia Division of Natural Resources with each county. Since the common field has a different name in each dataset, I use the rename() method to change one of the names to match. I then use merge() to perform the join. Specifically, I use a left join so that all counties are maintained. I then print the resulting table.

The geometry information is maintained, so I am able to map the results.

wshed2 = gpd.read_file(wv_c)
harv = pd.read_csv(t1, sep=",", header=0, encoding="ISO-8859-1")
wshed2.rename(columns={'NAME':'COUNTY'}, inplace=True)
dnr_merge = pd.merge(wshed2, harv, how="left", on="COUNTY")
print(dnr_merge.head())
           AREA      PERIMETER      COUNTY  STATE  FIPS STATE_ABBR  \
0  2.819364e+08   69539.619328        Ohio     54 -1146         WV   
1  8.071574e+08  125167.039523    Marshall     54 -1148         WV   
2  1.686168e+09  179373.536795     Preston     54 -1145         WV   
3  5.954419e+08  145624.281267      Morgan     54 -1147         WV   
4  9.470626e+08  164158.984954  Monongalia     54 -1147         WV

   SQUARE_MIL  POP2000  POP00SQMIL  MALE2000  ...   2005   2006  \
0     106.176    47427       446.7     22177  ...  44958  44662   
1     306.994    35519       115.7     17288  ...  34250  33896   
2     648.324    29334        45.2     14535  ...  30052  30384   
3     228.983    14943        65.3      7343  ...  15987  16337   
4     361.159    81866       226.7     41291  ...  84592  84752

                                            geometry  OID_  BUCK  ANTLERLESS  \
0  POLYGON ((522753.988 4431484.270, 522933.396 4...   NaN   527         625   
1  POLYGON ((522753.988 4431484.270, 522920.947 4...   NaN  1719        1658   
2  POLYGON ((605953.930 4397497.446, 606406.156 4...   NaN  2041        2825   
3  POLYGON ((749219.462 4385550.735, 749158.156 4...   NaN   680         772   
4  POLYGON ((549590.510 4396972.754, 550334.558 4...   NaN  1808        2138

   MUZZLELOAD   BOW  TOTALDEER           REGIONNAME  
0         117   273       1542   Northern Panhandle  
1         277   506       4160   Northern Panhandle  
2         512  1075       6453  Mountaineer Country  
3         109   163       1724    Eastern Panhandle  
4         415   709       5070  Mountaineer Country

[5 rows x 58 columns]
plt.rcParams['figure.figsize'] = [10, 8]
fig, ax = plt.subplots(1, 1)
dnr_merge.plot(column='TOTALDEER', cmap="YlOrRd", ax=ax, legend=True, legend_kwds={'label': "Total Harvest", 'orientation': "vertical"})
fig.patch.set_facecolor('#bcd1bc')

png

Example 3: Vector Clip

Using the joined attributes, I now use Pandas to extract out counties that are in the Eastern Panhandle region, information that was originally in the deer harvest table.

I then save this result to file using the GeoPandas to_file() method. Lastly, I use this county subset to perform a clip to extract towns in the Eastern Panhandle. The results are then mapped.

ep = dnr_merge[(dnr_merge["REGIONNAME"]=="Eastern Panhandle")]
ep.to_file(out_fold + "ep.shp")
ep2 = out_fold + "ep.shp"
wbt.clip(
    wv_t, 
    ep2, 
    out_fold + "ep_towns.shp"
)
0
ep3 =  gpd.read_file(out_fold + "ep.shp")
ep_t = gpd.read_file(out_fold + "ep_towns.shp")
plt.rcParams['figure.figsize'] = [10, 8]
fig, ax = plt.subplots(1, 1)
ep.plot(ax=ax, color="gray")
ep.boundary.plot(ax=ax, edgecolor="black")
ep_t.plot(ax=ax, marker="D", color="red", markersize=50)
<AxesSubplot:>

png

Example 4: Raster Clip

It is also possible to clip raster data relative to vector polygon boundaries. Here, I am clipping the elevation data relative to the watershed boundaries. The maintain_dimensions argument is set to False. If set to True, the extent of the input data will be maintained, with any cells outside of the mask or clip extent set to NoData.

I then map the results. The same min and max values are used for both grids so that the color ramps are directly comparable. Also, I am using Rasterio to read in the data and convert them to NumPy arrays.

wbt.clip_raster_to_polygon(
    elev, 
    wshed, 
    out_fold + "dem_clip.tif", 
    maintain_dimensions=False, 
)
0
elev_x = rio.open(elev)
elev_x_arr = elev_x.read()
elev_y = rio.open(out_fold + "dem_clip.tif")
elev_y_arr = elev_y.read()
plt.rcParams['figure.figsize'] = [10, 8]
fig, ax = plt.subplots(1, 2)
ax[0].imshow(elev_x_arr[0,:,:], cmap="gist_gray", vmin=elev_x_arr[0,:,:].min(), vmax=elev_x_arr[0,:,:].max())
ax[0].axis("off")
ax[1].imshow(elev_y_arr[0,:,:], cmap="gist_gray", vmin=elev_x_arr[0,:,:].min(), vmax=elev_x_arr[0,:,:].max())
ax[1].axis("off")
fig.patch.set_facecolor('#bcd1bc')

png

Example 5: Intersection

Below, I am performing an intersection of the county and watershed boundaries. Only boundaries within the overlapping or shared area are maintained.

wbt.intersect(
    wshed, 
    wv_c, 
    out_fold + "isect.shp", 
    snap=0.0, 
)
0
wshed2 = gpd.read_file(wshed)
isect2 = gpd.read_file(out_fold + "isect.shp")
plt.rcParams['figure.figsize'] = [10, 8]
fig, ax = plt.subplots(1, 2)
wshed2.plot(ax=ax[0], edgecolor="black", linewidth=3, legend=False)
ax[0].axis("off")
isect2.plot(ax=ax[1], edgecolor="black", linewidth=3, legend=False)
ax[1].axis("off")
fig.patch.set_facecolor('#bcd1bc')

png

Example 6: Raster Resample

A function is available to resample raster data, or change the cell size. The following methods are available: nearest neighbor ("nn"), bilinear interpolation ("bilinear"), and cubic convolution ("cc"). Here, I am using cubic convolution.

By comparing the number of rows and columns in the original and new data sets, you can see that the spatial resolution has been reduced.

wbt.resample(
    elev, 
    out_fold + "elev_re.tif", 
    cell_size=100, 
    base=None, 
    method="cc", 
)
0
elev_x = rio.open(elev)
elev_y = rio.open(out_fold + "elev_re.tif")
print(elev_x.height)
(elev_y.height)
1923





577

Example 7: Multidirectional Hillshade

WhiteboxTools has a large set of terrain and geomorphic analysis tools available. Here, I will demonstrate a few using a high spatial resolution digital elevation model (DEM) derived from LiDAR. In the first example, I am creating a multidirectional hillshade from the DEM. In contrast to a traditional hillshade, multidirectional hillshades combine multiple illuminating positions. The altitude parameter allows you to set the height of the sun above the horizon while the full_mode parameter relates to the number of sun positions used. A zfactor can be defined for exaggeration or if the vertical units are different from the horizontal units.

wbt.multidirectional_hillshade(
    elev2, 
    out_fold + "md_hs.tif", 
    altitude=45.0, 
    zfactor=None, 
    full_mode=False, 
)
0
md_hs = rio.open(out_fold + "md_hs.tif")
md_hs2 = md_hs.read()
plt.rcParams['figure.figsize'] = [10, 8]
plt.imshow(md_hs2[0,:,:], cmap="gist_gray")
<matplotlib.image.AxesImage at 0x27245d54460>

png

Example 8: Relative Topographic Position

This example shows how to create a relative topographic position index, which is essentially a measure of hillslope position (ridge vs. valley). This is an example of a tool that relies on a moving window, so the size of the window is defined using the filterx and filtery parameters. Larger window sizes will produce more generalized patterns.

wbt.relative_topographic_position(
    elev2, 
    out_fold + "tp.tif", 
    filterx=11, 
    filtery=11, 
)
0
tp = rio.open(out_fold + "tp.tif")
tp2 = tp.read()
plt.rcParams['figure.figsize'] = [10, 8]
plt.imshow(tp2[0,:,:], cmap="gist_gray")
<matplotlib.image.AxesImage at 0x27245da1be0>

png

Example 9: Pennock Landform Classification

The Pennock landform classification generates a categorical raster that differentiates the following classes based on slope gradient, profile curvature, and plan curvature:

  • CFS: convergent footslope
  • DFS: divergent footslope
  • CSH: convergent shoulders
  • DSH: divergent shoulders
  • CBS: convergent backslope
  • DBS: divergent backslope
  • L: level terrain

Symbolizing a raster grid categorically with matplotlib is a bit complex. My example was derived from this example from Earth Lab. Here is a summary of the process:

  1. Import the required modules from matplotlib.
  2. Define a color map using the matplotlib ListedColormap method, which generates an object of the Colormap class.
  3. Define the data intervals to map to each color using the BoundaryNorm method, which generates a colormap index based on discrete intervals.
  4. Use the defined color map and boundaries to render the raster using the cmap and norm arguments of imshow.
  5. Add a legend and define legend labels as a dictionary to associate each color with a category name.

Here, I used named matplotlib colors. A list of named colors is provided here. You can also use RGB values or hex codes.

wbt.pennock_landform_class(
    elev2, 
    out_fold + "plf.tif", 
    slope=3.0, 
    prof=0.1, 
    plan=0.0, 
    zfactor=None, 
)
0
from matplotlib.patches import Patch
from matplotlib.colors import ListedColormap
import matplotlib.colors as colors

plf = rio.open(out_fold + "plf.tif")
plf2 = plf.read()

cmap = ListedColormap(["black", "firebrick", "peru", "darkkhaki", "slategray", "seagreen", "royalblue", "Gray"])
norm = colors.BoundaryNorm([0, 1, 2, 3, 4, 5, 6, 7, 8], 8, clip=False)

fig, ax = plt.subplots(figsize=(10, 8))
chm_plot = ax.imshow(plf2[0,:,:],
                     cmap=cmap,
                     norm=norm)
ax.set_title("Pennock Landform Class", fontsize=24, color="#000000")
legend_labels = {"Black": "Background", "firebrick": "CFS", "peru": "DFS", "darkkhaki": "CSH", "slategray": "DSH", "seagreen": "CBS", "royalblue": "DBS","gray": "L"}
patches = [Patch(color=color, label=label)
           for color, label in legend_labels.items()]
ax.legend(handles=patches,
          bbox_to_anchor=(1.35, 1),
          facecolor="white")
ax.set_axis_off()
fig.patch.set_facecolor('#bcd1bc')
plt.show()

png

Example 10: Find Ridges

The code below demonstrates the identification of ridges from a DEM. The output is a categorical raster, where cells coded as 1 represent mapped ridges. You can also post-process the result to thin ridge lines.

wbt.find_ridges(
    elev2, 
    out_fold + "ridge.tif", 
    line_thin=True, 
)
0
rid = rio.open(out_fold + "ridge.tif")
rid2 = rid.read()
plt.rcParams['figure.figsize'] = [10, 8]
plt.imshow(rid2[0,:,:], cmap="gist_gray")
<matplotlib.image.AxesImage at 0x27245d12f40>

png

Example 11: Raster Equal To

WhiteboxTools includes a variety of logical, mathematical, and statistical operations that can be applied to raster data, similar to the math and Raster Calculator functions available in ArcGIS Pro and QGIS. It is also possible to perform mathematical and logical operations using NumPy once raster grids are converted to NumPy arrays.

In the example below, I am finding all grid cells in a land cover data set that are coded to the value 41, which represents deciduous forest. I then plot the result using a categorical symbology, which was explained above.

wbt.equal_to(
    lc,
    41,
    out_fold + "for.tif"
)
0
for1 = rio.open(out_fold + "for.tif")
for2 = for1.read()

cmap = ListedColormap(["whitesmoke", "forestgreen"])
norm = colors.BoundaryNorm([0, 1, 2], 2, clip=False)

fig, ax = plt.subplots(figsize=(10, 8))
chm_plot = ax.imshow(for2[0,:,:],
                     cmap=cmap,
                     norm=norm)
ax.set_title("Deciduous Forest", fontsize=24, color="#000000")
legend_labels = {"whitesmoke":"Not Forest", "forestgreen":" Deciduous Forest"}
patches = [Patch(color=color, label=label)
           for color, label in legend_labels.items()]
ax.legend(handles=patches,
          bbox_to_anchor=(1.35, 1),
          facecolor="gray")
ax.set_axis_off()
fig.patch.set_facecolor('#bcd1bc')
plt.show()

png

Example 12: Raster Greater Than

This is an example of applying the greater than operation to a raster grid. The output will be binary, where 0 represents False and 1 represents True. The incl_equals argument when set to False means greater than while True indicates greater than or equal to.

I have specifically used this function to find all cells in a DEM with an elevation value greater than 600 meters. These cells are coded to 1, and all other cells are coded to 0.

wbt.greater_than(
    elev2,
    600,
    out_fold + "elev_bi.tif",
    incl_equals=False
)
0
ebi1 = rio.open(out_fold + "elev_bi.tif")
ebi2 = ebi1.read()

cmap = ListedColormap(["gainsboro", "orangered"])
norm = colors.BoundaryNorm([0, 1, 2], 2, clip=False)

fig, ax = plt.subplots(figsize=(10, 8))
chm_plot = ax.imshow(ebi2[0,:,:],
                     cmap=cmap,
                     norm=norm)
ax.set_title("Elevation Binary", fontsize=24, color="#000000")
legend_labels = {"gainsboro":"<= 800 m", "forestgreen":"> 800 m"}
patches = [Patch(color=color, label=label)
           for color, label in legend_labels.items()]
ax.legend(handles=patches,
          bbox_to_anchor=(1.35, 1),
          facecolor="gray")
ax.set_axis_off()
fig.patch.set_facecolor('#bcd1bc')
plt.show()

png

Example 13: Extract Raster Values at Points

Raster values can be extracted at point locations. If the out_text parameter is set to True, the result will be provided as text. If it is set to False, the result will be added to the point attribute table.

ele_ext = wbt.extract_raster_values_at_points(
    elev, 
    pnts, 
    out_text=True,
)
print(ele_ext)
0

Example 14: Raster Zonal Stats

Similar to extracting values at point locations, you can also obtain summary statistics relative to zones. This tool will not accept vector polygon zones; instead, the zones must be provided as a categorical raster grid. So, in the example I first convert the watershed boundaries to a raster using the FID as the unique identifier. The base argument is used to align the result to the origin and extent of the DEM layer.

I then use the raster zones to summarize the DEM data. Specifically, I obtain the mean elevation by watershed. The result is written to a raster, where each cell in a watershed is assigned the mean elevation of the watershed. It is also possible to write the output to a table in HTML format.

zones1 = wbt.vector_polygons_to_raster(
    wshed, 
    out_fold + "ws_r.tif", 
    field="FID", 
    nodata=True, 
    cell_size=None, 
    base=elev, 
)
wbt.zonal_statistics(
    elev, 
    features=out_fold + "ws_r.tif", 
    output=out_fold+"ele_mn.tif", 
    stat="mean", 
    out_table=None
)
0
elemn = rio.open(out_fold + "ele_mn.tif")
elemn2 = elemn.read()
plt.rcParams['figure.figsize'] = [10, 8]
plt.imshow(elemn2[0,:,:], cmap="gist_gray", vmin=300, vmax=600)
<matplotlib.image.AxesImage at 0x2724587cdc0>

png

Example 15: LiDAR Class Filter

In these last two examples, I will introduce some tools for working with LiDAR data. There are a wide variety of tools available, so if you are interested in working with and analyzing LiDAR data, I recommend investigating the documentation.

First, I create a copy of the LiDAR data that only includes points classified as ground. Ground points are assigned a code of "2", so all other codes are excluded to filter out just ground returns. Here is what the codes indicate:

  • "0": Never Classified
  • "1": Unclassified
  • "7"/"18": Noise or Outliers
las1 = "C:/Users/amaxwel6/python_ds/data/spatial/CO195.las"
wbt.filter_lidar_classes(
    las1, 
    out_fold + "las_grd.las", 
    exclude_cls='0,1,7,18', 
)
0

Example 16: LiDAR Rasterization

Here, I am generating a digital surface model (DSM) from the point cloud. Next, I created a first return intensity image.

The two surfaces are then visualized using matplotlib after reading the results back in and converting them to NumPy arrays.

wbt.lidar_digital_surface_model(
    las1, 
    output=out_fold+"dsm.tif", 
    resolution=1.0, 
    radius=0.5, 
    minz=None, 
    maxz=None, 
    max_triangle_edge_length=None
)
0
wbt.lidar_idw_interpolation(
    las1, 
    output=out_fold + "las_int.tif", 
    parameter="intensity", 
    returns="first", 
    resolution=1.0, 
    weight=1.0, 
    radius=2.5, 
    exclude_cls='7,18', 
    minz=None, 
    maxz=None, 
)
0
dsm = rio.open(out_fold + "dsm.tif")
dsm2 = dsm.read()
fint = rio.open(out_fold + "las_int.tif")
fint2 = fint.read()
plt.rcParams['figure.figsize'] = [10, 8]
fig, ax = plt.subplots(1, 2)
ax[0].imshow(dsm2[0,:,:], cmap="gist_gray", vmin=620, vmax=760)
ax[0].set_title("DSM", fontsize=24, color="#000000")
ax[0].axis("off")
ax[1].imshow(fint2[0,:,:], cmap="gist_gray", vmin=128, vmax=16200)
ax[1].set_title("Intensity", fontsize=24, color="#000000")
ax[1].axis("off")
fig.patch.set_facecolor('#bcd1bc')

png

Using only the ground returns, which were filtered out above, I generate another DSM. Since only ground returns are used, this is actually a digital terrain model (DTM). Next, I create a hillshade from the DTM.

In order to created a normalized digital surface model (nDSM), I convert the DSM and DTM to NumPy arrays and then subtract them to obtain a measure of height of features above the ground.

Finally, I plot all three results.

wbt.lidar_digital_surface_model(
    out_fold + "las_grd.las", 
    output=out_fold+"dem.tif", 
    resolution=1.0, 
    radius=0.5, 
    minz=None, 
    maxz=None, 
    max_triangle_edge_length=None
)
0
wbt.multidirectional_hillshade(
    out_fold + "dem.tif", 
    out_fold + "lid_hs.tif", 
    altitude=45.0, 
    zfactor=None, 
    full_mode=False, 
)
0
dem = rio.open(out_fold + "dem.tif")
dem2 = dem.read()
lhs = rio.open(out_fold + "lid_hs.tif")
lhs2 = lhs.read()

ndsm = dsm2[0,:,:] - dem2[0,:,:] 

plt.rcParams['figure.figsize'] = [18, 12]
fig, ax = plt.subplots(1, 3)
ax[0].imshow(dem2[0,:,:], cmap="YlOrBr", vmin=620, vmax=760)
ax[0].set_title("DEM", fontsize=24, color="#000000")
ax[0].axis("off")
ax[1].imshow(lhs2[0,:,:], cmap="gist_gray", vmin=0, vmax=30000)
ax[1].set_title("Hillshade", fontsize=24, color="#000000")
ax[1].axis("off")
fig.patch.set_facecolor('#bcd1bc')
ax[2].imshow(ndsm, cmap="YlGn", vmin=0, vmax=35)
ax[2].set_title("nDSM", fontsize=24, color="#000000")
ax[2].axis("off")
fig.patch.set_facecolor('#bcd1bc')

png

Again, WhiteboxTools contains more than 440 tools for working with and analyzing geospatial data. As you can see from the provided examples, the syntax is similar for each tool, and the package is generally easy to use. If you are interested in investigating WhiteboxTools further, please review the documentation.

Concluding Remarks

In this module, I demonstrated the use of GeoPandas, Rasterio, and matplotlib for reading, writing, and visualizing geospatial data using Python. I then provides some examples of spatial analysis operations that can be performed using WhiteboxTools. If you have specific interests or needs, I think you will find that my examples, the provided library/module documentation, and online help and forums will be adequate to get you started.

Lastly, there is still active development of geospatial libraries/modules for use in Python. So, you may find new tools to apply to your work. With an understanding of basic Python, GIS, and data science, you should be able to use examples and documentation to apply techniques to your own analyses.