Open-Source Spatial Analytics (R)

Introduction

Welcome to Open-Source Spatial Analytics. The materials presented here were created by West Virginia View with funding from AmericaView and the United States Geological Survey. The goal of this course is to help geospatial and earth scientists get started using the R data science environment and coding language. R is a powerful tool for preparing, visualizing, and analyzing a wide variety of data types, including geospatial data. My goal in this first section is to help you get up and running to work through the modules that constitute this course.

Who is this course for?

This course was designed for individuals that have prior experience working with geospatial data. It is assumed that you have prior knowledge of geospatial data types (e.g., raster grids and vector layers), map projections, map design and geospatial data visualization, and spatial analysis. If you do not have any prior knowledge of geospatial data analysis and GIS, we would recommend taking our Introduciton to GIScience course first. It is not assumed that you have any prior knowledge using R or coding in general. The first few modules will introduce the R coding language from scratch. If you are hoping to learn how to code, learning R can be a good stepping stone for learning other languages and environments, such as Python.

What resources are required?

R is open-source and free. It can be installed on all major operating systems (Windows, MacOS, and Linux distros). We will also use the RStudio integrated development environment (IDE), which is free to install on personal computers. In short, you do not need to purchase any software, data, or books but do need to have access to a computer. We recommend at least 4GB of RAM and 10GB of storage space. All data used in the examples or required to complete the assignments have been provided, except when you are asked to use your own data or find data. We have also provided the R Markdown files used to create all the modules. All modules should work in all major web browsers.

What topics will be covered?

The material for this course is organized into modules, and all modules have been organized into three overarching sections. Here is a brief overview of the course content.

  • Part I: Using R for Data Analysis
    • Setting Up R: Install R and RStudio on your machine and learn the basic layout of RStudio
    • R Language Part I: basics of the R language including variables, data types, data models, and reading in and writing out files
    • Data Manipulation: prepare, wrangle, subset, query, and clean data for analysis using base R and the dplyr package
    • Strings and Factors: manipulate and work with strings and factors using stringr and forcats
    • R Language Part II: further exploration of the R language including functions, loops, and flow control
    • Data Summarization and Statistical Tests: summarize data using base R and the tidyverse and perform common statistical tests, such as assessment of correlation, t-tests, and ANOVA.
    • R Markdown: create documents, reports, webpages, and PDFs using R Markdown
    • ggplot2 Part I: learn the basics of data visualization and graphing with ggplot2 with a focus on aesthetic mappings and different types of univariate and multivariate graphs
    • ggplot2 Part 2: clean up and edit ggplot2 graphs for eventual export and publication
    • tables with gt: create and design tables using gt
  • Part II: Spatial Analytics in R
    • Working with Spatial Data: read in and explore spatial data using sp, sf, rgdal, and raster
    • Maps with tmap: create map layouts and thematic maps using tmap
    • Additional Map Examples: make thematic maps using mapsf and cartogram
    • Interactive Maps with Leaflet: create interactive web maps using Leaflet in R
    • Vector-Based Spatial Analysis: analyze vector geospatial data using sf
    • Raster-Based Spatial Analysis: analyze raster geospatial data using raster
    • Raster-Based Spatial Analysis (terra): introduction to the raster package's successor: terra
    • LiDAR and Imagery: use the lidR package to work with and analyze LiDAR point clouds and perform common image analysis operations on remotely sensed, multispectral data sets
  • Part III: Machine Learning and Spatial Predictive Modeling
    • Machine Learning Background: overview of machine learning and spatial predictive modeling as a video series
    • Random Forests: use random forests to make spatial probabilistic models and evaluate results
    • Machine Learning with caret: use caret to implement a wide variety of machine learning algorithms, tune hyperparameters, and assess models
    • Machine Learning with tidymodels: introduction to caret’s successor: tidymodels

Do I need to start at the beginning?

If you have some prior experience with R, you do not need to work through all the material. However, if you are starting from scratch, we recommend working through the material in the order provided. If you already have a good grasp of R and the tidyverse packages, feel free to skip ahead to the spatial topics (Part II). However, I would at least encourage you to skim the prior sections.

Now what?

To get started, please progress to the next module where you will be guided through installing R and the RStudio IDE. We hope you enjoy the course.