Tables with gt

Objectives

  1. Generate well-formatted and presentable tables using the gt package
  2. Understand the philosophy used by gt
  3. Use the gt documentation

Overview

As we explored in the previous modules, ggplot2 uses the grammar of graphics to allow for defining and generating a wide variety of plots. Many functions are available to customize and improve the output for inclusion in presentations, papers, or reports. Also, ggplot2 follows the tidyverse philosophy. Recently, the gt package has been released to allow for the generation of clean, presentation-ready tables. Similar to ggplot2, tables are broken into their constituent components, which can then be defined and modified. Resulting tables can be saved to HTML output for use on webpages.

The following table components are defined:

  • Table header
    • Title: main plot title
    • Subtitle: plot subtitle
  • Stub head: label for stub/row names
  • Column labels
    • Spanner column labels: allow for grouping columns under a spanner or heading
    • Column labels: column names
  • Stub
    • Row group label: allow for grouping rows
    • Row label: first column as label for each row
    • Row summary label: label for a row representing summary data aggregated from other rows
  • Table body
    • Cell: data presented in table
  • Table footer
    • Footnotes: notes tied to specific table elements or the entire table
    • Source notes: provide or cite data sources

Documentation for gt can be found here and here. There are other packages available for generating tables in R that we will not discuss here including knitr, kableExtra, formattable, DT, pander, huxtabler, reactable, flextable,ftextra pixiedust, tangram, ztable, condformat, stargazer, and xtable.

In this module, I will provide an overview of the gt package by stepping through an example. We will work with the high_plains_data.csv file. The elevation (“elev”), temperature (“temp”), and precipitation (“precip2”) data were extracted from raster grids provided by the PRISM Climate Group at Oregon State University. The elevation data are provided in feet, the temperature data represent 30-year annual normal mean temperature in degrees Celsius, and the precipitation data represent 30-year annual normal precipitation in inches. The original raster grids have a resolution of 4-by-4 kilometers and can be obtained here. I also summarized percent forest by county (“per_for”) from the 2011 National Land Cover Database (NLCD). NLCD data can be obtained from the Multi-Resolution Land Characteristics Consortium (MRLC). The link at the bottom of the page provides the example data and R Markdown file used to generate this module.

Data Preparation

First, you will need to install gt using either install.packages(“gt”) or devtools::install_github(“rstudio/gt”). You can also use the Install Packages tool in RStudio.

Our goal will be to generate a table of state-level summary info. So, we will begin with some data manipulation using dplyr. First, I read in the data table using read.csv(). Next, perform some data manipulation as follows.

  1. Calculate the county-level median values for total population (“TOTPOP10”), percent forest cover (“per_for”), elevation (“elev”), and temperature (“temp”).
  2. Calculate the sum of all county populations in each state to obtain a total state population.
  3. Join the two summaries into a new table using a left join and the common “STATE_NAME” column.
library(dplyr)
library(gt)
hp <- read.csv("D:/ossa/gt_tables/high_plains_data.csv", header=TRUE, sep=",", stringsAsFactors = TRUE)
hp1 <- hp %>% group_by(STATE_NAME) %>%
  summarize_at(c("TOTPOP10", "per_for", "elev", "temp", "precip2"), median)
hp2 <- hp %>% group_by(STATE_NAME) %>%
   summarize(tot_pop = sum(TOTPOP10))
hp3 <- left_join(hp2, hp1, by="STATE_NAME")

Create Table

To create a basic table, I first pipe the summarized data frame into the gt() function. This results in a table that includes all columns in the table and the column names.

hp3 %>% gt() 
STATE_NAME tot_pop TOTPOP10 per_for elev temp precip2
Colorado 5029196 15083.5 22.7831594 2229.0180 7.013425 4.703404
Kansas 2853118 7053.0 1.4923449 458.9333 12.683535 7.486343
Montana 989415 7198.0 5.9975489 1223.4162 5.966852 4.042198
Nebraska 1826341 6274.0 0.8298879 618.9475 9.787200 6.689845
North Dakota 672591 4153.0 0.3632199 551.1226 5.167969 4.659844
South Dakota 814180 5369.5 0.3260004 522.9511 7.795871 5.703697
Utah 2763885 20802.0 20.5519685 1901.3015 8.874297 3.912545
Wyoming 563626 15885.0 4.6158960 1826.9601 6.300659 3.807544

Next, I would like to edit the column names so that they are more interpretable and also include the units of measurement. This is accomplished using cols_label(). I also use the html() function to specify some line breaks and the degrees symbol.

hp3 %>% gt() %>%
  tab_header(
    title="High Plains Summary") %>%
  cols_label(
    STATE_NAME = "State",
    tot_pop = "State Polulation (2010)",
    TOTPOP10 = "Population (2010)",
    per_for = "Percent Forest",
    elev = html("Elevation<br>(Feet)"),
    temp = html("Temperature<br>&deg;C"),
    precip2 = html("Precipitation<br>(Inches)")
  )
High Plains Summary
State State Polulation (2010) Population (2010) Percent Forest Elevation
(Feet)
Temperature
°C
Precipitation
(Inches)
Colorado 5029196 15083.5 22.7831594 2229.0180 7.013425 4.703404
Kansas 2853118 7053.0 1.4923449 458.9333 12.683535 7.486343
Montana 989415 7198.0 5.9975489 1223.4162 5.966852 4.042198
Nebraska 1826341 6274.0 0.8298879 618.9475 9.787200 6.689845
North Dakota 672591 4153.0 0.3632199 551.1226 5.167969 4.659844
South Dakota 814180 5369.5 0.3260004 522.9511 7.795871 5.703697
Utah 2763885 20802.0 20.5519685 1901.3015 8.874297 3.912545
Wyoming 563626 15885.0 4.6158960 1826.9601 6.300659 3.807544

The first column of data represents the total state population, while the other columns are median county values. So, I would like to add a spanner over the last set of columns to denote this. This is accomplished using tab_spanner(), which requires providing the desired label text and the list of columns that the spanner will be placed above.

hp3 %>% gt() %>%
  tab_header(
    title="High Plains Summary") %>%
  cols_label(
    STATE_NAME = "State",
    tot_pop = "State Polulation (2010)",
    TOTPOP10 = "Population (2010)",
    per_for = "Percent Forest",
    elev = html("Elevation<br>(Feet)"),
    temp = html("Temperature<br>&deg;C"),
    precip2 = html("Precipitation<br>(Inches)")
  ) %>%
  tab_spanner(
    label = "County Median Values",
    columns = c("TOTPOP10", "per_for", "elev", "temp", "precip2")
  )
High Plains Summary
State State Polulation (2010) County Median Values
Population (2010) Percent Forest Elevation
(Feet)
Temperature
°C
Precipitation
(Inches)
Colorado 5029196 15083.5 22.7831594 2229.0180 7.013425 4.703404
Kansas 2853118 7053.0 1.4923449 458.9333 12.683535 7.486343
Montana 989415 7198.0 5.9975489 1223.4162 5.966852 4.042198
Nebraska 1826341 6274.0 0.8298879 618.9475 9.787200 6.689845
North Dakota 672591 4153.0 0.3632199 551.1226 5.167969 4.659844
South Dakota 814180 5369.5 0.3260004 522.9511 7.795871 5.703697
Utah 2763885 20802.0 20.5519685 1901.3015 8.874297 3.912545
Wyoming 563626 15885.0 4.6158960 1826.9601 6.300659 3.807544

Now, I would like to format all the numbers. fmt_percent() is used to add the percent sign and change the number of decimal places for the percent forest data. fmt_number() is used to change the number of decimal places and/or add comma separators. Check out the package documentation for more formatting functions, including options for dates, time, integers, currency, and missing values.

hp3 %>% gt() %>%
  tab_header(
    title="High Plains Summary") %>%
  cols_label(
    STATE_NAME = "State",
    tot_pop = "State Polulation (2010)",
    TOTPOP10 = "Population (2010)",
    per_for = "Percent Forest",
    elev = html("Elevation<br>(Feet)"),
    temp = html("Temperature<br>&deg;C"),
    precip2 = html("Precipitation<br>(Inches)")
  ) %>%
  tab_spanner(
    label = "County Median Values",
    columns = c("TOTPOP10", "per_for", "elev", "temp", "precip2")
  ) %>%
  fmt_percent(
    columns = per_for, decimals = 1, scale_values = FALSE)%>%
  fmt_number(
    columns = c("tot_pop", "TOTPOP10", "elev"),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  fmt_number(
    columns = c("temp", "precip2"),
    decimals = 1,
  ) 
High Plains Summary
State State Polulation (2010) County Median Values
Population (2010) Percent Forest Elevation
(Feet)
Temperature
°C
Precipitation
(Inches)
Colorado 5,029,196 15,084 22.8% 2,229 7.0 4.7
Kansas 2,853,118 7,053 1.5% 459 12.7 7.5
Montana 989,415 7,198 6.0% 1,223 6.0 4.0
Nebraska 1,826,341 6,274 0.8% 619 9.8 6.7
North Dakota 672,591 4,153 0.4% 551 5.2 4.7
South Dakota 814,180 5,370 0.3% 523 7.8 5.7
Utah 2,763,885 20,802 20.6% 1,901 8.9 3.9
Wyoming 563,626 15,885 4.6% 1,827 6.3 3.8

I then use tab_source_note() and html() to add notes associated with the data sources used to generate the table.

hp3 %>% gt() %>%
  tab_header(
    title="High Plains Summary") %>%
  cols_label(
    STATE_NAME = "State",
    tot_pop = "State Polulation (2010)",
    TOTPOP10 = "Population (2010)",
    per_for = "Percent Forest",
    elev = html("Elevation<br>(Feet)"),
    temp = html("Temperature<br>&deg;C"),
    precip2 = html("Precipitation<br>(Inches)")
  ) %>%
  tab_spanner(
    label = "County Median Values",
    columns = c("TOTPOP10", "per_for", "elev", "temp", "precip2")
  ) %>%
  fmt_percent(
    columns = per_for, decimals = 1, scale_values = FALSE)%>%
  fmt_number(
    columns = c("tot_pop", "TOTPOP10", "elev"),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  fmt_number(
    columns = c("temp", "precip2"),
    decimals = 1,
  ) %>%
  tab_source_note(
    source_note = html("Population Data obtaine from 2010 US Census.<br>Forest cover was derived from National Land Cover Database (NLCD)<br>Temperature and elevation data were derived from PRISM.")
  )
High Plains Summary
State State Polulation (2010) County Median Values
Population (2010) Percent Forest Elevation
(Feet)
Temperature
°C
Precipitation
(Inches)
Colorado 5,029,196 15,084 22.8% 2,229 7.0 4.7
Kansas 2,853,118 7,053 1.5% 459 12.7 7.5
Montana 989,415 7,198 6.0% 1,223 6.0 4.0
Nebraska 1,826,341 6,274 0.8% 619 9.8 6.7
North Dakota 672,591 4,153 0.4% 551 5.2 4.7
South Dakota 814,180 5,370 0.3% 523 7.8 5.7
Utah 2,763,885 20,802 20.6% 1,901 8.9 3.9
Wyoming 563,626 15,885 4.6% 1,827 6.3 3.8
Population Data obtaine from 2010 US Census.
Forest cover was derived from National Land Cover Database (NLCD)
Temperature and elevation data were derived from PRISM.

Finally, I perform some formatting. tab_style() can be used to edit styles for specific element by specifying locations. gt includes a variety of helper functions that are explained in the documentation for specifying certain elements to style or format (for example, cells_title(), cells_column_labels(), cells_body(), cells_source_notes(), and cells_column_spanners()). Different style options can be accessed using cell_text(), cell_fill(), and cell_borders().

  • cell_text(): color, font, size, align, v_align, style, weight, stretch, decorate, transform, whitespace, and indent
  • cell_fill(): color and alpha
  • cell_borders(): color, style, and weight

tab_options() provides a variety of options for manipulating the table or styles applied to specific components, subcomponents, or elements. Here, I am using available options to change all the line colors in the table. A full list of options with descriptions is provided here.

hp3 %>% gt() %>%
  tab_header(
    title="High Plains Summary") %>%
  cols_label(
    STATE_NAME = "State",
    tot_pop = "State Polulation (2010)",
    TOTPOP10 = "Population (2010)",
    per_for = "Percent Forest",
    elev = html("Elevation<br>(Feet)"),
    temp = html("Temperature<br>&deg;C"),
    precip2 = html("Precipitation<br>(Inches)")
  ) %>%
  tab_spanner(
    label = "County Median Values",
    columns = c("TOTPOP10", "per_for", "elev", "temp", "precip2")
  ) %>%
  fmt_percent(
    columns = per_for, decimals = 1, scale_values = FALSE)%>%
  fmt_number(
    columns = c("tot_pop", "TOTPOP10", "elev"),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  fmt_number(
    columns = c("temp", "precip2"),
    decimals = 1,
  ) %>%
  tab_source_note(
    source_note = html("Population Data obtaine from 2010 US Census.<br>
    Forest cover was derived from National Land Cover Database (NLCD)<br>
    Temperature and elevation data were derived from PRISM.")
  ) %>%
  tab_style(
    style = list(
      cell_text(align="center", weight="bold")
    ),
    locations=cells_column_labels()
  )%>%
  tab_style(
    style = list(
      cell_text(align="center", weight="bold", color="#812F33")
    ),
    locations = cells_title()
  )%>%
  tab_style(
    style = list(
      cell_text(align="center", style="italic", color="#812F33")
    ),
    locations = cells_column_spanners()
  )%>%
  tab_style(
    style = list(
      cell_text(align="center")
    ),
    locations = cells_body()
  )%>%
  tab_options(
    table_body.hlines.color="#812F33",
    table.border.top.color="#812F33",
    table.border.bottom.color = "#812F33",
    table_body.border.bottom.color =  "#812F33",
    heading.border.bottom.color = "#812F33",
    column_labels.border.top.color = "#812F33",
    column_labels.border.bottom.color = "#812F33",
  )
High Plains Summary
State State Polulation (2010) County Median Values
Population (2010) Percent Forest Elevation
(Feet)
Temperature
°C
Precipitation
(Inches)
Colorado 5,029,196 15,084 22.8% 2,229 7.0 4.7
Kansas 2,853,118 7,053 1.5% 459 12.7 7.5
Montana 989,415 7,198 6.0% 1,223 6.0 4.0
Nebraska 1,826,341 6,274 0.8% 619 9.8 6.7
North Dakota 672,591 4,153 0.4% 551 5.2 4.7
South Dakota 814,180 5,370 0.3% 523 7.8 5.7
Utah 2,763,885 20,802 20.6% 1,901 8.9 3.9
Wyoming 563,626 15,885 4.6% 1,827 6.3 3.8
Population Data obtaine from 2010 US Census.
Forest cover was derived from National Land Cover Database (NLCD)
Temperature and elevation data were derived from PRISM.

Export

I now have a decent looking table. gt provides several functions for exporting the table for inclusion in a webpage or a shiny app. The table can also be saved to a static image. Some export options may require installing additional packages. For the HTML output, a single HTML file will be generated that includes all of the CSS-based styling and HTML table content using appropriate tags.

mytable <- hp3 %>% gt() %>%
  tab_header(
    title="High Plains Summary") %>%
  cols_label(
    STATE_NAME = "State",
    tot_pop = "State Polulation (2010)",
    TOTPOP10 = "Population (2010)",
    per_for = "Percent Forest",
    elev = html("Elevation<br>(Feet)"),
    temp = html("Temperature<br>&deg;C"),
    precip2 = html("Precipitation<br>(Inches)")
  ) %>%
  tab_spanner(
    label = "County Median Values",
    columns = c("TOTPOP10", "per_for", "elev", "temp", "precip2")
  ) %>%
  fmt_percent(
    columns = per_for, decimals = 1, scale_values = FALSE)%>%
  fmt_number(
    columns = c("tot_pop", "TOTPOP10", "elev"),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  fmt_number(
    columns = c("temp", "precip2"),
    decimals = 1,
  ) %>%
  tab_source_note(
    source_note = html("Population Data obtaine from 2010 US Census.<br>
    Forest cover was derived from National Land Cover Database (NLCD)<br>
    Temperature and elevation data were derived from PRISM.")
  ) %>%
  tab_style(
    style = list(
      cell_text(align="center", weight="bold")
    ),
    locations=cells_column_labels()
  )%>%
  tab_style(
    style = list(
      cell_text(align="center", weight="bold", color="#812F33")
    ),
    locations = cells_title()
  )%>%
  tab_style(
    style = list(
      cell_text(align="center", style="italic", color="#812F33")
    ),
    locations = cells_column_spanners()
  )%>%
  tab_style(
    style = list(
      cell_text(align="center")
    ),
    locations = cells_body()
  )%>%
  tab_options(
    table_body.hlines.color="#812F33",
    table.border.top.color="#812F33",
    table.border.bottom.color = "#812F33",
    table_body.border.bottom.color =  "#812F33",
    heading.border.bottom.color = "#812F33",
    column_labels.border.top.color = "#812F33",
    column_labels.border.bottom.color = "#812F33",
  )
#Save a PNG raster image
gtsave(mytable, "D:/ossa/table.png")
NULL
#Save as PDF file
gtsave(mytable, "D:/ossa/table.pdf")
NULL
#Save as HTML
gtsave(mytable, "D:/ossa/table.html")

Concluding Remarks

If you need to produce table output to include in presentations, reports, or manuscripts, gt is a great resource. Since this package is relatively new at the time of this writing, it may change substantially in the future. ggplot2 and gt collectively allow for the flexible generation of figures and tables and are a great resource worth experimenting with andn learning.