22 Producing Tables (gt)

22.1 Topics Covered

Preparing data and generating a simple table
Adding titles and column labels
Aligning text
Formatting cell values
Adding spanners
Using tab_options()
Data colorization
Adding footnotes
Rendering tables

22.2 Introduction

22.2.1 About gt

As with figures, you may be interested in designing publication- or presentation-quality tables in R. There are a few packages available for generating tables in R including kableExtra and DT. Here, we will focus on gt. We have chosen to focus on gt since we feel that it is intuitive and well designed. It also uses syntax similar to the tidyverse.

Below, we are loading in the tidyverse, gt, and scales, which is used for data formatting.

library(tidyverse)
library(gt)
library(scales)

22.3 Data Preparation

Our goal is to generate a table that summarize some key information for each sub-region of the United States. We will work with the us_county_data.csv data used in the tidyverse chapter (Chapter 18). Since these data represent county-level measures, we need to aggregate the data to sub-regions and summarize the data of interest. This is accomplished using dplyr, and for each sub-region we calculate the total count of counties; area in square kilometers; total population; median of the county-level median income (US Dollars); median of the county-level mean elevation (m); median of the county-level mean annual total precipitation (mm); median of the mean annual county temperature (°C); and percent of forest, developed, and crop cover as estimated using the county-level percentages and areas. We also calculate population density using the total population and area estimates and move the new column to occur directly after the population data. Lastly, we use forcats to re-code the sub-region factor levels to non-abbreviated names. The data are now ready to start generating a table.

cntyPth <- "gslrData/chpt22/data/"

srD <- read_csv(str_glue("{cntyPth}us_county_data.csv")) |> 
  mutate_if(is.character, as.factor) |>
  group_by(SUB_REGION) |>
  summarize(nCnty = n(),
            area = sum(SQMI)*2.58999,
            pop = sum(POPULATION),
            mInc = median(med_income, na.rm=TRUE),
            elev = median(dem),
            precip = median(precip),
            temp = median(tempmn),
            pFor = (sum((per_for/100)*SQMI)/(sum(SQMI)))*100,
            pDev = (sum((per_dev/100)*SQMI)/(sum(SQMI)))*100,
            pCrop = (sum((per_crop/100)*SQMI)/(sum(SQMI)))*100
  ) |> 
  mutate(popDen = pop/area) |>
  relocate(popDen, .after=pop) |>
  ungroup()


srD$SUB_REGION <- fct_recode(srD$SUB_REGION, 
                             "East North Central"="E N Cen", 
                             "East South Central"="E S Cen",
                             "Mid-Atlantic"="Mid Atl", 
                             "Mountain"="Mtn",     
                             "New England"="N Eng",
                             "Pacific"="Pacific",
                             "South Atlantic"="S Atl",
                             "West North Central"="W N Cen", 
                             "West South Central"="W S Cen")

22.4 Make Table

22.4.1 Base Table

A gt table object is created using the gt() function. Simply passing the data to this function generates a table. We are also setting the font size using tab_options() in order to decrease the size of the entire table. We make further use of tab_options() later in the example. Although the table is functional, it can be greatly improved. In the remainder of this chapter, we work to improve this table as a means to learn more about gt.

srT <- srD |> gt() |>
  tab_options(table.font.size=11)
srT

SUB_REGION	nCnty	area	pop	popDen	mInc	elev	precip	temp	pFor	pDev	pCrop
East North Central	437	642130.1	47368533	73.76782	55310.0	243.0933	1000.8536	10.331616	27.23315	11.099251	39.515841
East South Central	364	470929.0	19402234	41.19992	44335.5	187.2671	1412.2993	14.791583	49.12894	8.323377	9.821077
Mid-Atlantic	150	262408.4	42492943	161.93438	59561.5	287.9525	1182.6845	9.242569	55.20378	13.734407	8.504453
Mountain	281	2236621.5	24919150	11.14142	53790.0	1671.5016	413.3704	7.624000	18.24613	1.676929	5.871804
New England	67	169530.3	15116205	89.16522	67365.0	147.0000	1253.2409	7.706078	65.94515	10.307294	1.097819
Pacific	133	835063.4	51480760	61.64892	59427.0	641.9725	724.8609	10.462643	30.12613	5.567056	9.139728
South Atlantic	584	702759.3	66026391	93.95306	49896.5	115.2225	1246.2440	15.977813	41.95959	12.385514	9.246431
West North Central	618	1341099.0	21616921	16.11881	55388.0	383.5105	777.3056	9.586250	10.47021	4.712193	41.610768
West South Central	470	1121618.2	40774139	36.35296	49180.0	202.6054	1043.6455	17.902725	17.55864	6.150014	14.567517

22.4.2 Title and Column Labels

First, we add a title using tab_header() and change the column names using col_label(). We are modifying the gt table object in-place. Note the use of HTML via the html() function to format the column names. This allows us to use HTML tags to format the text. We can also specify special symbols, such as the degree sign. With this addition, the table now has much more meaningful column names.

srT <- srT |>
  tab_header(
    title="Sub-Region Summary") |>
  cols_label(
    SUB_REGION = "Sub-Region",
    nCnty = "# Counties",
    pop = "Population",
    popDen = html("Population<br>Density<br>(per km<sup>2</sup>)"),
    mInc = html("Median<br>Income"),
    area = html("Area (km<sup>2</sup>)"),
    elev = html("Elevation<br>(meters)"),
    precip = html("Total Annual<br>Precip (mm)"),
    temp = html("Mean Annual<br>Temp. &deg;C"),
    pFor = "% Forest",
    pDev = "% Developed",
    pCrop = "% Cropland")
srT

Sub-Region	# Counties	Area (km²)	Population	Population Density (per km²)	Median Income	Elevation (meters)	Total Annual Precip (mm)	Mean Annual Temp. °C	% Forest	% Developed	% Cropland
Sub-Region Summary
East North Central	437	642130.1	47368533	73.76782	55310.0	243.0933	1000.8536	10.331616	27.23315	11.099251	39.515841
East South Central	364	470929.0	19402234	41.19992	44335.5	187.2671	1412.2993	14.791583	49.12894	8.323377	9.821077
Mid-Atlantic	150	262408.4	42492943	161.93438	59561.5	287.9525	1182.6845	9.242569	55.20378	13.734407	8.504453
Mountain	281	2236621.5	24919150	11.14142	53790.0	1671.5016	413.3704	7.624000	18.24613	1.676929	5.871804
New England	67	169530.3	15116205	89.16522	67365.0	147.0000	1253.2409	7.706078	65.94515	10.307294	1.097819
Pacific	133	835063.4	51480760	61.64892	59427.0	641.9725	724.8609	10.462643	30.12613	5.567056	9.139728
South Atlantic	584	702759.3	66026391	93.95306	49896.5	115.2225	1246.2440	15.977813	41.95959	12.385514	9.246431
West North Central	618	1341099.0	21616921	16.11881	55388.0	383.5105	777.3056	9.586250	10.47021	4.712193	41.610768
West South Central	470	1121618.2	40774139	36.35296	49180.0	202.6054	1043.6455	17.902725	17.55864	6.150014	14.567517

22.4.3 Text Alignment

The tab_style() function allows for styling different table elements. Arguments to the style parameter define style changes while arguments to the locations parameter define what part(s) of the table to manipulate. Below, we are centering the table title, column labels, and cell data. We are also converting the title and column name text to bold face.

srT <- srT |>
  tab_style(
    style = list(
      cell_text(align="center")
      ),
    locations = list(cells_title(), 
                     cells_column_labels(),
                     cells_body())
    ) |>
  tab_style(
    style = list(
      cell_text(weight="bold")
      ),
    locations = list(cells_title(), 
                     cells_column_labels())
    )

srT

Sub-Region	# Counties	Area (km²)	Population	Population Density (per km²)	Median Income	Elevation (meters)	Total Annual Precip (mm)	Mean Annual Temp. °C	% Forest	% Developed	% Cropland
Sub-Region Summary
East North Central	437	642130.1	47368533	73.76782	55310.0	243.0933	1000.8536	10.331616	27.23315	11.099251	39.515841
East South Central	364	470929.0	19402234	41.19992	44335.5	187.2671	1412.2993	14.791583	49.12894	8.323377	9.821077
Mid-Atlantic	150	262408.4	42492943	161.93438	59561.5	287.9525	1182.6845	9.242569	55.20378	13.734407	8.504453
Mountain	281	2236621.5	24919150	11.14142	53790.0	1671.5016	413.3704	7.624000	18.24613	1.676929	5.871804
New England	67	169530.3	15116205	89.16522	67365.0	147.0000	1253.2409	7.706078	65.94515	10.307294	1.097819
Pacific	133	835063.4	51480760	61.64892	59427.0	641.9725	724.8609	10.462643	30.12613	5.567056	9.139728
South Atlantic	584	702759.3	66026391	93.95306	49896.5	115.2225	1246.2440	15.977813	41.95959	12.385514	9.246431
West North Central	618	1341099.0	21616921	16.11881	55388.0	383.5105	777.3056	9.586250	10.47021	4.712193	41.610768
West South Central	470	1121618.2	40774139	36.35296	49180.0	202.6054	1043.6455	17.902725	17.55864	6.150014	14.567517

22.4.4 Number Formatting

We can now format the data. gt includes a variety of functions for formatting data. Please see its associated documentation for a full list of available functions. In our example, we are using fmt_percent() to format the percent forest, developed, and crop columns; fmt_number() to format the population, population density, area, elevation, temperature, and precipitation data; and fmt_currency() for the median income data. Note that all of these functions allow for specifying the number of decimal places. Other formatting functions available include fmt_integer(), fmt_scientific(), fmt_engineering(), fmt_fraction(), fmt_roman(), fmt_spelled_num(), fmt_date(), fmt_time(), fmt_datetime(), fmt_bins(), fmt_markdown(), fmt_url(), and fmt_icon(). It is possible to apply multiple formatting functions to the same column. You can also define custom formatting using fmt().

srT <- srT |>
  fmt_percent(
    columns = c("pFor", 
                "pDev", 
                "pCrop"), 
    decimals = 1, 
    scale_values = FALSE) |>
  fmt_number(
    columns = c("pop", 
                "area", 
                "elev", 
                "precip"),
    decimals = 0,
    use_seps = TRUE
  ) |>
  fmt_number(
    columns = c("popDen", 
                "temp"),
    decimals = 1,
    use_seps = TRUE
  ) |>
  fmt_currency(
    columns = c("mInc"),
    decimals=0
    )

srT

Sub-Region	# Counties	Area (km²)	Population	Population Density (per km²)	Median Income	Elevation (meters)	Total Annual Precip (mm)	Mean Annual Temp. °C	% Forest	% Developed	% Cropland
Sub-Region Summary
East North Central	437	642,130	47,368,533	73.8	$55,310	243	1,001	10.3	27.2%	11.1%	39.5%
East South Central	364	470,929	19,402,234	41.2	$44,336	187	1,412	14.8	49.1%	8.3%	9.8%
Mid-Atlantic	150	262,408	42,492,943	161.9	$59,562	288	1,183	9.2	55.2%	13.7%	8.5%
Mountain	281	2,236,621	24,919,150	11.1	$53,790	1,672	413	7.6	18.2%	1.7%	5.9%
New England	67	169,530	15,116,205	89.2	$67,365	147	1,253	7.7	65.9%	10.3%	1.1%
Pacific	133	835,063	51,480,760	61.6	$59,427	642	725	10.5	30.1%	5.6%	9.1%
South Atlantic	584	702,759	66,026,391	94.0	$49,897	115	1,246	16.0	42.0%	12.4%	9.2%
West North Central	618	1,341,099	21,616,921	16.1	$55,388	384	777	9.6	10.5%	4.7%	41.6%
West South Central	470	1,121,618	40,774,139	36.4	$49,180	203	1,044	17.9	17.6%	6.2%	14.6%

22.4.5 Spanners

Spanners are used to group columns. Below we are grouping the variables into three categories: “Environmental”, “General”, and “Population”. We also apply custom styles to the spanners using tab_style().

srT <- srT |>
  tab_spanner(
    label = "Environmental",
    columns = c("elev", 
                "precip", 
                "temp", 
                "pFor", 
                "pDev", 
                "pCrop")
  ) |>
  tab_spanner(
    label = "General",
    columns = c("nCnty", 
                "area")
  ) |>
  tab_spanner(
    label = "Population",
    columns = c("pop", 
                "popDen", 
                "mInc")
  ) |>
  tab_style(
    style = list(
      cell_text(align="center", 
                style="italic", 
                color="#812F33")
    ),
    locations = cells_column_spanners()
  )

srT

Sub-Region	General		Population			Environmental
Sub-Region Summary
Sub-Region	# Counties	Area (km²)	Population	Population Density (per km²)	Median Income	Elevation (meters)	Total Annual Precip (mm)	Mean Annual Temp. °C	% Forest	% Developed	% Cropland
East North Central	437	642,130	47,368,533	73.8	$55,310	243	1,001	10.3	27.2%	11.1%	39.5%
East South Central	364	470,929	19,402,234	41.2	$44,336	187	1,412	14.8	49.1%	8.3%	9.8%
Mid-Atlantic	150	262,408	42,492,943	161.9	$59,562	288	1,183	9.2	55.2%	13.7%	8.5%
Mountain	281	2,236,621	24,919,150	11.1	$53,790	1,672	413	7.6	18.2%	1.7%	5.9%
New England	67	169,530	15,116,205	89.2	$67,365	147	1,253	7.7	65.9%	10.3%	1.1%
Pacific	133	835,063	51,480,760	61.6	$59,427	642	725	10.5	30.1%	5.6%	9.1%
South Atlantic	584	702,759	66,026,391	94.0	$49,897	115	1,246	16.0	42.0%	12.4%	9.2%
West North Central	618	1,341,099	21,616,921	16.1	$55,388	384	777	9.6	10.5%	4.7%	41.6%
West South Central	470	1,121,618	40,774,139	36.4	$49,180	203	1,044	17.9	17.6%	6.2%	14.6%

22.4.6 Using `tab_options()`

The tab_options() provides a wide variety of formatting options for different components of the table. Below, we are mainly using this function to change the color of different table elements. Please see the function’s documentation for a full list of modifications that can be applied.

srT <- srT |>
  tab_options(
    table_body.hlines.color="#812F33",
    table.border.top.color="#812F33",
    table.border.bottom.color = "#812F33",
    table_body.border.bottom.color =  "#812F33",
    heading.border.bottom.color = "#812F33",
    column_labels.border.top.color = "#812F33",
    column_labels.border.bottom.color = "#812F33",
    table.font.size=11
  )

srT

Sub-Region	General		Population			Environmental
Sub-Region Summary
Sub-Region	# Counties	Area (km²)	Population	Population Density (per km²)	Median Income	Elevation (meters)	Total Annual Precip (mm)	Mean Annual Temp. °C	% Forest	% Developed	% Cropland
East North Central	437	642,130	47,368,533	73.8	$55,310	243	1,001	10.3	27.2%	11.1%	39.5%
East South Central	364	470,929	19,402,234	41.2	$44,336	187	1,412	14.8	49.1%	8.3%	9.8%
Mid-Atlantic	150	262,408	42,492,943	161.9	$59,562	288	1,183	9.2	55.2%	13.7%	8.5%
Mountain	281	2,236,621	24,919,150	11.1	$53,790	1,672	413	7.6	18.2%	1.7%	5.9%
New England	67	169,530	15,116,205	89.2	$67,365	147	1,253	7.7	65.9%	10.3%	1.1%
Pacific	133	835,063	51,480,760	61.6	$59,427	642	725	10.5	30.1%	5.6%	9.1%
South Atlantic	584	702,759	66,026,391	94.0	$49,897	115	1,246	16.0	42.0%	12.4%	9.2%
West North Central	618	1,341,099	21,616,921	16.1	$55,388	384	777	9.6	10.5%	4.7%	41.6%
West South Central	470	1,121,618	40,774,139	36.4	$49,180	203	1,044	17.9	17.6%	6.2%	14.6%

22.4.7 Colorizing Data

It is possible to colorize data based on the data values. This can be accomplished using data_color(), which requires defining (1) the column to be manipulated, (2) the color palette to use, and (3) the domain, which helps defines the color ramp or scheme applied based on the range of possible values. In our example, we have provided colorization for the cover type percentage columns. Note that the domain is defined based on the range of values stored in the associated column.

srT <- srT |>
    data_color(
    columns = c("pFor"),
    colors = col_numeric(
      palette = c("#edf8e9", 
                  "#005a32"),
      domain  = srT$pFor
    )
  ) |>
  data_color(
    columns = c("pDev"),
    colors = col_numeric(
      palette = c("#fee5d9", 
                  "#fc9272"),
      domain  = srT$pDev
    )
  ) |>
  data_color(
    columns = c("pCrop"),
    colors = col_numeric(
      palette = c("#FFE8D3", 
                  "#DC8638"),
      domain  = srT$pCrop
    )
  )

srT

Sub-Region	General		Population			Environmental
Sub-Region Summary
Sub-Region	# Counties	Area (km²)	Population	Population Density (per km²)	Median Income	Elevation (meters)	Total Annual Precip (mm)	Mean Annual Temp. °C	% Forest	% Developed	% Cropland
East North Central	437	642,130	47,368,533	73.8	$55,310	243	1,001	10.3	27.2%	11.1%	39.5%
East South Central	364	470,929	19,402,234	41.2	$44,336	187	1,412	14.8	49.1%	8.3%	9.8%
Mid-Atlantic	150	262,408	42,492,943	161.9	$59,562	288	1,183	9.2	55.2%	13.7%	8.5%
Mountain	281	2,236,621	24,919,150	11.1	$53,790	1,672	413	7.6	18.2%	1.7%	5.9%
New England	67	169,530	15,116,205	89.2	$67,365	147	1,253	7.7	65.9%	10.3%	1.1%
Pacific	133	835,063	51,480,760	61.6	$59,427	642	725	10.5	30.1%	5.6%	9.1%
South Atlantic	584	702,759	66,026,391	94.0	$49,897	115	1,246	16.0	42.0%	12.4%	9.2%
West North Central	618	1,341,099	21,616,921	16.1	$55,388	384	777	9.6	10.5%	4.7%	41.6%
West South Central	470	1,121,618	40,774,139	36.4	$49,180	203	1,044	17.9	17.6%	6.2%	14.6%

22.4.8 Footnotes

It is also possible to add footnotes tied to specific column(s) using tab_footnote(), a table caption using tab_caption(), and data source information using tab_source_note(). Below, we demonstrate the creation of footnotes.

srT <- srT |>
    tab_footnote(
      footnote = "Data from the National Land Cover Database (NLCD)",
      locations = cells_column_labels(columns=c("pFor", 
                                                "pDev", 
                                                "pCrop")),
      placement=c("left")
  ) |>
  tab_footnote(
      footnote = "Data from US Census",
      locations = cells_column_labels(columns=c("pop", 
                                                "mInc")),
      placement=c("left")
  ) |>
  tab_footnote(
      footnote = "Data from PRISM",
      locations = cells_column_labels(columns=c("elev", 
                                                "precip", 
                                                "temp")),
      placement=c("left")
  )

srT

Sub-Region	General		Population			Environmental
Sub-Region Summary
Sub-Region	# Counties	Area (km²)	Population¹	Population Density (per km²)	Median Income¹	Elevation (meters)²	Total Annual Precip (mm)²	Mean Annual Temp. °C²	% Forest³	% Developed³	% Cropland³
East North Central	437	642,130	47,368,533	73.8	$55,310	243	1,001	10.3	27.2%	11.1%	39.5%
East South Central	364	470,929	19,402,234	41.2	$44,336	187	1,412	14.8	49.1%	8.3%	9.8%
Mid-Atlantic	150	262,408	42,492,943	161.9	$59,562	288	1,183	9.2	55.2%	13.7%	8.5%
Mountain	281	2,236,621	24,919,150	11.1	$53,790	1,672	413	7.6	18.2%	1.7%	5.9%
New England	67	169,530	15,116,205	89.2	$67,365	147	1,253	7.7	65.9%	10.3%	1.1%
Pacific	133	835,063	51,480,760	61.6	$59,427	642	725	10.5	30.1%	5.6%	9.1%
South Atlantic	584	702,759	66,026,391	94.0	$49,897	115	1,246	16.0	42.0%	12.4%	9.2%
West North Central	618	1,341,099	21,616,921	16.1	$55,388	384	777	9.6	10.5%	4.7%	41.6%
West South Central	470	1,121,618	40,774,139	36.4	$49,180	203	1,044	17.9	17.6%	6.2%	14.6%
¹ Data from US Census
² Data from PRISM
³ Data from the National Land Cover Database (NLCD)

22.5 Render to table

Lastly, the gt table object can be rendered to an output table. We demonstrate saving the table as an image file (PNG), as a PDF, and as HTML code to render the table in a web browser. As also demonstrated throuhgout this chapter, tables can be rendered as part of the output when a Quarto document is rendered to a product, such as a webpage or PDF.

#Save a PNG raster image
saveFld <- "gslrData/chpt22/output/"
gtsave(srT, str_glue("{saveFld}table.png"))
NULL
#Save as PDF file
gtsave(srT, str_glue("{saveFld}table.pdf"))
NULL
#Save as HTML
gtsave(srT, str_glue("{saveFld}table.html"))

22.6 Concluding Remarks

This was a fairly brief chapter that introduced the functionality of the gt package for generating publication- or presentation-quality tables. Combined with ggplot2, this allows for generating quality graphs and tables in R. Such output can greatly enhance your data presentation. We recommend checking out the gt documentation for more examples and to further explore the package functionality.

22.7 Questions

In a gt table, explain the purpose of a spanner.
In a gt table, explain the purpose of a header.
In a gt table, explain the purpose of a row group.
Explain the difference between a footnote, source note, and caption as implemented within gt.
Explain the use of the following formatting functions: fmt_engineering(), fmt_scientific(), fmt_units(), and fmt_spelled_num().
Explain the use of the following formatting functions: fmt_url(), fmt_icon(), and fmt_email().
Explain the use of the following gt functions: cols_align_decimal(), cols_merge(), and cols_width().
What are the purpose of the md(), html(), and latex() helper functions?

22.8 Exercises

You have been provided with a urban tree dataset for Portland, Oregon in the exercise folder for the chapter. These data are avaialble here.

Task 1

Complete the following preprocessing tasks using the tidyverse.

Read in portlandTrees.csv
Select out the following columns: “Common_nam”, “Genus_spec”, “Family”, “Genus”, “DBH”, “TreeHeight”, “CrownWidth”, “CrownBaseH”, and “Condition”
Convert all character columns to factors
Rename the columns as follows:
- “Common_nam =”Common”
- “Genus_spec” = “Scientific”
- “Family =”Family”
- “Genus” = “Genus”
- “DBH” = “DBH”
- “TreeHeight” = “Height”
- “CrownWidth” = “CrownWidth”
- “CrownBaseH” = “CrownBaseHeight”
- “Condition” = “Condition”
Remove all trees from the dataset that have a condition of “Dead”
Create a new common name column and lump all species that have less than 200 samples in the dataset into an “Other” class
Drop any rows with missing data
Filter out only trees from the following genera: Quercus, Acer, Ulmus, Pinus, Pseudotsuga, and Fagus

Task 2

Aggregate the data as follows:

For each genus and common name combination calculate the following:
- Count of trees
- Mean DBH
- Mean Height
- Mean crown width
- Mean crown base height
Count the number of trees in each species with each condition (hint: this can be accomplished using count() and pivot_wider())
Merge the results from the last two steps into a single table

Task 3

Use gt to create a table that meets the following criteria using the results from the prior two tasks (note: the process above should yield a grouped tibble, so the genus will automatically be treated as row groups).

Contains the following columns with the names specified and in the order specified: Common Name, Count, Height, DBH (in), Crown Width (ft.), Crown Base Height (ft), Poor, Fair, Good; break up the titles to multiple lines to save space; the Poor, Fair, and Good columns should represent condition counts
Add the title “Portland Park Trees”
Add a spanner over the Height, DBH, Crown Width, and Crown Base Height columns with a label of “Size”
Add a spanner over the Poor, Fair, and Good columns with a label of “Condition”
Format the height, DBH, CrownWidth, and CrownBaseHeight columns as numbers with a comma thousand separator and one decimal value
Format the Count, Poor, Fair, and Good columns as numbers with a comma thousand separator and zero decimal values
Make the column labels bold and centered
Make the cell titles bold and centered
Make the column labels bold and centered
Center all cell content other than the common name
Make the common name italic
Make the row groups a different color and italicize the text