22  Producing Tables (gt)

22.1 Topics Covered

  1. Preparing data and generating a simple table
  2. Adding titles and column labels
  3. Aligning text
  4. Formatting cell values
  5. Adding spanners
  6. Using tab_options()
  7. Data colorization
  8. Adding footnotes
  9. Rendering tables

22.2 Introduction

22.2.1 About gt

As with figures, you may be interested in designing publication- or presentation-quality tables in R. There are a few packages available for generating tables in R including kableExtra and DT. Here, we will focus on gt. We have chosen to focus on gt since we feel that it is intuitive and well designed. It also uses syntax similar to the tidyverse.

Below, we are loading in the tidyverse, gt, and scales, which is used for data formatting.

22.3 Data Preparation

Our goal is to generate a table that summarize some key information for each sub-region of the United States. We will work with the us_county_data.csv data used in the tidyverse chapter (Chapter 18). Since these data represent county-level measures, we need to aggregate the data to sub-regions and summarize the data of interest. This is accomplished using dplyr, and for each sub-region we calculate the total count of counties; area in square kilometers; total population; median of the county-level median income (US Dollars); median of the county-level mean elevation (m); median of the county-level mean annual total precipitation (mm); median of the mean annual county temperature (°C); and percent of forest, developed, and crop cover as estimated using the county-level percentages and areas. We also calculate population density using the total population and area estimates and move the new column to occur directly after the population data. Lastly, we use forcats to re-code the sub-region factor levels to non-abbreviated names. The data are now ready to start generating a table.

cntyPth <- "gslrData/chpt22/data/"

srD <- read_csv(str_glue("{cntyPth}us_county_data.csv")) |> 
  mutate_if(is.character, as.factor) |>
  group_by(SUB_REGION) |>
  summarize(nCnty = n(),
            area = sum(SQMI)*2.58999,
            pop = sum(POPULATION),
            mInc = median(med_income, na.rm=TRUE),
            elev = median(dem),
            precip = median(precip),
            temp = median(tempmn),
            pFor = (sum((per_for/100)*SQMI)/(sum(SQMI)))*100,
            pDev = (sum((per_dev/100)*SQMI)/(sum(SQMI)))*100,
            pCrop = (sum((per_crop/100)*SQMI)/(sum(SQMI)))*100
  ) |> 
  mutate(popDen = pop/area) |>
  relocate(popDen, .after=pop) |>
  ungroup()


srD$SUB_REGION <- fct_recode(srD$SUB_REGION, 
                             "East North Central"="E N Cen", 
                             "East South Central"="E S Cen",
                             "Mid-Atlantic"="Mid Atl", 
                             "Mountain"="Mtn",     
                             "New England"="N Eng",
                             "Pacific"="Pacific",
                             "South Atlantic"="S Atl",
                             "West North Central"="W N Cen", 
                             "West South Central"="W S Cen")

22.4 Make Table

22.4.1 Base Table

A gt table object is created using the gt() function. Simply passing the data to this function generates a table. We are also setting the font size using tab_options() in order to decrease the size of the entire table. We make further use of tab_options() later in the example. Although the table is functional, it can be greatly improved. In the remainder of this chapter, we work to improve this table as a means to learn more about gt.

srT <- srD |> gt() |>
  tab_options(table.font.size=11)
srT
SUB_REGION nCnty area pop popDen mInc elev precip temp pFor pDev pCrop
East North Central 437 642130.1 47368533 73.76782 55310.0 243.0933 1000.8536 10.331616 27.23315 11.099251 39.515841
East South Central 364 470929.0 19402234 41.19992 44335.5 187.2671 1412.2993 14.791583 49.12894 8.323377 9.821077
Mid-Atlantic 150 262408.4 42492943 161.93438 59561.5 287.9525 1182.6845 9.242569 55.20378 13.734407 8.504453
Mountain 281 2236621.5 24919150 11.14142 53790.0 1671.5016 413.3704 7.624000 18.24613 1.676929 5.871804
New England 67 169530.3 15116205 89.16522 67365.0 147.0000 1253.2409 7.706078 65.94515 10.307294 1.097819
Pacific 133 835063.4 51480760 61.64892 59427.0 641.9725 724.8609 10.462643 30.12613 5.567056 9.139728
South Atlantic 584 702759.3 66026391 93.95306 49896.5 115.2225 1246.2440 15.977813 41.95959 12.385514 9.246431
West North Central 618 1341099.0 21616921 16.11881 55388.0 383.5105 777.3056 9.586250 10.47021 4.712193 41.610768
West South Central 470 1121618.2 40774139 36.35296 49180.0 202.6054 1043.6455 17.902725 17.55864 6.150014 14.567517

22.4.2 Title and Column Labels

First, we add a title using tab_header() and change the column names using col_label(). We are modifying the gt table object in-place. Note the use of HTML via the html() function to format the column names. This allows us to use HTML tags to format the text. We can also specify special symbols, such as the degree sign. With this addition, the table now has much more meaningful column names.

srT <- srT |>
  tab_header(
    title="Sub-Region Summary") |>
  cols_label(
    SUB_REGION = "Sub-Region",
    nCnty = "# Counties",
    pop = "Population",
    popDen = html("Population<br>Density<br>(per km<sup>2</sup>)"),
    mInc = html("Median<br>Income"),
    area = html("Area (km<sup>2</sup>)"),
    elev = html("Elevation<br>(meters)"),
    precip = html("Total Annual<br>Precip (mm)"),
    temp = html("Mean Annual<br>Temp. &deg;C"),
    pFor = "% Forest",
    pDev = "% Developed",
    pCrop = "% Cropland")
srT
Sub-Region Summary
Sub-Region # Counties Area (km2) Population Population
Density
(per km2)
Median
Income
Elevation
(meters)
Total Annual
Precip (mm)
Mean Annual
Temp. °C
% Forest % Developed % Cropland
East North Central 437 642130.1 47368533 73.76782 55310.0 243.0933 1000.8536 10.331616 27.23315 11.099251 39.515841
East South Central 364 470929.0 19402234 41.19992 44335.5 187.2671 1412.2993 14.791583 49.12894 8.323377 9.821077
Mid-Atlantic 150 262408.4 42492943 161.93438 59561.5 287.9525 1182.6845 9.242569 55.20378 13.734407 8.504453
Mountain 281 2236621.5 24919150 11.14142 53790.0 1671.5016 413.3704 7.624000 18.24613 1.676929 5.871804
New England 67 169530.3 15116205 89.16522 67365.0 147.0000 1253.2409 7.706078 65.94515 10.307294 1.097819
Pacific 133 835063.4 51480760 61.64892 59427.0 641.9725 724.8609 10.462643 30.12613 5.567056 9.139728
South Atlantic 584 702759.3 66026391 93.95306 49896.5 115.2225 1246.2440 15.977813 41.95959 12.385514 9.246431
West North Central 618 1341099.0 21616921 16.11881 55388.0 383.5105 777.3056 9.586250 10.47021 4.712193 41.610768
West South Central 470 1121618.2 40774139 36.35296 49180.0 202.6054 1043.6455 17.902725 17.55864 6.150014 14.567517

22.4.3 Text Alignment

The tab_style() function allows for styling different table elements. Arguments to the style parameter define style changes while arguments to the locations parameter define what part(s) of the table to manipulate. Below, we are centering the table title, column labels, and cell data. We are also converting the title and column name text to bold face.

srT <- srT |>
  tab_style(
    style = list(
      cell_text(align="center")
      ),
    locations = list(cells_title(), 
                     cells_column_labels(),
                     cells_body())
    ) |>
  tab_style(
    style = list(
      cell_text(weight="bold")
      ),
    locations = list(cells_title(), 
                     cells_column_labels())
    )

srT
Sub-Region Summary
Sub-Region # Counties Area (km2) Population Population
Density
(per km2)
Median
Income
Elevation
(meters)
Total Annual
Precip (mm)
Mean Annual
Temp. °C
% Forest % Developed % Cropland
East North Central 437 642130.1 47368533 73.76782 55310.0 243.0933 1000.8536 10.331616 27.23315 11.099251 39.515841
East South Central 364 470929.0 19402234 41.19992 44335.5 187.2671 1412.2993 14.791583 49.12894 8.323377 9.821077
Mid-Atlantic 150 262408.4 42492943 161.93438 59561.5 287.9525 1182.6845 9.242569 55.20378 13.734407 8.504453
Mountain 281 2236621.5 24919150 11.14142 53790.0 1671.5016 413.3704 7.624000 18.24613 1.676929 5.871804
New England 67 169530.3 15116205 89.16522 67365.0 147.0000 1253.2409 7.706078 65.94515 10.307294 1.097819
Pacific 133 835063.4 51480760 61.64892 59427.0 641.9725 724.8609 10.462643 30.12613 5.567056 9.139728
South Atlantic 584 702759.3 66026391 93.95306 49896.5 115.2225 1246.2440 15.977813 41.95959 12.385514 9.246431
West North Central 618 1341099.0 21616921 16.11881 55388.0 383.5105 777.3056 9.586250 10.47021 4.712193 41.610768
West South Central 470 1121618.2 40774139 36.35296 49180.0 202.6054 1043.6455 17.902725 17.55864 6.150014 14.567517

22.4.4 Number Formatting

We can now format the data. gt includes a variety of functions for formatting data. Please see its associated documentation for a full list of available functions. In our example, we are using fmt_percent() to format the percent forest, developed, and crop columns; fmt_number() to format the population, population density, area, elevation, temperature, and precipitation data; and fmt_currency() for the median income data. Note that all of these functions allow for specifying the number of decimal places. Other formatting functions available include fmt_integer(), fmt_scientific(), fmt_engineering(), fmt_fraction(), fmt_roman(), fmt_spelled_num(), fmt_date(), fmt_time(), fmt_datetime(), fmt_bins(), fmt_markdown(), fmt_url(), and fmt_icon(). It is possible to apply multiple formatting functions to the same column. You can also define custom formatting using fmt().

srT <- srT |>
  fmt_percent(
    columns = c("pFor", 
                "pDev", 
                "pCrop"), 
    decimals = 1, 
    scale_values = FALSE) |>
  fmt_number(
    columns = c("pop", 
                "area", 
                "elev", 
                "precip"),
    decimals = 0,
    use_seps = TRUE
  ) |>
  fmt_number(
    columns = c("popDen", 
                "temp"),
    decimals = 1,
    use_seps = TRUE
  ) |>
  fmt_currency(
    columns = c("mInc"),
    decimals=0
    )

srT
Sub-Region Summary
Sub-Region # Counties Area (km2) Population Population
Density
(per km2)
Median
Income
Elevation
(meters)
Total Annual
Precip (mm)
Mean Annual
Temp. °C
% Forest % Developed % Cropland
East North Central 437 642,130 47,368,533 73.8 $55,310 243 1,001 10.3 27.2% 11.1% 39.5%
East South Central 364 470,929 19,402,234 41.2 $44,336 187 1,412 14.8 49.1% 8.3% 9.8%
Mid-Atlantic 150 262,408 42,492,943 161.9 $59,562 288 1,183 9.2 55.2% 13.7% 8.5%
Mountain 281 2,236,621 24,919,150 11.1 $53,790 1,672 413 7.6 18.2% 1.7% 5.9%
New England 67 169,530 15,116,205 89.2 $67,365 147 1,253 7.7 65.9% 10.3% 1.1%
Pacific 133 835,063 51,480,760 61.6 $59,427 642 725 10.5 30.1% 5.6% 9.1%
South Atlantic 584 702,759 66,026,391 94.0 $49,897 115 1,246 16.0 42.0% 12.4% 9.2%
West North Central 618 1,341,099 21,616,921 16.1 $55,388 384 777 9.6 10.5% 4.7% 41.6%
West South Central 470 1,121,618 40,774,139 36.4 $49,180 203 1,044 17.9 17.6% 6.2% 14.6%

22.4.5 Spanners

Spanners are used to group columns. Below we are grouping the variables into three categories: “Environmental”, “General”, and “Population”. We also apply custom styles to the spanners using tab_style().

srT <- srT |>
  tab_spanner(
    label = "Environmental",
    columns = c("elev", 
                "precip", 
                "temp", 
                "pFor", 
                "pDev", 
                "pCrop")
  ) |>
  tab_spanner(
    label = "General",
    columns = c("nCnty", 
                "area")
  ) |>
  tab_spanner(
    label = "Population",
    columns = c("pop", 
                "popDen", 
                "mInc")
  ) |>
  tab_style(
    style = list(
      cell_text(align="center", 
                style="italic", 
                color="#812F33")
    ),
    locations = cells_column_spanners()
  )

srT
Sub-Region Summary
Sub-Region General Population Environmental
# Counties Area (km2) Population Population
Density
(per km2)
Median
Income
Elevation
(meters)
Total Annual
Precip (mm)
Mean Annual
Temp. °C
% Forest % Developed % Cropland
East North Central 437 642,130 47,368,533 73.8 $55,310 243 1,001 10.3 27.2% 11.1% 39.5%
East South Central 364 470,929 19,402,234 41.2 $44,336 187 1,412 14.8 49.1% 8.3% 9.8%
Mid-Atlantic 150 262,408 42,492,943 161.9 $59,562 288 1,183 9.2 55.2% 13.7% 8.5%
Mountain 281 2,236,621 24,919,150 11.1 $53,790 1,672 413 7.6 18.2% 1.7% 5.9%
New England 67 169,530 15,116,205 89.2 $67,365 147 1,253 7.7 65.9% 10.3% 1.1%
Pacific 133 835,063 51,480,760 61.6 $59,427 642 725 10.5 30.1% 5.6% 9.1%
South Atlantic 584 702,759 66,026,391 94.0 $49,897 115 1,246 16.0 42.0% 12.4% 9.2%
West North Central 618 1,341,099 21,616,921 16.1 $55,388 384 777 9.6 10.5% 4.7% 41.6%
West South Central 470 1,121,618 40,774,139 36.4 $49,180 203 1,044 17.9 17.6% 6.2% 14.6%

22.4.6 Using tab_options()

The tab_options() provides a wide variety of formatting options for different components of the table. Below, we are mainly using this function to change the color of different table elements. Please see the function’s documentation for a full list of modifications that can be applied.

srT <- srT |>
  tab_options(
    table_body.hlines.color="#812F33",
    table.border.top.color="#812F33",
    table.border.bottom.color = "#812F33",
    table_body.border.bottom.color =  "#812F33",
    heading.border.bottom.color = "#812F33",
    column_labels.border.top.color = "#812F33",
    column_labels.border.bottom.color = "#812F33",
    table.font.size=11
  )

srT
Sub-Region Summary
Sub-Region General Population Environmental
# Counties Area (km2) Population Population
Density
(per km2)
Median
Income
Elevation
(meters)
Total Annual
Precip (mm)
Mean Annual
Temp. °C
% Forest % Developed % Cropland
East North Central 437 642,130 47,368,533 73.8 $55,310 243 1,001 10.3 27.2% 11.1% 39.5%
East South Central 364 470,929 19,402,234 41.2 $44,336 187 1,412 14.8 49.1% 8.3% 9.8%
Mid-Atlantic 150 262,408 42,492,943 161.9 $59,562 288 1,183 9.2 55.2% 13.7% 8.5%
Mountain 281 2,236,621 24,919,150 11.1 $53,790 1,672 413 7.6 18.2% 1.7% 5.9%
New England 67 169,530 15,116,205 89.2 $67,365 147 1,253 7.7 65.9% 10.3% 1.1%
Pacific 133 835,063 51,480,760 61.6 $59,427 642 725 10.5 30.1% 5.6% 9.1%
South Atlantic 584 702,759 66,026,391 94.0 $49,897 115 1,246 16.0 42.0% 12.4% 9.2%
West North Central 618 1,341,099 21,616,921 16.1 $55,388 384 777 9.6 10.5% 4.7% 41.6%
West South Central 470 1,121,618 40,774,139 36.4 $49,180 203 1,044 17.9 17.6% 6.2% 14.6%

22.4.7 Colorizing Data

It is possible to colorize data based on the data values. This can be accomplished using data_color(), which requires defining (1) the column to be manipulated, (2) the color palette to use, and (3) the domain, which helps defines the color ramp or scheme applied based on the range of possible values. In our example, we have provided colorization for the cover type percentage columns. Note that the domain is defined based on the range of values stored in the associated column.

srT <- srT |>
    data_color(
    columns = c("pFor"),
    colors = col_numeric(
      palette = c("#edf8e9", 
                  "#005a32"),
      domain  = srT$pFor
    )
  ) |>
  data_color(
    columns = c("pDev"),
    colors = col_numeric(
      palette = c("#fee5d9", 
                  "#fc9272"),
      domain  = srT$pDev
    )
  ) |>
  data_color(
    columns = c("pCrop"),
    colors = col_numeric(
      palette = c("#FFE8D3", 
                  "#DC8638"),
      domain  = srT$pCrop
    )
  )

srT
Sub-Region Summary
Sub-Region General Population Environmental
# Counties Area (km2) Population Population
Density
(per km2)
Median
Income
Elevation
(meters)
Total Annual
Precip (mm)
Mean Annual
Temp. °C
% Forest % Developed % Cropland
East North Central 437 642,130 47,368,533 73.8 $55,310 243 1,001 10.3 27.2% 11.1% 39.5%
East South Central 364 470,929 19,402,234 41.2 $44,336 187 1,412 14.8 49.1% 8.3% 9.8%
Mid-Atlantic 150 262,408 42,492,943 161.9 $59,562 288 1,183 9.2 55.2% 13.7% 8.5%
Mountain 281 2,236,621 24,919,150 11.1 $53,790 1,672 413 7.6 18.2% 1.7% 5.9%
New England 67 169,530 15,116,205 89.2 $67,365 147 1,253 7.7 65.9% 10.3% 1.1%
Pacific 133 835,063 51,480,760 61.6 $59,427 642 725 10.5 30.1% 5.6% 9.1%
South Atlantic 584 702,759 66,026,391 94.0 $49,897 115 1,246 16.0 42.0% 12.4% 9.2%
West North Central 618 1,341,099 21,616,921 16.1 $55,388 384 777 9.6 10.5% 4.7% 41.6%
West South Central 470 1,121,618 40,774,139 36.4 $49,180 203 1,044 17.9 17.6% 6.2% 14.6%

22.4.8 Footnotes

It is also possible to add footnotes tied to specific column(s) using tab_footnote(), a table caption using tab_caption(), and data source information using tab_source_note(). Below, we demonstrate the creation of footnotes.

srT <- srT |>
    tab_footnote(
      footnote = "Data from the National Land Cover Database (NLCD)",
      locations = cells_column_labels(columns=c("pFor", 
                                                "pDev", 
                                                "pCrop")),
      placement=c("left")
  ) |>
  tab_footnote(
      footnote = "Data from US Census",
      locations = cells_column_labels(columns=c("pop", 
                                                "mInc")),
      placement=c("left")
  ) |>
  tab_footnote(
      footnote = "Data from PRISM",
      locations = cells_column_labels(columns=c("elev", 
                                                "precip", 
                                                "temp")),
      placement=c("left")
  )

srT
Sub-Region Summary
Sub-Region General Population Environmental
# Counties Area (km2) Population1 Population
Density
(per km2)
Median
Income1
Elevation
(meters)2
Total Annual
Precip (mm)2
Mean Annual
Temp. °C2
% Forest3 % Developed3 % Cropland3
East North Central 437 642,130 47,368,533 73.8 $55,310 243 1,001 10.3 27.2% 11.1% 39.5%
East South Central 364 470,929 19,402,234 41.2 $44,336 187 1,412 14.8 49.1% 8.3% 9.8%
Mid-Atlantic 150 262,408 42,492,943 161.9 $59,562 288 1,183 9.2 55.2% 13.7% 8.5%
Mountain 281 2,236,621 24,919,150 11.1 $53,790 1,672 413 7.6 18.2% 1.7% 5.9%
New England 67 169,530 15,116,205 89.2 $67,365 147 1,253 7.7 65.9% 10.3% 1.1%
Pacific 133 835,063 51,480,760 61.6 $59,427 642 725 10.5 30.1% 5.6% 9.1%
South Atlantic 584 702,759 66,026,391 94.0 $49,897 115 1,246 16.0 42.0% 12.4% 9.2%
West North Central 618 1,341,099 21,616,921 16.1 $55,388 384 777 9.6 10.5% 4.7% 41.6%
West South Central 470 1,121,618 40,774,139 36.4 $49,180 203 1,044 17.9 17.6% 6.2% 14.6%
1 Data from US Census
2 Data from PRISM
3 Data from the National Land Cover Database (NLCD)

22.5 Render to table

Lastly, the gt table object can be rendered to an output table. We demonstrate saving the table as an image file (PNG), as a PDF, and as HTML code to render the table in a web browser. As also demonstrated throuhgout this chapter, tables can be rendered as part of the output when a Quarto document is rendered to a product, such as a webpage or PDF.

#Save a PNG raster image
saveFld <- "gslrData/chpt22/output/"
gtsave(srT, str_glue("{saveFld}table.png"))
NULL
#Save as PDF file
gtsave(srT, str_glue("{saveFld}table.pdf"))
NULL
#Save as HTML
gtsave(srT, str_glue("{saveFld}table.html"))

22.6 Concluding Remarks

This was a fairly brief chapter that introduced the functionality of the gt package for generating publication- or presentation-quality tables. Combined with ggplot2, this allows for generating quality graphs and tables in R. Such output can greatly enhance your data presentation. We recommend checking out the gt documentation for more examples and to further explore the package functionality.

22.7 Questions

  1. In a gt table, explain the purpose of a spanner.
  2. In a gt table, explain the purpose of a header.
  3. In a gt table, explain the purpose of a row group.
  4. Explain the difference between a footnote, source note, and caption as implemented within gt.
  5. Explain the use of the following formatting functions: fmt_engineering(), fmt_scientific(), fmt_units(), and fmt_spelled_num().
  6. Explain the use of the following formatting functions: fmt_url(), fmt_icon(), and fmt_email().
  7. Explain the use of the following gt functions: cols_align_decimal(), cols_merge(), and cols_width().
  8. What are the purpose of the md(), html(), and latex() helper functions?

22.8 Exercises

You have been provided with a urban tree dataset for Portland, Oregon in the exercise folder for the chapter. These data are avaialble here.

Task 1

Complete the following preprocessing tasks using the tidyverse.

  1. Read in portlandTrees.csv
  2. Select out the following columns: “Common_nam”, “Genus_spec”, “Family”, “Genus”, “DBH”, “TreeHeight”, “CrownWidth”, “CrownBaseH”, and “Condition
  3. Convert all character columns to factors
  4. Rename the columns as follows:
    • “Common_nam =”Common”
    • “Genus_spec” = “Scientific”
    • “Family =”Family”
    • “Genus” = “Genus”
    • “DBH” = “DBH”
    • “TreeHeight” = “Height”
    • “CrownWidth” = “CrownWidth”
    • “CrownBaseH” = “CrownBaseHeight”
    • “Condition” = “Condition”
  5. Remove all trees from the dataset that have a condition of “Dead”
  6. Create a new common name column and lump all species that have less than 200 samples in the dataset into an “Other” class
  7. Drop any rows with missing data
  8. Filter out only trees from the following genera: Quercus, Acer, Ulmus, Pinus, Pseudotsuga, and Fagus

Task 2

Aggregate the data as follows:

  1. For each genus and common name combination calculate the following:
    • Count of trees
    • Mean DBH
    • Mean Height
    • Mean crown width
    • Mean crown base height
  2. Count the number of trees in each species with each condition (hint: this can be accomplished using count() and pivot_wider())
  3. Merge the results from the last two steps into a single table

Task 3

Use gt to create a table that meets the following criteria using the results from the prior two tasks (note: the process above should yield a grouped tibble, so the genus will automatically be treated as row groups).

  1. Contains the following columns with the names specified and in the order specified: Common Name, Count, Height, DBH (in), Crown Width (ft.), Crown Base Height (ft), Poor, Fair, Good; break up the titles to multiple lines to save space; the Poor, Fair, and Good columns should represent condition counts
  2. Add the title “Portland Park Trees”
  3. Add a spanner over the Height, DBH, Crown Width, and Crown Base Height columns with a label of “Size”
  4. Add a spanner over the Poor, Fair, and Good columns with a label of “Condition”
  5. Format the height, DBH, CrownWidth, and CrownBaseHeight columns as numbers with a comma thousand separator and one decimal value
  6. Format the Count, Poor, Fair, and Good columns as numbers with a comma thousand separator and zero decimal values
  7. Make the column labels bold and centered
  8. Make the cell titles bold and centered
  9. Make the column labels bold and centered
  10. Center all cell content other than the common name
  11. Make the common name italic
  12. Make the row groups a different color and italicize the text