22 Producing Tables (gt)
22.1 Topics Covered
- Preparing data and generating a simple table
- Adding titles and column labels
- Aligning text
- Formatting cell values
- Adding spanners
- Using
tab_options()
- Data colorization
- Adding footnotes
- Rendering tables
22.2 Introduction
22.2.1 About gt
As with figures, you may be interested in designing publication- or presentation-quality tables in R. There are a few packages available for generating tables in R including kableExtra and DT. Here, we will focus on gt. We have chosen to focus on gt since we feel that it is intuitive and well designed. It also uses syntax similar to the tidyverse.
Below, we are loading in the tidyverse, gt, and scales, which is used for data formatting.
22.3 Data Preparation
Our goal is to generate a table that summarize some key information for each sub-region of the United States. We will work with the us_county_data.csv data used in the tidyverse chapter (Chapter 18). Since these data represent county-level measures, we need to aggregate the data to sub-regions and summarize the data of interest. This is accomplished using dplyr, and for each sub-region we calculate the total count of counties; area in square kilometers; total population; median of the county-level median income (US Dollars); median of the county-level mean elevation (m); median of the county-level mean annual total precipitation (mm); median of the mean annual county temperature (°C); and percent of forest, developed, and crop cover as estimated using the county-level percentages and areas. We also calculate population density using the total population and area estimates and move the new column to occur directly after the population data. Lastly, we use forcats to re-code the sub-region factor levels to non-abbreviated names. The data are now ready to start generating a table.
cntyPth <- "gslrData/chpt22/data/"
srD <- read_csv(str_glue("{cntyPth}us_county_data.csv")) |>
mutate_if(is.character, as.factor) |>
group_by(SUB_REGION) |>
summarize(nCnty = n(),
area = sum(SQMI)*2.58999,
pop = sum(POPULATION),
mInc = median(med_income, na.rm=TRUE),
elev = median(dem),
precip = median(precip),
temp = median(tempmn),
pFor = (sum((per_for/100)*SQMI)/(sum(SQMI)))*100,
pDev = (sum((per_dev/100)*SQMI)/(sum(SQMI)))*100,
pCrop = (sum((per_crop/100)*SQMI)/(sum(SQMI)))*100
) |>
mutate(popDen = pop/area) |>
relocate(popDen, .after=pop) |>
ungroup()
srD$SUB_REGION <- fct_recode(srD$SUB_REGION,
"East North Central"="E N Cen",
"East South Central"="E S Cen",
"Mid-Atlantic"="Mid Atl",
"Mountain"="Mtn",
"New England"="N Eng",
"Pacific"="Pacific",
"South Atlantic"="S Atl",
"West North Central"="W N Cen",
"West South Central"="W S Cen")
22.4 Make Table
22.4.1 Base Table
A gt table object is created using the gt()
function. Simply passing the data to this function generates a table. We are also setting the font size using tab_options()
in order to decrease the size of the entire table. We make further use of tab_options()
later in the example. Although the table is functional, it can be greatly improved. In the remainder of this chapter, we work to improve this table as a means to learn more about gt.
srT <- srD |> gt() |>
tab_options(table.font.size=11)
srT
SUB_REGION | nCnty | area | pop | popDen | mInc | elev | precip | temp | pFor | pDev | pCrop |
---|---|---|---|---|---|---|---|---|---|---|---|
East North Central | 437 | 642130.1 | 47368533 | 73.76782 | 55310.0 | 243.0933 | 1000.8536 | 10.331616 | 27.23315 | 11.099251 | 39.515841 |
East South Central | 364 | 470929.0 | 19402234 | 41.19992 | 44335.5 | 187.2671 | 1412.2993 | 14.791583 | 49.12894 | 8.323377 | 9.821077 |
Mid-Atlantic | 150 | 262408.4 | 42492943 | 161.93438 | 59561.5 | 287.9525 | 1182.6845 | 9.242569 | 55.20378 | 13.734407 | 8.504453 |
Mountain | 281 | 2236621.5 | 24919150 | 11.14142 | 53790.0 | 1671.5016 | 413.3704 | 7.624000 | 18.24613 | 1.676929 | 5.871804 |
New England | 67 | 169530.3 | 15116205 | 89.16522 | 67365.0 | 147.0000 | 1253.2409 | 7.706078 | 65.94515 | 10.307294 | 1.097819 |
Pacific | 133 | 835063.4 | 51480760 | 61.64892 | 59427.0 | 641.9725 | 724.8609 | 10.462643 | 30.12613 | 5.567056 | 9.139728 |
South Atlantic | 584 | 702759.3 | 66026391 | 93.95306 | 49896.5 | 115.2225 | 1246.2440 | 15.977813 | 41.95959 | 12.385514 | 9.246431 |
West North Central | 618 | 1341099.0 | 21616921 | 16.11881 | 55388.0 | 383.5105 | 777.3056 | 9.586250 | 10.47021 | 4.712193 | 41.610768 |
West South Central | 470 | 1121618.2 | 40774139 | 36.35296 | 49180.0 | 202.6054 | 1043.6455 | 17.902725 | 17.55864 | 6.150014 | 14.567517 |
22.4.2 Title and Column Labels
First, we add a title using tab_header()
and change the column names using col_label()
. We are modifying the gt table object in-place. Note the use of HTML via the html()
function to format the column names. This allows us to use HTML tags to format the text. We can also specify special symbols, such as the degree sign. With this addition, the table now has much more meaningful column names.
srT <- srT |>
tab_header(
title="Sub-Region Summary") |>
cols_label(
SUB_REGION = "Sub-Region",
nCnty = "# Counties",
pop = "Population",
popDen = html("Population<br>Density<br>(per km<sup>2</sup>)"),
mInc = html("Median<br>Income"),
area = html("Area (km<sup>2</sup>)"),
elev = html("Elevation<br>(meters)"),
precip = html("Total Annual<br>Precip (mm)"),
temp = html("Mean Annual<br>Temp. °C"),
pFor = "% Forest",
pDev = "% Developed",
pCrop = "% Cropland")
srT
Sub-Region Summary | |||||||||||
Sub-Region | # Counties | Area (km2) | Population | Population Density (per km2) |
Median Income |
Elevation (meters) |
Total Annual Precip (mm) |
Mean Annual Temp. °C |
% Forest | % Developed | % Cropland |
---|---|---|---|---|---|---|---|---|---|---|---|
East North Central | 437 | 642130.1 | 47368533 | 73.76782 | 55310.0 | 243.0933 | 1000.8536 | 10.331616 | 27.23315 | 11.099251 | 39.515841 |
East South Central | 364 | 470929.0 | 19402234 | 41.19992 | 44335.5 | 187.2671 | 1412.2993 | 14.791583 | 49.12894 | 8.323377 | 9.821077 |
Mid-Atlantic | 150 | 262408.4 | 42492943 | 161.93438 | 59561.5 | 287.9525 | 1182.6845 | 9.242569 | 55.20378 | 13.734407 | 8.504453 |
Mountain | 281 | 2236621.5 | 24919150 | 11.14142 | 53790.0 | 1671.5016 | 413.3704 | 7.624000 | 18.24613 | 1.676929 | 5.871804 |
New England | 67 | 169530.3 | 15116205 | 89.16522 | 67365.0 | 147.0000 | 1253.2409 | 7.706078 | 65.94515 | 10.307294 | 1.097819 |
Pacific | 133 | 835063.4 | 51480760 | 61.64892 | 59427.0 | 641.9725 | 724.8609 | 10.462643 | 30.12613 | 5.567056 | 9.139728 |
South Atlantic | 584 | 702759.3 | 66026391 | 93.95306 | 49896.5 | 115.2225 | 1246.2440 | 15.977813 | 41.95959 | 12.385514 | 9.246431 |
West North Central | 618 | 1341099.0 | 21616921 | 16.11881 | 55388.0 | 383.5105 | 777.3056 | 9.586250 | 10.47021 | 4.712193 | 41.610768 |
West South Central | 470 | 1121618.2 | 40774139 | 36.35296 | 49180.0 | 202.6054 | 1043.6455 | 17.902725 | 17.55864 | 6.150014 | 14.567517 |
22.4.3 Text Alignment
The tab_style()
function allows for styling different table elements. Arguments to the style
parameter define style changes while arguments to the locations
parameter define what part(s) of the table to manipulate. Below, we are centering the table title, column labels, and cell data. We are also converting the title and column name text to bold face.
srT <- srT |>
tab_style(
style = list(
cell_text(align="center")
),
locations = list(cells_title(),
cells_column_labels(),
cells_body())
) |>
tab_style(
style = list(
cell_text(weight="bold")
),
locations = list(cells_title(),
cells_column_labels())
)
srT
Sub-Region Summary | |||||||||||
Sub-Region | # Counties | Area (km2) | Population | Population Density (per km2) |
Median Income |
Elevation (meters) |
Total Annual Precip (mm) |
Mean Annual Temp. °C |
% Forest | % Developed | % Cropland |
---|---|---|---|---|---|---|---|---|---|---|---|
East North Central | 437 | 642130.1 | 47368533 | 73.76782 | 55310.0 | 243.0933 | 1000.8536 | 10.331616 | 27.23315 | 11.099251 | 39.515841 |
East South Central | 364 | 470929.0 | 19402234 | 41.19992 | 44335.5 | 187.2671 | 1412.2993 | 14.791583 | 49.12894 | 8.323377 | 9.821077 |
Mid-Atlantic | 150 | 262408.4 | 42492943 | 161.93438 | 59561.5 | 287.9525 | 1182.6845 | 9.242569 | 55.20378 | 13.734407 | 8.504453 |
Mountain | 281 | 2236621.5 | 24919150 | 11.14142 | 53790.0 | 1671.5016 | 413.3704 | 7.624000 | 18.24613 | 1.676929 | 5.871804 |
New England | 67 | 169530.3 | 15116205 | 89.16522 | 67365.0 | 147.0000 | 1253.2409 | 7.706078 | 65.94515 | 10.307294 | 1.097819 |
Pacific | 133 | 835063.4 | 51480760 | 61.64892 | 59427.0 | 641.9725 | 724.8609 | 10.462643 | 30.12613 | 5.567056 | 9.139728 |
South Atlantic | 584 | 702759.3 | 66026391 | 93.95306 | 49896.5 | 115.2225 | 1246.2440 | 15.977813 | 41.95959 | 12.385514 | 9.246431 |
West North Central | 618 | 1341099.0 | 21616921 | 16.11881 | 55388.0 | 383.5105 | 777.3056 | 9.586250 | 10.47021 | 4.712193 | 41.610768 |
West South Central | 470 | 1121618.2 | 40774139 | 36.35296 | 49180.0 | 202.6054 | 1043.6455 | 17.902725 | 17.55864 | 6.150014 | 14.567517 |
22.4.4 Number Formatting
We can now format the data. gt includes a variety of functions for formatting data. Please see its associated documentation for a full list of available functions. In our example, we are using fmt_percent()
to format the percent forest, developed, and crop columns; fmt_number()
to format the population, population density, area, elevation, temperature, and precipitation data; and fmt_currency()
for the median income data. Note that all of these functions allow for specifying the number of decimal places. Other formatting functions available include fmt_integer()
, fmt_scientific()
, fmt_engineering()
, fmt_fraction()
, fmt_roman()
, fmt_spelled_num()
, fmt_date()
, fmt_time()
, fmt_datetime()
, fmt_bins()
, fmt_markdown()
, fmt_url()
, and fmt_icon()
. It is possible to apply multiple formatting functions to the same column. You can also define custom formatting using fmt()
.
srT <- srT |>
fmt_percent(
columns = c("pFor",
"pDev",
"pCrop"),
decimals = 1,
scale_values = FALSE) |>
fmt_number(
columns = c("pop",
"area",
"elev",
"precip"),
decimals = 0,
use_seps = TRUE
) |>
fmt_number(
columns = c("popDen",
"temp"),
decimals = 1,
use_seps = TRUE
) |>
fmt_currency(
columns = c("mInc"),
decimals=0
)
srT
Sub-Region Summary | |||||||||||
Sub-Region | # Counties | Area (km2) | Population | Population Density (per km2) |
Median Income |
Elevation (meters) |
Total Annual Precip (mm) |
Mean Annual Temp. °C |
% Forest | % Developed | % Cropland |
---|---|---|---|---|---|---|---|---|---|---|---|
East North Central | 437 | 642,130 | 47,368,533 | 73.8 | $55,310 | 243 | 1,001 | 10.3 | 27.2% | 11.1% | 39.5% |
East South Central | 364 | 470,929 | 19,402,234 | 41.2 | $44,336 | 187 | 1,412 | 14.8 | 49.1% | 8.3% | 9.8% |
Mid-Atlantic | 150 | 262,408 | 42,492,943 | 161.9 | $59,562 | 288 | 1,183 | 9.2 | 55.2% | 13.7% | 8.5% |
Mountain | 281 | 2,236,621 | 24,919,150 | 11.1 | $53,790 | 1,672 | 413 | 7.6 | 18.2% | 1.7% | 5.9% |
New England | 67 | 169,530 | 15,116,205 | 89.2 | $67,365 | 147 | 1,253 | 7.7 | 65.9% | 10.3% | 1.1% |
Pacific | 133 | 835,063 | 51,480,760 | 61.6 | $59,427 | 642 | 725 | 10.5 | 30.1% | 5.6% | 9.1% |
South Atlantic | 584 | 702,759 | 66,026,391 | 94.0 | $49,897 | 115 | 1,246 | 16.0 | 42.0% | 12.4% | 9.2% |
West North Central | 618 | 1,341,099 | 21,616,921 | 16.1 | $55,388 | 384 | 777 | 9.6 | 10.5% | 4.7% | 41.6% |
West South Central | 470 | 1,121,618 | 40,774,139 | 36.4 | $49,180 | 203 | 1,044 | 17.9 | 17.6% | 6.2% | 14.6% |
22.4.5 Spanners
Spanners are used to group columns. Below we are grouping the variables into three categories: “Environmental”, “General”, and “Population”. We also apply custom styles to the spanners using tab_style()
.
srT <- srT |>
tab_spanner(
label = "Environmental",
columns = c("elev",
"precip",
"temp",
"pFor",
"pDev",
"pCrop")
) |>
tab_spanner(
label = "General",
columns = c("nCnty",
"area")
) |>
tab_spanner(
label = "Population",
columns = c("pop",
"popDen",
"mInc")
) |>
tab_style(
style = list(
cell_text(align="center",
style="italic",
color="#812F33")
),
locations = cells_column_spanners()
)
srT
Sub-Region Summary | |||||||||||
Sub-Region | General | Population | Environmental | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
# Counties | Area (km2) | Population | Population Density (per km2) |
Median Income |
Elevation (meters) |
Total Annual Precip (mm) |
Mean Annual Temp. °C |
% Forest | % Developed | % Cropland | |
East North Central | 437 | 642,130 | 47,368,533 | 73.8 | $55,310 | 243 | 1,001 | 10.3 | 27.2% | 11.1% | 39.5% |
East South Central | 364 | 470,929 | 19,402,234 | 41.2 | $44,336 | 187 | 1,412 | 14.8 | 49.1% | 8.3% | 9.8% |
Mid-Atlantic | 150 | 262,408 | 42,492,943 | 161.9 | $59,562 | 288 | 1,183 | 9.2 | 55.2% | 13.7% | 8.5% |
Mountain | 281 | 2,236,621 | 24,919,150 | 11.1 | $53,790 | 1,672 | 413 | 7.6 | 18.2% | 1.7% | 5.9% |
New England | 67 | 169,530 | 15,116,205 | 89.2 | $67,365 | 147 | 1,253 | 7.7 | 65.9% | 10.3% | 1.1% |
Pacific | 133 | 835,063 | 51,480,760 | 61.6 | $59,427 | 642 | 725 | 10.5 | 30.1% | 5.6% | 9.1% |
South Atlantic | 584 | 702,759 | 66,026,391 | 94.0 | $49,897 | 115 | 1,246 | 16.0 | 42.0% | 12.4% | 9.2% |
West North Central | 618 | 1,341,099 | 21,616,921 | 16.1 | $55,388 | 384 | 777 | 9.6 | 10.5% | 4.7% | 41.6% |
West South Central | 470 | 1,121,618 | 40,774,139 | 36.4 | $49,180 | 203 | 1,044 | 17.9 | 17.6% | 6.2% | 14.6% |
22.4.6 Using tab_options()
The tab_options()
provides a wide variety of formatting options for different components of the table. Below, we are mainly using this function to change the color of different table elements. Please see the function’s documentation for a full list of modifications that can be applied.
srT <- srT |>
tab_options(
table_body.hlines.color="#812F33",
table.border.top.color="#812F33",
table.border.bottom.color = "#812F33",
table_body.border.bottom.color = "#812F33",
heading.border.bottom.color = "#812F33",
column_labels.border.top.color = "#812F33",
column_labels.border.bottom.color = "#812F33",
table.font.size=11
)
srT
Sub-Region Summary | |||||||||||
Sub-Region | General | Population | Environmental | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
# Counties | Area (km2) | Population | Population Density (per km2) |
Median Income |
Elevation (meters) |
Total Annual Precip (mm) |
Mean Annual Temp. °C |
% Forest | % Developed | % Cropland | |
East North Central | 437 | 642,130 | 47,368,533 | 73.8 | $55,310 | 243 | 1,001 | 10.3 | 27.2% | 11.1% | 39.5% |
East South Central | 364 | 470,929 | 19,402,234 | 41.2 | $44,336 | 187 | 1,412 | 14.8 | 49.1% | 8.3% | 9.8% |
Mid-Atlantic | 150 | 262,408 | 42,492,943 | 161.9 | $59,562 | 288 | 1,183 | 9.2 | 55.2% | 13.7% | 8.5% |
Mountain | 281 | 2,236,621 | 24,919,150 | 11.1 | $53,790 | 1,672 | 413 | 7.6 | 18.2% | 1.7% | 5.9% |
New England | 67 | 169,530 | 15,116,205 | 89.2 | $67,365 | 147 | 1,253 | 7.7 | 65.9% | 10.3% | 1.1% |
Pacific | 133 | 835,063 | 51,480,760 | 61.6 | $59,427 | 642 | 725 | 10.5 | 30.1% | 5.6% | 9.1% |
South Atlantic | 584 | 702,759 | 66,026,391 | 94.0 | $49,897 | 115 | 1,246 | 16.0 | 42.0% | 12.4% | 9.2% |
West North Central | 618 | 1,341,099 | 21,616,921 | 16.1 | $55,388 | 384 | 777 | 9.6 | 10.5% | 4.7% | 41.6% |
West South Central | 470 | 1,121,618 | 40,774,139 | 36.4 | $49,180 | 203 | 1,044 | 17.9 | 17.6% | 6.2% | 14.6% |
22.4.7 Colorizing Data
It is possible to colorize data based on the data values. This can be accomplished using data_color()
, which requires defining (1) the column to be manipulated, (2) the color palette to use, and (3) the domain, which helps defines the color ramp or scheme applied based on the range of possible values. In our example, we have provided colorization for the cover type percentage columns. Note that the domain is defined based on the range of values stored in the associated column.
srT <- srT |>
data_color(
columns = c("pFor"),
colors = col_numeric(
palette = c("#edf8e9",
"#005a32"),
domain = srT$pFor
)
) |>
data_color(
columns = c("pDev"),
colors = col_numeric(
palette = c("#fee5d9",
"#fc9272"),
domain = srT$pDev
)
) |>
data_color(
columns = c("pCrop"),
colors = col_numeric(
palette = c("#FFE8D3",
"#DC8638"),
domain = srT$pCrop
)
)
srT
Sub-Region Summary | |||||||||||
Sub-Region | General | Population | Environmental | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
# Counties | Area (km2) | Population | Population Density (per km2) |
Median Income |
Elevation (meters) |
Total Annual Precip (mm) |
Mean Annual Temp. °C |
% Forest | % Developed | % Cropland | |
East North Central | 437 | 642,130 | 47,368,533 | 73.8 | $55,310 | 243 | 1,001 | 10.3 | 27.2% | 11.1% | 39.5% |
East South Central | 364 | 470,929 | 19,402,234 | 41.2 | $44,336 | 187 | 1,412 | 14.8 | 49.1% | 8.3% | 9.8% |
Mid-Atlantic | 150 | 262,408 | 42,492,943 | 161.9 | $59,562 | 288 | 1,183 | 9.2 | 55.2% | 13.7% | 8.5% |
Mountain | 281 | 2,236,621 | 24,919,150 | 11.1 | $53,790 | 1,672 | 413 | 7.6 | 18.2% | 1.7% | 5.9% |
New England | 67 | 169,530 | 15,116,205 | 89.2 | $67,365 | 147 | 1,253 | 7.7 | 65.9% | 10.3% | 1.1% |
Pacific | 133 | 835,063 | 51,480,760 | 61.6 | $59,427 | 642 | 725 | 10.5 | 30.1% | 5.6% | 9.1% |
South Atlantic | 584 | 702,759 | 66,026,391 | 94.0 | $49,897 | 115 | 1,246 | 16.0 | 42.0% | 12.4% | 9.2% |
West North Central | 618 | 1,341,099 | 21,616,921 | 16.1 | $55,388 | 384 | 777 | 9.6 | 10.5% | 4.7% | 41.6% |
West South Central | 470 | 1,121,618 | 40,774,139 | 36.4 | $49,180 | 203 | 1,044 | 17.9 | 17.6% | 6.2% | 14.6% |
22.4.8 Footnotes
It is also possible to add footnotes tied to specific column(s) using tab_footnote()
, a table caption using tab_caption()
, and data source information using tab_source_note()
. Below, we demonstrate the creation of footnotes.
srT <- srT |>
tab_footnote(
footnote = "Data from the National Land Cover Database (NLCD)",
locations = cells_column_labels(columns=c("pFor",
"pDev",
"pCrop")),
placement=c("left")
) |>
tab_footnote(
footnote = "Data from US Census",
locations = cells_column_labels(columns=c("pop",
"mInc")),
placement=c("left")
) |>
tab_footnote(
footnote = "Data from PRISM",
locations = cells_column_labels(columns=c("elev",
"precip",
"temp")),
placement=c("left")
)
srT
Sub-Region Summary | |||||||||||
Sub-Region | General | Population | Environmental | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
# Counties | Area (km2) | Population1 | Population Density (per km2) |
Median Income1 |
Elevation (meters)2 |
Total Annual Precip (mm)2 |
Mean Annual Temp. °C2 |
% Forest3 | % Developed3 | % Cropland3 | |
East North Central | 437 | 642,130 | 47,368,533 | 73.8 | $55,310 | 243 | 1,001 | 10.3 | 27.2% | 11.1% | 39.5% |
East South Central | 364 | 470,929 | 19,402,234 | 41.2 | $44,336 | 187 | 1,412 | 14.8 | 49.1% | 8.3% | 9.8% |
Mid-Atlantic | 150 | 262,408 | 42,492,943 | 161.9 | $59,562 | 288 | 1,183 | 9.2 | 55.2% | 13.7% | 8.5% |
Mountain | 281 | 2,236,621 | 24,919,150 | 11.1 | $53,790 | 1,672 | 413 | 7.6 | 18.2% | 1.7% | 5.9% |
New England | 67 | 169,530 | 15,116,205 | 89.2 | $67,365 | 147 | 1,253 | 7.7 | 65.9% | 10.3% | 1.1% |
Pacific | 133 | 835,063 | 51,480,760 | 61.6 | $59,427 | 642 | 725 | 10.5 | 30.1% | 5.6% | 9.1% |
South Atlantic | 584 | 702,759 | 66,026,391 | 94.0 | $49,897 | 115 | 1,246 | 16.0 | 42.0% | 12.4% | 9.2% |
West North Central | 618 | 1,341,099 | 21,616,921 | 16.1 | $55,388 | 384 | 777 | 9.6 | 10.5% | 4.7% | 41.6% |
West South Central | 470 | 1,121,618 | 40,774,139 | 36.4 | $49,180 | 203 | 1,044 | 17.9 | 17.6% | 6.2% | 14.6% |
1 Data from US Census | |||||||||||
2 Data from PRISM | |||||||||||
3 Data from the National Land Cover Database (NLCD) |
22.5 Render to table
Lastly, the gt table object can be rendered to an output table. We demonstrate saving the table as an image file (PNG), as a PDF, and as HTML code to render the table in a web browser. As also demonstrated throuhgout this chapter, tables can be rendered as part of the output when a Quarto document is rendered to a product, such as a webpage or PDF.
22.6 Concluding Remarks
This was a fairly brief chapter that introduced the functionality of the gt package for generating publication- or presentation-quality tables. Combined with ggplot2, this allows for generating quality graphs and tables in R. Such output can greatly enhance your data presentation. We recommend checking out the gt documentation for more examples and to further explore the package functionality.
22.7 Questions
- In a gt table, explain the purpose of a spanner.
- In a gt table, explain the purpose of a header.
- In a gt table, explain the purpose of a row group.
- Explain the difference between a footnote, source note, and caption as implemented within gt.
- Explain the use of the following formatting functions:
fmt_engineering()
,fmt_scientific()
,fmt_units()
, andfmt_spelled_num()
. - Explain the use of the following formatting functions:
fmt_url()
,fmt_icon()
, andfmt_email()
. - Explain the use of the following gt functions:
cols_align_decimal()
,cols_merge()
, andcols_width()
. - What are the purpose of the
md()
,html()
, andlatex()
helper functions?
22.8 Exercises
You have been provided with a urban tree dataset for Portland, Oregon in the exercise folder for the chapter. These data are avaialble here.
Task 1
Complete the following preprocessing tasks using the tidyverse.
- Read in portlandTrees.csv
- Select out the following columns: “Common_nam”, “Genus_spec”, “Family”, “Genus”, “DBH”, “TreeHeight”, “CrownWidth”, “CrownBaseH”, and “Condition”
- Convert all character columns to factors
- Rename the columns as follows:
- “Common_nam =”Common”
- “Genus_spec” = “Scientific”
- “Family =”Family”
- “Genus” = “Genus”
- “DBH” = “DBH”
- “TreeHeight” = “Height”
- “CrownWidth” = “CrownWidth”
- “CrownBaseH” = “CrownBaseHeight”
- “Condition” = “Condition”
- Remove all trees from the dataset that have a condition of “Dead”
- Create a new common name column and lump all species that have less than 200 samples in the dataset into an “Other” class
- Drop any rows with missing data
- Filter out only trees from the following genera: Quercus, Acer, Ulmus, Pinus, Pseudotsuga, and Fagus
Task 2
Aggregate the data as follows:
- For each genus and common name combination calculate the following:
- Count of trees
- Mean DBH
- Mean Height
- Mean crown width
- Mean crown base height
- Count the number of trees in each species with each condition (hint: this can be accomplished using
count()
andpivot_wider()
) - Merge the results from the last two steps into a single table
Task 3
Use gt to create a table that meets the following criteria using the results from the prior two tasks (note: the process above should yield a grouped tibble, so the genus will automatically be treated as row groups).
- Contains the following columns with the names specified and in the order specified: Common Name, Count, Height, DBH (in), Crown Width (ft.), Crown Base Height (ft), Poor, Fair, Good; break up the titles to multiple lines to save space; the Poor, Fair, and Good columns should represent condition counts
- Add the title “Portland Park Trees”
- Add a spanner over the Height, DBH, Crown Width, and Crown Base Height columns with a label of “Size”
- Add a spanner over the Poor, Fair, and Good columns with a label of “Condition”
- Format the height, DBH, CrownWidth, and CrownBaseHeight columns as numbers with a comma thousand separator and one decimal value
- Format the Count, Poor, Fair, and Good columns as numbers with a comma thousand separator and zero decimal values
- Make the column labels bold and centered
- Make the cell titles bold and centered
- Make the column labels bold and centered
- Center all cell content other than the common name
- Make the common name italic
- Make the row groups a different color and italicize the text