Customizing tables

The standard tmtyro workflow, with functions like add_vocabulary() and add_sentiment(), works easily with tabulize() to generate clean, useful tables that communicate results effectively. These tables are designed to help users focus on their work without needing to worry about formatting, presentation, or code. For those advancing beyond the tyro stage, learning to customize this output or even to create tables from scratch can be a valuable next step.

Starting from `tabulize()`

The tables tmtyro creates offer a good starting point for anyone interested in learning more about gt and related packages. Since tabulize() creates standard gt tables, they can be modified using standard functions from that package or from extension packages like gtExtras.

Alignment

By default, character columns in tables prepared by tabulize() (and gt in general) are center aligned. To adjust this alignment, use gt’s cols_align() function.

library(dplyr)
library(gt)
library(tmtyro)
corpus_dubliners <- get_gutenberg_corpus(2814) |> 
  load_texts() |> 
  identify_by(part) |> 
  standardize_titles() |> 
  select(doc_id, word)

# Choose just 5 stories
some_docs <- unique(corpus_dubliners$doc_id)[c(1:3, 12, 15)]

corpus_dubliners <- corpus_dubliners |> 
  filter(doc_id %in% some_docs)

# tabulize() typically centers text columns
corpus_dubliners |> 
  tabulize()

	words
The Sisters	3,113
An Encounter	3,257
Araby	2,345
Ivy Day in the Committee Room	5,249
The Dead	15,731


# cols_align() adjusts alignment
corpus_dubliners |> 
  tabulize() |> 
  cols_align(
    align = "left",
    columns = doc_id)

	words
The Sisters	3,113
An Encounter	3,257
Araby	2,345
Ivy Day in the Committee Room	5,249
The Dead	15,731

Themes

Outputs can be highly customized using themes built into packages like gtExtras.

library(gtExtras)
corpus_dubliners |> 
  tabulize() |> 
  gt_theme_excel()

	words
The Sisters	3,113
An Encounter	3,257
Araby	2,345
Ivy Day in the Committee Room	5,249
The Dead	15,731

Many theme options are available, adjusting coloring, font face, and text size. They’re easy to add with functions beginning gt_theme_...():

corpus_dubliners |> 
  tabulize() |> 
  gt_theme_538()

	words
The Sisters	3,113
An Encounter	3,257
Araby	2,345
Ivy Day in the Committee Room	5,249
The Dead	15,731

For more theme options, see the gtExtras documentation online.

Titles and Summary Rows

The examples shown here barely scratch the surface of options available with gt. Summary rows, added with grand_summary_rows() make it easy to share corpus statistics. Titles and subtitles, added with tab_header(), help clarify conclusions and a main takeaways:

corpus_dubliners |> 
  tabulize() |>
  cols_align("left", columns = doc_id) |> 
  grand_summary_rows(
    fns = list("avg" ~ mean(.) |> 
                 scales::label_comma(accuracy = 0.1)()), 
    columns = "n") |> 
  tab_header(
    title = md("Word counts in *Dubliners* stories"),
    subtitle = "“The Dead” is about three times the average length.") |> 
  opt_align_table_header("left")

		words
Word counts in Dubliners stories
“The Dead” is about three times the average length.
	The Sisters	3,113
	An Encounter	3,257
	Araby	2,345
	Ivy Day in the Committee Room	5,249
	The Dead	15,731
avg	—	5,939.0

Going further

Combining these methods with those explained in greater depth in gt’s documentation can allow for truly customized tables. The default table, for instance, is functional but not necessarily pretty. Customization makes it possible to aim for something clean like this:

corpus_dubliners |> 
  tabulize() |> 
  tab_style(
    style = cell_borders(
      sides = "all", 
      color = NULL),
    locations = cells_body()) |> 
  tab_style(
    style = cell_text(size = pct(70)),
    locations = cells_column_labels()
  ) |> 
  cols_align(
    align = "right",
    columns = doc_id) |> 
  opt_css(
    css = ".gt_col_headings {border-bottom-color: #FFFFFF !important;}"
  )

	words
The Sisters	3,113
An Encounter	3,257
Araby	2,345
Ivy Day in the Committee Room	5,249
The Dead	15,731

Starting with gt

tmtyro’s tabulize() only works with a standard workflow using functions like add_vocabulary() and add_frequency(). Preparing similar tables manually is possible with familiarity with packages like gt or tinytable. A few methods for creating and modifying gt tables are shown below, but more are found in package documentation.

Corpus details

By default, a corpus prepared by tmtyro will tabulize() into a table showing word counts for each document. A simple version of this can be prepared by hand with very little effort:

gt_details <- corpus_dubliners |> 
  count(doc_id) |> 
  gt()

gt_details

doc_id	n
The Sisters	3113
An Encounter	3257
Araby	2345
Ivy Day in the Committee Room	5249
The Dead	15731

Once the table is prepared, gt allows for further tweaking—for instance, to format word counts for readability, hide the doc_id column header, and rename n as words:

gt_details |> 
  fmt_integer(n) |> 
  cols_label(
    doc_id = "",
    n = "words")

	words
The Sisters	3,113
An Encounter	3,257
Araby	2,345
Ivy Day in the Committee Room	5,249
The Dead	15,731

Word frequencies

The standard workflow for preparing a polished table of high-frequency word counts with tmtyro—add_frequency() |> tabulize()—will easily show a few of the most used words in each document. To use get_frequency() when adding columns for word counts, a chain of functions will prepare a summary table—distinct() |> slice_max(). Once it’s ready, gt() will do the rest.

dubliners_count <- corpus_dubliners |>
  group_by(doc_id) |> 
  mutate(
    n = get_frequency(word)) |> 
  ungroup() |> 
  distinct() |> 
  slice_max(
    order_by = n, 
    by = doc_id, 
    n = 3) # show three words each

gt_counts <- dubliners_count |> 
  # limit to three stories for a shorter display
  filter(doc_id %in% c("The Sisters", "An Encounter", "The Dead")) |> 
  gt()

gt_counts

doc_id	word	n
The Sisters	the	171
The Sisters	and	118
The Sisters	to	94
An Encounter	the	181
An Encounter	and	107
An Encounter	he	101
The Dead	the	866
The Dead	and	570
The Dead	of	396

The cols_label() function from gt can adjust headers, and tmtyro’s collapse_rows() function hides repeated values in a column:

gt_counts |> 
  cols_label(doc_id = "") |> 
  collapse_rows(doc_id)

	word	n
The Sisters	the	171
The Sisters	and	118
The Sisters	to	94
An Encounter	the	181
An Encounter	and	107
An Encounter	he	101
The Dead	the	866
The Dead	and	570
The Dead	of	396

Choosing to adjust things manually introduces a steeper learning curve, but it also allows for greater customization:

dubliners_count |> 
  filter(doc_id %in% c("The Sisters", "An Encounter", "The Dead")) |> 
  gt(groupname_col = "doc_id") |> 
  cols_label(
    word = "") |> 
  data_color(columns = n, palette = "PuBuGn") |> 
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_row_groups())

	n
The Sisters
the	171
and	118
to	94
An Encounter
the	181
and	107
he	101
The Dead
the	866
and	570
of	396

Dictionary matches, including for sentiment, follow the same pattern.

Vocabulary richness

A similar manual workflow can be used to prepare tables of vocabulary richness. Without customization, gt() prepares a table that isn’t as clear as it could be:

dubliners_vocab <- corpus_dubliners |> 
  filter(doc_id %in% c("The Sisters", "An Encounter", "The Dead")) |> 
  group_by(doc_id) |> 
  summarize(
    words = n(),
    vocab_count = sum(is_new(word)),
    ttr = last(get_ttr(word)),
    hapax_count = sum(is_hapax(word)),
    htr = last(get_hir(word))) |> 
  ungroup()

gt_vocab <- dubliners_vocab |> 
  gt()

gt_vocab

doc_id	words	vocab_count	ttr	hapax_count	htr
The Sisters	3113	903	0.2900739	552	0.17732091
An Encounter	3257	980	0.3008904	620	0.19035923
The Dead	15731	2746	0.1745598	1557	0.09897654

Here, tab spanners can be added to approximate the version created by a typical tmtyro workflow:

gt_vocab |> 
  tab_spanner(
    label = "vocabulary",
    columns = c("vocab_count", "ttr")) |> 
  tab_spanner(
    label = "hapax",
    columns = c("hapax_count", "htr")) |> 
  cols_label(
    vocab_count = "total",
    ttr = "ratio",
    hapax_count = "total",
    htr = "ratio") |> 
  fmt_number(c(ttr, htr), decimals = 3)

doc_id	words	vocabulary		hapax
doc_id	words	total	ratio	total	ratio
The Sisters	3113	903	0.290	552	0.177
An Encounter	3257	980	0.301	620	0.190
The Dead	15731	2746	0.175	1557	0.099

Starting with tinytable

Of course, many other options exist in R for preparing tables to communicate findings. One of these, tt() from the tinytable package, is worth consideration. A few methods for preparing tinytable tables are shown here, but more are found in package documentation.

Corpus details

The standard function for using tinytable is tt():

library(tinytable)
details_tt <- corpus_dubliners |> 
  count(doc_id) |> 
  tt()

details_tt

doc_id	n
The Sisters	3113
An Encounter	3257
Araby	2345
Ivy Day in the Committee Room	5249
The Dead	15731

Adjusting this output is straightforward using a few functions that use a standard syntax. Each references rows with the argument “i” and columns with the argument “j”. Data format is adjusted using format_tt(), and output style is modified with style_tt(). For instance, to change the number format in the “n” column shown here, use format_tt() like this:

details_tt <- details_tt |> 
  format_tt(
    j = 2,
    digits = 0,
    num_mark_big = ",")

details_tt

doc_id	n
The Sisters	3,113
An Encounter	3,257
Araby	2,345
Ivy Day in the Committee Room	5,249
The Dead	15,731

Column names are adjusted using the standard colnames() or setNames() functions from R:

colnames(details_tt) <- c("", "words")

details_tt

	words
The Sisters	3,113
An Encounter	3,257
Araby	2,345
Ivy Day in the Committee Room	5,249
The Dead	15,731

Properties like column alignment can be adjusted with style_tt():

details_tt |> 
  style_tt(
    j = 2,
    align = "r"
  )

	words
The Sisters	3,113
An Encounter	3,257
Araby	2,345
Ivy Day in the Committee Room	5,249
The Dead	15,731

Word frequencies

While tmtyro offers collapse_rows() to limit repeated values in gt tables, these need to be suppressed manually using tinytable’s rowspan argument in style_tt():

dubliners_count |> 
  group_by(doc_id) |> 
  slice_head(n = 3) |> 
  ungroup() |> 
  tt() |> 
  style_tt(
    i = c(1, 4, 7, 10, 13), 
    j = 1, 
    rowspan = 3, 
    alignv = "t")

doc_id	word	n
The Sisters	the	171
The Sisters	and	118
The Sisters	to	94
An Encounter	the	181
An Encounter	and	107
An Encounter	he	101
Araby	the	190
Araby	i	96
Araby	and	70
Ivy Day in the Committee Room	the	320
Ivy Day in the Committee Room	a	144
Ivy Day in the Committee Room	mr	141
The Dead	the	866
The Dead	and	570
The Dead	of	396

Unfortunately, this process of manually indicating rows is fiddly and prone to error. Any miscount will make the table misrepresent the data. As an alternative, consider adjusting the underlying table before using tt() to cut out repeating values, using mutate(), case_when(), and lag():

dubliners_count |> 
  group_by(doc_id) |> 
  slice_head(n = 3) |> 
  ungroup() |> 
  mutate(
    doc_id = case_when(
      doc_id == lag(doc_id) ~ "",
      TRUE ~ doc_id
    )) |> 
  tt()

doc_id	word	n
The Sisters	the	171
	and	118
	to	94
An Encounter	the	181
	and	107
	he	101
Araby	the	190
	i	96
	and	70
Ivy Day in the Committee Room	the	320
	a	144
	mr	141
The Dead	the	866
	and	570
	of	396

Alternatively, use automatic grouping, indicating rows with group_tt():

count_table <- dubliners_count |> 
  group_by(doc_id) |> 
  slice_head(n = 3) |> 
  ungroup()

# Drop the doc_id column with select(), then reference it in group_tt()
my_tt <- count_table |> 
  select(-doc_id) |> 
  tt() |> 
  group_tt(i = as.character(count_table$doc_id))

my_tt

word	n
the	171
and	118
to	94
the	181
and	107
he	101
the	190
i	96
and	70
the	320
a	144
mr	141
the	866
and	570
of	396

To format these group rows, we can use the attribute my_tt@group_index_i to get the row numbers:

my_tt |> 
  style_tt(
    i = my_tt@group_index_i, 
    bold = TRUE,
    background = "lightgreen")

word	n
the	171
and	118
to	94
the	181
and	107
he	101
the	190
i	96
and	70
the	320
a	144
mr	141
the	866
and	570
of	396

Vocabulary richness

Tables for reporting vocabulary richness often need a lot of customizing. By default, tt() prepares a table that leaves a lot to be desired:

tt_vocab <- dubliners_vocab |> 
  tt()

tt_vocab

doc_id	words	vocab_count	ttr	hapax_count	htr
The Sisters	3113	903	0.2900739	552	0.17732091
An Encounter	3257	980	0.3008904	620	0.19035923
The Dead	15731	2746	0.1745598	1557	0.09897654

Among other things, we might want to adjust number formatting with format_tt(), set alignment with style_tt(), rename columns using colnames() or setNames(), and add labels over column groupings with group_tt():

tt_vocab |> 
  group_tt(
    j = list(
      "vocabulary" = 3:4,
      "hapax" = 5:6)) |> 
  format_tt(
    j = c(2:3, 5),
    digits = 0,
    num_mark_big = ",") |> 
  style_tt(
    j = c(2:3, 5),
    align = "r") |> 
  format_tt(
    j = c(4, 6),
    digits = 3,
    num_fmt = "decimal",
    num_zero = TRUE) |> 
  setNames(c("", "words", "total", "ratio", "total", "ratio"))

		vocabulary		hapax
	words	total	ratio	total	ratio
The Sisters	3,113	903	0.290	552	0.177
An Encounter	3,257	980	0.301	620	0.190
The Dead	15,731	2,746	0.175	1,557	0.099

In the end, none of this is overwhelming, and results can be clearly prepared for communication.

word	n
the	171
and	118
to	94
the	181
and	107
he	101
the	190
i	96
and	70
the	320
a	144
mr	141
the	866
and	570
of	396

word	n
the	171
and	118
to	94
the	181
and	107
he	101
the	190
i	96
and	70
the	320
a	144
mr	141
the	866
and	570
of	396

word	n
the	171
and	118
to	94
the	181
and	107
he	101
the	190
i	96
and	70
the	320
a	144
mr	141
the	866
and	570
of	396

word	n
the	171
and	118
to	94
the	181
and	107
he	101
the	190
i	96
and	70
the	320
a	144
mr	141
the	866
and	570
of	396

Leveraging gt and tinytable

Starting from tabulize()

Alignment

Themes

Titles and Summary Rows

Going further

Starting with gt

Corpus details

Word frequencies

Vocabulary richness

Starting with tinytable

Corpus details

Word frequencies

Vocabulary richness

Starting from `tabulize()`

word	n
the	171
and	118
to	94
the	181
and	107
he	101
the	190
i	96
and	70
the	320
a	144
mr	141
the	866
and	570
of	396

word	n
the	171
and	118
to	94
the	181
and	107
he	101
the	190
i	96
and	70
the	320
a	144
mr	141
the	866
and	570
of	396