Skip to contents

plot_vocabulary() visualizes the vocabulary growth as new words are used in each document.

Usage

plot_vocabulary(
  df,
  x = progress_words,
  by = doc_id,
  identity = NULL,
  descriptive_labels = TRUE,
  labeling = c("point", "inset", "inline", "axis")
)

Arguments

df

A tidy data frame, potentially containing columns called "doc_id" and "word"

x

A column showing the cumulative count of words

by

A grouping column for colors and labels

identity

A grouping column for lines

descriptive_labels

A toggle for disabling descriptive labels of progress_percent on the X-axis

labeling

Options for labeling groups:

  • "point" labels the final value

  • "inline" prints the label within a smoothed curve

  • "axis" prints labels where a secondary Y-axis might go

  • "inset" prints a legend within the plot area

  • Anything else prints a legend to the right of the plot area.

Value

A ggplot object

Examples

dubliners <- get_gutenberg_corpus(2814) |>
  load_texts() |>
  identify_by(part) |>
  standardize_titles()

dubliners_measured <- dubliners |>
  add_vocabulary()

dubliners_measured |>
  plot_vocabulary(progress_percent)


dubliners_measured |>
  plot_vocabulary()


if (FALSE) { # \dontrun{
  get_micusp_corpus(
    discipline %in% c("Physics", "Economics")) |>
    load_texts() |>
    add_vocabulary() |>
    plot_vocabulary(by = discipline)
} # }