Skip to contents

plot_hapax() visualizes a sampling of hapax legomena projected on faceted curves of vocabulary growth over time

Usage

plot_hapax(
  df,
  prop = 0.01,
  x = progress_words,
  y = vocabulary,
  by = doc_id,
  descriptive_labels = TRUE,
  feature = hapax
)

Arguments

df

A tidy data frame, potentially containing columns called "doc_id" and "word"

prop

The proportion of hapax to sample. The chart can become illegible with proportions over ~1%

x

The progress column to show. Default option is progress_percent, but progress_words is also appropriate.

y

The Y-axis variable to chart. Default value is the cumulative vocabulary size.

by

A grouping column, such as doc_id

descriptive_labels

A toggle for disabling descriptive labels of progress_percent on the X-axis

feature

The column to check for new features. Defaults to hapax, but the function might also be used with new_word instead to plot a sample of new additions to documents' vocabularies.

Value

A ggplot object

Examples

if (FALSE) {
  dubliners <- get_gutenberg_corpus(2814) |>
    load_texts() |>
    identify_by(part) |>
    standardize_titles()

  dubliners_measured <- dubliners |>
    add_vocabulary()

  dubliners_measured |>
    plot_hapax()
}