plot_vocabulary()
visualizes the vocabulary growth as new words are used in each document.
Usage
plot_vocabulary(
df,
x = progress_words,
by = doc_id,
identity = NULL,
descriptive_labels = TRUE,
labeling = c("point", "inset", "inline", "axis")
)
Arguments
- df
A tidy data frame, potentially containing columns called "doc_id" and "word"
- x
A column showing the cumulative count of words
- by
A grouping column for colors and labels
- identity
A grouping column for lines
- descriptive_labels
A toggle for disabling descriptive labels of progress_percent on the X-axis
- labeling
Options for labeling groups:
"point"
labels the final value"inline"
prints the label within a smoothed curve"axis"
prints labels where a secondary Y-axis might go"inset"
prints a legend within the plot areaAnything else prints a legend to the right of the plot area.
See also
Other visualizing helpers:
change_colors()
,
plot_bigrams()
,
plot_doc_word_bars()
,
plot_doc_word_heatmap()
,
plot_hapax()
,
plot_htr()
,
plot_tf_idf()
,
plot_topic_bars()
,
plot_topic_distributions()
,
plot_topic_wordcloud()
,
plot_ttr()
,
visualize()
Examples
dubliners <- get_gutenberg_corpus(2814) |>
load_texts() |>
identify_by(part) |>
standardize_titles()
dubliners_measured <- dubliners |>
add_vocabulary()
dubliners_measured |>
plot_vocabulary(progress_percent)
dubliners_measured |>
plot_vocabulary()
if (FALSE) { # \dontrun{
get_micusp_corpus(
discipline %in% c("Physics", "Economics")) |>
load_texts() |>
add_vocabulary() |>
plot_vocabulary(by = discipline)
} # }