make_topic_model()
moves a table of texts through the necessary steps of preparation before building a topic model. The function applies seven steps:
identifies text divisions by the
doc_id
columndivides each of the texts into same-sized chunks of
sample_size
words (default is 1000 words)unnests text table into a table with one word per row
removes stop words and proper nouns (identified as any word that only appears with a capitalized first letter)
counts word frequencies for each chunk
converts the table of frequencies into a document term matrix
builds a topic model with
k
topics
Arguments
- df
A data frame with nested text in a "text" column.
- by
The column for identifying each document. By default, the "title" column will be used.
- sample_size
The sample size for each document chunk. By default, samples will include 1000 words.
- k
The number of topics to search for. By default, 15 topics will be sought.
- cache
Whether to cache the resulting model as an RDS file in the "data/" folder. Set to
TRUE
by default. Delete this RDS file to create a new model.
Examples
if (FALSE) { # \dontrun{
mysteries <- load_texts("mystery-novels", word = FALSE)
mysteries_lda <- mysteries |>
make_topic_model(k = 10)
} # }