Skip to contents

Split a column of text using tidytext::unnest_tokens(), flattening the table into one token per row while also omitting any token that is present only in a capitalized form.

Usage

unnest_without_caps(df, output = "word", input = "text", to_lower = TRUE)

Arguments

df

A data frame

output

Output column to be created.

input

Input column that gets split by word.

to_lower

Whether to convert final words to lowercase.

Value

A data frame

Examples

if (FALSE) { # \dontrun{
mysteries <-
  load_texts("mystery-novels",
             to_lower = FALSE) |>
  unnest_without_caps()

# Since `unnest_without_caps()` is
# incorporated into `load_texts()`,
# it may be unnecessary for many
# scenarios.
mysteries <-
  load_texts("mystery-novels",
             remove_names = TRUE)
  } # }