Divide documents in equal lengths
Usage
add_partitions(
data,
size = 1000,
overlap = 0,
minimum = 0.25,
by = doc_id,
feature = word,
character = FALSE,
label = NULL
)Arguments
- data
A tidy data frame, potentially containing a column called "word"
- size
Size of each partition
- overlap
Size each partition should overlap. If a value between 0 and 1 is used,
overlapwill be calculated as a percentage ofsize.- minimum
Minimum partition size. If a value between 0 and 1 is used,
minimumwill be calculated as a percentage ofsize.- by
A column containing document grouping
- feature
The feature to partition by in each document
- character
Whether to return a
partitioncolumn as a character vector with zeroes added for padding. This feature may be helpful if usingidentify_by()to considerpartitionwhen defining documents in a corpus.- label
Whether to label variables added to data frame
Examples
if (FALSE) { # \dontrun{
dubliners <- get_gutenberg_corpus(2814) |>
load_texts() |>
identify_by(part) |>
standardize_titles()
dubliners |>
add_partitions() |>
head()
} # }
