Add ngram columns
Arguments
- df
A tidy data frame, potentially containing a column called "word"
- n
A range defining the extent of an ngram—for instance, from word 1 to word 3. Alternatively, a single number will signal the number of words to include in each ngram. Default value of
1:2
will produce bigrams.- feature
The feature to use when constructing ngrams
- keep
Whether to keep the original feature column
- collapse
Whether to join the ngram parts into a single column called "ngram"
- by
A grouping column identifying a document, such as
doc_id
.
See also
Other n-gram helpers:
combine_ngrams()
,
plot_bigrams()
,
separate_ngrams()
Examples
if (FALSE) { # \dontrun{
my_corpus <- load_texts()
my_bigrams <- my_corpus |>
add_ngrams(3)
} # }
dubliners <- get_gutenberg_corpus(2814) |>
load_texts() |>
identify_by(part) |>
standardize_titles()
dubliners |>
add_ngrams(2) |>
head()
#> # A tibble: 6 × 6
#> doc_id title author part word_1 word_2
#> <fct> <chr> <chr> <chr> <chr> <chr>
#> 1 The Sisters Dubliners Joyce, James THE SISTERS there was
#> 2 The Sisters Dubliners Joyce, James THE SISTERS was no
#> 3 The Sisters Dubliners Joyce, James THE SISTERS no hope
#> 4 The Sisters Dubliners Joyce, James THE SISTERS hope for
#> 5 The Sisters Dubliners Joyce, James THE SISTERS for him
#> 6 The Sisters Dubliners Joyce, James THE SISTERS him this