
Package index
-
get_corpus() - Prepare a corpus or corpora of texts
-
get_gutenberg_corpus() - Build and load a corpus from Project Gutenberg
-
get_micusp_corpus() - Get a MICUSP corpus
-
download_once() - Download a file once
-
micusp_metadata() - Get MICUSP metadata
-
parse_html() - Read HTML headers and text from file
-
load_texts() - Load a folder or data frame of texts
-
move_header_to_text() - Move a header column to text
-
identify_by() - Choose a new doc_id column
-
standardize_titles() - Standardize document titles
-
unnest_without_caps() - Split text into words and drop proper nouns
Measure Text Features
Functions for measuring features of texts and being choosy about how you do it.
-
add_dictionary() - Add values from a dictionary
-
add_frequency() - Add frequency of words or other features
-
add_index() - Index document row numbers
-
add_ngrams() - Add ngram columns
-
add_partitions() - Divide documents in equal lengths
-
add_sentiment() - Add sentiment markers
-
add_tf_idf() - Compare usage across a corpus
-
add_vocabulary() - Measure lexical variety
-
drop_na() - Drop rows containing missing values
-
drop_stopwords() - Remove stopwords
-
summarize_tf_idf() - Compare usage across a corpus
-
expand_documents() - Convert data frame from long tidy format to wider format
-
combine_ngrams() - Combine ngram columns
-
separate_ngrams() - Separate one word per column
-
make_dictionary() - Create a lexicon
-
load_topic_model() - Load (or cache and load) a topic model
-
make_topic_model() - Construct a topic model
Explore Results
Generic functions make it easy to share results with an audience (or keep them to yourself)
-
contextualize() - Show a term in context
-
tabulize() - Prepare a table of data
-
visualize() - Visualize output
-
collapse_rows() - Collapse gt rows in the style of kableExtra
-
change_colors() - Choose other colors
-
get_cumulative_vocabulary() - Cumulative total of vocabulary size
-
get_frequency()get_tf() - Get frequencies of values in a vector
-
get_hir() - Cumulative hapax introduction ratio
-
get_htr() - Cumulative hapax-token ratio
-
get_idf_by() - Get inverse document frequencies of values in one vector
xcategorized by another vectorby.
-
get_match() - Get dictionary matches of values in a vector
-
get_sentiment() - Get sentiment matches of values in a vector
-
get_tf_by() - Get term frequencies of values in one vector
xcategorized by another vectorby.
-
get_tfidf_by() - Term frequency–inverse document frequency
-
get_ttr() - Cumulative type-token ratio
-
is_hapax() - Check for hapax legomena
-
is_new() - Check for new words in a vocabulary
-
pos_tags - Part of speech tags