Package index
-
get_corpus()
- Prepare a corpus or corpora of texts
-
get_gutenberg_corpus()
- Build and load a corpus from Project Gutenberg
-
get_micusp_corpus()
- Get a MICUSP corpus
-
download_once()
- Download a file once
-
micusp_metadata()
- Get MICUSP metadata
-
parse_html()
- Read HTML headers and text from file
-
load_texts()
- Load a folder or data frame of texts
-
move_header_to_text()
- Move a header column to text
-
identify_by()
- Choose a new doc_id column
-
standardize_titles()
- Standardize document titles
-
unnest_without_caps()
- Split text into words and drop proper nouns
Measure Text Features
Functions for measuring features of texts and being choosy about how you do it.
-
add_dictionary()
- Add values from a dictionary
-
add_index()
- Index document row numbers
-
add_ngrams()
- Add ngram columns
-
add_partitions()
- Divide documents in equal lengths
-
add_sentiment()
- Add sentiment markers
-
add_vocabulary()
- Measure lexical variety
-
drop_na()
- Drop rows containing missing values
-
drop_stopwords()
- Remove stopwords
-
summarize_tf_idf()
- Compare usage across a corpus
-
count()
- Count values in one or more columns
-
expand_documents()
- Convert data frame from long tidy format to wider format
-
combine_ngrams()
- Combine ngram columns
-
separate_ngrams()
- Separate one word per column
-
make_dictionary()
- Create a lexicon
-
load_topic_model()
- Load (or cache and load) a topic model
-
make_topic_model()
- Construct a topic model
Explore Results
Generic functions make it easy to share results with an audience (or keep them to yourself)
-
contextualize()
- Show a term in context
-
tabulize()
- Prepare a table of data
-
visualize()
- Visualize output
-
collapse_rows()
- Collapse gt rows in the style of kableExtra
-
change_colors()
- Choose other colors
-
pos_tags
- Part of speech tags