Skip to contents

Prepare Texts

Functions for collecting, loading, and cleaning a corpus of texts.

Collecting Texts

get_corpus()
Prepare a corpus or corpora of texts
get_gutenberg_corpus()
Build and load a corpus from Project Gutenberg
get_micusp_corpus()
Get a MICUSP corpus
download_once()
Download a file once
micusp_metadata()
Get MICUSP metadata
parse_html()
Read HTML headers and text from file

Loading Texts

load_texts()
Load a folder or data frame of texts

Cleaning Text and Metadata

move_header_to_text()
Move a header column to text
identify_by()
Choose a new doc_id column
standardize_titles()
Standardize document titles
unnest_without_caps()
Split text into words and drop proper nouns

Measure Text Features

Functions for measuring features of texts and being choosy about how you do it.

add_dictionary()
Add values from a dictionary
add_index()
Index document row numbers
add_ngrams()
Add ngram columns
add_partitions()
Divide documents in equal lengths
add_sentiment()
Add sentiment markers
add_vocabulary()
Measure lexical variety
drop_na()
Drop rows containing missing values
drop_stopwords()
Remove stopwords
summarize_tf_idf()
Compare usage across a corpus
count()
Count values in one or more columns
expand_documents()
Convert data frame from long tidy format to wider format
combine_ngrams()
Combine ngram columns
separate_ngrams()
Separate one word per column
make_dictionary()
Create a lexicon

Model Topics

Model complex relationships in a corpus.

load_topic_model()
Load (or cache and load) a topic model
make_topic_model()
Construct a topic model

Explore Results

Generic functions make it easy to share results with an audience (or keep them to yourself)

contextualize()
Show a term in context
tabulize()
Prepare a table of data
visualize()
Visualize output

Adjusting tables and figures

collapse_rows()
Collapse gt rows in the style of kableExtra
change_colors()
Choose other colors

Data

Data included

pos_tags
Part of speech tags