Skip to contents

tmtyro 0.6

  • Variables now gain descriptive labels with new options to toggle label use. Turn off labels locally by setting options(tmtyro.use_labels = FALSE) within a document or turn them off for a profile or project by setting the environment variable TMTYRO_USE_LABELS to FALSE. get_data_dictionary() prepares an explanatory data dictionary using these labels, set_data_dictionary() modifies labels from a provided data dictionary, and drop_labels() removes labels.
  • Optional logging now records steps for most tmtyro functions working at the level of data frame, with options to toggle use of this log. Turn off logging locally by setting options(tmtyro.use_log = FALSE) within a document or turn it off for a profile or project by setting the environment variable TMTYRO_USE_LOG to FALSE. Helper functions get_methods_log(), set_methods_log(), and add_methods_log() allow for management and adding manual entries to the log.
  • New narrativize() function uses log to print a narrative describing methods used.
  • Support for ggplot2 version 4.0+.
  • New theme_tmtyro() function extracts and modularizes theme defaults with smart grid lines using S7 methods.
  • New percent argument in get_tf_by() for consonance with get_tf().
  • New get_df_by() function for getting document frequencies in correspondence with get_tf_by().
  • New html parameter in contextualize() ensures HTML output.
  • Improvements to get_gutenberg_corpus() handle file downloading:
    • Cached files can now be used without a network connection
    • New download argument directs handling of file downloads and location
  • Improvements to parse_html():
    • New headers argument limits headers to user-assigned range
    • New standardize_headers argument allows keeping header tags for transparency
    • New standardize_headers() function for managing standardization of column names from HTML tags
  • add_vocabulary() now adds fewer columns. Chain this function with add_progress() to regain those that have been dropped.
  • add_index() is now a thin wrapper for add_progress(), a new function that adds support for measuring progress by percentage and for specifying units used in labels.
  • tabulize() returns better formatted tables for every type.
  • A new italicize_titles() generic function simplifies formatting of the doc_id column (or other columns) for tables made with tabulize() and for figures made with visualize().
  • Where possible, visualizations avoid reprinting Y-axis values for small multiples from plot_doc_word_bars() when Y-axis isn’t rearranged.
  • More unit tests added to increase coverage
  • REMOVED: Setting labeling = "axis" is no longer supported because of deprecations in ggh4x, which is no longer imported

tmtyro 0.5

tmtyro 0.4.1

tmtyro 0.4.0

  • New function contextualize() shows terms in a window of context
  • New function add_index() adds a column showing word indices within each document
  • load_texts() adds support to keep original capitalization and punctuation alongside the tokenized word column with the keep_original argument. This process does not work in all instances, so the option defaults to FALSE.
  • add_dictionary() includes an option to keep original terms. This is useful for n-gram dictionaries, where a match might otherwise span multiple rows.
  • add_ngrams() supports negative ranges, for building context windows
  • add_partitions() supports overlapping partitions
  • standardize_titles() capitalizes words after terminal punctuation

tmtyro 0.3.0

tmtyro 0.2.0

  • New function add_partitions() adds a partition column, useful for getting same-sized samples
  • identify_by() now works with multiple columns, and it keeps existing metadata columns. This is especially useful with the new add_partitions() column, using something like my_corpus() |> add_partitions() |> identify_by(title, partition) before continuing to work with partitioned documents. To return framing to unpartitioned data, used identify_by(title) or whatever other column is most relevant.
  • New visualization and tabulization methods for expand_documents()
  • Functions now imported: count() and drop_na()
  • When the ggraph package is loaded, plot_bigrams() now uses a color scale on edges, rather than spot color on nodes, with full support for change_color()
  • Improved documentation with website articles for customizing colors and showing code comparisons

tmtyro 0.1.0

  • First “public” release! 🎉
  • Unnecessary components removed and dependencies reduced
  • Examples standardized and made reproducible
  • change_colors() now works with plot_bigrams()
  • change_colors() now includes a “dubois” colorset
  • tabulize() documentation is now improved for online output
  • standardize_titles() now works with factors
  • Added default behavior for visualize() on a corpus
  • Part of speech tagging should now work for more texts

tmtyro (development version 0.0.8.9000)

tmtyro (development version 0.0.7.9000)

  • get_gutenberg_corpus() now retrieves HTML versions of texts from Project Gutenberg and parses header tags for section markers
  • New function parse_html() for reading headers in an HTML file
  • New function move_header_to_text() for converting header to text
  • New function identify_by() to simplify using something other than doc_id
  • Improved internal linking within documentation

tmtyro (development version 0.0.6.9000)

  • Better working visualize() function as generic with supported methods
  • Improved change_colors() with added support for the Okabe-Ito colorset and the option of starting with something other than the first color of a palette. With these changes, color options have been removed from other visualization functions to consolidate them within change_colors().
  • When a data set includes only one unique doc_id, visualizations are no longer divided into facets.
  • In an effort to reduce the number of dependencies, many packages have been removed from “Imports” (geomtextpath, ggrepel, glue, NLP, openNLP, plotly, RColorBrewer, stopwords, textstem, wordcloud). Where appropriate, these have been shifted to “Suggests” or dropped entirely.