Once installed, stylo2gg will interface with data recorded by the stylo package. The examples below introduce functionality using the eighty-five Federalist Papers, originally published pseudonymously in 1788.
Principal component analysis
As called here, the stylo package limits words to those common to at
least 75% of the texts (using the culling...
argumements),
saves the data in an object called federalist_mfw
, and
plots the texts based on their word usage with principal component
analysis:
library(stylo)
federalist_mfw <-
stylo(gui = FALSE,
corpus.dir = system.file("extdata/federalist", package = "stylo2gg"),
analysis.type = "PCR",
pca.visual.flavour = "symbols",
analyzed.features = "w",
ngram.size = 1,
display.on.screen = TRUE,
sampling = "no.sampling",
culling.max = 75,
culling.min = 75,
mfw.min = 900,
mfw.max = 900)
data:image/s3,"s3://crabby-images/8243a/8243a799027f7b88b1b61c3d40186a895da1526b" alt="This visualization places each part by its frequencies of 120 of the most frequent words—chosen from among words appearing in at least three-fourths of all papers The chart shows that the texts whose authorship had once been in question, shown here with red Xs, have frequency distributions most similar to those by James Madison, shown here with green crosses."
This visualization places each part by its frequencies of 120 of the most frequent words—chosen from among words appearing in at least three-fourths of all papers The chart shows that the texts whose authorship had once been in question, shown here with red Xs, have frequency distributions most similar to those by James Madison, shown here with green crosses.
By default, the stylo2gg()
function uses both the data
and visualization settings from federalist_mfw
:
data:image/s3,"s3://crabby-images/31376/31376b6fb0a58995f971111c1f64ba9fece2884f" alt="Using selected `ggplot2` defaults for shapes and colors, the visualization created by `stylo2gg` nevertheless shows the same patterns of style, presenting a figure drawn from the same principal components. Here, the disputed papers are marked by purple diamonds, and they seem closest in style to the parts known to be by Madison, marked by blue Xs."
Using selected ggplot2
defaults for shapes and colors, the
visualization created by stylo2gg
nevertheless shows the
same patterns of style, presenting a figure drawn from the same
principal components. Here, the disputed papers are marked by purple
diamonds, and they seem closest in style to the parts known to be by
Madison, marked by blue Xs.
Other settings are explained in the article on principle component analysis.
Hierarchical clustering
In addition to two-dimensional relationships with principal components, stylo can also show a dendrogram for cluster analysis, showing texts’ relationships based on their distance to each other.
federalist_mfw2 <-
stylo(gui = FALSE,
corpus.dir = system.file("extdata/federalist", package = "stylo2gg"),
custom.graph.title = "Federalist Papers",
analysis.type = "CA",
analyzed.features = "w",
ngram.size = 1,
display.on.screen = TRUE,
sampling = "no.sampling",
culling.max = 75,
culling.min = 75,
mfw.min = 900,
mfw.max = 900)
data:image/s3,"s3://crabby-images/45b06/45b064900e895048782ca46383cbbd821bb215f5" alt="Dendrogram of hierarchical clusters, prepared by stylo."
Dendrogram of hierarchical clusters, prepared by stylo.
This federalist_mfw2
object can then be piped into
stylo2gg()
:
federalist_mfw2 |>
stylo2gg()
data:image/s3,"s3://crabby-images/70d71/70d711399f29ec4bd1582987ef1a24846492ffd2" alt="As with principal components analysis, `stylo2gg()` function defaults will recreate the chart made by `stylo()`."
As with principal components analysis, stylo2gg()
function
defaults will recreate the chart made by stylo()
.
Alternatively, using the unnumbered federalist_mfw
object from earlier will create a similar cluster analysis using the
option viz="CA"
:
federalist_mfw |>
stylo2gg(viz="CA",
shapes = FALSE)
data:image/s3,"s3://crabby-images/e90f4/e90f4062c9a0d316bb3db65a32ae4170277a04f8" alt="Function arguments simplify exploration without necessitating additional calls to `stylo()`."
Function arguments simplify exploration without necessitating additional
calls to stylo()
.
Additional settings for visualizing clusters with dendrograms are explained in the article on hierarchical clustering.