MONK basics

MONK (an acronym for Metadata Offer New Knowledge) is by the developers of WordHoard.

Start with this tutorial. Here are some notes:

  • MONK’s capabilities are summed up in the word “metadata,” which essentially means data about data. Parts of speech and lemmas are different examples of metadata.
  • For example, in the phrase “the Thames ran softly,” we know that ran is a verb (specifically, the past participle of to run); that softly is an adverb, modifying ran; that the Thames is a noun (specifically, a river in southern England).
  • The tutorial tells us that MONK treats all texts as “bags of words.” Think of these like bags of Scrabble tiles, but where every word is copied onto multiple tiles.
  • A word like ran appears in three bags: [1] a lemma bag labelled “to run” (along with running and runs); [2] a much larger part of speech bag labelled “verbs” (along with every other verb!); and [3] a spelling bag labelled “ran” (along with ranne and other variants). [There are other bags, but for simplicity’s sake we can ignore them.]

The Getting Started tutorial covers essential steps to begin working with MONK: getting an account, starting a project, and working with both worksets (texts) and toolsets (more on those below).

The Define Worksets tutorial is next. It shows you how to identify Shakespeare as the author, and Hamlet as the text, you would like to work on.

  • Scroll past the sections on searching for linguistic features like lemmas or spellings — they are not relevant to English 203, as they allow you to search a wider array of texts for particular contents.
  • Your first goal is to define a workset that includes the complete text of Hamlet. (In later steps, you can add worksets that include single acts, or all of Shakespeare’s plays, or all of his tragedies, or other combinations of works.) Follow the tutorial to do this, beginning with step 6.

So what “new knowledge” can MONK’s metadata offer us? That’s where the Worksets Comparison tutorial comes in. (You should also watch the video about comparing worksets.)

  1. Frequency Comparison: This allows you to compare the frequency of words (either unique spellings or related lemmas) in two worksets. For example, if you want to compare the adjectives Shakespeare used in his comedies to those in his tragedies, the tutorial will show you how to do that. If you want to see what words appear more often in Act 4 than in the rest of Hamlet, first define worksets with the texts you’re comparing and then return here to run a Frequency Comparison on them. In general, though, you will get better results from this tool when comparing larger worksets, because it lists only spellings or lemmas that are more numerous in one or the other.
  2. Dunnings Log Likelihood: This allows you to compare the likelihood of a word appearing in one workset as compared to another. One is your analysis workset (the text/s you are studying), and the other is your reference workset (the text/s you are comparing to). So if you wanted to compare the likelihood of Shakespeare using certain words in his comedies as opposed to his tragedies, you would choose “genre: play-comedy” for the First Workset, and “genre: play-tragedy” for the Second Workset; and then choose “Dunnings: second workset as analysis set.”

Once you’ve mastered these MONK basics, you’re ready to explore its more advanced functions:

  1. Searching the concordance
  2. Using other toolsets beyond these suggested two

Leave a Reply

Your email address will not be published. Required fields are marked *