Polonius’ sentiments about Hamlet’s ‘recent behaviour’ were perhaps approached in our MONK group today.
Being met with frustration on the first day of collective contribution to learning and mastering MONK, I believe, though my teammates may disagree, was both beneficial and disconcerting. MONK, amongst other capabilities (albeit extremely limited capabilities), immediately bonded us in the united effort to overcome its barricades of text analysis. A united effort that made a modest amount of progress, but progress nevertheless. Our processes, and the obstacles that MONK hurled our way, as depicted and described below, have revealed to us the limitations of MONK’s capabilities.
To begin, I depart with my emphasis on the limitations of MONK’s capabilities, to explain what those capabilities are. Then the limitations which are to follow will be of much more significance and clarity. In a general overview, MONK is an acronym for “Metadata Offer New Knowledge.” It functions on a ‘bag of words model’ in which it takes a digital text and interprets the characters in the entire text as numerical values. The ‘bags of words’ (called worksets from here), are compared with other kinds of bags in order to provide a frequency comparison with other texts. It is an analytic tool, where we enter data so that the tool can give data back. Thus, in summary, MONK is able to search concordances such as lemmas, parts of speech, and spelling, which are all inputs for Dunnings. It is also able to compare the frequency of any of these three between two worksets through the use of toolsets.Â Those who are interested in further details, or feel that my explanation leaves much to be desired, may proceed to the Monk Tutorial. For those who are interested in Dunnings, and the analytics of it, may proceed here.
We defined our worksets as chunks of text, instead of as lemmas, parts of speech or spelling, as to suit our purposes of analyzing Hamlet 3.4. The worksets that I am currently attempting to work with are the complete text of Hamlet, Act III, and Act III scene iv.
We began our first session by exploring our tool in an attempt to grasp it’s full potential in analytic capabilities. Though not verbally stated, I imagine the question we sought to answer was, ‘what can MONK do to provide me with more insight than what I could get from simply reading the text?’ With this general aim in mind, we started by searching general concordances in Hamlet just to practice using it. We entered, in the concordance search bar, “mother n” in order to search for the frequency at which mother appears throughout the text as a noun:
As you would guess, “mother” as a noun, does not appear this many times in sequences throughout Hamlet. The problem presented here that we continued to experience, was that the findings do not provide us with any line numbers or references to acts. We are left with the general picture of how many times we see the word “mother.”
Regardless, we continued on to see if perhaps the toolset “compare worksets” would provide us more insight into the significance of frequencies in Dunnings as opposed to concordance of just an isolated text. So, upon saving our worksets, we entered into the tool and before starting to even use the tool, we were already faced with another problem: what could we compare Hamlet 3.4 for with in order to obtain useful results?
Because MONK is a comparison tool, we determined the best ways in which we could establish the significance of Hamlet 3.4 to Hamlet in general, was to compare 3.4 to the entire text of Hamlet, 3.4 to Act 3 (excluding 3.4). At this point took our own experimental paths, continuing to share with one another what we found, what problems we experienced, and questioning what we could do to take that result to further analysis. The following is what I found in my own attempts to use MONK. (However, the problems that are described here are ones that all five of us encountered.)
First, the feature comparison has several analysis methods available in the drop menu:
On The left hand side of the screen, I have set the first work set as Act 3.4 and the full Hamlet text as the second. The ‘Analysis Methods’ drop menu contains the options “Dunnings: First workset as analysis; Dunnings: Second work set as analysis; and Frequency Comparison.” The remaining two I have yet to venture into.
The results on the right were the result of selecting “Dunnings: First work set as analysis” and then selecting ‘Lemma’ as a feature, 30 as the minimum frequency, and ‘nouns’ for feature class. These data inputs returned to me the data results on the right, in which the left hand column displays numerical values of the frequencies, and the right displays a visual guide in which grey words are under used, and black overused. The size of the font used reflects the extent of over or under use; the bigger the grey text, the greater the under use and vice versa.
ThisÂ is where my problems began. To stop myself from rambling, I will just mention in brief that the problems that I experienced in comparing 3.4 to Act 3 workset were the same, if not worse.
In comparing 3.4 to Hamlet as a whole, whether altering the analysis method, changing the minimum frequency, or switching from lemma to spelling in the feature drop menu, there were very little changes that could be noticed in the frequencies on the right hand side.
This was the result of Â the following parameters:
- Â Â First Workset- Hamlet 3.4
- Second Workset- Hamlet (full)
- Analysis Method: Dunnings: First workset as analysis
- Minimum Frequency: 20.
- Feature: Lemma
**Please note the bold grey letters, as the list reflects those letters
- First Workset: Hamlet 3.4
- Second Workset: Hamlet (full)
- Analysis Method: Dunnings: Second workset as analysis
- Minimum Frequency: 20
- Feature: Lemma
As you can see, the words are exactly the same, whether you are using the first or second workset as analysis. I assure you, the results are equally baffling. The logic behind our thinking here, was that 3.4 as a significantly smaller body of text, would return different results whether it was the text being analyzed, or the text being compared.
This was just one exampleÂ of the various parameters I manipulated in order to generate results. This was a problem that we all experienced as a group. In an attempt to determine if we were missing something or otherwise incorrect, we used the same tool to compare Hamlet to the genre of tragedies available in the MONK database. The results varied greatly with this search.
This is what we realized:
MONK is capable of establishing very interesting data on the frequencies of words and lemmas within texts, but onlyÂ if it is a large and substantial amount of text. This comparison technique is useful for the comparison of genre to genre, as it looks to the general significances of frequencies. However, the frequencies that exist within one scene, one act, or even one play, are difficult to use in establishing an argument. MONK is designed to be used in the broad spectrum of language that Shakespeare employs.
Because of this, when trying to analyze smaller bodies of texts, results became increasingly harder to establish as significant.
In the MONK tutorial, the section titled “Basic Facts on Common and Rare Words” explains the concept of Zipf’s Law, and explains that the words that occur rarest are the ones that will be the most interesting and significant, as opposed to the more common ones.
This being the case, it has been difficult (as of now) for us to look past the limitations and difficulties of MONK and embrace the potential it may have, as the frequency of words in 3.4 compared to Hamlet as a whole, is bound to be among all the rare due to the difference in content.
Nevertheless,Â as Hamlet says, “There is nothing either good or bad, but thinking makes it so.”
I believe our next step is to question: “In what ways can we manipulate MONK in order to use it in innovative ways in order to draw insight from dunning frequencies and workset comparisons to study Hamlet 3.4?”
Perhaps there are some ideas here.
Innovation: that’s what the Digital Humanities is all about right?