MONK: Truly, “more matter with less art!”

The last time I wrote, the content of my post focused on frustrations that I had experienced with the limitations on the capabilities of MONK, and the difficulty I experienced even approaching my starting question of what our tool could do to provide us with insight about Hamlet 3.4, that we couldn’t get from just reading the text. Needless to say, the content of this post is very different.

For the duration of our team meeting today, we prepared to deliver our presentation on MONK and its capabilities and explain how it led us to new understandings of Hamlet 3.4. When dividing the topics to be discussed, I found myself assigned with the task of explaining the classification methods that MONK uses, Naive Bayes and decision tree induction, and how MONK uses them to provide useful knowledge. These, being concepts that I had a grasp of (a slippery grasp at that), I felt comfortable in explaining to my fellow team members the information I had absorbed from reading the night before.

Well, as I began talking and explaining my findings by referring to the actual process of using the methods, I realized I hardly understood exactly what I was talking about or where my vague and unconfident sentences were taking me. It was after that meeting that I sat down and furiously (or with committed fervour rather), researched, practiced, and practiced again until I understood exactly how these were to be helpful to our analysis. The following is what I found.

Text mining or also called data mining, in its shortest possible form of explanation, is a process that revolves around pure mathematical data analytics in order to return statistical data and probabilities based on patterns and sequences observed in the data. MONK, using Naive Bayes and decision tree induction, is among these text mining methods.

The tutorials for Naive Bayes and decision tree induction provide detailed, technical explanations of what they are and the processes of these analytics. In my attempt to get a better understanding of these analytics, I started with these tutorials. For those of you who read them, you will see that when I say detailed and technical, I mean that it looks like english but there were moments when I doubted that it really was.

This section (below), is only half english.

This one, is most definitely not english.

So, I turned where all students turn for short and quick explanations: Wikipedia. In my brief descriptions to follow, there are terms that I must first address in order for the explanations to be coherent.

  • Training sets– sets of data used to discover parameters that can provide a probability of predictable relationships between two or more sets of data.
  • Test set– A set of data used to asses the strength of the probability that was given by the training sets.
  • Over fitting– Crucial to training sets, are when statistical models (such as those in MONK) emphasize and display the minor fluctuations and random errors in the data instead of the relevant relationship, because there are more parameters than there are potential observations.
Naive Bayes is a classification method that uses two or more “classes” that are assigned to training sets. It builds knowledge and “learns” comparisons between the two classes, and applies them to classify an unknown text. It is useful for 3 things:
  1. Categorizing a text.
  2. Finding features that stand out in a text.
  3. Characteristics of one text that are common to a large body of texts, like a genre.
The MONK tutorial points out that the interesting aspects that can be seen using Naive Bayes, are those that we would consider “misclassifications.” In this way, Naive Bayes is useful for making a hypothesis and testing it, or going through the process to confirm something you believe you already know.
Decision tree inductions take the classifications provided by Naive Bayes, and use them to determine the attributes or characteristics that made them so. Below, is a simplified and understandable image of the basic concept of a decision tree, provided by the MONK tutorial.
This is the process that is applied to the data analytics of the decision tree. It determines which aspects are present and which are not, and then logically produces a ‘tree’ of information that leads to probabilities.
This is where over fitting is a crucial aspect. When this models grows to become too complex, this means the training data will be too detailed, therefore essentially useless in analyzing texts other than the training set. Instead of ‘learning’ the general relationship between the ideas, it memorizes that particular training set and attempts to apply it elsewhere.
The purpose of my explaining the analytics behind the tool, is because once I understood what the tool was searching for, and how it searched, it made it far easier for me to understand how to use the tool. With a body of text, and a tool that compares one body of texts, to one or more other bodies of texts, it is extremely difficult to determine what to look for that could be significant. Being given the probability and frequencies of words in texts is, despite how simple it may sound, a difficult place to start because there are just too many words.
Nevertheless, this is what I learned.
In general, using the classification tools that MONK had to offer, and practicing using them correctly did not further my understanding of Hamlet 3.4 as much as I had hoped, however, it did confirm what I believed, surprise me with things I believed that were wrong, and open for me a door into the digital humanities by showing me its vast capabilities. For example:
In terms of Hamlet 3.4, I attempted to analyze the scene in comparison to the all the tragedies in order to find what of this scene was characteristically tragic in Shakespeare’s language. Unfortunately, the way that worksets are defined, the closest I could get to this kind of analysis was Hamlet compared to all Shakespeare’s tragedies, and 3.4 compared to the remainder of Act 3. There I became faced with a problem also, what parameters do i assign each scene in order to find out something useful about 3.4?
In the section where it says “click to rate” there is a certain parameter that you are setting. If you filled in “love,” “death, and “betrayal” as themes of the first three scenes into the first three spaces, and hit ‘continue’ then it would return to you the conclusion of which theme scene 4 best fit according to the probability determined by Naive Bayes. Doing this, unfortunately returned no substantial results as the interactions within the individual scenes themselves were too varied from scene to scene.
In attempting to compare the nature of Hamlet to the tragedies, I did the following:
After hitting continue, I set the following parameters:
These parameters returned to me the following classifications using Naive Bayes algorithm:
The intensity of the red next to the title of the play indicates the level of confidence, or the lowest probability of error, that its classification is correct. The predicted rating, is the classification that Naive Bayes provides, based on the 2 classes (historical and fictional) that I have set for it.  From this, Naive Bayes shows me that it is fairly certain that based on the data I have provided and the data that it has analyzed, there is a certain % probability that it is a fictional play.
When i click Hamlet and the continue, MONK shows me the data that it has found which explains its confidence level.
The nouns that appear in the far right column are those that have given the Naive Bayes algorithm reason for the presence of probable confidence. The “Avg. Freq. Training” column is the number of times that the word appears in the ‘parameter’ plays that I labelled before, and the “Avg Freq Test” column is the number of times that the word appears in the plays that I left to be classified.
The reason that the confidence is not vibrant red in the predictions however, is because of the infrequent words that appear below:
When I click “Decision Tree,” the image that pops up displays the process by which the analytics flipped the tree over to determine what word could act as a classification.
The results displayed above provides the probability of error of the word “unkindness” as the basis of that classification. This decision tree states that in terms of probability, this word had the lowest error rate, and highest predictive performance.
Therefore, from this data, I can conclude that Naive Bayes and the decision tree have determined that there is a higher probability that Hamlet is a play of fiction, rather than history.
In conclusion, despite the various frustrations the group has experienced and the little bits that we picked up about 3.4 in specific, through Naive Bayes and decision tree induction, I have learned that classifications are a great place to start. Comparing texts in order to determine aspects of one based off another CAN show you something you never knew, or prove you wrong, in order to provide you with some idea of what you need to look for or what research criteria you need to change.
In terms of research, as we’re doing in ENGL203, learning and being wrong…I think that’s a great way to start.




6 thoughts on “MONK: Truly, “more matter with less art!”

  1. YOU’VE FIGURED IT OUT! Thank you so much for spending all that time trying to make sense of this tool. I really appreciate your detailed explanation of the steps you took and the terms used within the tool.

  2. It literally took me hours, but I finally got it! I’ve been trying to do an act-to-act analysis, and trying to figure out what we can define different acts to get useful data for single acts, but that’s been tough. It is narrow, though not as narrow as the scene-to-scene analysis, but i’m still having trouble trying to figure out how to classify each act in order to contribute in phase 2. Hopefully it’ll work out. I posted this with the hope that I made enough sense for you ladies to read it, and maybe try out different parameters in your experiments with at least some success. Perhaps we can continue sharing anything we find useful in phase 2 with one another in keep things ‘interconnected’ that way as well!

    • Really impressive research here, April. I learned a lot from this, and have to say it goes further into the statistical principles than I expected from anyone in this class!

      What you say about helping each other in Phase 2 fills my pedagogical heart with joy. That was one of my ambitions from the beginning.

      • I didn’t anticipate that I would be going into statistical research when I began using this tool with my group! However, when I finally decided that there was a possibility that I wasn’t getting ANY useful data because I just wasn’t using it even a little correctly, I decided I needed to do some investigating.
        I believe I am correct in my understandings of these statistical principles, or of what little I know of them in terms of their vast, ‘non-english’ complexities at least, as I have been able to slowly (but surely), refine my searches to grow more and more useful with this classification toolset.

        I must admit I was skeptical about the blogging and what I could possibly even begin to say that might be even remotely useful, but I truly think that writing my processes and the trouble that I experienced was the easiest way for me to reflect on what I needed to work on. Reading my teammates blog posts gave me ideas about things I could try differently as well because while writing, I think, it is easier to recall specific frustrations and discoveries that get forgotten in conversation.

  3. It took me a while to actually get to write this question, but after having heard you speak of MONK in your presentation and read your blog post, I’m left curious:
    Although \or\ is a common term with highly ambiguous connotations within the play – would you be able to use MONK to make a tree like the one in the screenshot above and see the choices that fork from it/ensue? Or perhaps, would it be possible to reverse the tree in such a way that you could see what leads up to the specific word?
    Great presentation by the way!

    • I’m glad you asked! I realized that this was something I forgot to address in our presentation in regards to word frequencies and probabilities in the decision trees process of analyzing a classification.

      Words such as “or,” “and,” “but,” etc are all words that are necessary to complete sentences, for the most part. So, as can be assumed, they are in huge abundances throughout the play, and all of Shakespeare’s plays in general. As we mentioned through the course of our presentation, MONK depends on a broad database of works in order to compare, contrast, and classify a smaller (but still relatively large) body of text.
      Using words such as the ones above, being ambiguous and abundant words, would return ambiguous and abundant data. The processes outlined in this blog post depend on the parameters that you set in order to follow those to find similar instances throughout the text. Thus, if your set parameters are vague, than the data you are returned with as analysis are also vague.

      It would be possible to create a decision tree from Naive Bayes using words such as “or.” It is possible to classify a text based on such words. However, the tree it would create would be one that wouldn’t provide you with much reason for its classification, or probability of a text being one thing, more than another. If you look at the statistical data that MONK provides in the last screenshot I provided, out of 11 instances of “or” for example, it is likely that it would be classified correctly 11 times out of 11, leaving no probability for error. It would be a sure determination with a 0% margin for error, however, it would be a sure determination that wouldn’t provide as much information as could be sought from a word such as “unkindness,” “death,” or “vow.”

      The process of making the decision tree, such as in the “play or not to play” decision tree image provided in the beginning of the post, is not displayed in MONK. I’m sure it would be interesting to see that process in words that it eliminated and used in order to reach its conclusion. MONK, however, only will provide you with the word that provided the “most certain” basis for its conclusion, and not all the other words that were analyzed in that process.

      I hope this answers your question!

Leave a Reply

Your email address will not be published. Required fields are marked *