How to Write a Blog Post

Writing blog posts as assignments for classes is, to most, a foreign concept. Raised from the standard depths of five paragraph essays complete with an introduction, body paragraphs, a conclusion, and to the point of a concise thesis, some are intimidated by the idea of writing anything different. Well, fear not! Blog posts are fundamentally very similar to the standard essay that many of us have written time and time again. Like essays, your posts have a reader, and you introduce, explain, and conclude your topic of discussion. The only differences are that, in an online community, your posts do not just have a reader, but they have readers, whom will all be interested in what you have to say because you will explain and explore your ideas. How do I know they will be interested, you ask? Here’s how:

Continue reading

Concluding on My Introductory Experience in the Digital Humanities

Introductory Conclusions

As an english major, a lover of the literary, historical, and symbolic, I walked away with a celebratory slide, from anything that involved numbers in any shape or form. I suppose in my mind it was a celebratory slide, however to my math, physics, and chemistry teachers, it must have resembled something of a frantic scrambling flee for the door. This is, I think, something that the majority of my fellow classmates in ENGL203 can attest to; The mistrust of anything that would take a piece of literature and suggest, ” sometimes, a river is just a river.The river moves with this speed, this velocity, because the water demonstrates this amount of viscosity, and it moves in this direction.” As students of the literary,  I suppose in response we would go on our rants and tangents of the river representing a winding and continuous process of life. My point here, is that there has been an innate and inherent hatred for some of us, if not most of us, towards the mathematical and statistical aspects of the world, and how those aspects take away from the symbolic values that have been metaphorically scattered throughout the universe.

Throughout the course of ENGL203 however, in the midst of my introduction into the world of the Digital Humanities, my understandings of the statistical, quantitative aspects of the literary text such as Hamlet, has consequently enriched my qualitative findings of the text. Digital Humanities, in my mind was the best example of an oxymoron, if I had ever heard a good one. I began this course with the question, “what could I possibly gain from knowing how many times a word shows up in a text?” I have concluded the course with the question, “in what different ways could these statistics and probabilities be applied to this text, or a wide array of texts, to provide me with the best kind of data to answer a series of research questions?”

Working with MONK throughout the semester in analyzing Hamlet, I have acquired a new appreciation for the mathematical aspects of the world. I say ‘appreciation’ without the implication that I have begun to appreciate mathematics, but to mean that I can see the value that it can provide in analyzing a text such as Hamlet, as I continue to have a lingering suspicion toward mathematics. Ben Schmidt’s article Treating Texts as Individuals vs. Lumping Them Together has provided me with additional insight into my perspectives of the tools that can be used to analyze texts, such as Hamlet, in the Digital Humanities.

It is my perspective, and argument, that although the traditional close-reading that we have been taught throughout the years as lovers of the literary has much to offer us in an analysis of literary texts such as Hamlet, the tools that are available in the Digital Humanities that provide us with statistical data and probabilities complete our understandings of the qualitative with the quantitative aspects. I believe that the precedence we place of the qualitative, though understandable, is misguided. The numerical values that we are provided with in our tools, though frightening and confusing for us english majors, complete our analysis in such a way that makes the digital a valuable and effective method in text analysis.

 

 

The Quantitative

MONK, despite its glitches and imperfections, did not fail to teach me a lesson about the Digital Humanities and the value of statistical data. In the beginning, I suppose I did not feel very different from the way Queen Gertrude did when she responded to Polonius’ melodramatic ramblings by saying, “More matter with less art (2.2.95).” I found MONK to be spewing at me numbers, statistics, probabilities, that provided me with nothing valuable whatsoever.

The images below, provide a pretty clear picture of what I was ‘fleeing’ from the rise of my university career:

THIS, after the entire course, is still lost to me:

I initially believed that I was going to understand nothing about these tools and flunk out of the course, however, it was comforting to find that I was wrong.

An aspect of MONK that I found particularly interesting in the way it contributed to my analysis of Hamlet, was the classification tool and its Naive Bayes analytics and Decision tree as methods of analysis. By using work frequencies of a variety of texts, MONK is able to classify texts into categories.

My immediate understanding of Hamlet, just by reading it, is that it is particularly tragic in its subject matter. Hamlet mopes around the entire text, quips like a madman with incredible mood swings, while everyone around him is scheming against one another, only to have it so everyone dies eventually. This plot, as ridiculous as I have made it seem in my summary of it, can be read as nothing but tragic. However, from the classification tool that MONK provides, I discovered that Hamlet‘s word frequencies, were more comedic than tragic. By comparing it to a wide array of different texts, I was able to discover that Hamlet, like other texts such as Othello, are anomalous to the tragic genre of Shakespeare’s texts. The question to be considered here is, would I have met these conclusions from just a traditional reading of the text? I doubt it.

The emphasis here, is not on my lack of abilities in close-reading texts…but on the acute abilities of the text mining strategies of tools such as MONK. From word frequencies, or the quantitative values of Hamlet, I was able to discern the qualitative aspect of it as being less tragic than the classic tragedy in Shakespearean texts.

The Qualitative

In his article Treating Texts as Individuals vs. Lumping Them Together, Ben Schmidt explores and describes the strengths and weaknesses of various methods of analytics, and their use in answering question in text analysis. He states that the key importance in using tools that employ these methods of analytics is “how to treat the two corpuses we want to compare. Are they a single long text? Or are they a collection of shorter texts, which have common elements we wish to uncover?” Interested in analyzing hundred of texts, Schmidt is aware if the imperfections that arise from any division of this large number of texts. He poses the question, ” how far can we ignore traditional limits between texts and create what are, essentially new documents to be analyzed?” At the end of the article, he provides lists of the appropriate uses of Dunnings log-likelihood, Mann-Whitney, and TF-IDF comparisons in texts.

From working with TF-IDF as well as Dunnings log-likelihood in MONK, it was interesting to find that I reached the same conclusions that Ben Schmidt reaches in his article with his analysis of the tools. Attempting to use these analytics in MONK just to analyze Hamlet alone, was a difficult and arduous task, as the text being analyzed was simply to small. Hamlet as an individual text, in comparison to the huge array of texts available in the MONK program, hardly returned information that could provide useful in a text-mining analysis of Hamlet. As many of the MONK users have noted, Hamlet on its own, was too narrow a data set to find any meaningful data using a broad and wide-scale analysis method such as MONK. As suggested in Schmidt’s article:

 

Each tool that uses and provides quantitative data has individual strengths and weaknesses. The valuable lesson to be taken away from Ben Schmidt’s article, is the suggestion that there must be a certain amount of care put into using tools such as Dunnings Log- likelihood and IF-IDF comparisons, and even with that care, sometimes these tools cannot be applied in the line of inquiry being pursued. In short, these tools cannot alway be relied on, and should not be the absolute basis of argumentation when it comes to text analysis. That mistrust that all of us share toward the numeric values that can pervade the literary, though extreme at times, is not unfounded. There is value in the qualitative meaning that we gather from traditional readings of texts, when the quantitative just simply does not make sense.

THEREFORE

I have learned that, in a sense, neither the traditional reading nor the digital statistics of texts are completely trustworthy.

With the traditional reading, I concluded without being absolutely correct, that Hamlet was completely a tragedy, and that there was simply no other type of text that it could be.

With the digital statistics, I discovered that, although I was returned with data, the methods that I was attempting to use were very picky in the type of data I was inputting, and could return me with skewed conclusions if I did not use them with the utmost care. (Which I don’t believe I did all the time.)

However, in both circumstances, I was able to use the digital to correct my traditional reading, and use the traditional reading to double-check my digital findings.

My purpose in writing all of the above is, therefore, to show that there is much value that can be gained from both methods of analysis. Each method on its own, is in some sense, incomplete. The Digital Humanities, in all of the tools it offers to provide a statistical analysis of probabilities in texts, through methods such as word frequencies, has provided not only a valuable, but legitimate method of analyzing literary texts such as Hamlet. Our fear of the numbers in statistics and probabilities and the automatic assumption that they will not be useful in a literary analysis of a text, though understandable, is misguided. As Hamlet begs of his friends, ” Nay then, I have an eye of you. If you love me, hold not off (2.2.255-257).” A request that many would beg of their endeavours using the digital tools, that they would not hesitate to reveal the value that they have uncovered beneath the text. The trick is in recognizing, to begin with, that there is in fact value, it just simply must be uncovered and laid in plain view for analysts to use.

However, once it is found…there is a great amount of valuable knowledge to be gained that can be contributed to our analyses as a whole.

For example,

The river does indeed represent the continuous winding and progression of life, and the numerical values of its speed, direction, and viscosity, tell me that this metaphorical river of life, flows at a rapid pace, in one direction decided by destiny, at a speed determined by the hardships and challenges innate to its path. Thus, providing me with a well-rounded, complete analysis, with the symbolic qualitative meaning and the numerical quantitative data, of the way of life.

 

 

Works Cited

Shakespeare, William. Hamlet. Ed. Ann Thompson and Neil Taylor. London: Arden Shakespeare, 2006. Print.

MONK’s “Tragic” Words: A continuation

As a continuation of my last post

In my attempt to discover words that may participate in MONK’s classification of Act V as more tragic, I found myself being led in another direction of attempting to figure out why MONK insisted on classifying Hamlet as a ‘half-tragedy’ in comparison to the other words. My discoveries in individual word frequencies were interesting, as it would seem that they would contradict the ‘half-tragedy’ classification that MONK previously made. In other words, MONK seems to have contradicted itself.

In comparing the tragedies to all of Shakespeare’s plays, MONK has returned me with the following data:

 

The first verb that MONK provided on the list as appear most frequent in the tragedies in comparison to the rest of Shakespeare’s plays, was “swear.”

Upon selecting the word to see the break down of frequencies, I was provided with the following information:

“Swear,” as it appears in all of Shakespeare’s tragedies, appears most frequently in Hamlet.

 

 

To satisfy my own curiosity, I scrolled further down the list and selected a word that seemed less likely to appear in a tragedy, but still one I did not remember reading that frequently when I did my own reading of the Hamlet text. Selecting ‘smile,’ I was provided with the following chart:

In terms of the number of times the word “smile” appears in the tragedies, it appears most frequently in Hamlet.

 

I assure you, this pattern remains consistent throughout the list of frequencies that MONK has provided me.

I remain uncertain of if these results are being affected by the glitches and malfunctions that MONK has been experiencing as of late, but this does raise an interesting question:

If MONK’s data hasn’t been affected by its recent problems, where does this leave us with understanding Hamlet as being classified as a tragedy? 

If the words being provided by MONK as most frequently occurring in Shakespeare’s tragedies in comparison to the rest of his plays all appear most frequent in Hamlet, why is it then, that Hamlet is the play that is most frequently classified only as a ‘half-tragedy?’

This is a question that is beyond MONK or my own understanding to fully grasp, and so, it is my hope that the tools of my group members can take this information and further analyze it to bring us closer to an understanding of what this all means for Hamlet as a whole.

Perhaps it is not these tragic words that can be the basis for our classification of Hamlet as a tragedy. Perhaps we must take the comedic words used in Hamlet to understand why MONK refuses to accept it fully as a tragedy?

These are all questions I hope to have answered in my next blog post, as I believe that these answers will guide me to an interesting discovery about Act V in relation to Hamlet as a whole.

 

 

 

MONK: To be, or not to be?

In all of the discoveries that I have almost made, it seems that MONK has made its decision to ‘not be.’

Unable to create worksets that could be compared for word frequencies, which my group discussed as a good initial focus today, I have found myself at a loss of anything useful to blog about other than how this program has refused to co-operate with me. However, it occurred to me today, that perhaps for the sake of my group I shall force MONK to hand me something useful.

Yes, I do mean force.

In the interest of figuring out what classifies Act V as ‘more tragic’ than Hamlet, I began to use the preset corpus and genre worksets in order to determine which words were frequently used by Shakespeare in his tragedies. The following is what I learned in this endeavour.

It is worth mentioning, I think, for those of you that are familiar with MONK, you know that it has this irritating stubborn thing where it just refuses to remember the options that you have selected to search with when you hit previous, so this process was a long and arduous one.

 

To begin, I chose the preset worksets to be compared would be all of Shakespeare’s plays with his tragedies, in order to determine which words were unique to his tragedies. I was returned with these:

The words provided in this list are those words that appear most frequently in the comparison between all of Shakespeare’s plays and all of the tragedies.

When I select the word “justify” I am provided with a graph of the frequency of that word across te time span of Shakespeare’s writings:

I found it interesting that the year the word “justify” peaked was roughly around the time when Hamlet was written, and so I hit ‘continue’ in order to see the plays in which this word occurs and in which play in occurred most frequently.

The circulation period I was most interested in was between the year 1600-1610. Finding that time frame on the list, this is what I discovered:

The word ‘justify’ occurs more in Hamlet than it does in any other play in this time period.

It also appears more in Hamlet than it does in any other play, and all the plays on this list in all the time periods, were tragedies.

Going through the list, I found similar words of interest to tragedies (not just in Hamlet). For example, the word ‘rehearse’ appears only, or most frequently in this comparison, in tragedies.

Using words like this, I think it will be of interest to our group in analyzing Act V.

 

I believe that because Act V was classified by MONK as more tragic that the rest of the play, these words will be helpful in assessing why MONK has made this classification and it will provide a starting point for the other frequency analyzing tools in gathering further interesting analysis about Act V.

MONK: Hilarious Hamlet

In the first stages of phase 2 of our group projects, I find I am more intrigued by MONK that I had been initially in phase 1, to say in earnest (but not unfounded) honesty. As promoted by the blog posts of the MONK group and throughout our presentation, MONK, as a text mining tool that focuses on statistical analysis and word frequencies, appears to be more cooperative in answering questions about a broader range of data. Though Act V is not as broad as MONK seems to wish it could be, I have found that I am indeed learning new information about Hamlet, Act V than I had known before.

My initial purpose in embarking on my analyzing journey was to discover what was unique about Act V, that I could not deduce from reading, but could learn from using the analytics of MONK.

In my blog posts from phase 1, I was left pondering the question of, “why does MONK, in comparison to all other tragedies, continuously notify me that it is only half confident that Hamlet is a tragedy?” With this question in mind, I endeavoured to determine if perhaps Act V participated in this strange inconsistency.

To begin, I defined my workset to contain As You Like it, The Rape of Lucrece, Hamlet, Julius Caesar, Much Ado About Nothing, and Act V.

Then, selecting my classification toolset and the newly created workset, I began to rate the the training and test sets. As can be seen in the image below, I rated As you Like it, and Much Ado About Nothing as the comedy training sets, and The Rape of Lucrece and Julius Caesar as the tragedy training sets. I left Hamlet and Act V with blank ratings, thus making them my test sets.

This is what I was returned with:

 


From this image it is easy to have the attention redirected to the fact that according to these queries, Julius Caesar is not a tragedy.

However, MONK’s lack of confidence in Julius Caesar being classified as a tragedy notwithstanding, where the attention must be drawn (as it took me a while to do so), is toward the fact that in a statistical analysis of the plays that are present, MONK has classified both Hamlet and Act V as comedies.

Feeling uneasy about my results, I went back to the user ratings, and removed those anomalies that MONK was picking up, and forced MONK to recognize Hamlet as a tragedy by rating it so.

These were the results I was returned with:

Both analyses were conducted on the basis of nouns.

In classifying Hamlet as a tragedy, and leaving Act V as the test set, MONK returned me with it’s classification that, with a 0% probability of error and 100% confidence, Hamlet is not a tragedy.

However, MONK does believe, that Act V is a tragedy.

The words I was most interested by in the data it used in determining its confidence in the ratings, however, was words like ‘blood.’

The first number displayed, 26.1241, represents the average frequency that the word appeared every 10000 features in the test set, Act V. The second number is the average frequency that the word occurred every 10000 features in the training sets.

From words such as ‘blood,’ MONK has determined that, based on average frequency, act V can be classified as a tragedy.

 

It was interesting for me to find that based on word frequencies and statistical analysis of noun features, in comparison to other works of Shakespeare, Act V can be classified as a tragedy and Hamlet cannot. Though it would be a worthwhile endeavour to attempt to figure out why MONK refuses to agree that Hamlet is definitely a tragedy, I find (it being my responsibility as a member of the Act V group for phase 2), I am led to research the cause of Act V being classified as more of a tragedy than Hamlet itself.

Because, to me, the subject matter and the words Shakespeare uses in telling the tale of Hamlet’s tragic story, it is difficult for me to understand its classification as anything but a tragedy. Therefore,  I have reached another understanding of MONK that I did not previously have in attempting to analyze 3.4. I wanted, so desperately, for MONK to see and understand Hamlet 3.4 the way I read it. I wanted to force it to read the words on the page in the order that they are in, and take the sentence for what it means.

However, it is this reading that we do as sensible, and feeling people, that leads to an analysis that is incomplete without tools such as MONK, and it is that reading that completes the pure numerical data, which is literally meaningless to any symbolic possibilities that exist in literature.

I digress.

 

MONK being a tool that uses pure data (and not emotion) in providing a classification, has yet to reveal to me the statistical reasoning for Act V being more tragic than Hamlet as a whole. Although my point here is not that MONK is unable to show me, it is that I have yet to fully understand the reasons it has provided me.

Reading the subject matter, it is rather simple for me to determine why Act V is tragic. The entire cast being wiped out is indeed, quite tragic. However, from reading that same subject matter in Hamlet, I cannot comprehend the reason why the play ISN’T tragic. From the interpretation of Hamlet losing his father, to have his mother marry his uncle, to find out that his uncle-father murdered his mother, and much more, is devastatingly tragic! My point here then, is that my reading and comprehension is not, and cannot always be correct. I would assume, as a university student living in Canada where all people have equal rights, that Othello is a tragedy. However, the audience that Shakespeare wrote for, not knowing a thing about racial equality would consider Othello a comedy.

The evidence of these are in the words, and in the probabilities that MONK discovers. It will classify Othello as a comedy on the basis of words, and in that same way it will classify Hamlet as a ‘half-tragedy.’

It is my hope that we, as the group analyzing Act V, can determine (undeterred by emotional bias) the true nature of Act V in relation to Hamlet by collaborating the various data we get from our digital tools.

I will from here, endeavour to determine why MONK tells me that Act V is so significantly more tragic than the entire text of Hamlet.

 

 

 

MONK: Truly, “more matter with less art!”

The last time I wrote, the content of my post focused on frustrations that I had experienced with the limitations on the capabilities of MONK, and the difficulty I experienced even approaching my starting question of what our tool could do to provide us with insight about Hamlet 3.4, that we couldn’t get from just reading the text. Needless to say, the content of this post is very different.

For the duration of our team meeting today, we prepared to deliver our presentation on MONK and its capabilities and explain how it led us to new understandings of Hamlet 3.4. When dividing the topics to be discussed, I found myself assigned with the task of explaining the classification methods that MONK uses, Naive Bayes and decision tree induction, and how MONK uses them to provide useful knowledge. These, being concepts that I had a grasp of (a slippery grasp at that), I felt comfortable in explaining to my fellow team members the information I had absorbed from reading the night before.

Well, as I began talking and explaining my findings by referring to the actual process of using the methods, I realized I hardly understood exactly what I was talking about or where my vague and unconfident sentences were taking me. It was after that meeting that I sat down and furiously (or with committed fervour rather), researched, practiced, and practiced again until I understood exactly how these were to be helpful to our analysis. The following is what I found.

Text mining or also called data mining, in its shortest possible form of explanation, is a process that revolves around pure mathematical data analytics in order to return statistical data and probabilities based on patterns and sequences observed in the data. MONK, using Naive Bayes and decision tree induction, is among these text mining methods.

The tutorials for Naive Bayes and decision tree induction provide detailed, technical explanations of what they are and the processes of these analytics. In my attempt to get a better understanding of these analytics, I started with these tutorials. For those of you who read them, you will see that when I say detailed and technical, I mean that it looks like english but there were moments when I doubted that it really was.

This section (below), is only half english.


This one, is most definitely not english.

So, I turned where all students turn for short and quick explanations: Wikipedia. In my brief descriptions to follow, there are terms that I must first address in order for the explanations to be coherent.

  • Training sets– sets of data used to discover parameters that can provide a probability of predictable relationships between two or more sets of data.
  • Test set– A set of data used to asses the strength of the probability that was given by the training sets.
  • Over fitting– Crucial to training sets, are when statistical models (such as those in MONK) emphasize and display the minor fluctuations and random errors in the data instead of the relevant relationship, because there are more parameters than there are potential observations.
Naive Bayes is a classification method that uses two or more “classes” that are assigned to training sets. It builds knowledge and “learns” comparisons between the two classes, and applies them to classify an unknown text. It is useful for 3 things:
  1. Categorizing a text.
  2. Finding features that stand out in a text.
  3. Characteristics of one text that are common to a large body of texts, like a genre.
The MONK tutorial points out that the interesting aspects that can be seen using Naive Bayes, are those that we would consider “misclassifications.” In this way, Naive Bayes is useful for making a hypothesis and testing it, or going through the process to confirm something you believe you already know.
Decision tree inductions take the classifications provided by Naive Bayes, and use them to determine the attributes or characteristics that made them so. Below, is a simplified and understandable image of the basic concept of a decision tree, provided by the MONK tutorial.
This is the process that is applied to the data analytics of the decision tree. It determines which aspects are present and which are not, and then logically produces a ‘tree’ of information that leads to probabilities.
This is where over fitting is a crucial aspect. When this models grows to become too complex, this means the training data will be too detailed, therefore essentially useless in analyzing texts other than the training set. Instead of ‘learning’ the general relationship between the ideas, it memorizes that particular training set and attempts to apply it elsewhere.
The purpose of my explaining the analytics behind the tool, is because once I understood what the tool was searching for, and how it searched, it made it far easier for me to understand how to use the tool. With a body of text, and a tool that compares one body of texts, to one or more other bodies of texts, it is extremely difficult to determine what to look for that could be significant. Being given the probability and frequencies of words in texts is, despite how simple it may sound, a difficult place to start because there are just too many words.
Nevertheless, this is what I learned.
In general, using the classification tools that MONK had to offer, and practicing using them correctly did not further my understanding of Hamlet 3.4 as much as I had hoped, however, it did confirm what I believed, surprise me with things I believed that were wrong, and open for me a door into the digital humanities by showing me its vast capabilities. For example:
In terms of Hamlet 3.4, I attempted to analyze the scene in comparison to the all the tragedies in order to find what of this scene was characteristically tragic in Shakespeare’s language. Unfortunately, the way that worksets are defined, the closest I could get to this kind of analysis was Hamlet compared to all Shakespeare’s tragedies, and 3.4 compared to the remainder of Act 3. There I became faced with a problem also, what parameters do i assign each scene in order to find out something useful about 3.4?
In the section where it says “click to rate” there is a certain parameter that you are setting. If you filled in “love,” “death, and “betrayal” as themes of the first three scenes into the first three spaces, and hit ‘continue’ then it would return to you the conclusion of which theme scene 4 best fit according to the probability determined by Naive Bayes. Doing this, unfortunately returned no substantial results as the interactions within the individual scenes themselves were too varied from scene to scene.
In attempting to compare the nature of Hamlet to the tragedies, I did the following:
After hitting continue, I set the following parameters:
These parameters returned to me the following classifications using Naive Bayes algorithm:
The intensity of the red next to the title of the play indicates the level of confidence, or the lowest probability of error, that its classification is correct. The predicted rating, is the classification that Naive Bayes provides, based on the 2 classes (historical and fictional) that I have set for it.  From this, Naive Bayes shows me that it is fairly certain that based on the data I have provided and the data that it has analyzed, there is a certain % probability that it is a fictional play.
When i click Hamlet and the continue, MONK shows me the data that it has found which explains its confidence level.
The nouns that appear in the far right column are those that have given the Naive Bayes algorithm reason for the presence of probable confidence. The “Avg. Freq. Training” column is the number of times that the word appears in the ‘parameter’ plays that I labelled before, and the “Avg Freq Test” column is the number of times that the word appears in the plays that I left to be classified.
The reason that the confidence is not vibrant red in the predictions however, is because of the infrequent words that appear below:
When I click “Decision Tree,” the image that pops up displays the process by which the analytics flipped the tree over to determine what word could act as a classification.
The results displayed above provides the probability of error of the word “unkindness” as the basis of that classification. This decision tree states that in terms of probability, this word had the lowest error rate, and highest predictive performance.
Therefore, from this data, I can conclude that Naive Bayes and the decision tree have determined that there is a higher probability that Hamlet is a play of fiction, rather than history.
In conclusion, despite the various frustrations the group has experienced and the little bits that we picked up about 3.4 in specific, through Naive Bayes and decision tree induction, I have learned that classifications are a great place to start. Comparing texts in order to determine aspects of one based off another CAN show you something you never knew, or prove you wrong, in order to provide you with some idea of what you need to look for or what research criteria you need to change.
In terms of research, as we’re doing in ENGL203, learning and being wrong…I think that’s a great way to start.

 

 

 

MONK’s “pranks…too broad to bear with”

Polonius’ sentiments about Hamlet’s ‘recent behaviour’ were perhaps approached in our MONK group today.

Being met with frustration on the first day of collective contribution to learning and mastering MONK, I believe, though my teammates may disagree, was both beneficial and disconcerting. MONK, amongst other capabilities (albeit extremely limited capabilities), immediately bonded us in the united effort to overcome its barricades of text analysis. A united effort that made a modest amount of progress, but progress nevertheless. Our processes, and the obstacles that MONK hurled our way, as depicted and described below, have revealed to us the limitations of MONK’s capabilities.

To begin, I depart with my emphasis on the limitations of MONK’s capabilities, to explain what those capabilities are. Then the limitations which are to follow will be of much more significance and clarity. In a general overview, MONK is an acronym for “Metadata Offer New Knowledge.” It functions on a ‘bag of words model’ in which it takes a digital text and interprets the characters in the entire text as numerical values. The ‘bags of words’ (called worksets from here), are compared with other kinds of bags in order to provide a frequency comparison with other texts. It is an analytic tool, where we enter data so that the tool can give data back. Thus, in summary, MONK is able to search concordances such as lemmas, parts of speech, and spelling, which are all inputs for Dunnings. It is also able to compare the frequency of any of these three between two worksets through the use of toolsets. Those who are interested in further details, or feel that my explanation leaves much to be desired, may proceed to the Monk Tutorial. For those who are interested in Dunnings, and the analytics of it, may proceed here.

We defined our worksets as chunks of text, instead of as lemmas, parts of speech or spelling, as to suit our purposes of analyzing Hamlet 3.4. The worksets that I am currently attempting to work with are the complete text of Hamlet, Act III, and Act III scene iv.

We began our first session by exploring our tool in an attempt to grasp it’s full potential in analytic capabilities. Though not verbally stated, I imagine the question we sought to answer was, ‘what can MONK do to provide me with more insight than what I could get from simply reading the text?’ With this general aim in mind, we started by searching general concordances in Hamlet just to practice using it. We entered, in the concordance search bar, “mother n” in order to search for the frequency at which mother appears throughout the text as a noun:

As you would guess, “mother” as a noun, does not appear this many times in sequences throughout Hamlet. The problem presented here that we continued to experience, was that the findings do not provide us with any line numbers or references to acts. We are left with the general picture of how many times we see the word “mother.”

Regardless, we continued on to see if perhaps the toolset “compare worksets” would provide us more insight into the significance of frequencies in Dunnings as opposed to concordance of just an isolated text. So, upon saving our worksets, we entered into the tool and before starting to even use the tool, we were already faced with another problem: what could we compare Hamlet 3.4 for with in order to obtain useful results?

Because MONK is a comparison tool, we determined the best ways in which we could establish the significance of Hamlet 3.4 to Hamlet in general, was to compare 3.4 to the entire text of Hamlet, 3.4 to Act 3 (excluding 3.4). At this point took our own experimental paths, continuing to share with one another what we found, what problems we experienced, and questioning what we could do to take that result to further analysis. The following is what I found in my own attempts to use MONK. (However, the problems that are described here are ones that all five of us encountered.)

 

First, the feature comparison has several analysis methods available in the drop menu:

On The left hand side of the screen, I have set the first work set as Act 3.4 and the full Hamlet text as the second. The ‘Analysis Methods’ drop menu contains the options “Dunnings: First workset as analysis; Dunnings: Second work set as analysis; and Frequency Comparison.” The remaining two I have yet to venture into.

 

The results on the right were the result of selecting “Dunnings: First work set as analysis” and then selecting ‘Lemma’ as a feature, 30 as the minimum frequency, and ‘nouns’ for feature class. These data inputs returned to me the data results on the right, in which the left hand column displays numerical values of the frequencies, and the right displays a visual guide in which grey words are under used, and black overused. The size of the font used reflects the extent of over or under use; the bigger the grey text, the greater the under use and vice versa.

This is where my problems began. To stop myself from rambling, I will just mention in brief that the problems that I experienced in comparing 3.4 to Act 3 workset were the same, if not worse.

In comparing 3.4 to Hamlet as a whole, whether altering the analysis method, changing the minimum frequency, or switching from lemma to spelling in the feature drop menu, there were very little changes that could be noticed in the frequencies on the right hand side.

For example:

This was the result of  the following parameters:

  •     First Workset- Hamlet 3.4
  • Second Workset- Hamlet (full)
  • Analysis Method: Dunnings: First workset as analysis
  • Minimum Frequency: 20.
  • Feature: Lemma

**Please note the bold grey letters, as the list reflects those letters

 

Then:

The parameters set for this second analysis:

  • First Workset: Hamlet 3.4
  • Second Workset: Hamlet (full)
  • Analysis Method: Dunnings: Second workset as analysis
  • Minimum Frequency: 20
  • Feature: Lemma

As you can see, the words are exactly the same, whether you are using the first or second workset as analysis. I assure you, the results are equally baffling. The logic behind our thinking here, was that 3.4 as a significantly smaller body of text, would return different results whether it was the text being analyzed, or the text being compared.

This was just one example of the various parameters I manipulated in order to generate results. This was a problem that we all experienced as a group. In an attempt to determine if we were missing something or otherwise incorrect, we used the same tool to compare Hamlet to the genre of tragedies available in the MONK database. The results varied greatly with this search.

 

This is what we realized:

MONK is capable of establishing very interesting data on the frequencies of words and lemmas within texts, but only if it is a large and substantial amount of text. This comparison technique is useful for the comparison of genre to genre, as it looks to the general significances of frequencies. However, the frequencies that exist within one scene, one act, or even one play, are difficult to use in establishing an argument. MONK is designed to be used in the broad spectrum of language that Shakespeare employs.

Because of this, when trying to analyze smaller bodies of texts, results became increasingly harder to establish as significant.

In the MONK tutorial, the section titled “Basic Facts on Common and Rare Words” explains the concept of Zipf’s Law, and explains that the words that occur rarest are the ones that will be the most interesting and significant, as opposed to the more common ones.

This being the case, it has been difficult (as of now) for us to look past the limitations and difficulties of MONK and embrace the potential it may have, as the frequency of words in 3.4 compared to Hamlet as a whole, is bound to be among all the rare due to the difference in content.

Nevertheless, as Hamlet says, “There is nothing either good or bad, but thinking makes it so.”

I believe our next step is to question: “In what ways can we manipulate MONK in order to use it in innovative ways in order to draw insight from dunning frequencies and workset comparisons to study Hamlet 3.4?”

Perhaps there are some ideas here.

Innovation: that’s what the Digital Humanities is all about right?