The Darker Secrets of the Digital Humanities

Another semester comes to an end, and for the first time ever I’ve spent more quality time with my computer than with a good old fashioned book in order to complete my English class. Twitter, WordPress, and WordHoard have consumed my life and have completely flipped the world of Shakespeare around for me. I’ve never been a huge fan of the bard and I’m still not super interested investigating him any further than I’m required too. Having the internet there and the various digital tools to aid me definitely made this semester a lot more enjoyable than the fall semester where it was strictly reading Shakespeare’s works (with a hint of twitter).

The way we chose to investigate Hamlet this semester was by strictly looking for answers to our own questions. The problem with this is that we eliminated anything that we found that doesn’t necessarily fit with our hypothesis; we also tended to eliminate things that we didn’t find interesting. Scott B. Weingart (the scottbot irregular), mentioned in his blog entry entitled, Avoiding Traps, the ideas of sampling bias, selection bias, data dredging, cherry picking, confirmation bias, p-values, positive results bias, file drawer problem, and HARKing. I believe that all of the above are crucial to understanding the digital humanities fully and also, so we don’t make broad or incorrect assumptions about Shakespeare’s literature.



(Original Image from

Beware of Biases

Weingart defines a selection bias as “an error in choosing the individuals or groups to take part in a scientific study”, and a sampling bias is “that it undermined the external validity of a test (the ability of its results to be generalized to the rest of the population)”. So, for both of those to make sense in our classroom we would use our digital tools (WordHoard, WordSeer, Monk, TAPoR, and Voyeur) as the individuals taking part in our study and the sampling bias would simply be the results we garner from them. As we learnt throughout the semester some tools are simply not designed to work and analyze specific portions of the text. Some are better at looking at a specific scene and act (Phase 1), others are better at looking at whole scenes (Phase 2), and there are still some, that I get the sense, that are not great at doing work at either phase and would be better suited comparing the whole text to other works.

I worked with WordHoard for the entirety of the course and personally I felt like it was able to work well during both phases. I was able to gain information that I need relatively quickly; however, I did notice that when I presented my findings to other users who were not using WordHoard they were confused with my findings and screenshots (I even tried kicking it old school and presenting my findings on sticky notes, as seen in my fourth blog post, with no avail). My findings fit perfectly into the concept of sampling bias since it’s unreadable to non-users of WordHoard, making it hard for my finding to reach a wide audience.


To use all the Information, or to not use all the information, that is the question

The Internet is filled with more information than one person will ever need. With our work with the digital humanities we’re just expanding the information that is out there and for me this is a terrifying idea. When I first started elementary school, which was only in 1997, we still did all our research with books, the Internet was still considered “new”. Now, we live in a digital age where anything we want or need to know can be typed into nearly any device and we’ll receive an answer in seconds or less. We must be weary of the answers we receive from the Internet, as a good portion of it is misleading or false. The Internet is full of “trolls” (which Urban Dictionary users define as “Someone who is purposefully posting on a forum/message board/site with the sole aim to irritate the regular members”); in a sense Hamlet could be considered the troll of his day.

(Image from

So what do we do with all this information? Are we just adding fuel to the fire without even realizing it? Are our assumptions and conclusions trolling the digital humanities community and Shakespearean aficionados?

Weingart’s concern about data dredging resonates with me a great deal. For me, this was the most terrifying part of the process. Data dredging is the idea that with all the information out there for us it’s “tempting to find correlations between absolutely everything”. I fell victim to data dredging when I trusted Monk’s findings (HA, why did I ever trust Monk?). In my most recent blog post I talked about using April’s results and testing them in mine. I guess Monk scoured its database and came up with the results below but when I tested them in my tool it came up with zero results.

(April’s Results) 
(My Results)

Weingart was talking about human data dredging but in the case of Monk versus WordHoard, I fell victim the data dredging of Monk and it giving me false-positives. Monk trolled me.


Information Everywhere!

We all want to come off as intelligent individuals who know what they’re talking about so we tend to only share are solid and most interesting information. We are all victims of being a cherry picker (cherry picking isn’t just for sports anymore); we continuously cut away information until we get the strong hypothesis or conclusion that we were searching for.

For example, I looked up the word “love” in WordHoard and it told me that it appeared 65 times in the play. Great! Now I could make the general assumption that love was used in the Oxford English Dictionary definition of “a feeling or disposition of deep affection or fondness for someone” in all 65 occurrences if that would strengthen my argument, cherry picking. However, looking further into the results I see it’s not always used in that context:

Hamlet: As love between them like the palm might flourish, (5.2.40) ✔

Gertrude: For love of God, forbear him. (5.1.276) ✖

Hamlet uses the word love in the proper context of the OED definition, but Gertrude simply uses it as an expression with no significance behind it.

After you’re done cherry picking and data dredging you’re left with about 5% of all the information you’ve collect because that is al you’ve deemed worthy enough to be presented and shared. This is called the positive results bias. All the other information that is left over from your research is discarded, creating the file drawer problem.

The file drawer problem is an issue because without sharing our failures, or inconclusive results, we’re leaving other people to go down the same path. If we worked together as a community and published all our results, the good and the bad, we’d be able to see what works and what doesn’t and be able to provide better feedback and support.


Going Forward

Going forward, new and old digital humanists need to be aware of what their work is doing and how it’s helping or not helping others. Acknowledging the biases that are being formed when we do our research and being conscious to try and strop them is important. If we can stop only publishing our positive results and start sharing our other trials too, which the majority of English 203 did this semester in their blog posts due to all the frustrations and headaches our tools created, we can help and foster one another’s learning.

Data dredging and cherry picking is harder to stop doing because we’re drawn to those results. They’re the ones that bring us closer to our goal and our purpose of research. Sometimes other alleys and opportunities should be looked into before sticking simply to those first positive results.

Weingart also mentioned confirmation bias, p-values and HARKing, which I did not touch on either because I don’t have enough knowledge on the subject (p-values), or I felt that they didn’t quite fit into our classroom (confirmation bias and HARKing). However, from what I read, I do believe they are still important and vital to sustaining and fostering the growing digital humanities. As an individual who is addicted to her computer and the Internet, I hope they’re here to stay and get worked into more of the University’s courses.

Tediously Gaining Results

Since my last post, where I blindly searched words that I thought resembled those most likely found in Shakespeare’s comedies and tragedies, I’ve done some further investigation thanks to my lovely group members. They provided me some words that their programs deemed tragic or that they noticed in their past readings of Shakespeare’s tragedies. This was exactly what I needed to help me investigate further because WordHoard requires you to know exactly what you’re looking for.


I used April’s previous blog post to start off with. Monk generated her a list of words that were most often seen in tragedies and with her investigation of the word ‘justify’ I had high hopes for the results I would get in return. My hopes began to dwindle around search number ten where I still had zero results in my lemma search and search number twenty-five crushed me, as I still had no results. I painstakingly built thirty-four searches in total to find lemmas that were associated with April’s results, they were all returned to me stating that there were zero results.

Lovely. How come with Monk it showed that it was super confident that the word ‘justify’ appeared more in Hamlet than all of Shakespeare’s other tragedies, but yet when I searched lemmas or just the simple spelling of it in WordHoard it yielded zero results? This caused me to bring up the entire Hamlet text on different sites on the Internet to just do a simple ctrl+f, or ⌘+f in my case, to look for the word ‘justify’ but still no results…strange.


Next, I moved onto the comment that Dane had left me in my previous blog post about words that he though resembled the tone of a tragedy. Thank goodness some of his words garnered me results or may have gone mad just like Hamlet. I searched for twenty-three different lemmas from the words Dane had provided me; from those I got seven that had matches, 9 total appearance in act five.

Beast(n), duty(n), fall(n), fall(v), revenge(n), slay(v), and wretched(j) were the golden tickets I need to start making my conclusions.

All but one of them appear in 5.2, which leads me to assume that the first part of the act is more light, or comedic than the second scene which is dark and tragic (but I could assume this already since everyone dies in this scene….). But if I had not read Hamlet before and was simply going off WordHoard’s answers to my queries that’s what I would assume.


This led me into thinking about how unique these words were to act five, turns out only fall(n) is unique. The other six words appear more frequently. These “tragic” words appear seventeen times in act 4, fifteen times in acts one and two and nine times in act 3. So if I were not accounting for the amount of words and the actual context they were used in I would assume that act 4 was the most tragic, acts one and two were in the middle making it possibly a tragic comedy and acts three and five were the least tragic possibly even comedic. Strange isn’t it?

Hamlet: Tragic or Comical?

Is Hamlet truly a tragedy or can it be considered more of a comedy? We’ve noticed, as a group, that when we ask Monk to predict the classification of Hamlet in either Comedy or Tragedy it continually deems it comedic.

But why is this? To further investigate this we’ve decided to compare Hamlet to Macbeth, Titus Andronicus, Merchant of Venice and As You Like It.   Macbeth is a tragedy through and through, while Titus Andronicus was Shakespeare’s first tragedy making them two good candidates to be comparative texts. Comedy on the other hand, we chose As You Like It because it’s a classic comedy and very well known, the choice of Mechant of Venice provided us with a bridge between comedy and tragedy since it is commonly known as a tragic comedy…maybe Hamlet can be a tragic comedy too?

I’m not really a huge reader of Shakespeare so the only thing  that I knew that differentiate a tragedy from a comedy was that a tragedy ended in death, normally numerous deaths, while a comedy normally ended in marriage or marriages. I looked on Wikipedia….which I know it’s not the most reputable source but I just need a quick reference on the differences between the two. They describe a tragedy as linked to “Aristotle’s precept[ion] about tragedy: that the protagonist must be an admirable but flawed character, with the audience able to understand and sympathize with the character.” A comedy has a “happy ending, usually involving marriages between the unmarried characters, and a tone and style that is more light-hearted than Shakespeare’s other plays”. Worhoard isn’t capable of showing me a relationship or qualities in a person to help me un-code a tragedy, however I can look at key words, adjectives and the use of the negative to gain the tone of a comedy.

Knowing the limitations of my tool I turned back to my last analysis where I searched the lemmas of love and death, as well as the use of the negative. I used this method in the 4 additional plays, as well as Hamlet as a whole and just Act 5. I soon realized that the results I received could be misleading because I just got the number of results back and not a percentage. Since not all the plays are the same length if the word love appears 200 times in play X and play Y it will not be the same percentage or concentration. So I also had to get Wordhoard to calculate the total number of words in each play.

These are the results I got (organized on paper so it’s easy to understand and follow):

The results weren’t overly surprising; “love” had a higher appearance in comedies, while “death” had a higher concentration in tragedies. The negative seems to appear more often in comedies than tragedies and this may be a linguistic choice of Shakespeare, but I’m not sure.

My findings that as a whole play, Hamlet, as a whole, falls in the middle between tragedy and comedy when it comes to the lemma “love”, it’s right in-line with tragedy with the lemma “death”, but when you look at the negative it appears to be a comedy. Making it as a whole play a confusing mix of tragedy and comedy, a tragic comedy…

When you single out just Act 5 I can see that it lends itself more to tragedy in both lemmas categories and is in between the two categories when we look at the negative. Since tragedy appears twice, I can label Act 5 as a tragedy.

I think some help from my other group members about synonyms or other words that are comedic or tragic will help me utilize my tool further in uncovering this mystery. Maybe different scenes are more comedic and others are more tragic?



Phase 2 and still no light on the capabilities of WordHoard

So begins a new adventure in phase 2, trying to uncover a deeper meaning to Hamlet. I’m very interested to see how well the blending of tools will aid the understanding of the play and in specific if my tool will actually become useful.

To begin the process my group decided to each do our own general search of Act 5 so we have starting off points for an analysis. Playing around with WordHoard is always fun….ha! Not quite sure what to start looking for I played around with lemmas and decided that death and love are more than appropriate for this act as there is a funeral and well, everyone dies thanks to Shakespeare’s classic tragedies. Surprisingly ‘death’ is only seen eight times in both scenes and ‘love’ is seen ten times. One thing that was annoying when searching for the ‘love’ lemma was that I had to do two individual searches; once for it as a noun, and once for it as a verb. Nothing surprising came up when I searched these lemmas though, so that was a dead end for deeper exploration from my aspect

So I turned my searches towards looking at negatives and adjectives. I already knew the anger, sadness, and death that occurred so the fact that there were fifty-nine instances of the word not (or the negative) in the act was not surprising. What it did cause me to notice was that this program calls the gravediggers, clowns. Weird I know, they are given the description of clowns in the character list in our hardcopies but their role is of gravediggers. It would be interesting to see what play source this program pulled the text from because I’ve never read an edition with clowns in it. Anyways, Hamlet leads the way with his use of the negative by saying it 35 times, which just reiterates my analysis of him from act 3.4.

Adjectives on the other hand surprised me. Knowing that the act was a darker one I figured it would be hard to find good or positive adjectives but it was the contrary. The beginning was filled with positive adjectives and it was hard to find negative ones. In the middle there was a constant wave of positive and negative adjectives used amongst the characters. Finally at the end my initial thoughts were confirmed and the negative adjectives poured out during the final battle.

The last thing I looked into was the amount the speakers spoke in the act. Hamlet spoke 44% of the words, while the gravediggers (or clowns according to WordHoard) spoke an astounding 17% of the act. The other nine players spoke the remaining 39% of the act but none of them spoke more than 8% of those remaining words. I hope that makes sense and isn’t overly confusing….

Anyways, I still had the same annoying problems with WordHoard, having endless windows open, having to tediously build my searches because I’m not special enough to have an account. Hopefully my group members can help me find a use for my tool because my initial findings aren’t very helpful or deep.

(only half of the windows I had opened, scary)

WordHoard, simply not a fan

WordHoard is very misleading at a first glance. It presents itself as this tidy, little, program that is going to super easy to use and will be a powerful tool to help you prove your arguments and thesis’. However, under closer inspection I found out that it’s in fact a limited and temperamental program. The more I dig into it, the more I feel like I’m becoming loss in an endless maze, not being able to find the information I require.

Continue reading

Minor Inconveniences with WordHoard

After attending tutorials on how to use all of the tools I’m very thankful I ended up with WordHoard. It has Hamlet loaded into it already and the rest of Shakespeare’s works, as well as works from Spenser, Chaucer, and Early Greek Epics. It allows me to (somewhat) easily look up:

  • Which parts of the text are narration and which parts are speech
  • Whose speaking
  • Identify the difference between male and female voices
  • Speaker mortality (I think it’s really cool for a program to single out supernatural figures from moral figures, perfect for Greek Epics)
  • If it’s written in prose or verse
  • Lemmas (which I had to look up what this meant…)

Other than that everything is a whole lot more complicated.

Continue reading