Endless Context: the Future of the Digital Humanities Ringing in the Digital World

An Introduction

Every time I hear the words “Digital Humanities” I cannot help but think it is some little subset of the DigiWorld. As I have already mentioned in this course, the Digital World of Digimon is the product of massive amounts of information being packed into data, and eventually having enough information to simulate a world of its own. In my opinion, this is not so far-fetched. Take a moment to think about the Internet. There is nothing else that can hold such a massive amount of knowledge, and that is accessible to virtually any person at the speed and ease of the modern digital world. The knowledge comes directly form people who write about life, the planet, it’s functions, and everything their imagination can contribute beyond that. What the Digital Humanities actually is would be the branch between literature and technology. It has existed for ten years, maybe more. On the other hand literature is something that has been around since almost the beginning of recorded human history. It has had thousands of years of development in style and use, but also in cultural development. People have always had personal and historical inspirations for writing and because of this the context of even a single piece of literature is practically endless. Before the Internet, this context and background information was only accessible in physical form or within a great memory. However, by the incredible developments of technology, the Digital Humanities were born making years worth of physical texts into easily accessible data. Suddenly a text from approximately four-hundred years ago is instantly available and so is the history, interpretations, context and author’s biography with a few simple clicks of a button. This is what Sharon Leon is expressing in her post about if the Digital Humanities continue to expand information for countless users, then they will soon become the main resource for study in a given field; however, they will never replace the human aspect of comprehension.

Something New

The introduction to the Digital Humanities was a bit of a shock. For someone who simply adores the books and hours of cross referencing, it was almost unpleasantly simple to find a text in seconds immediately followed by various tools of text analysis. What would be gained by leaving all the work up to the computer and only using our gift of understanding to analyze Hamlet? However, it was no easy task. There were many searches to perform, and many results to be had, but the problem was what to do with them! From word frequencies to comparing Shakespeare’s entire opus we learned to read data. The best example is the NaiveBayes/Word tree analysis. You input a text, and the meaning you predict to come form it. And you get…

What exactly? At first glance this looks like a jumble of gradients and ratios. It looks like maths with visuals. The reactions were of course:

  1. What is Maths doing in the Humanities?
  2. What does it all mean?
  3. Why does it have to mean anything when we could just read instead?

In fact the word tree and NaiveBayes are ratios, probability, and percentages! As a group full of English majors we were both fascinated and terrified. (Link to second blog post) We had not yet deciphered this information, and we had not earned it; therefore, we did not understand it.

Luckily for us, Dr. Ullyot explained in our first few laboratory classes how words and speakers can be tagged. Voila! Instant understanding gained = instant credibility! Thus never caused a great tragedy, but we did need to learn how to link that data with our understanding. Eventually, and with a lot of perseverance we did. Some very cool things we found out were how to compare word counts between Shakespeare plays using the “comparison tool.” Imagine doing that by reading!

In other words Monk served us very well despite being the professed prototype of DH pioneers, one that was soon forgotten due to frustration. The unavoidable thing about frustration; however, is that it tends to lead to broken things… and the great thing about it is that broken things father ingenuity! Phase two was the reveal of all the ingenuity that followed Monk:


Then Monk met TaPOR, Voyeur, Wordhoard and Wordseer. The great discovery then was that each tool, whether cryptic or simple, supplemented the inabilities of the next as seen here: <http://engl203.ucalgaryblogs.ca/2012/03/naive-and-decisive-actually-sums-up-a-lot-of-monk/>  Each partner had learned to link the data being uncovered to understanding, and we successfully delved into a theme of Hamlet that particularly interested each of us. The theme of Hamlet’s madness enables us all to utilize our tools strengths, whether it was searching for a speaker, someone described, or how much madness was in a particular act. Our Act 3 ended up being the maddest of them all including lines like: “That I essentially am not in madness, / But mad in craft” (Act 3, scene 4).  We ended up using such discoveries from Monk as a starting point. The NaiveBayes tool provided us with direction on what, through association, could relate to our search about Hamlet’s madness, and the Monk concordance searches could find a bit of context. From there we could use one of the searches in Voyeur or Wordseer to place it in the text. One of the most intriguing results we found this way was that the tongue and the sword were both spoken of as weapons.  This started a whole project on poisoning the ear with words, and the damage caused by lies and words. Obviously: “The courtier’s, soldier’s, eye, tongue, sword…” (Act 3, scene 1). In the end it turned out that a bunch of English students could learn to leave the searching up to the tools, and to focus on comprehending the results.

400+ years

What would Shakespeare have done if he were to find there was a technology to break down his entire works into categories, word count, or frequency? If there were something to link all his meanings together? He would probably rewrite his plays to make them that much more cryptic!
The wondrous thing about his plays is that they were even complicated for the time they were created. Nowadays there are scholars who devote their lives to discovering the meanings of Shakespeare and the voices of Shakespeare. The article I chose is speculative on the future of museums and archives whether it will be possible to provide on-site enough information to let the average viewer read a work of art or historical artefact like an expert. She imagines a world where information is immediately available to those who seek it. It sounds like the future for those who would take the time to pursue it. I believe this is the future of the Digital Humanities. It would not be that while reading Hamlet notes appear at the side of the text to divine meanings. There are already books suited especially for that. Instead, this would have the power of the internet behind it. All of the searches we can do in the five tools are to deliver what you are searching for in their location, location, location. Sometimes you can even figure out who delivers the line, how often and if similar words were delivered. It is up to us to understand it. The difference, and what I believe is the future, would be to deliver context to the seeker. Not just the immediate ability to see the context and meanings of Shakespeare in notes by previous scholars, but also maps of discovery. What this would mean is that a person would find Hamlet in a digital tool, and not just find a word. The word would come with the initial and evolved definition since Shakespeare’s time, any idiomatic references it may contain of the 1600s, and the option to dive deeper in to what other scholars think about it. From this our understanding would not only be our own limited experience. Sharon Leon wrote:

“The difference here is in the effort to bring together evidence in a user interface that allows for the consideration of many perspectives and multiple causality, as opposed to offering a single perspective that simplifies the past” (http://www.6floors.org/bracket/2012/02/18/content-and-context-visualizations-for-the-public/).

This would be the ultimate information sharing. Anyone could learn about anything. It would open the flood-gates for textual analysis over the internet. The amount of information eventually becomes its own little official world of Shakespeare. If you remember one of my previous posts, Digimon and Divination (http://engl203.ucalgaryblogs.ca/2012/03/digimon-and-divination/), this is a continuation of my theory. Not necessarily that the DH world will become a different dimension where small monsters run around (even if this sounds accurate for some of the plays), but that any structure of a certain size becomes official. For proof, just look at the recent additions to the dictionary (e.g. to heart as seen here: http://www.inquisitr.com/101669/omg-lol-fyi-oed/).   The digital humanities may very well become the official source for literary scholars. Although the Humanities, like everything, will become digital it will never actually lose its footing in the physical world. The world of Digital monsters careened out of control because it lost its basis in the physical realm when the programmers abandoned it. Really though, the Humanities will always exist through humans because that is where the value lies. Besides all that the Digital Humanities will never lose its base as long as books still exist in paper… and let us face it; are there really any humanities scholars who do not adore an old fabric bound, gold edged novel from a by-gone era?

For the Love of the Digital

The next question may be… one I have already asked. “Where does the world end and data begin?”
The most shocking thing about computers is really how ridiculously simple they are at their very base. They just constantly make decisions. 1 or 0? Seriously, that is all they are in essence. So what is so complicated about that? Well you should see the extent that it goes to! Have you ever seen a software engineer’s homework? I have, it does not look like it has ANYTHING to do with 1s and 0s. What I do understand about computers and the Internet is, of course, the humanity of it all. I quote myself:
Internet, and the Digital Humanities; “must hold significant portions of the literature that shapes the world we live in. Literature is made in the image of the earth and of human experience, and the characters that inhabit it are in the image of its creatures. The depth that it reaches to is too far to count. It is too far a stretch to say that the universe of data is alternate to the universe of reality?” (http://engl203.ucalgaryblogs.ca/2012/03/digimon-and-divination/)

This is where the appearance of math I mentioned earlier meets that of the humanities. People are fantastic at taking literature and finding meaning in it. Computers are simply made to learn the basics of our patterns of association, so both must contribute. Monk, TaPOR, Wordseer, Wordhoard, and Voyeur can show us what they find, but without knowing how the user cannot appreciate the results. Although we are miles away from writing these programs ourselves, at least we now understand the power we are accessing.
It is incredible how much can be stored in virtually no physical space. It probably would blow Shakespeare’s mind. However, this wonderful thing has its demons. If people can burns books and art to erase ideas, then how hard could it be to highlight and delete…? Fight Club had a point: if you erase all proving data of debt, does it still exist? Banks already lend money that does not exist, making 100% plus interest of it back, effectively stealing from you for using a service.

Anyway, that was a tangent, but hopefully it gets the point across. Things that do not have root in the physical world have no credibility, but the Humanities will never survive without human interpretation. A computer can do whatever it is told, but at present, it will not understand why or how. You can tell it how to find the word “cowardly” and that sometimes “yellow” will mean the same thing, but it will not be able to distinguish when. Nuances are another thing that might never be known to a computer. Also idiomatic meanings, connotative meanings, emotional effect, so the list goes on. In Monk workbench even, you can search for lemmas of “madness” and you will be lucky if it comes up with anything about possession. However, in context, as the Sharon Leon (“Content and Context”) this will be the future. Even then the computer will not care. There are in fact many businesses and services that have gone digital beyond the need of human input. Luckily, this will not be one of them. The humanities have always been rooted in the realm of human experience, in passion, and in literature. As you can tell by the very word “humanities” it will never extend out of the influence of human intervention. The “digital humanities” depends fully on the cooperation of the digital and physical, the computer binary and the abstract human brain, and the fabrications of both. Thus at least there will always be the credibility, and always be the earned knowledge.

And So…

The introduction of the Digital Humanities has been like no other experience. Having comprehension transcend physical books was a scary idea, but I understand now that neither the DH nor literature can exist without the other. Reading will always have understanding and relation to fuel it, and the digital humanities will have its massive stores of data and the ease of accessing it to continue with. Thus the use of the five tools has become a triumph, and it will continue. Since understanding will always be required, the Humanities can march on to provide endless rounds of data association with works of art, literature and artefacts, and no meaning will be lost. Hopefully this is the pure future. Information will be accessible to everyone who chooses to find it, and not just through months of study. There will never be any loss of credibility because only some will choose to understand it fully. And the parallel universe made up of our data about the world we live in will never materialize with digital monsters and a doomsday prediction because… actually I cannot promise that one.

Works Cited:

Ann Thompson and Neil Taylor, eds. 2006: Hamlet. The Arden Shakespeare. 3rd

Series. London: Thomson Learning. 613 + xxii pp. ISBN 1-904271-33-2

Digimon and Divination?

It is a grey day. Warm with snowflakes like glitter. Someone down the hall seems to be having a workroom party, which they are all quite content with; you can just tell by the laughter. But we are instead lost in a different world; a digital world, if you will. One with so much information compiled and cross-linked that it encompasses the realm of human experience, and encodes the most significant events, works, and experiences as data. It is a place where you can indulge in the works of a man who lived in the 1600’s, and divine new secrets 400 years later. When you really think about it, it is fantastic; unbelievable almost.
Yet, at the same time it is another new day in the Humanities, and a lot of planning done. Today was devoted to pre-project-planning (say that three times fast). Although there were not too many new discoveries, there was the exploration and expansion of the old ones. Monk; of course, is a mining tool, meaning that the more you work the more you will discover. As is, I have been finding more uses for the NaiveBayes and decision tree tools. They might be unconventional, and a little hit-or-miss, but the results are pretty exciting!
In the classification tool you can find NaiveBayes. Under which you load your worksets and rate them. I found that rating each scene with a theme will give me the words that make the predicted theme true or false. Thus, searching for confirmation of the theme “madness,” elicits words that have some cryptic connection with that theme. Such as the word “armour,” which has to do with the armour of the mind… From there, you have to make some good old fashion English major connections and argue your findings; something that we are all experts at. My idea is that the armour of the mind refers to its sanity.. which is slowly broken down by lies. Etc.
Anyway, you get the point. This is what my program is best at in comparison to the other programs. They have the frequency, concordance, and description tools, but this seems to be a unique feature of Monk. The biggest question now is if it can be useful enough to present. That is the question for next time.

The words are supposed to be suggestive in conjunction to "madness"

It is not the most succinct method of analysis, but there is still time to work with it, and it does prove to be interesting every time. For example, “black” appears five times in Act 3, and it is always in a very negative context:

Results for "Black"

In case you were wondering about the title and the bit of writing at the beginning, it just occurred to me that the premise of one of my absolute favourite childhood shows has an abstract relation to the Digital Humanities. That show was Digimon (I know, I know), where an alternate dimension that housed a world made in the image of the earth, with fictional-type-monster inhabitants existed. If you know the show you might remember that the digital world was created by the compilation of data that is stored in computers and over the internet. First the foundations were laid, and “Over the ensuing years, through the continued growth of the electronic communications network on Earth, the Digital World continued to expand and grow,” (http://digimon.wikia.com/wiki/Digital_World) It’s a little bit silly, but it is an accurate depiction of not only the information amassed on the internet, but of the Digital Humanities itself, which must hold significant portions of the literature that shapes the world we live in. Literature is made in the image of the earth and of human experience, and the characters that inhabit it are in the image of its creatures. The depth that it reaches to is too far to count. It is too far a stretch to say that the universe of data is alternate to the universe of reality?
Just a thought.

Where does the world end and data begins?

Marry, this’ miching malicho; it means mischief.

This is easily one of my favourite lines in any Shakespeare play. Why? Because the words befit the meaning in a style that is all their own. And I cannot hlp but thinking that is Shakespeare himself knew that twenty-five young adults were set free with the power of technology to analyze his plays, he might think that a mischief all its own.
In our own little sect of madness we got off to a bumpy start. We were all “masters” of our respective programs, but how do we compare them? How can we link each advantage and rate the,. How many of the tools overlap in use? And what becomes overshadowed by a newer, better tool?
Most of all, how can we find out?
We needed a common ground. Something inside Hamlet that every person can indentify. Which is of course madness, something every hard-working university student has met with at least once, but besides all that it is a theme within Hamlet that everyone will decipher differently. Is he sane and acting? Is he crazy from the start? Is he driven mad by his own efforts? Hamlet will always be a mystery so long as space-time continues.

Where we are now:
Since we had a goal in mind, we were able to find the means. Within different programs frequency searches, Naive Bayes, concordance searches, “described as” searches have all proved useful. We are able to track down suggestive words through Naive Bayes, and then put them into other searches to divine meanings. The other cool thing that we have been finding is the ability to compare Hamlet to other Shakespeare tragedies. “Madness” appears in Hamlet 22 times! The next most frequent is probably Romeo and Juliet at 11 times. That is a huge jump. So we know that Hamlet is focused on madness, now we just need to find subtle hints, recurring themes and general meanings that can help to indicate the true madness of Hamlet, or the play he puts on for everyone.

The uniqueness of our Act has been comig out slowly as well. We know (not necessarily because of the digital humanities) that our Act contains much of the most important action in the play. The “To Be or Not To Be,” speach appears, as well as “Get thee to a nunnery,” the play performed for Claudius, the confrontation of Gertrude, the murder of Polonius, etc! There is simply a ton of stuff to research and a lot to discover.
Most importantly for next time we must study:
The use of “poison in the ear” as a metaphor.
Any reference to the mind such as:

Every instance that describes a character as “mad.”
And really anything else we can think of.
So that is about it for past 1 of Phase 2. We have a strong Act 3 team, with only a few hiccups,and some illness 🙁 and hopefully there will be more success to report on the next post. Right now there are just to many questions! It’s pretty amazing what we can do though. What has taken minutes on MONK or voyeur, etc, would have taken months in the traditional way. Could you imagine going through every Shakespeare tragedy and noting the use of the words: “mad” or “madness?” It sounds crazy, and yet that is what the creators of these programs have done for us. We are grateful 🙂

Naive and Decisive actually sums up a lot of MONK!

Phase 2, and a new light… hopefully.

Being the expert on MONK is a tough job. Luckily the bond that comes from quizzically hitting buttons and keys for 9 hours is not an easy one to break. My project screen looks well used and familiar-

The results go on and on. Do we know what all of them mean? Not really 🙂 but we like them.

Meeting the new group in person really revealed how much the other groups liked or disagreed with their tools as well, and the hope is that what one tool lacks, the others will fill. So far we have had an easy time agreeing on regulations and sharing stories, so things are looking good for acing this presentation in a different way than the first, (though my phase one group was completely amazing, and I will miss them).

As for MONK – let’s just say not much has changed, except – the Act! Act 3 is my personal favourite act. Insanity, insults, murder, confrontation, blood, more ghosts, and much more! Really though, it just always seems like the most action packed of all the Acts!
Monk is doing its best to help me support this idea. The word “madness” shows up nine times alone in this Act! Although I did discover a slight annoyance again. I could not get the program to look through a whole act, only through the scenes. So far this is only in “Edit Worksets,” so it could just be a glitch.

Other words that show up quite a bit? Time which shows up 10 times, and “Heaven” shows up 10 times! “Action” – 6. “Go” – 17. Death and murdrer show up quite a bit too, but or course the words pertaining to the future, and action-y words show up more often, which at the very least could tell us that this Act appears in the middle of the play.

I had a very cool discovery too! In the classification tool with NaiveBayes and Decision tree (which you either understand or you do not, there is not much in between) I was able to load my Act 3 workset, which features each scene of act 3 as a different document meaning I can compare them! This is perfect for this Phase!

I rated each act as either comedy or tragedy:

As you can see, scene 1 and 2 have slightly comedic tendencies, and scene 3, being of course about sending a man to heaven or hell, is not a comedy at all… and scene 4 is an absolutely confirmed tragedy, go figure. Anyway, I think this is brilliant! Let us continue…

Now all I have changed is scene 3 from comedy to tragedy:

This is amazing because it seems like Naive Bayes uses the document as points of comparison. Scene 1 is supposed to be less of a comedy than before if scene 3 is a strong tragedy. That makes sense! In conjunction with plotting the murder or the “King,” the word “King” in the first scene seems to be associated with much deeper, darker meanings… Intriguing…

I could honestly go on about this forever, but I doubt every one of my findings would be as interesting for everyone. In summary this just means that I have a way to directly look at all of the scenes together, and that is worth a lot! Anyway, our self assigned homework for the weekend was to read all of each other’s blog posts, and see what we deduce from them, what would work with each other’s programs, etc. The hope is that through self-education we will have a breakthrough in compatibility capabilities… if that makes any sense. I am looking forward to exploring more of my new discovery, and am really going to think about how it can help my group members; that is my self assigned homework for the week. At the very least I can show off my new discovery next time and hope that they think it is as cool as I do.

Until next time, Kelsey ^.^

Monk: A Greater Understanding and a Bigger Hurdle

Since the last post, the Monk group has met twice. We have made significant advances with the tools of the program, but have also made a crucial and unfortunate discovery to humble our success.

Firstly; however, our discovery. In the “compare” toolset there is an analysis method that we has not managed to figure out before. It is called “IDF” and it allows you to select a training set. Once you manage to fulfill all of the options to the program’s liking, you are advanced to a screen much like any other one where you can select a work, view it and type in the concordance you desire. Most of the toolsets get to this page and end there. However, for this tool, you are allowed to take the workset you nominated as a “training set” (we recommend selecting the all-encompassing “plays: tragedies” and “plays: comedies” or something for the most options) and from there to re-select a mix of both full plays and even individual scenes and save it as it’s own workset. (Minimum 3 selections).

As usual you hit a dead end on the concordance page, but uniquely, your saved workset becomes useful. Take your new workset with its many parts and load it into the “Classification” Toolset.

From here you must give each document a rating and follow the continue button…

This is the part of the program that Monk specializes in. Naive Bayes and Decision trees. The explanation of which will be one of the major parts of our presentation. After selecting your method you can insert a prediction if desired and…. Voila! You get a complicated rating system of “confidence” and “frequency.”

Very cool – now for the sad part. This tool, from what we understand, is basically used for the identification and classification of author’s works. It particularly focuses on entire play and their characteristics. Poor little Act 3, Scene 4 does not much register in the scale, and the part that does we of course already know its origin and the characteristics of it as a Shakespeare play. So how can we use Monk’s most defining tool as an aide in discovering Act 3, scene 4? That is our current mission. As well as explaining to you all this lovely piece of analysis:

Also since our last posts we have done more research into the purpose and uses of Monk…
We found out that Monk is one of the first of the Digital Humanities programs, almost a prototype for Wordhoard. Through different group member’s findings we have determined that the Classification, Frequency of words and the Concordance searches are specifically meant for analyzing large scale works such as entire plays or collections to find themes throughout historical moments, between writers or characteristics of the writers themselves. As it is, we are not sure how useful it is as a tool to analyze one scene in one play. Our greater understanding of the tool itself has further clarified this. Monk is great at finding certain things within a text, any text, of any size. Although, when it comes to comparing them, it is harder for a small document such as a scene to provide enough information to represent itself against other documents.

For the remaining days, our work will be centered on figuring out how Monk can directly provide insight into Act 3, Scene 4 specifically, and to see if it if possible to use the tool in any depth without comparing the scene to the entire works of Shakespeare’s tragedies –
Because as interesting as our tool can get, our focus must be on the one scene, and we are trying to be optimistic about getting it to work for us!

So, till next time, I leave you with this excerpt from the Monk help buttons.

Monk Workbench: Either the most simple or the most complicated tool in the Digital Humanities.

Kelsey Judd, First post.

Today was our first group confrontation of the program MONK.
It began well, with each member contributing what they had learned over the last week, and with all of us piecing together our separate knowledge to unravel the mysteries of the work tools. Within an hour we had discovered all the ins and outs of the program’s most useful components which I will try to explain: “Define Worksets” for finding concordances in lemmas or spelling, and “Compare” for finding frequency and Dunning’s analysis. Unfortunately soon after this we hit something like an impassible brick wall. Either due to out lack of experience or to something we cannot quite figure out in the program there does not seem to be all that much more to it beyond “Define” and “Compare,”…

The define feature is fairly straight forward once you realize one main point: it does not seem to keep an actual record of your “worksets.”

You can choose a tool on its own, or add a workset to work with.

We found that when you choose the “define worksets” tool it does not affect a tool if you choose a workset to go with it. Either way you come up with this page

It goes here whether you have a workset selected or not.

From here there are only two options. You can create a workset, which is basically searching Shakespeare’s works, or various works of American fiction and then saving your search and naming it. The second option is to search for lemmas, spelling or parts of speech; however, this does not seem to do anything. Whenever we try it, it will still ask you for which work you are searching in, even if you defined Hamlet or act 3.4 as your workset on the main work page or within the tool previously.

From this page, when you have selected Act 3, scene 4 comes a very simple little tool where you can search concordance. All you need is for the text to appear in the “advanced viewer” and to of course search on the concordance tab below it. Simple and straightforward. The only problem with this was that while it tells you all of the words or lemmas in which the word appears, and tells you how often they appear, it does not provide the speaker or location of the line, so it is mostly up to context. Now, I am sure there must be more to use in the define/edit worksets tools, but for some reason the five of us could not find it. Sounds like we still have a lot of exploring to do.

The other very useful tool is the “compare worksets” tool. It allows you to pull up specific texts, for example Hamlet as a whole, compared to just Act 3, scene 4.
It allows you to see the frequency or do a Dunning’s analysis of a word or a lemma, with the two variables being Shakespeare’s other works or works within a text. We found this works much better when used on a larger scale, such as comparing Hamlet to another play, or the whole of Shakespeare’s works.

Beware: the words on the far right run together sometimes, so you end up getting excited on finding the new word "actairbed."

As you can see the strange feature of this is that the words sometimes run together, so you think you have found a cool word: “actairbed,” when it’s really just the three close together. Amateur mistake of course. Clicking on the words will take you back to the spelling search and you will once again see the context and frequency with which they are used. The frequencies are quite a neat discovery, I think one of our next projects will be on how to use this tool to discover new and exciting themes in Hamlet act 3, scene 4.

End of the line?

Overall the experience with MONK has been a lot of trial and error, but rewarding when we do manage to find something new. The biggest problem we are having is the feeling that we are missing something crucial; we just seem to be going in circles. After upwards of three hours it may not seem like a lot, but has been quite a journey despite the time. Of course we will be pretty excited when we can successfully report back about new findings, most of all when we figure out how to save results… but for now figuring out the concordance and frequency tools has been rewarding.