philosophi.ca : Voyant Experiments

Sherlock Holmes Voyant� Cirrus View When playing with Sinclair and Rockwell�s Voyant, I decided to take a look at the entire Sherlock Holmes corpus in order to view trends that spanned across the collection. I tampered with the different formats of data visualization (the most interesting of which was the bubbles, which made a shrill, high pitched noise as it went through the entire text word by word) , but the most useful one in my perspective was the default Cirrus (word cloud) format. With Cirrus, you can see the giant word cloud of the most frequently used words, with the size corresponding to the number of occurrences in the Sherlock Holmes corpus. After filtering out the common and irrelevant words (by selecting the Taporware option in the �Stop Words List�), the user gets left with a more relevant view of the text. One of the first things I noticed was the relatively large word �man.� Subconsciously going back to our class discussion regarding Arthur Conan Doyle�s perspective of women (a la �Scandal in Bohemia�), I found it interesting that the word �man� was so much more prevalent than any word pertaining to a female. The largest I could find was �lady,� which at 176 instances was dwarfed by both �sir� (at 323 instances) and �man� (weighing in at a whopping 902 instances).

The Cirrus view also maintains the Words in Documents window, which allows the user to see exactly how many times the word occurred in each document. I found it interesting that, out of 36 stories, the word �lady� did not come up in 12. I feel like this reveals a little bit more about Arthur Conan Doyle�s perspective on women�although he may respect them (by making one as a protagonist who outsmarts Holmes in �Scandal in Bohemia�), he still grew up in a patriarchal society that was mainly concerned with reading about men instead of women. There was one thing I would have liked to see, though, which is having more than one word on the Word Trends graph. I�m sure Voyant has this capability, and I�m relatively sure I�ve gotten it to work before, but for some reason this time tampering with it did not yield the same results as the last time. Sherlock Holmes in Voyant After exploring many different tools used in the Digital Humanities such as VisualEyes and Voyant, I have come to realize that the use of these tools is mainly, as said in our assignment, �surfing and stumbling�, which is creative and challenging. Other than a few difficulties in using the program, such as not being exactly sure how to export the Holmes stories, I thought the exploration of this program was pretty interesting. While using Voyant and trying out the specific tools, certain questions and ideas also came to mind. I can see that a lot of people using this program would be frustrated by trying to identify a need for such online tools, but sometimes screwing around with text through a program can just be for generalizing text or data, or any specific need.

When I explored Voyant with A Study In Scarlet, I really enjoyed trying the different tools and seeing what they would do when I selected them. The tools screen-shot above ^^^ are just a few examples of the flexibility of the program. Voyant gave the user so many different options such as showing where certain words were and graphing them, to having a tool with bubbles that brought up different words as the tool ran through the entire story. Voyant Tools, with a motto of �Reveal Your Text�, does just that! Voyant reveals text depending on how you ask it, and what you ask it, to reveal. After exploring with the program, I did have a few questions though, that I thought could allow the user to get more information from the text. The main question and possibility I thought would be beneficial was, �Is there a way to extract only dialogue or quotes from the text?� With that in mind, I thought the ability to collect only the dialogue would be an extremely useful tool. So, after messing around, or should I say surfing and stumbling my way through this online tool, I found many things extremely creative, useful, and amenable. Assignment 4 Voyant is another interesting text analysis tool that involves embedding literature and technology together. The tool seemed useful in breaking down longer novels into smaller bits. Alike tools we used in class in the past, Voyant transforms text into simpler visualizations such as graphs and clouds. The first visualization happens to be that of the word cloud, which highlights the most common words such as was, the, and and. Therefore the word cloud seemed irrelevant when analyzing a story. The second box below, the summary and words in the entire corpus followed the same roots with the word cloud, and focused mostly on frequency of words through each stories. The word trends tool where, being a dynamic query seemed quite useful as I could scroll through the story and and select any word, which would then bring out relative frequencies for that word.�Keywords in context seemed most helpful when trying to understand the story because it actually wrapped up where all the occurrence of �the selected word was and had the context of those words included.

In Sherlock Holmes� stories, death seemed to be a popular theme and topic, which led me to choose as one of the keywords for Voyant to analyze. This allowed me to view when the word �death� occurred in each of the stories. I could easily navigate and view the paragraph of where the word was located and what was most impressive was that it updated the corpus reader to the spot where the each occurrences of the words were. Another search I went through was the word mystery. This led me to contexts of where Sherlock was in midst of solving a mystery or already solved a mystery. Voyant is a useful tool to search through the story. Versus a actual hardcopy of a book, it is much easier to navigate and manipulate through. However, since the tool is primarily focused on search queries, the user needs to know what to search for. So it seems more helpful for those who have already read the story and knows what to search for when going back to it. Voyant Analysis Having only previously read one of Doyle�s Sherlock Holmes works, the novel The Hound of the Baskervilles, I felt compelled to read the first novel, A Study in Scarlet. I initially began reading within Voyant, but found the tool too clumsy and annoying for �conventional� reading. Unless you�ve scrolled through the entire novel to cache it, the text won�t load fluidly as you read. The tool waits until you scroll, and then pauses for an annoying moment to load a new section, then jerkily jumps to a seemingly arbitrary spot, forcing you to find where you left off reading. And any inadvertent click on a word momentarily freezes the tool as it highlights that word throughout the entire text � and then you have to go and enter nothing into the search bar to clear the highlight. Maybe I�m inept and impatient, but I switched to a Project Gutenberg version about 1/3 into the novel. Voyant is clearly not meant to be used as a sophisticated text reader � it�s only for statistical text analysis and the like. (It also doesn�t appear to preserve text formatting like italics, though that might have just been a problem with the source text.) My Voyant attempts at �surfing and stumbling� through A Study in Scarlet weren�t particularly more successful. I don�t think there�s much value in trying to figure out the contents of a single short novel that can be read a few hours by monitoring fluctuating word occurrences, though the graphs are fun to look at, I suppose. The best part about Voyar is the word frequency count that appears when you hover the cursor over words. It�s neat to instantly see how often different words are used � and what words are only used once. I noticed �brownish� appears three times, something you wouldn�t otherwise notice. I�m sure it�s relevant somehow.

The possibilities for comparing a larger body of works are much broader. Unfortunately I was disappointed to see that the Voyant link to the �entire Sherlock Holmes corpus� contained only 36 short stories � well under the 56 stories that are undisputedly canon, as well as missing the 4 novels (even A Study in Scarlet.) Sherlock Holmes Analysis I really never had the chance to read Sherlock Holmes but only once. Even reading the novel, I had some difficulty getting into the story despite that I am very familiar with Holmes and Watson. Even then I never noticed the words that were used majority of the time. I guess when reading, you focus more on the story of the book versus the comprehensive. Using the �Voyant Tool� website helped me really see which words you would find regularly in the text. For example: the words: the, of, to, that, and, a and etc.. were the common use. When we first started this exercise, I was clueless, but after doing in together in class, I felt more comfortable. But reading the text online and using this website allows anyone to type in a specific word and see if it is used less or more. I chose to type the words �Sherlock and Watson.� surprisedly, a good amount came since they are the text�s main protagonists. (Sorry, I had difficulty coping the graphs to show. I am not a computer wiz so this is all I could do). Type:�Sherlock

20) greek_interpreter 19 27.16

10) scandal_in_bohemia 11 12.92

2) twisted_lip 10 10.87

13) red-headed_league 10 10.98

23) five_orange_pips 10 13.67

32) boscombe_valley_mystery 10 10.40

33) blue_carbuncle 10 12.80

5) speckled_band 9 9.18

7) six_napoleans 9 10.83

15) norwood_builder 8 8.69

14) priory_school 7 6.12

16) noble_bachelor 7 8.64

31) case_of_identity 7 10.04

It turns out that Sherlock is used quite a lot when you type his name in. I found it remarkable of how many times his name is used. But when I typed Holmes in, I think Holmes was spotted a little more than Sherlock. Basically throughout the story, Watson and the other characters refer to him as �Holmes,� instead of �Sherlock.� (Bear with me,�It�s not�the graph but its the only thing I could copy over to show.) Watson) yellow_�2) twisted�3) three_s�4) stock-b�5) speckle�6) solitar�7) six_nap�8) silver_�9) second_�10) scanda�11) reside�12) reigat�13) red-he�14) priory�15) norwoo�16) noble_�17) naval_�18) musgra�19) missin�20) greek_�21) golden�22) gloria�23) five_o�24) final_�25) engine�26) empty_�27) dancin�28) crooke�29) copper�30) charle�31) case_o�32) boscom�33) blue_c�34) black_�35) beryl_�36) abbey_�0.050.0Relative Frequencies �What I found for Watson was most interesting as well. Both partners of solving mysteries and fighting crime, so why wouldn�t their names show up a lot. This was a great exercise and website to truly understand DH better. I am not going to lie. When it comes to watching movies, I could understand it a lot better than reading a book because it usually takes a while for me to get into the story because I�m very visual. So, when we watched the TV series Sherlock, it really did helped me, despite it wasn�t a movie based on the novel but a more updated version of Sherlock which I liked, understand the story and the characters a lot better. So exploring the Voyant tool website helped me see the words differently because I am a visual person.

Sherlock Name Analysis

It took me a while to settle in on what I wanted my searches to be focused on. For about 30 minutes I was just typing in arbitrary words like �clue�, �fight�, �gun�, etc to try and find the faster-paced parts of the story. I will make a note that the word trend function makes it easier to find a specific part of a book; like a deduction scene, chase scene, romantic scene, etc.. I finally tried trying in the names of each of the character to see how often they came up in conversation. Interestingly enough, the word �Holmes� showed up more than the word �Sherlock�. Also, the marked incline of �Holmes� after the 4th section might be symbolic of Sherlock Holmes getting closer to solving the mystery. On the bottom is Watson�s name�he doesn�t come up nearly as often as the main character (naturally), and is absent during a majority of the 2nd half of the section. I then switch over to the whole corpus view and did the same name search but limited it to only �Sherlock�, �Watson�, and a character from Study in Scarlet named �Gregson�. This way, I would be able to see which of the stories mentioned any particular character a lot and which mentioned them only a few times. As you can see from the �Sherlock� and �Watson� graphs below, each character is motioned a varying number of times; there are clear upper and lower boundaries. I included the last �Gregson� �example as a means of displaying how this application can help you find even the most specific characters in a story. I ran across �Gregson� in the first story I read and then wanted to see if he showed up in any other stories in the corpus. The last pane shows you that Gregson only shows up in Study in Scarlet.

text analysis Voyant is a tool that, in my personal opinion, very useful for people who are not used to read long novels. (I make this conclusion based on my case) By using the text analysis tool I can know immediately who are in the story, and I can also get a glimpse, by looking at the sentences and where they appear in the story, of what the personalities of these characters without flipping pages back and forth for a long time. It is easier for me to understand the story if I know the characters first (I always appreciate those publishers that has brief intro of characters in the beginning of the book) � By examining the word trends, I also found out that some the frequency of different characters are overlapped, which also helps me to know the relationship of characters. While I click on the actual book section, Voyant will show me the conversation between each character so that I can understand how these characters relate to each other and know them by the conversation. One thing I really like this tool is that on the word trend section, you can hide the search by clicking it. I think it is very informative because you can compare the curve of different words. When it comes to read in English, it always takes me so long because I am such a slow reader. Usually I will have to take some times understanding , look up the vocabularies and sentences next, then read all over again so that I can understand the meaning that underneath the surface. I think text analysis tool can save slow readers a lot of time to get into the story faster since it gives them a small review of the book to make their life easier. It does not really matter what kind of words that are being searched, and that is also one of the thing that I like about text analysis tool. Users can type in anything that they are interested in to find out the hints that they want to know about the book. I think this is exactly the purpose of reading: people view and understand the stories differently, and different points of views yield various opinions which can be fully discussed. �The only question that I have after using this tool is, what if a person already know and read the whole book, is this tool going to help him/her to find out something else instead of just screwing around? Is the text analysis tool not helpful enough for someone who already knows the story and probably will not have any profound purpose to them? Elementary, my dear Watson! I started this exercise by opening the entire corpus of Holmes stories in the voyant viewer and setting the word cloud to take out all of the extraneous words. This left a cloud of the most used, but also more significant words in the stories. Holmes immediately popped up in the center as the largest word, framed by all the other words.

It took just a bit to locate Watson in the crowd, a good deal smaller and underneath Holmes. This got me thinking about Watson and how he is represented in the stories, so I clicked on his name and started to browse through instances of his name throughout the stories. The first occurrence of his name was in �Yellow Face�, the first Holmes story. It is said by Sherlock, talking to Watson, who is the narrator. I continued down through the story until my eyes flicked past Holmes uttering the sentence �My dear Mr. Grant Munro��. This immediately sparked a thought. �Elementary, my dear Watson!� I had been on the track of Watson when I flicked past Homes saying �my dear� to someone else, which reminded of the the aforementioned catchphrase. Having never read one of Arthur Conan-Doyle�s actual stories, that phrase was one of the first things I think of in association with Sherlock Holmes. I have heard it in many different movies, and from other people quoting it in certain situations. That has to be in the stories for sure! So I immediately searched for �elementary� in the corpus and was pretty surprised at what I found: The word elementary only occurs twice throughout all the books and neither of them are coupled with the famous catchphrase. So I then searched for �my dear Watson�, and was a little bit reassured by what I saw:

This part of the phrase is much more prevalent, appearing in almost two-thirds of the stories, and multiple times in a many of them. So, this catchphrase that I had previously connected as synonymous with Sherlock Holmes himself never really existed in canonical Holmes stories. I thought this to be an interesting find that shows an interesting relationship between various medias. Having never read a Holmes story before, I based my knowledge on the character from movies and tv series that starred or featured the character, or even movies that had other characters quoting the famous line. So, in my mind, this phrase was a big part of the character and was one of my biggest associations with him, even though this was actually bred by multi-media incarnations of Holmes that were not written by the actual creator of the character. I came across this concept-shattering find by stumbling through the stories and linking words and phrases that I might not have been able to see without this method. While not a particularly literary find, it was very interesting to look through these stories and watch as my view of the character shifted. Sherlock Analysis, Priya K Voyant, like other tools we have used in this class, takes a little time to become familiar with and little more time to began to actually know what you�re doing. Cirrus is an interesting text analysis tool that aggregates a lot of text into a myriad of statistics about that text. The most interesting aspect I found while exploring Voyant and Cirrus was the Word Trends and Words in Documents sections. After removing the most common words, it was easier to look through the Words in Documents list and trace which words Sir Arthur Conan Doyle used the most in some of his stories. While many of these words are still very common, the tool is useful in identifying an aspect of Sir Arthur Conan Doyle�s writing style. I feel like this tool would be especially helpful for texts whose authorship is still in question. The tool makes it significantly easier to identify phrase and word trends by a particular author. While exploring the Sherlock texts, I used both Word Trends� and Words in Documents to explore how Doyle used certain words in his stories. I tried to focus on words that stood out in the context of the genre. Words like �suddenly�, �rushed�, and �strange� are common, but, in relation to the mystery/thriller/detective genre, they provide some insight on how Doyle moved the plot along and how he builds anticipation in his stories.

For these words, tracing their trend graphs also illustrate how their frequency in the text increases sharply mid story. The trend graphs run almost parallel to the dramatic structure of the Sherlock stories (Exposition, Rising action, Climax, Falling action, and D�nouement). Another way Words in Documents� and Word Trends can be used is in tracing the frequency of certain characters throughout the story. Many authors of mysteries like introducing their criminal early in the story and then refraining from discussing that character till the mystery is solved at the end and the reader has nearly forgotten about that character. Keeping that disappearing trend in mind, these same tools can be used to focus on only names and characters and may reveal some secrets of the mystery, regardless of whether or not the reader has finished the story. While exploring Voyant, one tool that I could not find a potential use for was the vocabulary density underneath the summary. I�m interested in seeing how others used Voyant and if anyone else found a use for the vocabulary density tool. Assignment 4 Voyant is a text analysis tool that used to analyze the word composition of a text. It uses a variety of information visualizations to show quantitative and qualitative data about each unique words within the text. I used it to run analysis on five different Sherlock Holmes stories which are all eighteen pages long. They are The Adventure of the Nobel Bachelor, The Adventure of the Beryl Coronet, The Man with the Twisted Lip, The Boscombe Valley Mystery, and The Red-Headed League. The initial result of the analysis show that all five stories have around two thousand unique words. In addition, the word �the�, �and�, �to�, �I�, �a� , and �of� are the most frequent words used in each individual stories. The frequency count of those words within each story are approximately the same. All of these words are within the top six mostly used English words with the exception of the word �I�(http://www.duboislc.org/EducationWatch/First100Words.html ). This may be a result of the author�s inspiration for writing the Sherlock Holmes stories. The author once said that the character of Sherlock Holmes was inspired by Dr. Joseph Bell who he had worked for as a clerk at the Edinburgh Royal Infirmary. In the story, the author tells the story from the perspective of Holme�s assistant Dr. Watson. Therefore the frequent usage of the word �I� is the author�s personal projection of his real life experience observing how Dr. Joseph Bell work as his assistant.

20) greek_interpreter 19 27.16

10) scandal_in_bohemia 11 12.92

2) twisted_lip 10 10.87

13) red-headed_league 10 10.98

23) five_orange_pips 10 13.67

32) boscombe_valley_mystery 10 10.40

33) blue_carbuncle 10 12.80

5) speckled_band 9 9.18

7) six_napoleans 9 10.83

15) norwood_builder 8 8.69

14) priory_school 7 6.12

16) noble_bachelor 7 8.64

31) case_of_identity 7 10.04

Sherlock Name Analysis

The word elementary only occurs twice throughout all the books and neither of them are coupled with the famous catchphrase. So I then searched for �my dear Watson�, and was a little bit reassured by what I saw:

The role of senses in �A Study in Scarlet�

I decided to examine the role that senses play in Sherlock Holmes. Due to it�s nature as a mystery novel, the senses are invaluable as a tool to Sherlock as he solves the cases. There are two interesting things I noticed in my search: The gradual reveal of information, and the realistic ratios of sensory contribution that the story portrays. Revelation of Information Like and mystery novel, the gradual revelation of information in the forms of clues keeps the reader engaged. The feeling that they, along with Sherlock, are solving the mystery is a unique and appealing sensation. A mystery starts with the character, and reader by extension, knowing nothing about the case and gradually adds to their knowledge so that by the end of the story enough has been seen that the main character, Sherlock in our case, can draw the mystery to a conclusion.�I found that graphing �looking� and �saw� illustrated this process nicely:

The Story begins with a lot of looking for clues, and as the story progresses, the looking begins to pay off in the form of seeing. By the end, Sherlock is no longer looking because he has seen everything he needed. � Realistic Portrayal of Senses I found a document that outlines how humans learn and details the importance of each sense. They ranked them based on the percentage that it contributes to learning, the percentages were as follows: Seeing: �83 percent Hearing: �11 percent Touching: �3 � percent Smelling: 1 � percent Tasting: 1 percent I graphed each of the senses to see how they related to each other throughout the story and found something surprising: The senses are ranked in exactly the same order as the their respective real-world importance and with similar percentages! I did not expect this. I doubt it was intentional on Doyle�s part. It makes me wonder if humans naturally express their senses based on how much they learn from them naturally! Sebastian Marulanda, HW 4

I initially went about the assignment on a purely visual whim; the cirrus (word cloud) caught my attention. Bright words artfully arranged seemed like a good way to present textual information. However, I soon realized that the word cloud was emphasizing the most common words in the text (�the�, �and�, �of�, �was�, etc.) and wasn�t going to yield any interesting results. Many of the other widgets offered similar information, displaying only the most popular words. There were options lists available to narrow the scope down, but nothing to significantly reduce the common words that didn�t interest me. I then decided to simply navigate through the text by double-clicking words I found interesting in the Corpus Reader. This allowed me to view all the instances of said word, follow those instances and choose new words close to the old ones. It was like following a bread crumb trail until I found an interesting passage. I finally ended on the section describing an old 17th century book �De Jure inter Gentes�. Holmes seems a bit intrigued by the writer �Ex libris Guiliomi Whyte� which he translates to William Whyte. He concludes that due to his pragmatic writing he must have been some type of lawyer. What interested me most though was one sentence in that passage: �Here comes our man, I think.� I may be stretching a bit, but this is very reminiscent of a line from Shakespeare�s �Romeo and Juliet� when Tybalt is speaking of Romeo in Act III Scene I: �Here comes my man�. Although this is most likely a coincidence, maybe even an homage (stretching a bit), I would not have found this interesting tidbit had I not taken the �surfing and stumbling� approach. I would have never considered searching for Shakespeare quotes in Sherlock Holmes; it was a happy coincidence.