Note: these are my live notes on 1st Inaugural Texas Digital Humanities Conference which is being held at the University of Houston from April 10th to the 12th, 2014. They are being written live so there will be all sorts of problems. Please write me if there are inaccuracies or other unfortunate flaws.

Erez Lieberman Aiden: How to Read Five Million Books: A Big Data Adventure

Erez Lieberman Aiden, who was one of the people behind the Google N-Gram Viewer, gave the opening keynote. He started with ways of looking at the historical record. One way is close reading, but that isn't a practical way to read widely. Humanists therefore read a selection of books and secondary literature.

He then talked about how the 120 million or so books are being checked out (of libraries) and scanned by Google. They have scanned some 30 million books so far. Aiden and his collaborator Michel got access to the Google books corpus and developed the nGram viewer.

The beauty of the nGram viewer is its simplicity and power.

He gave an interesting example of using the N-Gram Viewer to look at how years are discussed. 1950 like other years is talked about most on 1950 (the year of) and for a couple years more and then usage drops rapidly. What is changing is that degradation is more rapid for more recent years. Why?

He talked about people and the curves of people. Apparently people today get famous faster and drop off faster too. He compared types of people (political figures vs authors vs actors and so on.) Political figures are the most famous of the groups. He showed how people like Chagall's signal is different in English and German.

He showed one of the anomalies of the Google scanning fashion. If you map for "best" and "beft" you find the two switch around 1880s when the large "s" that OCRs as "f" was discontinued.

Then he showed the Bookworm which does more than the N-Gram viewer. You can define corpora to work on. It also has a tool for creating tools including a tool for creating a bookworm. You upload a corpus and they will create a specialized N-Gram viewer like tool.

Erez Lieberman Aiden then moved to the question of whether you could predict movement of words from trends. He showed a XKCD cartoon showing that by 2109 all sentences will be only the word "sustainable". Obviously words that trend up must at some point plateau or drop (or we will be using only that word.)

He concluded with some summary points:

Historical record is going digital.
Visualizing the record is going to change how we think about the record.

Responding to a question about statistics he made an interesting point about how you don't really need statistics to get a lot of insight from the N-Gram viewer. He uses statistics to validate results, but not to generate them.

Friday: April 11th

I presented a keynote the morning of the Friday, on the subject of "Hermeneutica: In Praise of Small Interpretations.

Goodwin and Laudun: Computing Folklore Studies

Laudun and Goodwin presented research around using topic modelling in folklore studies. Preliminary results are available at DOI: 10.1353/jaf.2013.0063

They have developed a really neat way of visualizing topics as sparklines and classifying them by whether they rise or decline. He found that the ones rising late had to do with performance. Then they tried citation analysis using using data from JSTOR and web of science.

Andrew Higgins: Two Network Driven Views of Philosophy

Higgins is trying to develop large scale philosophical maps. He is using Philpapers to build a network of articles and topics to then create a network graph. The resulting graphs was a big hairball, but fascinating. He then showed other visualizations using modularity measures.

I wonder how PhilPapers might distort things? To some extent he was analyzing their taxonomy.

Anne Chao: The Use of Network Analysis Software in Tracing Chen Duxiu�s Political Radicalization

Chao is using network analysis to understand Chen Duxiu who was important to the founding of the Chinese communist party. She had these lovely social network diagrams that followed Chen Duxiu's connections through various organizations/magazines as he moved to Japan and back. She used her network diagrams to tell a story of Chen Duxiu's radicalization.

She used Gephi to generate graphs and then used Illustrator to edit the graphs to create a historical circle. We had some interesting discussion about her deliberate massaging of the data.

Posters and Minute Madness

We had a poster session preceded by a one minute madness where each poster presenter had one minute to pitch their poster. This was great as one got an overview of the projects. Here is the list with a comment or two.

Chinese Commercial Advertisement Archive: Neat database of ads with tags about the images in ads that look like they could be searched.
Online legacy preservation for humanities researchers: humanists in New Zealand put lots of work into building personal libraries of what they studied that are now no longer valued as everything has been digitized and thus available.
Liberate the Text! in 18th Connect and TypeWright - A partnership to clean up dirty OCR.
Advanced research consortium: building bridges between academic networks - ARC is built on networks like NINES and 18thConnect
Digital Acting Parts - an interactive tool for learning about acting Shakespeare.
Libraries as Digital Humanities Partners on Campus - They are thinking through how they can support humanists
The Subjunctive in Others Contexts - Mathematical analysis of the subjunctive.
Digital Archives Enhance Undergraduate Studies in World Literature -
Six Degrees of Cyrus: Applying Network Analysis to Herodotus - Neat visualizations
Rhyme Networks - Showed visual illustrations of rhyming patterns from past and talked about
Need to Reconceptualize the Ontology of Digital Humanities Praxis - Three Axes that conceptualize DH
Corpus lingusitcs of the vernacular
Aristotle Versus Ramus: Who won? Using topic modelling to figure out

Kathryn Holland: Reading Literary Networks through Digital Networks: OVis and Cross-Generational Links in Victorian and Modernist Literature

Holland is looking a family and their network across generations. She is using the Orlando project. The multi-generational family is a pivotal issue in the novel. Modernists thought they were breaking with families, but one can trace Victorian ideas through multi-generational families like the Strachey family of feminists. She showed an interesting network graphs like "A Tangled Mesh of Modernists" from a collection published in 1990. She then showed OVis (Orlando Vision) and talked about how it changed her perspective on the family.

Elisa Beshero-Bondar: Anti-Social Networking with Robert Southey: Place-Time in Poetry and Paratext

Beshero-Bondar began by talking about how poems can be thought of as networks. She talked about how difficult it can be to recover the rich network of allusions of a poem as they would be understood at the time. Poems are themselves machines that juxtapose . She is building an anti-social network edition. The 1801 print edition had the author's notes on the page (later they were moved to endnotes.)

Her graph had neat little whirligigs - or clusters like a cluster involving vampires. She finds interesting how metaplaces (imaginative places like Hell) hold things together. Cytoscape, which she is using, produces very different graphs from Gephi - I wonder if is her work or her visual choices.

Kathryn Beebe: Medieval Networks, Digital Humanities, and Observant Reform

Beebe talked about dealing with "tiny" data in Medieval Studies. She is doing social network analysis of medieval texts. She described a text of a journey to Jerusalem that was for cloistered nuns - giving them a chance to virtually travel on the pilgrimage. When she created a network graph she found the text traveled linking centers of observant reform. The nuns travelled virtually.

Tanya Clement: The Shunt Yard: Developing Infrastructures for Meaning Making in Information and Sound Studies

Clement started with some history and how she collaborated with someone who analyzed sound when on the MONK project. She realized how rarely we work with sound in the humanities despite the fact that there is a lot of recorded sound. Her question is can we build the sort of infrastructure for sound that we have for text.

HiPSTAS is the project she is working on now to look at analyze spoken word audio. They are using a tool called ARLO that uses spectograms that show a map of energy. Then machine learning is used to train the system to recognize things. She showed a fascinating list of questions they are trying to train the system to help them with.

Clement made an interesting comment about how we, the users/trainers, have to learn literacies in order to use the systems. They have had to learn to read the spectograms. Now they can see features and if one can see them then the machine can probably be trained to find it.

I got the feeling that what the machine can do is not what her participants initially wanted it to do. The use of signal processing introduces new ideas for how one can query the materials. What is neat about the project is that she interviewed every participant before letting them play with ARLO and then she follows how their interaction changes their questions. She is studying how spoken word researchers are exploring this new way of studying their data. It is as if she is watching them discover distant listening.

Clement is looking at early information studies theories of meaning (think Shannon and Weaver). She introduced a theorist new to me, Donald MacKay, who talked about meaning making or "information as construction." See Information, Mechanism, and Meaning'. Meaning making is like a shunt yard (railroad) where meaning comes partly from what is not selected. While I didn't follow the thread, it struck me as interesting that she is finding she has to go back in time for theories that can help her.

She was asked a great question about whether these techniques are answering traditional questions or generating new questions. The questioner was wondering if there was something in between. She had a number of answers:

She pointed out that one community that cares about sound archives are the archivists who want their archives to be used. They have questions different from the end researchers that can be served by a new model of infrastructure.
She pointed out that some traditional questions you can get traction on, but it takes a while to learn the literacies and to formalize the questions.
The seeing of sound provokes conversations that are neither traditional or new in the sense that the whole seeing makes sound strange.

I must admit that I don't like the binary opposition of computers answering traditional questions vs forcing a shift in questions. The opposition assumes that there is a fixed set of "traditional" questions that don't shift. Lines of questioning evolve for all sorts of reasons and constraints of the scholarly infrastructure have always been part of what shapes questions. The question bears assumptions about what we should be doing (not letting cold computers distract us) which is why a bunch of us started talking about the value of playing with computers. Then there is the issue of whether we really operate by questions in research or if that is just a fantasy of rigour that we pitch to our students. Perhaps we are like artists struggling with stubborn materials. Perhaps we welcome

That said, I liked the questioner asking for a middle way. One middle way I can think of is that the formalization (modelling) of formal questions lets us understand what we thought our questions are better.

Saturday, April 12, 2014

Elijah Meeks: Neotopology: Principled Interloping for a Big Tent Network Science

Elijah Meeks started by nicely introducing the challenges of interdisciplinary work like what he does. He isn't a computer scientist, he isn't a network scientist, and he isn't a designer - and he needs to be all of these. The only thing he thinks he is an expert on is "imposter syndrome." (The quality of his work belies his modesty.) He then reflected on interloping. Many of us are interlopers - especially those of us in the digital humanities who are wandering into other fields (trading zones) to gather useful tools/ideas.

He sees network visualizations as a way of publishing or showing information. A network graph is an elegant way to show certain things.

He showed Orbis which is one of the coolest projects ever and which has a new interface that allows one to do what Meeks calls Menard graphs.

He titled the talk "neotopology" because of the spatial turn to mapping culture. Neotopology is not GIS, it comes from neogeography. Neogeography is "new geography" where people are using cartographic techniques in all sorts of alternative ways like the LOTR Project which is mapping Middle Earth.

Humanists are not the only interlopers in the field of geography. Journalists, designers, and information scientists are all using geographic/cartographic ideas to visualize things.

He mentioned Christian Swinehart and his visualizations of choose-your-own-adventures.

He talk about the spatial turn and now we have the network turn. "Past Time, Past History" was published in 2002. We don't have a comparable book for the network turn. What we have, however, is the statue of Gephi which welcomes people. The network turn is taking place after the spatial turn so it can draw on the thinking of how space can be used even if it is not geographic space.

Another A network is fundamentally simple - it is an annotation of a connection. It has objects and connections. A network can be a supplemental view into your system that shows structure or connections.

He discussed some principles:

Classes Connect Classes
Interactivity - the interactivity is often part
Structure of Often Evidenciary - in the humanities we often have uncertain evidence
Embrace Information Visualization -
Don't Just Use Force-Directed Algorithms - network visualizaitons don't nee
Understand Basic Network Statistics
Invent Your Won Centrality Measures
Understan Topology
Travel Between Topology, Biography, Geography, and Demography

Many of his examples were from Kindred Britain - yet another fabulous project.

He concluded by reflecting on the field of network science. We are, when we use networks, interlopers that are developing a pidgin for a trading zone, but as network science tries to show, a field thought of as a network is about connections. What can network science say about network science. Could it scientifically prove that network science is not a science.

Needless to say, I asked how network science is a science. In many ways I prefer his view that the field is a trading zone. During the break a bunch of us, including Maximilian Schich, had an energetic conversation of the politics of calling a field science or not. Schich has been at the intersections of art, art history, and science and gave as example Arts, Humanities and Complex Networks. He had good reasons for calling what he does science, though not he appreciates that there is a continuum.

Yannick Rochat: Character networks in Les Confessions from Jean-Jacques Rousseau

Yannick talked about the challenges of Rousseau's autobiography. Then he talked about ways of quantifying plot and character. He used the index to build his network, which I thought was a neat hack. Using different measurements of centrality he has been teasing out the people that Rousseau thought were plotting against him (like Diderot.) He showed a fascinating animation of the mini-networks of people evolving over the course of the book. You see the shift in the second volume when these smaller networks explode.

Ayse Gursoy: Online networks of Games Criticism

Gursoy started with the issue of whether games are art. There is a conception of the critic as someone who sits at the back of the room and says if something is art or not. The critic historically is a identifiable persona who writes in a newspaper or magazine. Critics often have multiple roles and can become advocates for a medium. Criticism is not review. A review is about whether you should consume something. Criticism is more about ways of engaging with something. Dear Esther was a game that was a catalyst for games criticism because many people ended up discussing what it was. People argued about whether it was a game. She talked about the issue of intertextuality in games criticism. People have a need for these conversations. Places like Critical Distance is one site trying to curate a conversation.

She has been studying how Critical Distance is a community.

Neal Audenaert: Towards Large Scale Analysis of Visual Features

Audenaert talked about a project that is looking at how they can study the non linguistic features to Victorian texts. Book have three types of information:

Bibliographic information - about the paper, binding, and typeface
Visual and graphic features - arrangement of text, images, and white space
Linguistic features - the syntactic and semantic features

They use Tessaract and some custom analyzers to find blocks on the page. They are looking at how much text on the page, spacing, number of lines and so on. Then they can look at how features change over time. Where they want to go is to develop some image analysis tools.

He showed a comparison between two works of a histogram of line widths. In one the width of the lines of the poetry were consistent, in another they changed at the end. Then there are patterns of line indentation which seem standardized in some types of poetry.

They are trying to figure out how to structure and study the visual and graphic features. They are now collaborating with the Hathi Trust. They are trying to figure out which image analysis components will be useful at scale. They also want to develop a system that is not closed, but which others could use in different ways.

I must add that his slides were lovely.

Hackathon

Elijah Meeks ran a hackathon on network analysis. We started with the question of alternatives to force-directed graphs. Here are some of the alternatives he showed:

Arc diagrams - http://bl.ocks.org/emeeks/9458332
Adjacency matrix - http://bl.ocks.org/emeeks/9441864 - bost.ocks.org/mike/miserable
Scatter plots
Scripted layout - where you provide rules or a script for laying out nodes
Plotted layout - plot the points on some dimension

Meeks has a neat teaching tool at that lets you see issues.

A common problem with network visualizations is that viewers assume that nodes near each other visually are near to each other topologically.

Then he shifted to talking about D3 (Data Driven Documents) and he showed us Blocksporer as a way of exploring D3. We talked about how to teach this.

He has a book on D3 coming out from Manning. He has a bunch of examples here.

Then we tried Gephi using data prepared from DH 2011 for a workshop at Stanford. There are tutorials here.