Conference Report on DH2009

These are my notes on the papers I attended at the Digital Humanities 2009 conference held at the University of Maryland.

Note: this conference report was written on the fly. I have done some superficial editing, but it is what it is. For other perspectives see the dh09 thread on Twitter.

The conference was smoothly and well organized by the MITH team. They deserve a lot of credit. The venue was nice, there were a lot of participants, and the quality of the papers was extremely high.

Monday, June 22

Lev Manovich Plenary Talk

Lev talked about work he and his colleagues are doing on Cultural Analytics (CA). CA exploits the availability of large amounts of data to do "big humanities". We now have very large datasets, all sorts of mining tools and visualization ideas from the media arts community. This combination allows us not to have to choose between depth and breadth. Traditionally the social sciences worked with broad data that was shallow while the humanities worked with a narrow slice in depth. Now we can potentially work broadly and in depth. It's not either a broad survey or deep biography. Some of the features of the new data landscape are:

We have lots of data about some phenomena
Much of this data is created not by professional gatherers, but by the people themselves.
Instead of statistical that samples the phenomenon we have the full data in some cases
We also have lots of tools that grab data all the time - we can observe differently
For some phenomenon we have different resolutions of data

Jeremy Douglas presented rapidly about experiments to use analytics to look at web comics, video, and gaming. Lots of interesting ideas about how to visualize different features of each. I was interested at the creative ways they are trying to analyze non-textual data, something much harder than text analysis.

Lev framed part of the discussion by showing images of a very large tiled high-resolution display that you can see on the Projects page of the Software Studies site. The Software Studies site provides a thorough collection of documents and links.

Tuesday, June 23

Animating the Knowledge Radio

St�fan Sinclair presented a paper for both of us animating text analysis. We presented models we have been developing for presenting information over time - animating it. St�fan showed The Big See, the Ticker in Voyeur, and the Lava interface.

Text Analysis of Large Corpora Using HPC

Gerhard Brey presented HiTHeR, a project funded by JISC at Kings College London. The Humanities have been underrepresented in e-science so they are being wooed. The Aim of the project was to create a campus grid that could connect to the NGS (National Grid Service). Their test project for the campus grid was the Nineteenth-Century Serials Edition. They used Olive software to scan the originals and deliver them. They experimented with techniques for semantic tagging and document similarity identification. Gerhard distinguished between HPC and HTC (High Throughput Computing). HTC doesn't use a supercomputer, but lots of affordable computers including underutilized computers using Condor to distribute processing. They had nice ideas about document similarity - finding documents that are similar to one that fits your interests. They experimented overcoming bad OCR by using n-grams.

Appropriate Use Case Modeling of Historical Texts: A Software Engineering Perspective

John Keating and Aja Teehan presented on use case modeling to minimize the effort of developing software. John and Aja are from CS so they needed to contextualize the process and understand the humanities. They started by positioning their theoretical perspective. Activity Theory (Leont'ev and Rubinshtein) says subjects engage environment through tools for outcomes. Nardi argues that artefacts have been ignored. Community needs to also be taken into account.

I'm not sure humanities work is so instrumental. I don't know if we develop tools only for outcomes. Their theory is itself a tool and one which sees humanities scholarship as instrumental.

John talked about the TEI from their perspective - they see the TEI as too proscriptive - too limited to certain use cases. I wonder if they understand how the TEI represents a long conversation of digital humanists and may not be as limited as they think. They gave an example of an account book that was itself an encoding in the 17th century of accounts. They suggested that they want to model interaction which seemed interesting, but I'm not sure exactly what they meant. They also showed how they wanted the account books to be used like spreadsheets.

The cool idea was the embedding of interaction into the process so they can preserve documents with their functionality. The code is generated from the repository along with the encoded document. This project is about ethnographically recording what you are thinking so it can be used. They did ethnography on the community and that is kept along with their theorization.

Creating a Composite Cultural Heritage Artifact - the Digital Object

Fenella G. France presented on hyperspectral imaging of key documents in the Library of Congress for preservation. Hyperspectral imaging (HSI)is non-destructive and can then be used on artefacts that can't be touched. The idea is shooting at a variety of wavelengths to show all sorts of things that can't be seen. They can characterize, through spectral response, things like inks, environmental damage and material substrate. They keep the metadata (including photo and macro info) for the different wavelengths in the files. ImageJ is the software they use - it is developed by the NIH and well maintaned. They are then letting users play with different wavelengths.

Neat new word for the day = scriptospatial - overlaying information scripts on the visual space of manuscripts. Like Google maps they can add annotations with information about different spectra to the images.

The digital object for them is a combination of the physical and the layers of digital information.

On-site Scanning of 3D Manuscripts

Timoth H. Brom talked about the EDUCE (Enhanced Digital Unwrapping for Conservation an Exploration) project. They use high-resolution computed tomography (CT) scanner that shoots X-rays through artefacts to see inside the artifact. They then use the computer to calculate a 3-D image (voxel set). The computation is a reconstruction. Computers can then find structure in the dataset. Processing of the high-resolution layers is an issue if you don't have HPC so they created a 4 box HPC that could be transported in a minivan. They tried a gaming graphics card and it out performed the 4 system unit. Then they ran into problems of portable scanners. There are now portable high-resolution CT scanners.

The Ghost in the Manuscript: Hyperspectral Text Recovery and Segmentation

Patrick Shiel also presented on HSI for manuscript scanning. HSI takes images at monochromatic lights (single wavelengths) along with other types of light, many that we can't see. It uses multiple wavelengths that can then be combined in different sets to show different things. HSI is used in forensic work. HSI lets you see and hide different features, to subtract features to let others through, to see hidden information and to recover very faint information.

I'm amazed how little I know about light. There is reflected and fluoresced light. They use raking lights to show almost 3D topographic information.

Ubiquitous Text Analysis, T-REX, and Mashing Text

A bunch of us presented three papers around text analysis models and activities. I presented about embedded tools or ubiquitous tools - small panels that can be dropped into blogs, etext streams and other forms of online publishing. See TAToo for example.

Steve Downie, Patrick Juola and I presented about the T-REX (TADA Research Evaluation and eXchange) project that organized an evaluation exchange.

Peter Organisciak presented about the JiTR project (or Mashing Texts project). Peter talked about the Personas, Scenarios, Wireframes, and Graphic Design process we used. This process is a usability design process that builds on stories about imagined users (personas). Scenarios are developed and used for designing. This is a usability design process that should be of interest to humanists as it uses telling stories about fictional people not statistics to drive design decisions.

Design as a Hermeneutic Process

Stan Ruecker (and Alan Galey) presented on experimental interface design and book history. He wanted to mash experimental design theory and editorial hermeneutical theory. He quoted Lev from a previous conference that prototypes are theories and we shouldn't be embarrassed by them. Materialist hermeneutics would look at artifacts and how they are interpreted and interpret the world. Stan was very interesting on how an object makes an argument. Designed digital objects to argue need to:

Be situated
Allow unpacking and testing
Accommodate objections

Arguments are typically contestable, defensible, and substantive. Stan showed examples from Stefanie Posavec and others and asked how they were arguments.

I wonder if any visual object can make a formal argument without text or annotation. Likewise it might be impossible to make an argument with text only and graphic elements. Another question I have is about the very translation of argument. Stan was translating the examples he showed and then asking if they were arguments. The translation or interpretation, wasn't however, always so straightforward. The visual works seemed to me open to multiple interpretations in a way that is different from how a text works.

Lev Manovich added that software objects present a view of what is important or not. A text analysis tool presents a view of the world where typography doesn't matter, for example. Object argue by hiding and showing things.

There was an interesting discussion about time at the end.

What is transcription? Part 2

Michael Sperberg-McQueen and others talked about formalizing transcription. When you try to formalize markup one gets to a point where many tags point to transcription. Michael talked about the "assertions model" that starts from the observation that we read a transcription about something and learn about that thing. One interesting thing is that assertions can underspecify - be vague about the transcription. Michael talked about the "Readings Level" - readings attribute assertions to an exemplar. A language constrains what can be asserted. You get problems when you can't distinguish from other marks and when documents are equivocal - you don't know what the tokens are. The claim that any token maps to one and only one type is clearly wrong. The readings level solves the problem of contradictions in assertions. You can say that X reader reads a region Y and another reader reads it Z.

Burying Dead Projects

Shawn Day and I presented on how to close and deposit a project, specifically our work on depositing the Globalization Compendium. This project is being documented openly (though that documentation doesn't always catch up with what we have done.) See http://tada.mcmaster.ca/view/Main/ProblemOverview for our documentation. This talk seemed to get the most reaction of those I was involved in.

Google Book Search

Jon Orwant from Google Book Search came to talk to us about possibilities of supporting scholarship. See my blog entry on this.

MIHS- Text Mining Historical Sources Using Factoids

Sharon Webb is An Foras Feasa fellow working factlets and factoids. She is interested in development of Irish nationalism. She talked about "othering" where labeling and categorizing of a group creates an "other". MIHS (Mining Interactive Historical Source) is the software Sharon and others are creating that extracts information by developing factoids which are composed by factlets. Text mining can generate clusters of factoids. Factoids are ways to represent and manage information and connections between different types of structured information. (See Bradley & Short, 2005) Facts are argued to either be in the data or in interpretations.

Factlet - the fact as it exists independently - an assertion by a source independent of an interpreter
Factoid - the fact that exists through interpretation

They seem to be generating factlets from source texts though it looks like there must be a lot of interpretation. The factlet will preserve the source narrative and add deduction. Factoids are generated from lots of factlets. She showed some interesting interfaces for text mining documents using factoids and factlets. It looks like a historian would collect a lot of factlets as a form of structured notes and then manage/analyze them. She had a cute quote about how, "as a fact, that the fact, is not a fact!"

Sentiment Analysis of Fictional Characters Based on Entity Profiles

Rohini Srihari talked about sentiment analysis that tries to find the opinion of a person/organization on a particular topic. Another use is to find positive or negative opinions about an object. One can also look at how opinions change over time.

She talked about analyzing social groups and their behaviour. She gave as an example the Mumbai Terrorism incident and mining of it. What's being discussed by who and how is it changing? Is manipulation, persuaision, coercion happening?

She is now trying Jane Austen to validate automated sentiment analysis. Fictional characters in novels can act as a testbed for training. I get the feeling they are using fiction to train tools for counter-terrorism mining (as in Carnivore.) There is a lot of interest in commercial and security circles for sentiment analysis. Brand tracking and so on.

She talked about the "bag-of-words" (BoW) model and how that is used in many mining techniques. BoW works for a lot (Google is built on it) but it won't work for certain tasks. She showed how their system generates an entity profile from text - info about name of org, members, events, and description. They can then extract the sentences that a profile is based on and that is a synthesis document that can be mined.

For sentiment they have developed a lexicon of words that have positive and negative sentiments. They traverse WordNet in different ways to get synonym networks of adjectives. They assign sentiment to a character by looking at the sentences related to an entity profile. They see how many of the words in the sentences are in the positive or negative lists of their lexicon.

For example, Mary Crawford, the character, in Austen's Mansfield Part can be tracked through a novel to see what the sentiment associated with the character is. Crawford starts as positive but becomes described in a neutral way.

The Artificial Intelligence Hermeneutic Network

Fox Harrell presented about a new approach to intentional systems. They are looking at artificial intelligence systems like Voyager. Intentional systems is a new term for AI where we are interested in how we think about systems as having intentions. It is a shorthand for systems that are described as having intention whether informally ("my laptop it thinking about it") or formally.

Intentionality according to Searle is "that property of many mental states and events by which they are directed at or about the world". It is the aboutness that is important. The traditional view is that artifacts cannot have intentions except metaphorically. This is being challenged so that some seriously consider the intentionality of artifacts.

The Eliza Effect is when we want to believe that an artifact has intentionality.

Dennett proposes different stances. Physical stance, design stance and intentional stance. Physical stance predicts behavior based on laws of physics while intentional predicts based on beliefs (and stances.)

"System intentionality arises from a hermeneutic process" is Harrell's proposition. AI tends to see intentionality in complexity and knowledge representation. HCI folk see it in the interface. Their framework introduces system author and the publications surrounding a program. Software studies looks at code like novels - what about the authors, the context, and the surrounding documents.

Harrell then looked at Copycat as an example of the AI Hermeneutic Network. He applied his approach to the AI system Copycat. He did technical-social-cultural analysis, content analysis, and ideological analysis. In effect they are doing the interpretation and bibliography on software where they look not just at the AI system, but also at the politics, the surrounding documentation and language of intentionality. It seemed he was doing something similar to the "software studies" that Lev preached though not through visualization.

Great Papers I Missed

As I was giving a number of papers this year I missed a number like Melissa Terras on Digital Curiosities. There is a also a good set of live blogs at http://titania.stockton.edu/sjcdh .