These are my notes about DH2014 in Lausanne, Switzerland. Twitter at @DH2014Lausanne #dh2014 .

Note: These are being written live so they will be full of errors and misunderstandings.

Monday and Tuesday, July 7th & 8th

Monday I was in the ACH Exec meeting all day. On Tuesday I had a workshop on My Very Own Voyant. You can see the script for this workshop here.

@ADHOrg has a storify of Day 2

The President of EPFL, Patrick Aebischer, gave one of the short talks of welcome. He talked about creating a DH institute. At the EPFL they have a Digital Humanities Laboratory run by Frédéric Kaplan, one of the co-organizers of DH2014. Claire Clivaz is the other and she runs Laboratoire de cultures et humanités digitales.

Bruno Latour: Rematerializing Humanities Thanks to Digital Traces

Latour gave the opening keynote. (For the http://dh2014.org/program/keynotes/ keynotes see here.)

His talk has two parts. First, he discussed some principles. Then, he talked about some of his digital experiments.

His preliminary, and dogmatic, points

"The more people talk about the virtual, the more material it is." He made the point that the cloud is real and material. The cloud is the more real and the more expensive.
"The more people talk about cognitive functions, the more they describe a socio-technical environment." There is a lot of infrastructure to the cognitive and virtual.
"Digital is not a native function, but the result of a redundancy in the computer institution." He mentioned Brian Cantwell-Smith and how the digital is built on the analogue.
The closed-book fallacy - "The more people talk about the difference between book and web, the more they look alike when the whole network is taken into account." We have moved from the scriptorium to the screentorium. The book is complex and the scriptorium or scholars study are even more complex. We are just working through the complexity.
What is observable? "Segments of trajectories through distributed sets of material practices only some of which are made visible through digital traces."

He gave some examples of the changes in social theory. He mentioned three ages:

Statistics and the idea of society
Polls and the idea of opinion
Digital traces and the idea of vibrations - the big data approach to tracking the social

He showed a screencast of the E.A.T. Datascape. He showed an example of computational legal studies and some of Moretti's distant reading.

He summarized with some bottom line:

Most segments of networks remain invisible through digital tools, yet ttheir multiplication helps rematerializing all cognitively complex stes
Ditital huanites are the attempt to underline some segments to make others easier to grasp
The net result is an rematerialization of analogical and materialist accounts of cognitive functions: thought is re-engineered

His second part was a tour though a project he is part of.

Is it possible to relocate some of the skills necessary not for the distant massive reading but for the close reading of difficult texts through web. Latour and his colleagues have an experiment at Inquiry Into Modes of Existence.

The ideas behind this experiment include:

A hightech interface for a highly traditional argumentative process
Restricted site, but free access
Collaborative co-inquiry but on a limited set of questions
Open inquiry but protected from comments
Close reading, but distributed in space and time
Set of digital devices to facilitate thick description

He and colleagues are developing a set of practices he calls digital fortune. They had an (Italian) designer who tried to keep the same tone between paper and web. They wanted to be able to read closely, but also be able have all the networked information.

He argued that close reading requires closure and for that reasons they didn't want free floating commentaries. Instead they had mediators that edited contributions.

The whole thing felt like a highly structured colloquium that produced a book (or he called it a report.)

He talked about how one can get out of the fake linearity of the book. They invented some new interfaces for browsing the book and following arguments.

He talked about how they went back and read the book together to check things. They did a bunch of meetings that were taped as new materials for rewriting the book. This is a digitally mediated community of inquiry. They can then follow the community of readers with digital tools.

He joked about being a member of the digital humanities and soon being a professor (the EPFL Professor mentioned that they would be hiring).

His conclusions:

There is not much difference between the older scriptorium and the newer screentorium. The more we work in the digital the more we realize how complex traditional practices were.
The more thinking and interpreting become traceable, the more the humanities can merge with other disciplines.
The digital is not a domain, but an entry into the socio-technical.
"A little digitally inclines man's mind to virtuality, but depth in digitally brings men's mind about materiality." The digital is a way of getting deeper into the material.

Some points made in answer to questions:

A set of "repliques" - the aftershock of something
He talked about the importance of closure. Is that a matter of habits or traditions or is it necessary to thinking.
The word "digital" in digital humanities reminds how material this all is. He sees the digital as reminding us of some of the material features that are processed. (What a lovely turn of the term away from what we usually understand.)

I was struck by how he read "digital" in a generous and different materialist way. He could very well have critiqued us for falling for the virtual myth. Instead he chose to assume that we were with him on the materiality of the digital.

That said I wanted to ask him about real abstractions like "justice", "love", "France". Everything is (sort of) material, but you can't understand France by looking just at the territory. Likewise the "digital" is an abstract, but still real idea.

The Book Lab: Rolex Learning Center

The Book Lab is a project that explores where the book is going. Swiss designers participated in this exhibition. Books are not dead, the digital has extended them. This was curated by Swissnex San Francisco and the exhibit was opened to us.

Wednesday, July 9th

I chaired a session of long papers.

Erhard Hinrichs: CLARIN: Resources, tools, and Services for Digital Humanities Research

Hinrichs presented a paper co-written with Steven Krauwer about CLARIN, an EU research infrastructure focused on language resources. CLARIN has 10 national members that also fund it. It is a networked federation of certified European data repositories for easy access to language data. Some of the services:

Single sign-on
Discovery, acquisition and curation of data - the Virtual Language Observatory harvests data and lets people search the metadata. They offer help and user guides on standards for encoding and curating. They have ingestion systems that extra metadata from data submitted and help with fixing.
Data mining, analysis - They have a CLARIN-D Federated Content Search that searches the separate collections. You can also search individual collections, but the federated search crosswalks. WebLicht allows projects to enrich their collections with linguistic annotations and so on.
Visualizing - They have general visualization tools
Data sharing - They are doing neat stuff with ownCloud where they can save searches and stuff and share with other users. They are collaborating with EUDAT to publish through Simple Store.
Archiving of data - They have a data seal of approval to exercise quality control

He walked us through some examples. A lot of the corpora are national literary and linguistic corpora. He had a nice visualization of linguistic variants.

He mentioned that CLARIN is not a closed shop and they welcome other centers.

We had an interesting talk about infrastructure and funding.

Lukas Rosenthaler and Claire Clivaz: Navigating the Storm: eMOP, Big DH Projects, and Agile Steering Standards

Rosenthaler and Clivaz presented a paper also written with Peter Fornaro. Rosenthaler stared by asking what digital data was - any information stored in a code from a limited alphabet of symbols. Binary data is special case - you can call it + 3 volts or - 3 volts or spin up or spin down. We have had digital data for a long time. We know that copies and distribution (redundancy)

He then had to disappoint us. Digital data doesn't exist. There is a material layer that is analogue and the bits rot over time. Hardware gets obsolete. Software also gets obsolete. This is the real problem. Then we have the loss of research data. Kepler's law was based on Tycho Bahe's data in his notebooks.

The generic solution we have heard is Open Data Migration. The generic problem is money. The solution is to create a new institution that can act the way the library does - an institution to preserve data. The Swiss National Data Curation Centre is their decentralized centre and they have 2 years of funding. They focusing primarily on the preservation of complex structures like databases, not just documents. The digital sources are fairly easy - it is the complex that is hard.

Migration never stops and always costs. Migration doesn't give you anything new - but you have to migrate to survive. A lot of projects are dead and people

They are moving all information to open data. They think they emulate any data model through linked data.

SALSAH = System for Annotations & Linkage of Sources in Arts & Humanities is the name of their system. They can prove that it works. Their experience:

Do not impose a data model, but work with the researchers
Support standards only if it doesn't hinder research
Don't expect clean data
Try to understand the objects as the projects understand them - listen

They have 5 steps:

Understand needs
Create ontology and rights
Create scripts that migrate

Then Claire Clivaz took over. She answered questions that she had got from reviewers:

It is important to collaborate across disciplines
SALSAH allows them to experiment and gives flexibility
They are trying to look ahead rather than just preserve what they know
They need mirror servers
They need to deal with copyright issues
They are trying to not work top down
Researchers in the humanities need to accept to let their data go

Navigating the Storm: eMOP, Big DH Projects, and AGile Steering Standards

Elizabeth Grumback, Matthew Christy, Laura Mandell, Clemens Neudecker, Loretta Auvil, Todd Samuelson, and Apostolos Antonacopoulos were the co-authors of this presentation. The presentation surveyed

eMOP is a Early Modern OCR Project that is trying improve OCR engines so they can handle early modern fonts. There are all sorts of problems with OCR of early modern, especially with the images from poor scanning typical of the microfilming that happened back then. It would be nice to rescan, but many libraries figure EBBO and ECHO (based on microfilmed texts) is good enough.

Their dream was to train for different fonts using Tesseract, but it has been very hard. Giving lots of examples didn't work. Instead they need to provide the ideal of each letter. They use Aletheia to identify glyphs and match to unicode. They then use Franken+ to choose the best glyph so as to create a "monster" text that can then train tools.

Then we had the case of IMPACT (IMProving ACess to Text) - a European equivalent to eMOP. This project again started with a dream and reality was a collection of very diverse tools, algorithms, etc. Instead of producing one solution for all they built a bunch of smaller tools that can be composed in different ways. They also had the interesting problem of crossing disciplines. They created a buddy program to match a computer scientist with a librarian to overcome the different perspectives.

Note to self: the buddy system sounds like a great way of encouraging team members to learn about those in other fields.

Now IMACT has turned into a Center of Competency, see http://www.digitisation.eu/ www.digitisation.eu.

The PRImA Lab at University of Salford, Manchester was the third case study. Some of what they learned was:

You need to really understand the problem. Everyone thinks they can do everything in the beginning. It is hard to focus on one thing and solve it. He gave the example of the problem of layout. It isn't enough to just recognize words if you don't recognize layout of words.
You need to really understand what the different stakeholders know and think. You need to understand stakeholders and their roles. Often stakeholders don't really understand what you think they do. The DH researchers were the catalysts.

At the end they talked about the importance of failure. They have produced interesting results, even if they haven't "solved" the OCR problem. There was some discussion about crowdsourcing difficult passages. They are going to release a Facebook game that sounds really neat.

Arianna Ciula and CRistina Marras: Circling around texts and language: toward 'pragmatic modelling' in Digital Humanities

They want to look at models as icons. They started theoretical and then move to a pragmatic and metaphorical model (of modelling?). They showed a semiotic structure of th emodel relation. They claim models are icons because they are symmetrical (I'm not sure what was meant by that). The model represents "reality".

Modelling in experimental practices is pervasive. DH is attracted by practice of models and modelling because it anchors scholarship to practical dimension. The approaches of DH are also comparable to empirical practicers in techno-sciences. So what is modelling in DH?

Modelling process is part of what is being modelled. (Flanders)
What model of modelling can be considered "adequate" to DH?
Reflection on objects, practices (modelling) and languages

They focused mostly on textual complex and open objects. One can think of texts as artefacts, material documents, or as abstractions. This connects to debate about models as fictions. There is a wide sense of modelling in DH. The multidimensionality (SAHLE 2006, 2012) of text, the historicity, even the identity of the modeller are issues.

Looking now at role of models in real practices there is, in the sciences, a physical system that is represented in simulation. The physical system is stable. The connection between physical and experiment is supposed to be causal. There is no measurement without models. Models are mediators between theory and experiment (Morrison 2009). Models can take on a life of their own.

In DH the computer simulation adds a level of materiality to humanistic interpretation. (While in science the opposite is true - the simulation moves away from material.)

For Sperberg-McQueen (data) modelling is about surfacing ones assumptions. You

They then presented their pragmatic (metaphorical) model. You have an object (text) and Observer with a theory. Language and metaphors are what you model with. Models mediate between object and observer.

Metaphorical language is not purely descriptive nor purely functional. It adapts to the richness. Metaphor is not just an ornament for them, but a use of language that structures things.

They talked about the pragmatic aspect. I didn't quite get it, but I think they are right that in DH there is a pragmatism compared to the dream of theoretical presence of the humanities.

Theirs is a "deflationary" account of modelling. Within DH the models are themselves objects and have to be studied themselves.

Mark A. Finlayson: Computational Models of Narrative: Using Artificial Intelligence to Operationalize Russian Formalist and French Structuralist Theories

Finlayson talked about how Propp's functional theory of folktale makes a good theory of narrative in a text that he as a cognitive scientist can try to model. He talked about Propp's morphology where there are pairs of complications and replications. He came up with the functions with which tales can be built. Sack showed how a story could be coded

Finlayson wanted to see if a computer could learn Propp's morphology. Here are the steps:

You have narratives
The stories are annotated
Then you generate computer-readable representations
Then a learning algorithm
This can then be tested

Some think that Propp's morphology is too fuzzy, possibly incomplete. They actually tested whether annotators could be trained to reliably markup a text. They wanted to test if one could train people on Propp. They would train people and have two people annotate and then test for inter-annotator reliability. The answer is "yes" - you can train annotators to agree on Propp. He feels that this makes a strong statement Propp's applicability.

Then they had to enrich texts using a tool developed for human annotation. It is a large and deeply annotated narrative corpus. Then the question is whether Propp is learnable? Can they create an algorithm that can recognize Propp's functions from the enriched corpus. Their answer is yes and he walked us through how his algorithm works. For many types of functions they got good results.

He has a workshop on computational models in narrative: http://narrative.csail.mit.edu/cmn14/ - a workshop in Quebec City at the end of July for those interested.

ACH Meeting

I got late to the ACH meeting, but made it in time for the Job Slam which is a fabulous event where people with jobs and people looking for jobs each have 30 seconds to pitch. This is a great way to get a sense of both the types of jobs coming open and the richness of the new scholars looking for opportunities.

Stéfan Sinclair, the incoming President, talked about conferences upcoming. The next DH is in Australia and then in Poland. Does the membership want the ACH to organize smaller meetings in other regions.

Glenn Roe: Discourses and Disciplines in the Enlightenment: Topic Modelling the French Encyclopédie

Roe gave a paper co-written with Clovis Gladstone and Morrisey Robert. He started by talking about how the Encyclopédie was organized. There was an alphabetical organization, a tree of knowledge, maps and so on. Is the Encyclopédie a map, tree or graph. He talked about the classes and how some classes are only used once.

They used algorithms to classify the articles and then compared their automatic classification to the original classification.

Then he talked about revisting Pecheaux's notion of an "Automatic Discourse Analysis"? Clovis then talked about topic modelling and gave examples of topics identified for articles. He gave examples of topics that don't make as much sense. They wanted to think about topics as discourse. Are there topics that aren't primary for classification, but which underlie a large number of articles? They found some examples like what they call "droit naturel". Articles, like some under Grammaire, seem to have droit naturel as a topic. This may have been because Diderot put problematic articles under innocuous heading like Grammaire. Clovis talked about how Diderot would weave contentious issues into seemingly unproblematic articles. Morality issues were woven into other articles.

I asked about the "renvois" and how they might be using them.

John Montague: Seeing the Trees & Understanding the Forest

Montague gave a talk that a bunch of worked on. He asked how we can use visualization to understand things we didn't before? This was an adaptation of a question from Matt Jockers. You can see some of the prototypes he talked about at http://analytics.artsrn.ualberta.ca/viz/

He talked about how he surveyed a large number visualizations and identified 5-types of visualizations:

Histograms
Word clouds
Scatter plots
Network graphs
Dendograms

He talked about using the exploratory power of combining visualizations and using interactivity. We want clarity (does the reader get it) and understanding (do they learn from it.) Some of the features we can use include: colour, structural features, ornaments and others I didn't type in.

He showed some of the prototypes we developed like the dendogram viewer and prototypes for new word clouds.

Trading Consequences: A Case Study of Combining Text Mining & Visualisation to Facilitate Document Exploration

This paper was by Hinrichs, Uta; Alex, Beatrice; Clifford, Jim; and Quigley, Aaron. Trading Consequences is a 2-year Digging Into Data project looking at large-scale commodity trading in the 19th century. They had more than 200,000 historical documents focused on Canadian natural resources.

They had to develop a lexicon of commodities. They had to deal with a lot of noisy data as they had poorly OCRed text. They mined their textbase to extract entities like commodities, dates, and locations. They grounded them to other resources. Then they looked for links between commodities, dates and places.

Then the question was how to explore the data. They provided search, but they wanted interlinked visualizations. She showed examples then of their linked viz tool. You can filter down in different ways.

They then talked about a workshop where historians tried the visualization. They linked the links, but were worried by the noise in the database. They linked the document list, but wanted to see some of the document. The current tool has performance issues that make a real difference to using it.

She talked about noise in the data that show up. The noise reminds us of the noise in the data. Crowdsourcing could help.

Another problem is how to show what is not in the database? Every dataset has its problems. Can the visualization show the problems or the limitations of the dataset, and not just lead to new insights.

They created a second prototype based on the feedback which was quite interesting. See their site.

Thursday, July 10th

Eide, Øyvind: Sequence, Tree and Graph at the Tip of Your Java Classes

Øyvind started of a session of long papers with a paper on the mismatch between the XML editing tools like Oxygen that allow us to markup links and the network visualization tools. He has developed a tool called GeoModelText. It is developed by one developer, for one user, for one dataset, and for one screensize.

You have the linear text where the tags can be turned on and off. You have entities and spatial relationships and co-references between place names where there is more than one name for the same place.

As a text editor, you create a tree model of a text. As a programmer a DOM object is just an object. Objects can link to whatever you want - you get a triple structure. This means you have graph objects, DOM objects and you can get text out. As a programmer he knows this, but how to make it available to the text tool user who only sees the linear XML in a text editor.

How can we operationalise object manipulation? for a tool user, rather than programmer. Databases are not the solution. He wants to get back to the triple nature. He wants to let people move between formalisms.

Michael Sperberg-McQueen questioned the idea that a text is a sequence of characters. He made a very funny point that the notions of character (or the particular implementations we have) that we are working with were all decided by committees and committees rarely made decisions for scientific purposes.

Geoffrey Rockwell and Stéfan Sinclair: Towards an Archaeology of Text Analysis Tools

Next I presented a paper with Stéfan Sinclair. As I was giving the paper I couldn't take notes.

Stefan Jänicke: 5 Design Rules for Visualizing Text Variant Graphs

Stefan presented on TRAViz (Text Re-use Alignment Visualization) is a tool at http://taviz.vizcovery.org/ . He collaborated with Annette Geßner on this.

He talked about the introduction of the text variant graph. CollateX is a tool that aligns and visualizes variants producing just such a graph. They found the design is not appropriate to the information they have. TRAViz produces a better graph.

They developed some "design rules" for how graphs should be visualized that guided them in improving on the CollateX variant graphs.

Rule 1: Vary vertex label sizes
Rule 2: Abolish backward edges
Rule 3: Remove the label edges - there can be large gaps between vertices (which makes it hard to read) - they draw a different coloured edges instead of putting labels
Rule 4: Bundle major edges - they bundle bunches of edges into cords
Rule 5: Insert line breaks so the graph isn't one long scrolling tape - users are not used to scrolling horizontally - so they break the flow

He then showed a screencast of the resulting interactive visualization. Then he showed different uses of the visualization.

What is Modeling and What is Not?

In the next session I was part of a panel on modelling organized by Joris Van Zundert with Fotis Jannidis, Ted Underwood, Mike Kestemont, Julia Flanders and Michael Sperberg-McQueen.

July 11th, 2014

Ève Paquette-Bigras: A vocabulary of the aesthetic experience for modern dance archives

Ève Paquette-Bigras presented research on dance archives that she worked on with Dominic Forest. Without good descriptions of dance archives people can't find them and if they can't find them, then it is as if they didn't exist. Therefore Forest and Paquette-Bigras are trying to build a vocabulary for describing dance archives. They then made an interesting move - they looked at the writings of Mallarmé whose descriptions of dance are exemplary. They text mined his collections Divagations (1897) and Poésies (1899). The mining found clusters with their characteristic words. These words could provide a vocabulary for describing dance archives. Some of the themes include: body, ideal, nature, nudity, purity, laughter, solitude ... (all in French). She gave an example of how they used their vocabulary to describe a contemporary dance work choreographed by Dave St-Pierre.

In conclusion the Mallarmé vocabulary "describes some experience somehow and something happening on stage beyond the story."

Hartmut Skerbisch – Envisioning association processes of a conceptual artist

Martina Semlak presented on a genetic edition she is developing of the artist's notebooks of Skerbisch. His notebooks go from 1968 to 2008 - lots of non-linear fragments (2100 pages.) It has been transcribed and encoded in TEI. The notebooks contain lots of drawings, text organized on page, newspaper clippings.

She gave a couple of examples each of which have Joyce quotations. She can make connections between actual art works by Skerbisch, the workds he quoted, the verbal connections, music and so on. This is where the semantic enrichment can take place. A genetic edition tries to trace the development of a work. She is trying to trace the genesis of the ideas for art works in the notebooks through texts and sketches. A quote in the notebook is related to a book and to an artwork which is then related to an exhibition. She showed the RDF for enriching her edition. The RDF is stored in a triple store for querying.

For Skerbisch texts were "the departure point for my work, I accept the view that my work is a commentary"

Active Authentication through Psychometrics

John Noecker Jr. works at Juola and Associates (as in Patrick Juola) and started by talking about stylometry. "Every individual has a unique "language fingerprint"" - a stylome. Active Authentication is about verifying that a given person is who we think he is. We use passwords for this. The goal of active authentication is continuous ongoing verification that the person is still the person we think. This is useful if someone steps away from a computer.

They brought in 80 full time temps. They logged everything to see if they could track personality and maintain active authentication. This raised ethical considerations. They used Myers-Briggs Type Indicator and MIDAS (which is a measure of intelligence.) Their goal wasn't really to do psychological profiling so much as authentication.

They used JGAAP on the keylogger information to see if they could reliably profile a user with about 30 minutes of computer usage. They seem to have fairly good correlation between logging processing and the results of the formal MBTI and MIDAS tests.

They are combining this with other measurements to create a system for DARPA that could reliably, within a fairly short amount of time, authenticate some one. Can it also provide a profile?

I asked a question about the ethics as this has really interesting and spooky ethics implications. I asked why they were bothering with a psychological profile for authentication. Why couldn't they just use the activity to create a typing profile. Patrick Juola had an interesting answer which I don't have the time to record. We had a great conversation later about the ethics of doing research that could be misused.

Others had good questions about genre and type of activity. If someone gets up for a coffee, comes back and changes the type of document they are creating would that create a false impression that the person had changed.

Crowdsourcing Performing Arts History with NYPL's ENSEMBLE

Doug Reside presented work done with Ben Vershbow. OMNES is an ensemble and a world wide performing arts database. The arts aren't like textual disciplines where we have good archives of the texts. For performing arts a lot of the work is ephemeral and what can be archived are the ephemera like programmes, posters, and other documentation. At the NYPL they have a very large collection of artefacts. They serve as an analogue database of performance history. Alas it is hard to search this and much is falling apart. They started a crowdsourcing project ENSEMBLE that is a collaboration with ZOOINVERSE. The paper discussed the prototype project. They scanned some microfilm materials of materials in the public domain. Other crowdsourcing projects had great participation, but this one had lower participation numbers. The issues might be:

It is still in beta
The task is complex and more time-consuming.
They didn't advertise it extensively

They plan to revise the game and give newbies the simpler tasks. They also plan to have multiple people do the same tasks so they can compare results and use reliability as a metric for deciding whether to put the data into the final database.

They hope that the final database could allow one to follow people's careers, especially those about whom we don't have biographies. They could begin to map performance.

They are now trying to build a consortium that is world wide.

Starting the Conversation: Literary Studies, Algorithmic Opacity, and Computer-Assisted Literary Insight

Aaron Plasek and David Hoover looked at claims about literary text analysis by Steve Ramsay. They looked at some of Ramsay's analytic results and showed how one can get very different results. Hoover argued that Ramsay doesn't find the interesting stuff because all he is interested in is a provocation. Hoover feels Ramsay doesn't care enough about the algorithm because it is provocation he is after. Looking at deformed text we are obscuring the algorithm that generated it. He doesn't explore the algorithms far enough. Can one keep the provocation while also caring more about the algorithm? Hoover and Plasek are interested in not just any deformance, but to do deformance which is literary studies. Much more attention needs to be paid to testing the algorithms. The algorithms are the beginning of a conversation, not the end.

We had a neat discussion of how to have a conversation about the algorithms. Plasek made a good point about looking at code aesthetically, even if I don't think our community is ready for that. Hoover made the point that one can have a conversation through the topic or assertions. Thus we can discuss the algorithms through the claims made from the algorithms. In some ways this paper was proof of that point - they believe that Ramsay was wrong about

Elisabeth Losh: Visualizing Global News

Losh collaborated with Lev Manovich on this project. They are interested in news because it is an important craft of history and it shapes political engagement. What does it mean to know the facts? What can they do with news feeds:

Shot tracking and trying study visual narrative
Look at differences between news from different cultures

The problem is the digital humanities is not really part of the research into image/video recognition - computer vision.

We have all sorts of problems:

Massive scale - the news themselves are mixing media and complicating the phenomenon
Linguistic diversity - even in Switzerland there are four different languages with different gestures
Heterogeneity of provenance - commercial shot footage, video news releases, anchor shots, field shots, b-rolls, and government public relations materials, witness journalism - all of this is remixed without recognition of the provenance. We are looking at database cinema.
Copyright and IP - it is illegal in some countries to report certain things (Nazi stuff in Germany).
How to build interfaces for this - there are interesting different ways to represent the patterns. There are different experiments.
Limits to algorithms - computer vision can so some things, but not others. It can recognize cat videos, but not french fries.

Remix culture affects our news! News industry are using all sorts of media strategies and they experiment constantly.

She had an interesting example of how Obama presents himself and how he gets remixed. What do different stations do to contextualize Obma. Concept detection has gotten fairly good (American flags can be recognized and they show up a lot in Obama videos). Obama's weekly video speeches are used all around the world. There is a history to these speeches. She showed all sorts of things they can do with this video:

They can compare shots before and after in how news remix them
We can recognize objects in steam
We can look at shot narrative
We can do audio analysis
We can look at responses to the video

She argues that this work can address the media studies / DH divide. I asked about how to reach out to media studies. She suggested that ideas like database culture and remix culture can be shared. Another participant suggested that media studies is very wide and is undergoing shifts so it should be possible to find folk. They talked about a lot "turns" like the turn from thinking of media as instrumental to looking at the ecology of media.

I wonder if media studies is struggling with how to recognize new forms of academic response the way we are. Are media studies folk creating media works as academic research?

Jane Hunter: Extracting Relationships from an Online Digital Archive about Post-War Queensland Architecture

Hunter collaborated with John Macarthur, Deborah Van der Plaat and a number of others on this Digital Archive of Queensland Architecture. This project also works with a number of archives. They develop developed the new tools to extract the tacit knowledge from oral histories and to link to tangible knowledge. Their methodology:

Conduct interviews and public forums - they focused on themes
They recorded, transcribed, and aligned recordings to the transcriptions
Developed an ontology
Developed tagging and annotation services
Text processing of transcripts, articles, books - and named entity recognition of people, firms ...
Generate interactive objects and visualizations

Their system was based on Omeka. They used N3 Reasoning for the analysis of the assertions made. They now have a Sparkle interface to a Sesame backend. They made some really neat tools for researchers to create tools with the data. You can create your own maps, timelines, word clouds,

They allow people to connect things. You can drag and drop objects from different databases and link them to create a compound object graph that can be saved. One can also share things in a Gallery. I didn't really get this, but it seemed very cool.

They worked with architects to identify what they wanted to get out of the database. The architects are interested in influence. They are trying to figure out how they can infer influence.

Another issue is how can they get the general public and the architectural community engaged with the site? They have an exhibition that just started called "Hot Modernism" to try bring in public.

ADHO and centerNet Annual Meeting

Neil Fraistat opened the annual meeting. We had a short panel of speakers on Global DH. From Russia we had Inna Kizhner of the Siberian Federal University. She talked about the Russian National Corpus. Then we had Sinai Rusinek talked about DH in Israel. She talked about the Hebrew Historical Lexicon which was one of the first projects. There is the Genizah Project and others. There is really advanced computational linguistics. She talked about the Digital Humanities Israel project which is doing all sorts of community building things in Israel. They now got space, people, and money. Then Alex Gil from Go::DH talked about Caribbean DH. The history of Caribbean DH is interesting starting with poet Kamau Brathwaite using a dot-matrix printer for his poetry. The Digital Library of the Caribbean project is a big collaboration. They have had THATCamp Caribe which was a great success.

Then we had the centerNet report. CenterNet now is legally registered. There is a call for proposals for Day of DH. There are neat new initiatives. Ryan Cordell introduced DH Commons. They are calling for project statements. They had a project slam to get things going.

Macro-Etymological Analyzer - http://jonreeve.com/dev/etym/etym.php
Expanding the Republic of Letters: India and the Circulation of Ideas in the 18th Century
Kate Jones: BombSight - http://www.bombsight.org - she is trying to map where bombs fell on London and connect to stories. They have a augmented reality app too.
White Violence, Black Resistance - Amy Earhart and Toniesha Taylor - they are trying to figure out how to work across divides and create a more diverse canon. Project is pedagogically centered.
Photogrammar - http://photogrammar.yale.edu - Tilton and Leonard project organizing photos from US Farm Security Administration.
Brian Croxall: Archive of Belfast Street Poets - lots of letters are sealed so they are taking metadata to study the BSP.

Sukanta Chaudhuri: Tagore and Beyond: Looking at the Large Literary Database

He talked about the The Online Tagore Variorum. Tagore wrote in Bengali, a major language which is not a world language. Tagore wrote a lot in both Bengali and English and there are many variants to many of the documents. A full-scale digital variorum is thus a major undertaking. What is even more impressive is that they did it in two years with an Indian team of mostly humanists trained in India. They were bricoleurs not engineers.

He then talked about the issues of OCR. Bengali has a syllabic alphabet like Tamil. This makes it hard to do OCR. They ended up transcribing in plain text with no markup. They used some simple characters available on the keyboard like [{( to indicate things like deletions and so on. That gives them a strange e-text that needs to be filtered.

He argued that all text processing relies on search and collation. He showed some collation tools. He argued that variation is rich in text and that advances in big data of textual material

Can we look at every text or word as a unique work while doing big data. Can we reconcile the ideographic and big data. Can we contextual every word. I'm not sure I understood what he was saying at this point. He talked about data mining helping us study language across time and space. He talked about a hyperconcordance and hypercollation.

He talked about how the digital humanities forces us to talk about the humanities as much as about the digital.

Closing Stuff

Marie Saldana got the Fortier Prize for her paper "An Integrated Approach to the Procedural Modeling of Ancient Cities and Buildings." See http://dharchive.org/paper/DH2014/Paper-322.xml

Paul Arthur talked about the DH 2015 conference to be held at the University of Western Sydney next year. We heard about the conference in 2016 in Poland which looks beautiful. Do Krakowa!

We ended with seeing our faces. The logo of the conference was the network graph and the ending image was smiling faces ... ours. Nice touch. See http://dh2014.org/affiliated-events/photos/photos-thu/

DH 2014