These are my notes on the Digital Cultures, Big Data and Society conference organized by Emilie Pine and others of UCD.

As with all of my conference notes, these are written during the talks so they may have errors and omissions. Sometimes my battery runs low or I need to focus on the talk.

Day 1 at the Royal Irish Academy

Panel One: Digital Books

S.J. Ker (Maynooth University): Enhanced not "Distant": Examining Independence in the Novels of Austen, Edgeworth and Owenson.

Ker started by talking about the never-ending character of interpretation. The challenge to interpreter is to make things strange. In her case she wants to do this by moving to a corpus level of interpretation.

She also proposed great openness to text, tools, and interpretation that she called "enhanced" reading, rather than close or distant. Too many set close and distant reading in opposition. Close reading has operated not as a method, but as defining of interpretation. How then can one understand distant reading which defines itself in opposition? Ker asks how we can combine both.

Ker's case study uses a medium corpus of 28 novels that explore independence - conduct, education, and rank. She then showed some fascinating visualizations of conduct, education and rank over her authors. Education of an appropriate nature underlies all three authors. We can see that in collocations for education. There is a clear message that preparing women only for decoration have dangerous consequences.

She showed a neat word-embedding visualization exploring Austen. She talked about how her authors were presenting a world different from the world of aristocracy.

She closed by talking about spatial concepts/metaphors as used to interpret literature. She talked about maps as an analogue for close and distant reading. You can zoom in and out, switch to satellite view (or street view), and overlay other information. She talked about how important it is to share techniques in order to not make them invisible. Literary studies often pretend that only close reading of texts matter which is why we need to make visible the other approaches.

Marie-Louise Coolahan (NUIG): At-scale Questions: Patterns in the Reception and Circulation of Early Modern Women's Writing

Coolahan directs the RECIRC (Reception and Circulation of Early Modern Women's Writing) project. They have four work packages:

Transnational religious networks (ie. nuns)
Scientific correspondence networks
The manuscript miscellany - bound volume with mix of texts
Transmission trails and book owners

She showed the entry forms for tracking reception. They had researchers going around the world finding stuff and entering it online. They are now in data cleaning hell.

She talked about the traditional model of the humanities and the use of a database to enable team work that could be made visible on a larger scale.

She showed some preliminary visualizations of results and how they are throwing up questions. They are asking if their results may be due to what they were looking form. She showed a neat heat map of where the reception was taking place.

The work on the nuns is trying network analysis. They have studied a collection of letters about one particular conference of nuns. She had another neat visualization of reception of three major 17th century women authors.

Justin Tonra (NUIG): Quantitative Bibliography and the Digital Book Trades

Tonra talked about the catalogues of printed books from 1473 and 1800 and the challenge of using this sort of bibliographic data to study book history. There is a spectrum of resources being put together. Some appear to be more resources, others are intellectual history. Many in the humanities are suspicious of these sort of enumerative (quantitative) types of work. Increasing emphasis on digital humanities upholds the work of book history that often takes digital form.

Why do we need to defend quantitative bibliography? Because of the dangers of statistics of which humanists are often suspicious. Statistics are often lies that show important truths. We need the quantitative databases to make sure that our conclusions about book history are not biased.

He talked about Pollard's work of synthesis about the Dublin book trade. They are now trying to digitize Pollard's dictionary. The challenge is capturing the discursive elements.

He talked about using linked open data for datafication of these bibliographic records. We need ontologies that let us crosswalk different datasets. None of this works unless standardized and open. The large scale open work can support small scale close reading.

Discussion

There was a discussion about the archiving of the data of all these neat projects. Who in Ireland will store these projects?

Panel Two: Digital Witnesses: Industrial Memories

Emilie Pine, the Director, introduced the Industrial Memories project and the speakers involved in the project. This project is looking at the abuse of children in institutions in Ireland. There was a Commission to Inquire into Child Abuse (Ryan Report 2009) that produced a lot of data that needs to be read. They are digitizing the report in different ways.

"There are still witnesses who never encountered an auidence capable o listening to them or hearing what they have to say..." Paul Ricouer

The project wants to act as a witness and listen to the report.

What form is the narrative taking?
How do we understand the story being told?
How can we tell potentially different stories?

There are other reports and responses on such industrial schools from reports on other national scandals (like the TRC in Canada) to dramatic responses.

They are thinking about the digitization not just as a memory project, but also as an artistic intervention.

She then talked about the problems with the existing PDF/narrative report. By turning it into a database they could do system analysis.

Then Mark Keane talked about "Analyzing the Ryan Report: Methodologies." Keane was hoping they could analyze the different voices, but found it is mostly in one voice, ie. that of the reporter. So they resorted to basic methods like providing a search, counting words, classification, and some mining of associations.

He talked about the particular problem of tracking the anonymized abusers through the narrative. They could search for words about transfer, but a thesaurus wouldn't identify the church words like "dispensation" (for sacking.) They created histories for a number of these abusers and then looked at what was happening generally using social network analysis.

He then talked about the technologies they used. They used Google's Word2Vec technique to build domain vocabularies from vector spaces. He also talked about using paragraph classifiers as a way of finding paragraphs about transfer. Lastly he talk about association rule analysis.

He talked about the toxicity of the data and dealing with it.

Next, Susan Leavy talked about the developing the web site at http://industrialmemories.ucd.ie . They built a web site that lets one explore things like themes that were gathered by text mining. For example classifiers were used to find paragraphs about "Abuse Events". By preprocessing the data they could provide a far more usable database.

Tom Lane talked about the locative audio app they created for people who want to explore the actual schools. You have to literally go to the sites if you want to hear the dramatized readings. While the audio plays there are different graphics with overlay materials.

John Buckley from the IADT (Institute for Art, Design and Technology) talked about the "I.S. Complex" which is part of the IADT campus and was an industrial school run by the Christian Brothers before becoming an art school. They are exploring the history of their buildings that have since been renovated.

The Christian Brothers in 1954 dispersed the residents and turned the site into an industrial training school.

They are creating a VR of the original to juxtapose past and present.

Emilie Pine ended with the challenge of how to "allow the knowledge of past atrocity to touch us without paralyzing us?" What ways are there to approach this history? She ended with the wise observation that "The digital is not just a new way of reading, but is also a new way of witnessing!"

Keynote: Professor Alison Booth (University of Virginia): Biographical Networks, Gendered Types, and the Challenge of Mid-Range Reading

Margaret Kelleher introduced Booth and the importance of her work on the prosopography of women. She commented on how Booth provoked us to think about a feminist digital humanities and how we needed new tools.

Booth talked about the challenges of the typological gendered history. She also talked about how hard it is to hold together the critical work and the digital work.

She talked about how we set up English women as the most advanced women because the English are the most advanced. Yet the historical women themselves may refuse to be typecast and enrolled in social projects. Our prosopographies need metadata, but that metadata has a structured by power relations. How do we get more inclusive corpora that recover forgotten and marginalized peoples.

She talked about Recovery 3.0 examples like Colored Conventions, Orlando, Women Writers Online. She hopes to bring together many of these projects through linked data.

She then talked about the history of the Collective Biographies of Women project which "burst out of the book." It is not just about women writers.

Many womens' projects construct categories like national categories (the American woman vs the French woman.) Kelleher has written these constructions. What is true is the nationality of many of these projects. If you stick to English sources you get certain national constructions.

Booth asked "Is assigning types, names and nations ethical?" Famous Irish women are now being reclaimed as belonging to other nationalities. Their irishness is associated in London with certain attributes like "celtic passion." Collections tell us about the assumptions of typology of their time. What will be discovered about the CBW?

Nation building and public humanities motivated many of the collections that have been put together over time. Booth then talked about Lola Montez who is associated with Ireland. Montez had all sorts of stories told about her which she encouraged.

Booth talked about how they are joining CBW and SNAC so that people can go from one to the other. CBW has accomplished getting more women into SNAC which is mostly male.

Now they are focusing on two cohorts of women, African-Americans and writers. Booth is planning collaboration with Susan Brown, Laura Mandell, and Julia Flanders to link their data.

She then talked about "strategic typologies." You can't just add women to the record and stir. A truly rich social history will be more. We need to ask who is a woman and how sure are we of our type casting. She then talked about BESS markup used for team curated reading of events. This will let one pull out events in the different biographical collections associated with different women. This will allow mid-range reading where the text analysis is combined with the precision of close reading.

Day 2 at UCD

Panel Three: Digital Tools in the Humanities

Sandrine Peraldi (UCD): Integrating corpus-based tools in translation practices: methodological and professional implications

Peraldi gave a fascinating paper explaining the impact of computing on translation practice. She ran a study looking at why translators, though using computing extensively, were not using corpora.

Mark Stansbury and David Kelly (NUIG): The future of collaboration

Stansbury and Kelly showed and talked about a neat classics project. They talked about the collaboration between a classicist and technical person.

Padraig McCarron (Maynooth University): 1916 Degrees of Separation

McCarron talked about social network analysis of a corpus of letters. His title had 191 under erasure to pun on 6 degrees of separation.

Panel Four: Humanities in the Digital Age

Michelle Doran (TCD): The Two Cultures in the Digital Age

Doran is obsessed with the question of what is the digital humanities. Definitions often emphasize diversity and movement. She did a project looking at DH laboratories. What contributed to their proliferation? What underlying epistemological perspectives? She quoted Svennson on laboratory spaces.

For her MA she did a study of labs, but not maker spaces or virtual labs. It was hard to code the labs for study due to the instability or fluidity of the landscape. She talked about whether a lab that doesn't identify with DH can still be called a DH lab.

DH lab as an activist venue in an academic setting
DH lab as an institutional structure for collaboration
DH lab as a model for cyberinfrastructure

She then returned to the Snow two cultures debate. The culture of the humanities is now deeply woven with science practices, at least in DH labs. But there are dangers - the trojan horse problem.

Victoria Garnett (TCD): Humanities and Linguistic data in the Digital Age

Garnett looked at perceptions of research practices. How do we talk about analogue and digital humanists.

She was part of the Europeana Cloud project which ran from 2013 to 2016. They were trying to find out what the problems were with Europeana and propose solutions. They did case study work.

She then talked about the SPARKLE projects that tried to understand what humanists actually do. Now she is working on the PARTHENOS project that brings large EU infrastructure projects together to figure out the user requirements, data policies, standards, creation of a platform for research, training, and practices of research communication. PARTHENOS in training is looking at acculturation beyond just courses. See http://training.parthenos-project.eu .

She showed an example animation with Mork and Tork.

Keynote: Thinking-Through Big Data in the Humanities

I gave the afternoon keynote. In it I argued that the humanities had much to offer to big data. I touched on three ways we should contribute:

First, I outlined how the humanities have a history of dealing with big data. As we all know, ideas have histories, and we in the humanities know how to learn from the genesis of these ideas.
Second, I illustrated how we can contribute by learning to read the new genres of documents and tools that characterize big data discourse.
And lastly, I turned to the ethics of big data research, especially as it concerns us as we are tempted by the treasures at hand.

Update: they have now posted a postcast of my talk.

Panel Five: Literature, Society and Cultural Analytics Chair

Derek Greene (UCD): Constructing Social Networks of Irish and British Fiction, 1800-1922

Greene talked about social network analysis of a collection of novels. He talked about the history of method and its genealogy. Their corpus is 200 or so novels that were selected by a expert committee. They are trying to get a balance.

They had to hand code/annotated the mentions of characters. They had to hand code them because of entities like "second daughter." They then created a list of characters and for each character they have a dictionary wth attributes. They build character networks for each chapter.

He then showed visualizations of structure of these networks including word clouds of attributes for

There are two web sites for the project, including http://mlg.ucd.ie/novels-demo and http://www.nggprojectucd.ie

Karen Wade (UCD): Reconsidering the fictional letter in 19th century novels

Wade talked about a project on fictional letters. She went through a collection of novels and etxtracted letters in the novels. That let her look at the letter as a genre of writing as it was represented. In 1840 there was postal liberalization that made sending mail affordable to the middle classes. Was that reflected in the fictional letters? It seems that after 1840 the letters get shorter.

Francesca Benatti (Open University): Style and authorship in the Edinburgh Review

Benatti talked about a project in which she is collaborating with David King from computer science. They are trying to figure out who were the authors of the Edinburgh Review which at the time were hidden. She talked about the challenges digitizing their corpus.

They want to find the style of authors so they eliminated quotations. They then needed to find features to track. N-grams seemed to work. They are using the delta method developed by Burrows.

General Thoughts

We had a good conversation about self-identifying as a digital humanist. I was struck by the ongoing concerns people have about representing themselves as digital humanists. What does it say about inclusiveness? Can one be a not very technical digital humanist?

Part of the problem is that on technical subjects some people don't want to claim expertise they don't feel they have. This is true not just of digital humanities but of fields like philosophy. Who feels comfortable claiming expertise in Kant, Derrida, Heidegger?

Emilie Pine wrapped up the conference. Thank you Emilie Pine for such a gem of a conference. The Industrial Memories project will have their big coming out party in a week or so.

Digital Cultures Big Data And Society