Here are my notes from a couple of days at the The European Summer University in Digital Humanities. The schedule is at:

http://www.culingtec.uni-leipzig.de/ESU_C_T/node/728

These notes are being written live and will therefore be full of errors and omissions. Forgive me and send me corrections.

Projects

Here are some of the projects that presented on Tuesday the 20th

Konstantin Freybe: READ: Recognition and Enrichment of Archive Data

Konstantin Freybe works for a music institute in Leipzig and talked about the project "READ: Recognition and Enrichment of Archive Data". He slides are at http://slides.com/kfreybe/deck/live

READ has a large grant from the EU and is led from the University of Innsbruck. Their mission is to increase accessibility of archival documents and revolutionize the recognition of manuscripts. Their user groups are the archives that have documents and the scholars that want access. They aim to have 3 million page images accessible with thousands of pages of documentation.

Konstantin talked about his research on the snippets inside musical instruments. He is developing a workflow to identify the documents. The idea is to digitize and then use machine learning to

He talked about Transckribus, a web API to trainable software. The software should be able to recognize who wrote something once trained. ScanREAD is a second tool that helps with the digitization of documents. This lets you use a smartphone to scan something and then upload for processing. They also have a e-Learning-App that is designed to help users decipher historial documents.

See their web sites:

READ platform: http://transkribus.eu/
READ homepage: http://read.transkribus.eu/
User Guide: http://transkribus.eu/wiki/

Dinara Gagarina: System of (History-Oriented Information) Systems

Dinara Gagarina presented on "System of (History-Oriented Information) Systems." She is from a new DH institute that specialized in the history of science at Perm State University in Russia. See http://digitalhistory.ru . What is interesting is that they deal with the history of systems.

They have a catalogue of historical information systems.
They have old newspapers at http://permnewspapers.ru
They have information about ethnic units of the Russian Army in World War 1.
They have a parliamentary history of pre-revolutionary Russian.

What are history-oriented information systems? They are systems developed to store, organize and provide access to historical information along with analytical processing. They seem to have a database of such systems. If I understood her, they are documenting old databases. This gives researchers a tool for finding resources.

Randa El Khatib: TopoText 2.0: Prototyping Modes of Interactive Mapping

Randa El Khatib gave a paper about how they are trying to create better mapping tools. The first version of TopoText 1.0 had an interface for location detecting. It used the Stanford NER and connected to Google Maps. It also provides a word cloud of the collocates for a particular location.

TopoText 1.0 was limited in that it couldn't support an ongoing, collaborative mapping project. It wasn't connected to a gazetteer and it was a bit of a "black box." The tools also hogged the data and forced the researchers to work within a specific platform. A lot tools don't let you export data - they lock you in.

She talked about how the web should let us be able to collaborate at a distance. She talked about social knowledge creation. The annotations in TT 2.0 can be exported and shared. It also works with the GeoNames API, it uses Leaflet rather than Google Maps Engine. It exports to open CSV documents. You can also share the annotated maps with others. I think this is the URL for the project: https://github.com/mohamadjaber/topotext

This project grew out of pedagogical experiment between digital humanities and computer science.

Geoffrey Rockwell: Replication as a way of knowing in the digital humanities

I gave a paper talking about our experiments in replicating important historical methods. Here is the abstract:

Much new knowledge in the digital humanities comes from the practices of encoding and programming not through discourse. These practices can be considered forms of modelling in the active sense of making by modelling or, as I like to call them, practices of thinking-through. Alas, these practices and the associated ways of knowing are not captured or communicated very well through the usual academic forms of publication which come out of discursive knowledge traditions. In this talk I will argue for “replication" as a way of thinking-through the making of code. I will give examples and conclude by arguing that such thinking-through replication is critical to the digital literacy needed in the age of big data and algorithms.

The two IPython notebooks that I talked about are here:

• Mendenhall’s Characterstic Curve: https://github.com/sgsinclair/epistemologica/blob/master/Mendenhall-CharacteristicCurve.ipynb • John B. Smith and Imagery: https://github.com/sgsinclair/epistemologica/blob/master/Smith-Imagery.ipynb

Workshop: Stylometry

Maciej Eder was running a workshop on stylometry for the summer school. He has a nice site about computational stylistics at: https://sites.google.com/site/computationalstylistics/

In the morning he gave us a historical overview that started with Valla and the Donation of Constantine. He talked about William Benjamin Smith (aka Conrad Mascol, 1850-1934) and his work on the authorship of the Pauline Epistles. He also talked about Wincenty Lutoslawski (1863 - 1954) doing work on the chronology of Plato's dialogues. Lutoslawski was the first to use the term "stylometrie".

Eder then defined stylistics. He talked about the idea of an authorial fingerprint being:

undiscoverable with the naked eye
beyond authorial control
resistant to imitation, plagiarism and parody

The popular solution is to look at frequencies of the most frequent words.

He talked about signals in a collection of texts that one wants to tease out. These include:

Noise
Literary quality
Chronology
Translation
Genre
Education
Gender
Authorship
Literary tradition

It is easy to mix these different signals.

In the later sessions he took us through an excellent hands-on workshop on using Gephi for network analysis of stylistic results.

Small Garden Allotment Museum

In the afternoon we had a trip by canal boat on the canals and the visited the Small Garden Allotment museum: http://kleingarten-museum.de/english_guide