Modelling Cultural Processes

These are my live notes from a invitational workshop about modelling cultural processes at the NII Shonan Village Center in Japan. This is a research retreat centre modelled on the Schloss Dagstuhl in Germany. It was organized by Mits Inaba, Martin Roth, and Gerhard Heyer from Leipzig and Ritsumeikan. You can see the description here.

Day 1

The first day of the workshop coincides with the 2011 Tōhoku earthquake and tsunami.

Gerhrard Heyer opened the workshop by introducing the theme of modelling cultural processes. Modelling cultural processes has certain features:

High degree of parallel processes
Unpredictable
Feedback loops

Some of the questions:

What is theoretical basis?
What data do we need? How can we access?
How can we analyze?
How does analysis lead to new insight?
How can we visualize the processes and models?

Martin Roth then talked about the organization of the workshop. They imagine three workgroups.

identity in the information age;
shifting contents, shifting meanings across media and across time;
constructions of culture.

The workshop is a mix of lectures and working groups that themselves.

Sugimoto, Shigeo: Modelling Culture

Dr. Sugimoto gave the opening talk. He is involved in the Media Art Database and does research on metadata and digital archives. He started by defining how he uses "digital archive" and "metadata". He takes a

He talked about the Great East Japan Earthquake Archive "Hinagiku" by the National Diet Library. He asked about what lessons we have learned. How can we archive community memory?

He was initially skeptical of community archives?. The disaster changed his mind about the importance of community digital archives.

Another issue is how to link materials. We might have individual photos, but how do we turn them into linked data. The underlying data model is crucial.

He talked about the variety of cultural objects from castles to intangible cultural objects like Washi (Japanese paper making). How do we preserve this variety?

He talked about conceptual instances (Harry Potter as opposed to a particular novel). The conceptual instance as opposed to the embodied (or material) instance. Libraries are item (embodied) centric. Encyclopedias, by contrast, provide descriptions of conceptual instances. He gave examples, including some of intangible cultural practices.

He then divided the archiving process between the Knowledge World (conceptual and intangible objects) and Embodied World. He then distinguished objects from events. Events and ephemeral objects (performances) need to be recorded. These recordings become the surrogate object. He talked about primary and secondary objects. A secondary object would be the metadata. What we archive are descriptions and recordings. What is often missing is contextual data and connections.

He showed a slide that made it look like the conceptual object (the Platonic form) creates the instance.

In terms of process there is a digitization process and then the archiving and then the "physicalize" the data for human appreciation. Then we have to preserve what has been archived.

Some questions:

What are the costs of archiving community memories? How does one curate or limit the potentially infinite amount of memory to preserve.
Does archiving actually help future generations? Or does it silence them?
Does digital archiving actually work? Or should I ask, does institutional archiving work?
How is the archive a form of culture itself? How do archives become surrogates for the thing?

A participant pointed out the the traditional way of preserving culture (especially intangible culture) is to teach children.

Another pointed out how certain skills get replaced by technologies. Do we preserve the skill for long division when everyone has a calculator?

Breakout Group on Shifting Content

I joined the shifting contents group. We kept our notes at: https://yourpart.eu/p/s5rekwxCYs

We started by talking about the problems of shifting content. What is shifting?

Shift in content
Shift in rhetorical style
Shift in genre of document

I'm intrigued by the theoretical issues:

How do you know when it is the content changing and not something in the gathering processes and technologies.
Can we talk about a stack of surrogates or metadata?
When/where does metadata start?
Can we distinguish types of data?
Is a photo in a menu a form of metadata?
How do machines distort or enhance interpretation?
How do groups that create metadata and archives change their interpretations? How do metadata standards change?
How should changes to metadata standards be documented?
Can we automate metadata generation?
Can we identify
What types of metadata can be generated automatically and what can't?
What happens when metadata becomes data?

We can identify different types of shifts:

Shift in the phenomenon itself
Shift in the content that represents the phenomenon
Shift in recording technology
Shift in metadata creation processes
Shift in interpretation
Shift in tools and how they encode
Shift in users and their interpretative strategies

Research questions:

What is the role of the automation of metadata tagging?
What methods can be used in the humanities to disambiguate types of shifts?
How can we design systems that are open about their committments and

Cathleen Kantner: Between Scylla and Charybdis: Trade-Offs between the Need for Generic Tools and the Need for Hermeneutic Sensitivity in the Digital Humanities

Kantner started with by talking about "social facts" and social science. She talked about how there isn't direct access to the social world. It is always an interpretation. We have to transform phenomena into scientific objects of research in order to study them. The creation of social-scientific research then has a feedback effect back onto the phenomenon. The object of research are themselves "social facts" that are constructed by our theoretical frames. Different theoretical views will lead to different forms of data.

She made the interesting point that we need different the theoretical frames and therefore different data corpora for the same phenomena. It is like

She then talked about different methods:

Inductive method starts from data and hypotheses emerge from the data. Individual cases lead to abstraction.
Hypothetic-deductive method supporters think there is always a pre-existing framework. Hypotheses are freely stated. Theories are always already part of process of generating experimental data. Data is interpreted in light of theories and theories in light of data.

Theories,

Influence the choice of relevant data = ontological level
Influence the ordering and explanation = epistemological level
Influence the orientation and application of knowledge in practice = normative level

So ... we need to make all these things explicit.

We need generic tools when we start projects but we also need to then customize them for hermeneutic sensitivity.

She mentioned the issue of understanding tools one uses, especially difficult tools like topic modelling.

She imagines an environment that is like a visual programming environment so that people can adapt tools or pipelines.

At Stuttgart they developed an Exploration Work Bench and the Complex Concept Builder. She described the corpus they created of newspapers stories that touch on war or conflict. She talked about using topic modelling and then machine learning to find new articles of a topic.

She showed visualizations including one of just the number of articles and how they jump with different wars. She also showed a process visualization of a type of workflow for research.

What is interesting is how they are trying to generate complex concept tools where they create subcorpora that ideally can be used to train a machine that can then identify more articles of the same type.

She then talked about the online coding system and more generally the coding process. She talked about how they guarantee the reliability through continuous annotator controls.

Her conclusions:

We need tools for non-programming scholars
We need a balance of generic tools and customizable tools
We need a combination of distance and close reading data and methods
Software often meant for one field will not have exports or features for other fields
We need tools to handle multi-lingual research.

Her project is a good example of a computational project that generated results of interest to political scientists that could not have been generated without computing.

She gave a brilliant short explanation of identity theory.

Her English book is available (for U of A members) at https://search.library.ualberta.ca/catalog/7913249

Hiroshi Yoshida: Metagaming and Ecologies of Video Games

Hiroshi started by talking about how games often generate other games outside and inside - metagames. Video games remediate all other media. Metagame is an idea from Richard Garfield. Any game generates more games around it. Garfield defines 4 types of metagames - about/within/around/without games. Games have become equipment for other games. See also https://www.upress.umn.edu/book-division/books/metagaming .

Hiroshi talked about two directions of metagaming:

Outward - games about games or games without games
Inward - games about games and games within games

He gave some examples like machinima - Red vs. Blue. This would be an outward metagame. He gave Super Paper Mario as an example of an inward metagame.

He believes that metagaming is an aspect of videogames that is specific to the medium which is why he is studying it.

The computer can be thought of as a meta-medium. Videogames can remediate all other media including videogames. He talked about reflexivity and self-reference.

He gave an example of Retro Game Challenge that makes a game of GameCenter CX. It is a game about playing retro games that uses the two screens of Nintendo DS.

Then he talked about the interrelationship of games. Each game that comes out is in a network of other games through intertextuality. We compare each games to other games. We get ecologies of games.

He made a point that only videogames can be meta; that card and board games can't be meta or aren't meta in the same way. Because videogames are on computers, which are general purpose machines, they can subsume other media as they are digitized.

Day 2

Mitsu Inaba: Implementing Platforms of Cultural Construction

Professor Inaba gave the opening talk of Day 2. He started ny talking about Vygotskyan's Triangle of Culture or Cultural History Activity Theory (CHAT). Vygotsky also talked about scaffolding as the way that elders help youth learn. Then Inaba talked about Engestrom's model of cultural as a "activity system" for generating outcomes. When different cultures encounter each other there are contradictions generated by activity systems. These contradictions or dialogue can lead to innovations.

Inaba then showed his cultural learning virtual space where an "elder" or Japanese student will show someone new to Japan around a virtual Shinto Shrine developed in 2nd Life. He talked about how the relationship between elder and student can change through the dialogue and visit. The new person can provoke reflection and lead to changed relationships. The virtual visit led to contradictions that led to collaborative understanding. He gave another example of children brought together to develop ideas for a twon.

Gerhard Heyer: Text Mining

Gerhard Heyer talked about text mining. Text mining is a combination of methods and tools for (semi)-automatic construction structured language. He then talked about different types of text mining (supervised or unsupervised; interactive or static.) He talked about 3 approaches:

Frequentist approach - word frequencies, co-occurence analysis, vector similarity
Bayesian approach - topic modeling
Neural nets

He showed WortSchatz, a very cool corpus exploration tool that comes with different corpora. The tool is married to a text corpus collection from Leipzig.

He showed how you can use WordSchatz to graph a word at different times to compare its evolution. He also talked about topic modelling and neural nets.

This was followed by a discussion of overinterpretation from topic modelling.

Christian Kaulman: NLP Toolbox in Action

Kaulman talked about the NLP Toolbox he and colleagues are developing. He started with Word2Vec and gave examples. He talked about an interesting context change algorithm that uses a sliding window to calculate context over time and graph it. This led to the "Overton-Window" which is a hypothesis that there is a window of statements that are tolerated on a topic. The viability of political ideas depends on any idea falling within the window. If you can nudge the window in a direction you can make ideas that weren't tolerable acceptable.

He then gave an example of how they are exploring context change in statements about immigration using their first pass technique. They think that they can get better results with social media data than newspapers that may have a more controlled vocabulary. They merged datasets to create a 3d model of the game space with the messages placed. It is a different perspective than playing that lets them study the geospatial distributions of the message system.

Workgroup Meeting

We had workgroup meetings and then reports back from the workgroups. Our group was looking at shifting contents. We talked about different types of shifts:

The shifts in diachronic corpora. We can use different techniques to try to identify when shifts take place. This is the most obvious form of shift.
Shifts in approaches to evidence. One could say that in some of the humanities we are seeing a shift to data-driven techniques. The change to data changes the questions one can ask and the practices of the fields. We could call this datafication.
Shifts in the metadata added to data. Researchers can add layers of metadata to enable different interpretations. This is like the accretions of commentary on a manuscript. With important data we could see evolving accretions of data and metadata that become shifts worth studying. Some of the metadata added may itself have metadata/documentation about the choices made. At some point the tradition of interpretation through metadata becomes an object of study (data) itself.

Martin Roth: Dark Souls

Martin began their talk on the subject of community and defining it. They are looking at community formation in Dark Souls (the videogame) through ingame practices and paratexts. In particular they are looking at memes, both in the game and outside. They have drawn on some datasets created by fans to have a 3d model of where all the messages are left.

Then they talked about meme-based community construction. They talked about "Priase the Sun", a gesture that has been picked up by the community, but which doesn't have a necessary meaning. It is used as a sign of being part of the community. See https://www.pcgamer.com/why-we-praise-the-sun-the-story-of-dark-souls-most-famous-gesture/ . Now you get ASCII versions of the gesture in comments elsewhere.

They have some observations for studying YouTube. There is an API that can be used to get all sorts of information, but not much about relationships between information. They are looking at what channels are related to other ones, like righ-wing ones. The network of related channels can show some of the politics of individuals.

Stefan Janicke: Visualization and Digital Humanities

Stefan started with a collage of his visualizaitons. See http://vizcovery.de

His entry into digital humanities was a tool callled GeoTemCo - a tool to study geospatial data. The DARIAH Geo-Spatial tool came from GeoTemCo. They don't seem to have given him credit.

He then showed a visualization of the increase in visualizations in DH. He also showed a diagram of the way humanities use visualizations. He talked about and showed examples of distant and close reading.

He then talked about "Overview first, zoom and filter, then details on demand" (from Shneiderman). He talked about how visualization should be designed.

He shouild eXChange a technique-driven development to create better tag clouds. The risk of a technique-driven development approach is that you may not involve the humanists as you are just playing with the technique. Problem-driven development is more productive as it starts from the humanities problem.

He talked about the trustworthiness of a visualization. How does one communicte the nature of the DH data to casual users?

Day 3

Workgroup Meeting

In our workgroup we dropped out the clouds of theory to talk about research processes. We began by wandering, asking questions, and

One gap is that between the theory (like what is a community) and the tools (like a classifier.) How does one operationalize a theory about something like what a community?
We need tools that watch us and learn from what we do manually so they can enhance what we do. What tasks lend themselves to (semi)automation?
It would be interesting to have visual analytical or editing tools - visual ways of creagting or painting data rather than representing it.
We talked about how addicting topic modelling is. Why is it that we keep returning to topic modelling? Is there a version of Godwin's law to the effect that any digital humanities discussion about tools will eventually return to topic modelling. I'll call this "Rockwell's guideline."

We then shifted to working on a model for how data-driven research works in the humanities and social sciences. This would allow us to identify different stages where different types of tools might be used. It also allowed us to identify at what stages content shifts and how it does. Here are the phases in the ideal model:

Corpus formation
Corpus exploration and enrichment
Passage identification, segmentation, and classification and coding
Synthesis and interpretation (visualization)
Documentation

We called this the "Shonan Model".

This is, in many ways a simplification. It should also be noted that there is a lot of iteration and returning back in this process.

In the afternoon we had an excursion to Kamakura.

Day 4

Workgroup Results

We then heard reports from the workgroups.

Constructing Culture talked about they have first discuss what culture is and what questions to ask about a culture (and perspectives on one.) They then chose use cases and focused on the culture of the multiplayer online battle arena game, League of Legends. They talked about how there is a e-sport culture around League of Legends. They talked about griefing and how players are disciplined. Can players who grief others be re-educated. There is also a spreading of toxic behaviour. If one player is toxic then others might also shift to toxic behaviour. There is also an interesting set of questions around people playing for skins that make no difference to your competitiveness in the game. The skins are, in some sense, cultural products.

They then talked about methods for studying and changing game culture. Can machine learning and text mining be used to measure behaviour? Or do we need close reading of games. This led to the issue of what sorts of data can we get?

We talked about the difference between studying game culture and studying the construction of culture in games. The study of construction depends on the study of culture. Further, we have the opportunity in the academy to actually construct the culture of game criticism and game design. The field is fluid so we can make significant difference.

Shifting Contents - I presented the 5 step Shonan model (see above) so I couldn't take notes. We had a fascinating discussion about the validation/verification of computing tools before they are used. Computer scientists test tools created for fields (like medical informatics) before letting the users use them. That doesn't seem to happen in the digital humanities. Why shouldn't humanities tools be treated as seriously as tools for other fields? Are there differences in humanities research practices from other sciences (medicine or social science) that makes validation immaterial? Perhaps validation is of interest to the scientist and is what distinguishes the science from development for the computer scientist.

Identity group talked about how the CS view of modelling is different from the humanist view. For the computer scientist the computer model is what they want - it is the end of research while for others the computer model is just part of a larger process and not the end.

The computer scientist doesn't really consider the computer to be real. A computer model is a representation of the "real world" rather than being part of it.

They also talked about expressions of identity. How is identity expressed and can that be datified for research? They used gamergate in Japan as a use case. This led to reflections back on the process of research and how that can be iterated.