philosophi.ca |
Main »
Linked Infrastructure For Networked Cultural Scholarship Team Meeting 2019These are my notes about the Linked Infrastructure For Networked Cultural Scholarship Team Meeting in Banff, Alberta. Note that these notes are erratic. I write when the battery is charged and when things make sense. There are official notes at https://tinyurl.com/lincsnotes Saturday, Sept. 14, 2019Peter Patel-SchneiderSchneider started us off. He works now for Samsung research on semantic web technologies. He talked about how there are commercial knowledge graphs, like the Google KG that ingests everything.
Wikidata is a Knowledge Graph full of Triple Facts. It also has ranks (facts that are deprecated or preferred). It has qualifiers (things attached to facts like "in the year 1897 the population of Berlin is X" where the year is a qualifier.) There are references (as in this information came from.) There is a simple ontology language (RDFS). There are lots of tools and there is the culture of wikipedia. In his opinion it is big but sprawling. He showed a demo, Douglas Adam. He commented on the complexity and issues in wikidata. It is multi-lingual, but that means there are special ID numbers. Wikidata gets used in Wikipedias. It isn't RDF, but there is official RDF dump so it fits in the Linked Open Data Cloud. The job at hand is to:
He then talked about Powerful Ontologies. To use powerful ontology languages like OWL that support inference and so on, then you get a good representation, but you have to figure out the tokens/nodes. He then got to what he wanted to say, which is how can we develop a logic that captures the intuitions in data like wikidata. If we can build a useful logic then we can do useful reasoning. I asked whether it was possible to have one logic? The history of philosophy seems to be one of discovering how good ideas about logic go wrong. He answered by saying that we have two different problems:
My sense is that the humanities are those disciplines that deal with the ideographic (the unique cases and exceptions) not the nomographic (regular, lawful cases.) We therefore often focus on exactly what won't work with a logic. It gets even worse in that we want to compare the ideographic to the nomographic. We want an exception to logic and a standard logic too. We want to compare the exceptional epistemology to whatever passes as everyday epistemology. My intuition is that Peter mentioned Cyc as a project that spent 30 years to develop a master ontology of common knowledge which still in incomplete. Susan Brown: LINCS OverviewSusan Brown introduced us to the project and the team meeting. LINCS was imagined to do things like
We are a $5 million Cyberinfrastructure project (CFI and partner funding). It lasts 3 years with 6 university partners. 48 humanities researchers and additional technical team. We anticipate training 200 HQP. She showed a number of useful visualizations of the project from different perspectives. She showed a flow chart that captures what the project has to do:
How will this be maintained? She then showed a systems diagram with data sources at the bottom and flowing up to interfaces. We then talked about the Project Charter and Membership Guidelines. A good bibliographic reference is Ruecker, S. and M. Radzikowska (2008). The Iterative Design of a Project Charter for Interdisciplinary Research. Designing Interactive Systems, ACM Press. We talked about what has to be done and who has to do it. A lot of this came from the proposal and the letters of agreement. She mentioned LOUD (Linked Open Usable Data) as what we want. We were reminded that we want to administer ourselves in a respectful and non-hierarchical fashion. Research questions panel: Deanna Reder, Janelle Jenstad, Diane Jakacki, Stacy Allison-Cassin, Jon BathStacy Allison-Cassin talked about the challenges of abstract ideas in libraries like what a "work" is. She talked about how this is especially a challenge in music. She works with the Mariposa Folk Festival to maintain an archive. Janelle Jenstad talked about the projects she is involved with like the Early Modern Map of London. Jon Bath at the U of Saskatchewan started by talking about building silos. He now wants to stop building silos. Deanna Reder at SFU is the PI of the people and the text. It is a project about indigenous literatures widely defined. Diane Jakacki (scroll down) is the lead of REED London that is using CWRC. They then talked about what questions LINCS might help them answer:
They had an interesting discussion about openness and how we often don't want it. Archives show and hide things. Some of the issues:
I believe that the value of LOD is to help in claims being made. One way we check claims is to follow entities named in the claim to check the claim. What do we need to learn about the technologies:
Ichiro Fujinaga: SIMSSA and LOD projectSIMSSA stands for Single Interface for Music Score Searching and Analysis. Fijinaga talked about the challenges to music. In textual disciplines we have Google Books, but there is no search for music. We don't even really know what to search for. There is music recognition tools that can add information to images. IIIF provides an image interoperability framework. They have standards for music encoding (MEI). They have various tools. They hope that if they build others will come. OMR - Optical Music Recognition is a core technology that converts an image score into a computer readable music file. He showed the Cantus Ultimus. This lets one search across distributed score collections. He showed search for a sequence of pitches which was very cool. Then he showed MusicLibs that lets one search all sorts of music. He talked about provenance and the challenges of different types of provenance. There is the provenance of the musical work, provenance of source of instance, provenance of computer files, and who did the cataloging. To do this they use RDF Quads - named graphs that connect sources. Then he talked about feature extraction - how they can extract neat sets of features and create study sets. The types of questions that they can ask are really cool as in "select from fugues printed in London those that modulate to G Major." He talked about how he wants to now link to external data like prosographical databases. I asked about the cool queries that he described and what the inteface was to design queries. He pointed me to a apper about JSYMBOLIC 2.2. Heather Dunn: Canadian Heritage Information Network LOD initiatives (CHIN)Dun talked about CHIN which has a lot of data about Canadian artefacts. A lot of their data is flat. They have lots of data which is not online as it is not bilingual. Artists in Canada is one database they have that might be useful. They have developed a Nomenclature that is used for museum cataloguing. There is an overlap between object names between their different projects. They haven't yet connected their own data. They are using PoolParty to manage their nomenclature. They are trying to figure out how to scale up and make decisions about semantic data. She talked about the Records Data Model that is meant to eventually cover all museums. They are starting with the "agents" - beyond just the artists. They are basing what they do on CIDOC-CRM which is widely used and is event-based (which is good for historical objects and actors.) She showed a complex data pipeline. They are trying to figure out how to make it simpler to museums. She was asked about reconciliation when you have different ontologies. Deb Stacey: Ontology policy/strategyStacey started by talking about what an ontology is. It is a specification of a shared conceptualization. It should have a shared vocabulary in a coherent and consistent manner. It can guarantee consistency. The ontologies we know and love:
Some of the problems with ontologies are that they are hard to keep simple. People end up picking and choosing and disagreeing. We should reuse those of others. There is a balance between simplicity and semantics. We need to pick ontologies that are well known in community of practice. Ontologies provide structure that lets you see some things, but hides others. Does this fit with the exceptionalism of the humanities? Reasoning is important and hard. If one gets it right you can do neat reasoning. There is a limit to the current technology of reasoning. Some of the issues include:
And reasoning issues are evolving. With Reasoning we get certain things:
She talked about shapes and SHACL (Shapes Constraint Language). This allows you to constrain and then validate graph type properties. A shape is a way to identify metadata about a particular type of resource. You can describe what has to be there and what is optional. She talked about Foundational Ontologies that are basic, upper ontologies that describe superclasses that everyone agrees about. You then build on top of them. CIDOC, Dolce, SUMO are examples that are big. Dynamic Ontologies are drive by the structuring of the data. As one edits the data it can trigger changes to the ontology. This can be a version of versioning. It can be community driven, but it also expensive in ways. Above all, we have to accept that changes will happen and we have to change our ontologies. We had a discussion about how much an ontology might force on us and whether we need them. To some extent you always have some structure if you have structured data, an ontology makes it explicit. A great question asked was whether we should have so many people together in this project. What is the advantage of a large project. Constance Crompton: Is Less More?Constance Crompton discussed ontologies that could cross-walk with the TEI. She has been looking at what to pull from the TEI and how there is a fit with other ontologies. There are number challenges coming up:
There are different options for TEI contributions. We could have tools that would take people through the transformations needed to fit in LINCS. That would be for the most involved. Or we could have lighter ways for people to connect. Or just run automated tools on TEI that is open. She then presented questions:
I'm convinced that what we need is just in time markup where we can add interpretative markup and use it with other layers of markup. Constance suggested that the paratextual information may be more important. That what people want to know is about Atwood, not line numbers in a text. Lisa Goddard and Stacy Allison-Cassin: Storage and Publishing PlatformsWe then had an important discussion about storage and platforms. Important as we need to make up our minds soon. Stacy Allison-Cassin started by talking about why she got involved in wikidata. She didn't have a grant and needed something existing that she could add to. She found that there wasn't a lot of Canadian information in wikipedia so she ran a Music in Canada @ 150 Wikimedia Project. It was a year long project with a pre-conference workshops, editathons and outputs. She talked about how you have to be invested in the community relationships with other metadata folk. You can't just use it. She reminded us that there will always be mistakes in the data. She showed us how there is a mistake icon that shows up automatically. She talked about the Witchfinder General project where they tried to see what they could draw from wikidata about witches in Scotland. They documented all sorts of problems they found. She discussed how one can propose new properties and they get voted on. She then showed some tools like the Reasonator and Mix and Match integration and then showed a bunch of projects including projects using Wikibase like Rhizome. Lisa Goddard talked about how we have to make a decision. She pointed out that we have to make a decision soonish. She believes we should use Wikibase. What are the strengths or weaknesses. She talked about what we need (criteria):
She talked about APIs and whether we may have to extend the Wikidata API. She walked us through some of these criteria and how they play out in Wikibase. The Wikidata model is nice and humanly readable, but it doesn't need do qualifications as well. Provenance is at the statement level. Wikibase rank give simple way to indicate certainty. Lisa was asking about what might be alternatives. I don't know much about platforms but here are some options:
Lisa talked about Reification. This is obviously a complex issue. The key is how can we make the decision. What do we need to know for this to work. Here are some things we think we need:
Denilson Barbosa: Conversion and Diffbot Text Understanding Quasi-DemoDenilson talked about how we will always need to convert or extract metadata. There are datasets like the Internet Archive that would be great to be able to use to extract things. He then talked about using machine learned models for identifying entities and disambiguating entities to a reference KG. We need one or more reference Knowledge Graphs. We are partnering with Diffbot which has clients like the NSA, Ebay and Amazon. He talked about Diffbot's mission which is ambitious. They will make a NER and disambiguation tool open for us. They may give us limited access to their Knowledge Graph. He then demoed a simple tool that does NER using Diffbot KG and then returns entities. The tool was good at figuring out who "she" and "her" was. Lightning talks
Sunday, Sept. 15, 2019Access - Tools session
Breakout SessionsWe then had reports from the breakout groups. I was in a Tools group. We came up with the following:
My sense is that we will have the following types of tools:
At this point I had to leave. |
Navigate |
Page last modified on September 15, 2019, at 12:08 PM - Powered by PmWiki |