philosophi.ca |
Main »
Pathways To SEASRIn January of 2009 I attended the Tools for Data-Driven Scholarship meeting funded by Mellon and hosted by the NCSA. Note: These notes were written during the conference. They are biased and only what I had the time to type. As a result, when things got interesting I stopped taking notes. Day 1Thursday, January 15, 2009. IntroductionsThe first session included three introductions. Chris Mackie gave us a picture of what he hoped for SEASR and how it is different from other tool projects. For Chris the difference is sustainability and the sustainability of SEASR will come from its adoption by others. Hence the reason for Pathways is to get others (us) to learn about SEASR, try our ideas out, and then be willing to install SEASR locally and extend it. Mackie encouraged us to think of ourselves as co-owners, co-navigators, and co-developers. I suspect that eventually we will see a consortial model, possibly as part of Bamboo, but for the moment it seems less formal.
Michael Welge gave an overview of SEASR and the workshop. He expanded on the sustainability model talking about the three social elements in SEASR.
Michael also described SEASR as a data-driven environment, by which they mean that the arrival of data triggers things (as in a data-flow environment.) I think there is a weak association to the hermeneutical principle that the text should drive the interpretation. ZoteroXavier Llora gave an interesting demo of a proof-of-concept using Zotero to connect to SEASR. I love the idea of plug-ins to . The idea is that you can create an application in SEASR that can be a plug-in to Zotero that you can then run on any text (or collection) saved. Zotero can thus become an interface to other tools and one could build a set of favorite tools from various places. Xavier also talked about a connection with Fedora which I believe allows one to save a collection from Zotero to a repository that could then be processed by SEASR. UIMAUIMA stands for Unstructured Information Management Applications and is an IBM analytical engine for unstructured data from phone conversations to e-mail. They gave an exmaple of a flow of UIMA modules (chains) for part of speech tagging. They describe the flow (chain?) in XML. It seems to me similar to SEASR, but without the visual flow programming environment. I'm assuming that UIMA has a whole mess of modules that are useful to SEASR like "sentiment analysis". The presenter gave an example of using Mark Twain. He joked that it was, "How to cheat English literature with computer science." Next we saw an example of sentiment analysis using SYNnet, a tool that uses synonym connections using a thesaurus. I like the idea of using a thesaurus to map words so you can find a sentiment like "joy" by finding the synonyms. The point was to show how one can integrate other tools into SEASR just as the Zotero demo showed how SEASR can be integrated into other things. NESTERSteve Downie demoed NESTER (Networked Environment Sonic-Toolkits for Exploratory Research) and NEMA (Networked Environment for Music Analysis.) Before that he gave some background on the MIREX evaluation exchange and the model of having virtual labs that can analyze proprietary datasets (ie. lots of copyrighted music) without access. The idea is really smart - that people can submit algorithms that can be plugged into a SEASR framework that are then run behind the copyright firewall here and the results sent to the researchers. I think Steve has demonstrated the value of M2K (built on D2K the predecessor to SEASR). I have seen Steve's demos, for example at the SHARCNET workshop on humanities and HPC, and the link between the society (IMIRSEL), the exchanges, and the tools is compelling. Here is showed slides from a bird song project where the system can be trained to recognize songs. SEASR gives them the ability to put together web services in the visual programming environment. MeandreIn the category of a great name for software is "Meandre", which doesn't stand for anything, but does sound like what you can do with it - meander through ideas of text. It is first of all a data-flow visual programming environment for SEASR (and other) components. It implements the idea John Bradley and I had in Eye-ConTACT. Important is the standardization of how we define a component and how we define a flow. In principle Meandre could disappear over time if the standardization is done right. RDF is used to describe flows and share them. Then one can do reasoning on top of this. Meandre's metadata is an important development that builds on Dublin Core (for texts) adding component flow descriptions. Stéfan was working on TAML and TARL to do things like this. You can experiment with Meandre at http://seasr.org/download/ . This worked quite smoothly on my Mac, I was impressed. I was up and generating word clouds before the demo was over. ZigZag is a scripting language based on Python that lets one write flows and then run them elsewhere. ZigZag has automatic paralellization if you have access to a cluster. Both Meandre and ZigZag output the RDF descriptors of a flow. One way to think of these is that one is a visual programming environment and one is a scripting programming environment. The RDF descriptor is presumably then run by some engine. They have a MAU file (Meandre Archive Unit) that bundles the flow and components together into executables. Community HubLoretta Auvil talked about the community hub where components and flows can be discovered, shared and executed. The community hub has some ManyEyes features, but isn't quite working yet. Then things got really technical as we were walked through running Meandre. Eclipse Plug-InAmit Kumar introduced an Eclipse Plug-In for managing components on the server and creating new ones. AdoptionJohn Unsworth talked about adoption. The Pathways project is to help us adopt. John went on to talk about the Hathi trust. They hope to have at UIUC a repository of texts from the Hathi trust, Google books and so on (which would be millions of books). This captive collection may be only usable at UIUC or through some trusted mechanism. The idea is that this captive collection will need various tools to be accessible and SEASR could be the way cool tools are developed. Day 2MONKStéfan Sinclair demonstrated MONK (Metadata Offers New Knowledge). MONK offers a more accessible interface to SEASR for academics. They have also been adapting SEASR so that it can be combined with other projects into applications. Tools get gathered into Toolsets that can be gathered into an Application. The applications look like what you would share with colleagues, but even the Workbench is usable. Part of MONK is a preprocessing system that gets the XML texts into a common format and "adorns" them with part-of-speech tags. The philosophy of MONK is to pay attention to metadata and encoding of texts in order to get better knowledge from analysis.
FutureLensFutureLens is based on FeatureLens, a visualization tool from Maryland. One neat feature is the ability to combine terms. If you see two patterns that are the same phenomenon you can combine them and then see the distribution of the combination. Thus it lets you create clusters of patterns into themes. VUEAnoop Kumar of the Visual Understanding Environment team showed their project which allows one to create visual concept maps that can then be used for presentations. His presentation was created with VUE and he basically ran an animated step through his map. Very neat. |
Navigate |
Page last modified on January 26, 2009, at 10:02 AM - Powered by PmWiki |