Main »

Northwestern Computational Research Day

These are my live notes on the Northwestern Computational Research Day.

Note: They are being written live and will therefore be full of typos and other problems. Write me if you want something corrected.

Jay Walsh: Introductions

Walsh is the VP Research and he talked about new infrastructure being developed to support research

Geoffrey Rockwell: Big Data and the Humanities

I gave the opening keynote on how the humanities should engage in and with data science.

Sylverster Johnson: Encoding and Complex Searches for a Scholarly Edition of an Early Modern Text: the Purchas Project

Johnson talked about the Purchas his Pilgrimage (1626) work - a large (1100 pages) work surveying world religions full of maps. The project is creating a carefully curated TEI/XML edition along with search and visualization tools. They are using Named Entity Recognition to tag people, places, organizations. They are using MorphAdorner (which was created at Northwestern) to tag linguistic features.

One of the interesting things about the Purchas work is its intersectionality - Purchas weaves together religion, race, and anthropology in the work.

Louis Amaral: Finding the "Right" Data

Amaral does science of science. He looks for situations that lead to creativity. We know that certain structures seems to support creative work. He looks for data that he can look at. He also like to look at the flow of knowledge between sciences. Some of his results:

  • Teams have become much more important to the production of science - does the way these teams are formed affect the output?
  • Teams with more experienced researchers tend to publish more high impact papers
  • Teams that don't incorporate new people tend to publish in low impact journals - adding new collaborators seems to lead to higher quality
  • There is pressure on female researches to have a higher then average impact.

He talked about various evaluation methods like crowdsourcing, expert opinion ... He looks at how these correlate in IMDB as a way of testing them. That allows them to identify those evaluation metics that correlates well to research. Citations seem to actually be one of the best metrics.

He is also looking at research in action. What works within research teams? To look at this they used soccer to look at density of passing and correlation with scoring. The teams like Span who pass a lot win a lot. Can one generalize to research? He showed interesting visualizations of collaborations in his teams. What is true for soccer seems to be true of research teams.

William Kath: This your brain on QUEST of Computational modeling on neurons

Kath started by talking about how neurons work and how they compute. He showed some fascinating animations of dentrites and their complexity and connections. Then he talked about brain activity and showed an interesting video of a rat making decisions with sonnification of the neuron firings. Imaging is just coming on line and transforming our understanding.

He talked about QUEST - a HPC system at Northwestern that he uses to model neuron's firing so you can then simulate the model. They have modeled an AND gate that makes decisions based on comparison.

Obama has funded the BRAIN initiative. As long as we can't easily experiment on brains we will need computational models.

Eric Huls: Guest Keynote: How Analytics is Changing Everything, Including Insurance

Huls is a Senior VP for Allstate who leads a big data analytics team. They have a partnership with Northwestern hiring students. His talk was on the analytic revolution.

In his field analytics is understanding patterns in data to help make decisions or to create a product that others can use to make decisions. Data science is the intersection of all the skills that are needed including domain knowledge, hacking skills, and then math and stats. In practice few people have all the skills so it is teams that do data science. They look for people strong in one or two fields that can work with others.

The history of analytics: Lloyds of London started as a coffee shop where people would meet to negotiate insurance for ships. The analytics back then was about looking at the characteristics of the ship and voyage to decide how to underwrite a trip. Early uses of analytical technology included cryptography, simulations for atom bomb, and so on. Business uses got started in the 50s and 60s. In the 1970s they got real-time analytics. In the 2000s it took off (big bang). It has become ubiquitous.

What they see is mass personalization (of advertising and other services.) Real time advice and self-tracking.

He talked about the analytic decision making process. It is a buzzword and many have unreasonable expectations. You need to have goals in mind. Then you take constraints. You have tactics and data. All of these allow you to create or use models. If you can define the goals and constraints then you can try to solve the problem mathematically.

A lot of analytics is around retail and information gathered by retail stores. Even if you don't connect to a store's wifi they can pick up data from your cell phone and track you path (and web sites do.) Then they can optimize store layout. When you pay with a credit card they can then connect to financial data. He talked about the Target/pregnancy story.

Then he talked about love and how it (dating sites) often starts in a Hadoop cluster. He quoted a statistic to the effect that twice as many relationships that start online result in marriage than those that start offline. Online marriages report fewer divorces and greater satisfaction.

He then showed us the variety of ways analytics is used in insurance. He talked about how much growth there is in the field. They plan to hire some 40 data science folk at Allstate.

He was asked about ethics. He answered that a) there are a lot of regulations that they have to pay attention to, and b) internally they review all new products and think about the implications. They also have clear controls on personally identifiable information.

He then talked about how they now want to improve our driving rather than just insure us. It is about more than correlation - now with car data they can help people drive more safely.

Somebody asked about diverless cars. Allstate doesn't want any one person to have an accident, but if people don't have accidents then no one will get insurance. Driverless cars may cut down the need for driving insurance, but there are other opportunities for fleet insurance that could open up.

Allstate has more data than just about anyone other than State Farm.

I asked what information the would like to have and he answered that it would be useful to have biometric data for life insurance and that it would be useful to have predictive life changes information that would allow them to market products at key moments. For example, when people get married they go from two to one policy. Allstate would like to know beforehand that you are getting married so they can be the one.

Jillana Enteen: Technologies, Transitions, Translations: Analyzing Gender Reassignment Surgeries and the Thai Medical Tourism Industry

Jillana Enteen is the co-director of the NU DH Lab. She started by talking the Thai Medical Tourism Industry and one outfit in particular. They (Yanhee International Hospital) recently started advertising to the west on the web and have had to use the language of the audience. The 2003 model was of a all-inclusive tourism and medical tourism. By 2009 it is called Sex Reassignment Surgery and it is not about customizing a package. In 2014 they made a parallel site called Dream Come True and it has a special site for Sex TransGender Surgery. They now talk about western standards and language they think a western audience would appreciate.

She talked about the challenges she has grabbing web sites (sometimes with Flash) and creating a research database.

I was struck by the interesting challenges of her project. She needs better scrapers. She needs to ways of structuring her data for research. How do you study interface changes?

Emily Curtis Walters: Motion Pictures and Pictures in Motion: Reconstructing Imaginative Landscape of the WWII Generation in Leeds, 1915 - 1940

Walters is a PhD student looking at how people talked about moving pictures. She gave an example of someone who claimed to be a conscientious objector based on seeing films like All Quiet on the Western Front. Films resurfaced many times as evidence for being a pacifist. This led to debates about the effects of cinema on perceptions of war. WWI led to a war boom of books and other media about war.

These phenomenon are but one way that the generation of the Great War passed on their experiences. "Daddy, what did you do in the Great War?"

She is planning to create a database of all works of film, theatre and other works of art that communicate about WWI. She is mapping out the stories about the war that people would have experienced.

She then walked us through how she is recovering the reception history of a play/movie about WW!II which she is mapping. She uses the Torq system to visualize the life of stories.



edit SideBar

Page last modified on April 15, 2015, at 07:52 AM - Powered by PmWiki