These are very rough notes on the Digital Infrastructure Summit 2014 in Ottawa, January 28th and 29th. The Twitter hashtag is #LCDI. These were written live so there are lots of misunderstandings, typos and gaps.
This summit was organized by the DI Leadership Council, of which I was a member. You can see more about the Council at http://digitalleadership.ca/. In particular, see the documents prepared on the Canadian DI Environmental Scan, and the International DI Scan, at http://digitalleadership.ca/resource/di-summit-2014-charge-participants/. These documents are an excellent picture of the state of digital research infrastructure in Canada at this point and the challenges ahead.
One of the key messages of this summit should be that the Leadership Council was just a gathering that should, as soon as possible, be replaced by a coordinating group that can actually get things done. We were a ladder, not the roof.
Jay Black (SFU) & Steven Liss (Queens): Co-Chairs
Jay and Steven talked about our goals for the summit:
- Top 5 actions that need to be taken
- Identify how stakeholders can come together
- What role do we see for a policy framework – who should write it, what should be in it, how should it be updated
- We need a timeline and roadmap
- We need commitments from stakeholders on how to advance agenda
Chad Gaffield (SSHRC)
Chad talked about how we are at a time of deep conceptual change. We have gathered a unique coalition of the likeminded.
Chad talked about how we want something robust that includes many forms of learning (not just research.) It needs to be accessible across the country and different It needs to be sustainable. It needs to be an ecosystem that embraces collaboration and coordination. It needs to be innovative in the sense that it is not just a better way, but a different way of doing thing. We want to embrace new ways to think about learning, research and interaction.
What are we going to do? Our mission is to take steps forward. We will do that through engagement, coordination, and collaboration. The TC 3?+ has been leading on data management.
He talked about a T shape that is guided by the coalition of the likeminded so that individual stakeholders can build depths.
Initially it should be funded by existing resources. It can be built.
Now we have to identify next steps. The who, where, why. Chad compared this to the building of the first intercontinental railroad to unify a country.
The Director of Platforms and Strategic Investments from CIHR talked about what they are doing. Governing council of CIHR has asked them to increase funding to investigator initiated research. Governing Council has also asked them to look at transformative and transparent strategic investments. And thirdly they want to collaborate. They have a strategy for patient-oriented research (See http://www.cihr-irsc.gc.ca/e/41204.html). CIHR is partnering with partners on data platforms and services. They want data available in an accessible, standardized, and linked fashion to researchers. They see large comprehensive datasets as a research platform. Longitudinal data is research infrastructure for CIHR. How can they get data from different provinces standardized and combined? There are also interesting privacy and ethical issues.
He ended by saying that they are very open to thinking about things in a very different way.
Pierre Charest, NSERC
Pierre talked about collaboration like the Consultation Document from the TC 3+ (See http://theoreti.ca/?p=4983.) He talked about the supportive environment. The ultimate goal is to support research and generate new knowledge (not to create infrastructure.) New knowledge then becomes infrastructure for others. They see a cycle from data to datasets to insights that when shared create new data.
A representative from Genome Canada talked investments in genomics. They need:
- High-speed networks
- Ability to handle really large data-sets (1000 trillion letters of code)
- Ability to analyze big data
- Ability to integrate different datasets with genomic data (like data about human cells or bacteria in us)
- Ability to make data available in ways that are connected to clinical data so that it can be interpreted for our health
These types of data are not just important to humans. We also have to study animal and plant data. Finally we have to be able to partner/keep up with international projects.
A representative from CFI talked about what they are doing, including a 56 million award in 2012 to stabilize Compute Canada. They are partnering with different granting councils.
He talked about a Cartesian approach - a rational top down approach that we dream of, but rarely implement (and which is not very Canadian). Instead we have an approach where different stakeholders take leadership in different ways and different times. This may seem disorganized, but can be closer to needs of researchers.
Again he asserted that DI is not an end, it is about enabling world-class research to generate new knowledge and understanding. CFI is broadening their approach of what research infrastructure is. We need to balance needs for institutional resources and shared resources. They prefer shared resources (through Compute Canada), but recognize the value of local resources.
CFI is working on a Cyperinfrastructure Initiative for 2015. They want to build on achievements since 2006. They want to refresh the national platforms. There could be "thematic" initiatives around things like data. They like to see stuff based on long-range research plans.
They are listening to priorities and actions that have the potential to have lasting impact.
Questions and Answers
We had a session of questions from the floor. Some of the questions asked with (answers):
- What should be done now? (Start doing it and tweak later.)
- How do we deal with emergent complexity? (As we build it we will find new problems, but those are the problems of success.)
- What is the role of the private sector? (Sometimes industry is less interested in developing raw data that is needed. They get involved later.)
- Are there objective metrics for measuring success of digital infrastructure? How would we know it was a worthwhile investment?
- What about small datasets?
- Should my project be the model for all to follow? Something we all feel. (We all want to get more for our projects.)
Reflections: Some of the messages I heard in the first sessions include:
- Infrastructure is not an end in itself.
- Location of (digital) infrastructure doesn't matter.
- There is huge potential in government open datasets.
- A real temptation is to argue that your pet project should become infrastructure because then you get ongoing funding.
Janet gave us an overview starting with what is Digital Infrastructure. See my photo of her slide: . You can also see her slides at http://digitalleadership.ca/resource/di-summit-2014-presentations/. She gave a brief history. She talked about a long international scan out of which 3 things come:
- National DI initiatives are often framed by government policies - policy frameworks are important
- Multiple stakeholders have to be involved
- Early emphasis on physical infrastructure is changing to data focus
- More middleware/tools
- Rewards and incentives for good RDM
- Stronger innovation system
- Management and stewardship of data needs to be rewarded
- Provincial involvement
- Extracting value from international involvement
- No coordination
- Insufficient leadership
- Lack of coordination/leadership
- Lack of comprehensive policy
- Weakness in data management
Greg talked about broader social media engagement. He felt that the Leadership Council did a good job of establishing a voice and crossing sectors, but still needs to be broader. He led an initiative to build a digital presence at http://digitalleadership.ca/ where you can see a crowdsourcing campaign.
Dennis talked about how to move forward. A critical issue is WHO leads DI and HOW leadership operates. He argued for a coalition with working groups including groups that:
- Refine funding system
- Address weaknesses in data management
- Articulate value proposition upstream and downstream - what is it for?
Government is going to be very important setting out high level policy and funding frameworks. Private sector will gain innovation advantages from research findings.
The Leadership Council assumes acceptance of the problem statement. We have had a lot of talk, now time for making advances.
We want to make sure we have the top 5 issues that need to be addressed. We need recommendations for stakeholders. We need to flesh out policy framework. We need to ask what we (organizations) can do?
What can you do now!
At this point we started working groups. I had to facilitate one, so I couldn't transcribe notes.
Working Group 1
Here are some of the priorities or actions that came out of the first working group session:
- Policies and standards appropriate to disciplines
- Better links between libraries, researchers, granting councils on standards
- A "system" that makes it easy for researcher to be compliant
- Best practices on standards and interoperability is needed
- Coordination of CANARIE, Compute Canada, RDC
- Environmental scan of resources that exist
- National data service
- Longer term staffing and support for DI
- Need for skills training in data science (and interpretation)
- Dialectic between policy and implementation - there needs to be an understanding of rules and expectations
- Need for tools and sharing of tools
- Better definition of the relationships of the whole, but who should lead
- Further definition of data classification - this should flow to
- Communities of research are international - we need to be able to work with others
- At local level VP Rs?, library, computing units and researchers should be collaborating to develop institutional
- TC 3+ should issue call for proposals for studies/scans of issue
- Once one looks at data we need a broader sense of industry - get beyond IT industry. Perhaps industry needs to be trained.
- Shared responsibility is no responsibility - who will get operational authority? A secretariat that supports coalition. TC 3 has policy authority. Industry Canada?
- Sustainable funding for infrastructure (project funding will not work.)
- Build on what others are doing?
- National data management strategy
- Perhaps Minister Clement should become champion
- Publicly accessible data is most important outcome
- Policy recommendations to government should provide examples of real behaviour that made a difference
- TC 3+ should just pull the trigger
Who should lead?
There seemed to be some consensus that a specific organization should lead on developing a more detailed policy document that would identify who does what rather than writing policy in a committee. The TC 3+ seemed to emerge as the one entity that could force movement from others. If the councils impose change through policies to get grants they can force change in researchers who will then prod other institutions to change.
Someone had the bright idea to identify a bunch of things that have been done and call them successes.
Working Group 2
The next working group round had to do with what issues or gaps are critical.
- No concierge? Nowhere to turn to get advice for researchers.
- We need long term funding appropriate for infrastructure. Industry Canada needs to provide sustainable funding.
- We need education as to what is already there.
- We need someone or entity that "owns" the issue of digital infrastructure
- We need to make clear the value of DI to researchers and citizens
- We need funding for research infrastructure experiments
- We need more than just digital research infrastructure and it isn't all data. There are all sorts of other types of stuff and processes that are needed.
- For many faculty the issue now is the renewal of the disciplines in the face of falling funding. Many are more worried about the survival of their fields than expensive infrastructure. We need to convince people that this is as important as graduate funding and postdocs.
- We need support for local (university) infrastructure. This should take the form of increased Indirect Costs for Research from TC 3+ for libraries and DI.
- There is an issue around retaining and training people.
- There are no easy ways into the issues. Research narratives could help explain infrastructure to others.
- We need to articulate the economic value of data and show the correlation between where there is DI and where there is lively activity
- We need to be aware of the history and sociology of infrastructure and learn from other big infrastructure movements (railroads, roads)
- Do we need one DI or could there be the need for more than one.
- This is time of shrinking resources - how can we provide DI in time of tight budgets?
- There is need to have transparent reviews of bodies, especially DI bodies to make sure that they are really answering needs of research
- There is need for ongoing consultation with researchers and community researchers
- Scholarly organizations need to smash the egos and articulate community needs
- We need to build trust with industry. How can we leverage industry support.
- Can we create a service model where researchers don't need to know about plumbing.
- Plight of small researchers - they stand to gain the most with the provisioning of good DI
- DI could be a national asset that is funded centrally or a distributed good that is supported by many
- Need to move infrastructure prototypes forward to production systems
What if we gave all researchers DI coupons that they could give to other bodies who provide them the infrastructure they value?
The question was asked as to how much this would cost? One figure that was shared was $250 million more a year would place us in the middle of the G8 in terms of expenditure per capita on HPC.
At the end of the day we heard about some interesting test projects that emerged out of tables. There are disciplines that are ready to go.
There was real energy coming out of the first day! This was a real change from the first summit. Everyone wants stuff to happen.
Michael Ridley began by summarizing some of what happened yesterday. Some of the things that s
- Developing a culture of stewardship around data and DI
- DI as a national resource
- Lots of discussion around the carrots and sticks - how to encourage and enforce stewardship
- Need for collaboration
- General agreement that we need sustainable leadership - need for champions (or concierge)
- Sustainable and predictable funding
- Talking about more than pipes and tubes
- Lets just do it - we know enough to move forward
Then he proposed that we divide into groups to discuss the following topics
- DI Services, Tools, Software
- Possible Recommendations re Funding
- Data Management Plans (implementation)
- Priority Pilot Projects
- Metrics and Evaluation
- Inculcating a Culture of Data Stewardship
- Private Sector: Future Discussion
I was on the Metrics and Evaluation breakout group and we spent a lot of time discussing evaluation of researchers and how to recognize good data stewardship. The breakout groups were asked to also comment on what sort of coordinating body is needed.
Services, Tools, Software, Infrastructure
- Assumption: Good research infrastructure for research
- Guiding Principle: Develop anew only when there is nothing
- Open Standards
- What is doable in the short term = Development of a service catalogue
- People, software, processing, and storage
Approach is to have a group that would develop service catalogue which involves a number of stakeholders. It is important that universities are involved.
- Need for long-term, strategic funding.
- Need for flexibility for funding
- Few felt we needed more, but first step is spending existing money more effectively aligned with strategic approach
- CANARIE and Compute Canada should be merged or be
- Need to fund not just tubes, but tools and people
- One can't put expectations on universities unless there is more indirect costs of research
- We don't need more administrative layers. Before there is a coordinating body, there needs to be leadership and build consensus
- More useful to develop vision and then identify solutions, but need to get started on some solutions
- Interesting to look at Compute Ontario
- Tri-councils shouldn't be involved in funding infrastructure as then DI could compete for research dollars
- The challenge is what federal gov, provinces, and universities should fund. What should be funded at what level. Standards, international agreements should be dealt with at the federal level
- Hesitations around user pay model
- Though there is good idea about responsibilities around network, there isn't an understanding around storage or processing or people
- The Leadership Council was for problem definition, now we need a short term governance unit and then long term governance unit
- Accountability is important
- Advocacy agency to establish co-aligned sevices
- Perhaps hitch on to existing agency like CANARIE
- Leadership can no longer be done off the side of the table. We need a funded group to do it.
Data Management Plans
- Raise awareness and educate on data management plans
- Keep standards and requirements simple
- Need different plans for different disciplines - need for templates
- Replicate Compute Canada model with network of analysts that can support different communities
- How to get started? Do we make requirements or pilot projects?
- Data access could be connected to ethics review
- Importance of institutions and peer review in data management policies
- A draconian requirement will be a paper tiger - we need to manage the introduction of the emerging requirements
- I believe that data management plans can be presented as a form of publication - explain to academics how this is way of ensuring their research is recognized and used
- There are IP, legal, and privacy issues
- Data management is not just for researchers, but institutions too
- Open data is not the same thing - don't mix them up
- Coordinating body doesn't need to be heavy handed - it should evolve and have carrots
- Interest groups already exist - how can they be leveraged
Pilot Project Group
- CARL could network to provide some common services across universities
- Take astronomy and build on foundation of CADC to expand to a couple of other disciplines
- They talked about service development rather than equipment development
- What methods/solutions could be applied across different organizations
- A lot of groups have it in their mandate. Lets not reinvent wheel. Why get rid of Leadership Council when its doing a good job.
Metrics and Evaluation
This is the group I was on.
- If we have a goal you have to have a way to evaluate achievement
- We need a logic model of inputs and outputs
- The evaluation model will need to be adapted as we see how measurement
- Measuring should also communicate what can be done
- Big data analytics could be part of the measuring
- Evaluation should make a difference to researchers - change the tenure and promotion culture so that good data stewardship is recognized
- Build on examples where evaluation is working
- Scholarly associations can be part of changing the culture
- Genome Canada is a model for what can be done
- In art of assessment is recognition of importance
- We proposed a specific action = for RDC to host a multi-sector working group with a 6 month and 2 year mandate to report on the evaluation of data stewardship
- Larger body needs to be modular with input from TC 3+
Culture of Stewardship
- Need for policy from TC 3+
- Citation and recognition of data stewardship
- Will need quality assurance programs
- Need to locate value proposition
- Group don't want to add culture requirements without removing anything - don't add tasks without support
- Coordinating body needs to be academic that can guide training, education and policy
- Coordinating body should be small, agile, doers, have buy in, ...
- Need to look at tenure and promotion process
- Need for HQP education
Private Sector, Government, and Academia Continuum
- Need to look at an ecosystem that is larger and more inclusive
- Industry is both service provider and user of data - how can business be grown with data?
- Design process to involve industry right at beginning
- Training of HQP is important to industry
- Industry has their own solutions like open-geospatial consortium
- Financial sector is also important - they handle a lot of data - what can we learn from them and their investment teams
- How to get government interested? Government has lots of data too.
- Think outside the box on coordination - get it as think tank at highest levels
- Leverage the bodies in play today as there is a cost to building completely new coordinating bodies
- Look at short term actions that can be achieved by a coordinating body
Chad Gaffield was our final speaker. He began by thanking the people who prepared background papers and organized the summit.
He addressed the issue of why SSHRC is so interested in these issues. The single most important thing happening in the social sciences and the humanities is the paradigm change into digital scholarship. It is not a technology-only issue. It is about people, concepts, and processes. The excitement is that we have reached a point where we can make progress in understanding digital age. We have learned a lot and have a lot of success stories. We have faced similar challenges (like a TC 3 ethics protocol.) We are learning that in the digital age there will be filtering, whether merit review, peer review, archival accessioning, and so on. Similarly on curation side - we have a lot of expertise in curation - lets apply that to the digital.
We can't forget that research cultures have traditions and that not all people see the value of DI. We need training, education, dialogue with different traditions. There need to be incentives and recognition of digital work.
Collaboration isn't easy. It has costs, even on local campus level. We need to get provosts and presidents on board. Digital is horizontal - it cuts across sectors. Industry is very important - they have a role beyond just provisioning. They are involved in co-creation and talent.
TC 3+ is committed to coherent and cohesive approach to digital research infrastructure. Not just on policy side, but on open data side.
He talked about drift in the word infrastructure and how he sees it as including data and literacy. We need for robust ecosystem where literacy and data are part of infrastructure (not just the pipes and compute cycles).
He ended on the question, Can we build robust digital ecosystem in Canada? Yes - we can do it for the benefit of everyone now and for descendants
Steven Liss ended the day by calling us to not let this opportunity pass us by. Rubber now has to hit the road (or ice in the case of Edmonton.) We have to do something or we will look back on this Summit as another wasted gathering.