Thoughts on open science -- my response to Eli Kintisch

Eli Kintisch (@elikint) just wrote a very nice article on "Sharing in Science" for Science Careers; his article contained quotes from my MSU colleague Ian Dworkin as well as from me.

When Eli sent me an e-mail with some questions about open science, I responded at some length (hey, I type fast and have lots of opinions!). Having been through this before, I knew that he would use maybe four sentences at most, but that seemed like a waste! So I told him I'd like to post the entire e-mail response once his article came out. Here it is! I've edited a few things to make them clearer or 'cause I felt like it.


  1. How has sharing code, data, R methods helped scientific research in one or two specific examples?

One personal story -- we have a technical approach (called digital normalization) that we have yet to publish. We posted a paper draft on a preprint site and made the software available almost 3 years ago, however. The approach makes certain kinds of difficult sequence analyses much easier, and makes some other previously impossible analyses now possible. (We've published some of these results -- see www.pnas.org/content/early/2014/03/13/1402564111.abstract -- just not the approach itself.)

Since we made the approach available, hundreds or thousands of people have used it. A derivative approach is now part of two major software packages (the Trinity and Mira assemblers), it's been used in about 15 publications, and is soon to appear in dozens more, and it's basically been quite a success. All prepub.

It's hard to cite another clear example in my area of expertise, in part because in genomics data and software tend to be freely available prepub, so it's more the default. There are big wins like the human genome project and the ENCODE project, where the data (and code, in the case of ENCODE) were made freely and openly available, and this has obviously accelerated science.

Hmm.

This is a good article to the state of things in my field (the top bit, at any rate):

http://massgenomics.org/2013/06/data-sharing-embargo.html

  1. How has data sharing impacted your career? Do you think it could help you get tenure? Some say that faculty comittees do not consider this.

My career has developed in large part because I've been open about everything. In addition to making my software methods open, and posting my papers, and making all my educational materials available, I blog and tweet about everything. I'm pretty well known largely because of this.

So I would make three points.

One is, going forward, data (and code) sharing will be an expectation. Program managers and funding organizations are eager to see their funding put to maximum use, and this includes making as much data and code available as possible.

Second -- because of this, it will soon be the norm for new faculty. You will no longer stand out if you do it (like I do now) but if you don't do it (because then you will no longer get published or receive grants). It may take 5-10 years for this to be the new norm but I think it's inevitable. Of course, many details need to be worked out, but the incentives for doing so are going to be there.

Third -- everyone knows what tenure and promotion committees care about. It's funding and reputation (which usually means papers and invitations to speak). These can be beneficiaries of openness, and in fact I would argue that my success so far is a direct example: I have received grants where the reviewers cited my openness, and many of my invitations to go speak (i.e. reputation builders) are from my online presence. So I don't expect what committees care about to change, but I do expect the paths to those reputational results to expand in number and variety to include openness as one of the core mechanisms.

  1. What barriers exist in your mind to more people sharing their work in this way?

Culture and training are the two main ones. The culture discourages people from doing it, because it's new and edgy and people don't know how to think about it. The training is a block because even people who are ideologically disposed towards openness (which is, in fact, most scientists, I believe) don't know where to get started.

  1. Is R Open Science important and if so why?

I'm the wrong person to ask :). rOpenSci are doing things right, as far as I can tell, but I don't actually use R myself so I've never used their software. So from my perspective their main utility is in showing that their approach isn't fatal. Which is awesome

  1. Any practical tips to scientists you'd share on making sharing easier?

Just ... get started! Post figures and posters to figshare. Make your published data openly available and easy to use, i.e. in the format that you would have wished for it to be delivered in for your paper. Make your methods explicit. Post preprints at bioRxiv if the journal is ok with it. If there's a technical barrier that prevents you from doing something, note it and move on, but don't let it stop you from doing something else.

--titus

Comments !

social