Nik Sultana, a postdoc in Cambridge, asked me some questions via e-mail, and I asked him if it would be OK for me to publish them on my blog. He said yes, so here you go!
- How is the quality of scientific software measured? Is there a "bug index", where software loses points if it's found to contain serious bugs in, say, a 6-month period? If not a "bug index" (with negative points) then something like an "openness index" perhaps, with points gained by more better-quality code being available?
There is no formal measurement. I've put some points here -- http://ivory.idyll.org/blog/ladder-of-academic-software-notsuck.html. Happy to discuss more, but there's nothing formal ;)
- The attendee list for the CW14 workshop seemed to include more bio places represented. Are there any subjects/communities within science that are more conscious of the need to share code and replicate computations than others, in your experience?
Hmm, genomics and bioinformatics are actually pretty good. I think the code quality sucks, but the point of sharing is obvious to genomicists -- so much of what we know about biology comes not from de novo analysis but rather from comparative analysis across species and experiments.
Chemists seem to be worse.
- I only came across initiatives such as SSI, myexperiment, figshare, datadryad, etc after I submitted my thesis. Do you think I should ask the uni (and previous places where I studied) for my money back? (No just kidding, please skip to the next question.)
- What reassurance would you give to members of the public regarding the state of reproducibility in science? In recent writing on scientific reproducibility it is often pointed out that there have been a spate of rather high-profile retractions and corrections. How do we know there isn't some research-equivalent of Bernie Madoff somewhere?
I've seen many assurances that most unreproducible stuff seems to be unintentional - the result of sloppy work habits, etc. Moreover, in my experience, most scientists are clearly aware of the goal of reproducibility, they just don't know how to do it (esp in computational work). That having been said, anyone who fabricates data or results is going to keep very quiet about it, and I've certainly heard a number of stories.
A Bernie Madoff-style scandal is tough in a field where no one really has that much money or that high a profile. That having been said, look up Hendrik Schoen and Diederik Stapel...
- You've done a PhD, and have worked on various initiatives on establishing good practices in science since then. What one-sentence piece of advice would you give to young researchers starting out in science, on the theme of reproducibility of their experiments?
Just do it; your life and career will be richer for it.
- Do you think that the currently established system in academia or industrial research takes openness + sharing into account for career advancement? What incentives are in place, or what incentives should be made in place?
No. This is a longer discussion -- I can send you another discussion I had with a Science magazine journalist if you're interested (since posted -- see my response to Eli Kintisch) -- but everything is indirect. My openness has been great but only in an indirect fashion, in the sense that my primary metrics (grants and papers) have benefitted.
For true change, the funding agencies and journal article reviewers need to provide the incentives. NIH is starting to step up. Also see the new REF guidelines in the UK re open access -- you can see how the incentives are working there.
Even if we had stable and painless-to-use technology and procedures for sharing code + data, there might be hold-back for two reasons:
- Commercial interests might be at conflict with openness, since the latter can disadvantage commercial exploitation.
- Scientists might fear giving other scientists an advantage, or having a mistake found in their work.
Are these fundamental limitations of the "human science system", as it were, or are there ways around them do you think?
For (a), openness does not conflict with intellectual property rights. So I think this is simply not a concern.
For (b), those scientists seem to miss the point of science, which is to build upon others' work. This is where incentives to be open can be most useful.
- Do you track, or have you come across or heard of, similar infrastructure needs in humanities subjects? Presumably these subjects do need storage and collaboration technology, but maybe don't need computation as much as scientific subjects.
Digital Humanities. Talk to Ethan Watrall.