Nik Sultana, a postdoc in Cambridge, asked me some questions via e-mail,
and I asked him if it would be OK for me to publish them on my blog. He
said yes, so here you go!
- How is the quality of scientific software measured? Is there a "bug
index", where software loses points if it's found to contain serious
bugs in, say, a 6-month period? If not a "bug index" (with negative
points) then something like an "openness index" perhaps, with points
gained by more better-quality code being available?
There is no formal measurement. I've put some points here --
http://ivory.idyll.org/blog/ladder-of-academic-software-notsuck.html. Happy
to discuss more, but there's nothing formal ;)
- The attendee list
for the CW14 workshop seemed to include more bio places
represented. Are there any subjects/communities within science that
are more conscious of the need to share code and replicate
computations than others, in your experience?
Hmm, genomics and bioinformatics are actually pretty good. I think the
code quality sucks, but the point of sharing is obvious to genomicists --
so much of what we know about biology comes not from de novo analysis
but rather from comparative analysis across species and experiments.
Chemists seem to be worse.
- I only came across initiatives such as SSI, myexperiment, figshare,
datadryad, etc after I submitted my thesis. Do you think I should
ask the uni (and previous places where I studied) for my money
back? (No just kidding, please skip to the next question.)
Yes. :)
- What reassurance would you give to members of the public regarding
the state of reproducibility in science? In recent writing on
scientific reproducibility it is often pointed out that there have
been a spate of rather high-profile retractions and corrections.
How do we know there isn't some research-equivalent of Bernie
Madoff somewhere?
I've seen many assurances that most unreproducible stuff seems to be
unintentional - the result of sloppy work habits, etc. Moreover, in
my experience, most scientists are clearly aware of the goal of
reproducibility, they just don't know how to do it (esp in
computational work). That having been said, anyone who fabricates
data or results is going to keep very quiet about it, and I've
certainly heard a number of stories.
A Bernie Madoff-style scandal is tough in a field where no one really
has that much money or that high a profile. That having been said,
look up Hendrik Schoen and Diederik Stapel...
- You've done a PhD, and have worked on various initiatives on
establishing good practices in science since then. What
one-sentence piece of advice would you give to young researchers
starting out in science, on the theme of reproducibility of their
experiments?
Just do it; your life and career will be richer for it.
- Do you think that the currently established system in academia or
industrial research takes openness + sharing into account for
career advancement? What incentives are in place, or what
incentives should be made in place?
Hahahahahahaha!
No. This is a longer discussion -- I can send you another discussion
I had with a Science magazine journalist if you're interested (since
posted -- see my response to Eli Kintisch) -- but
everything is indirect. My openness has been great but only in an
indirect fashion, in the sense that my primary metrics (grants and
papers) have benefitted.
For true change, the funding agencies and journal article reviewers
need to provide the incentives. NIH is starting to step up. Also see
the new REF guidelines in the UK re open access -- you can see how the
incentives are working there.
Even if we had stable and painless-to-use technology and procedures
for sharing code + data, there might be hold-back for two reasons:
- Commercial interests might be at conflict with openness, since
the latter can disadvantage commercial exploitation.
- Scientists might fear giving other scientists an advantage, or
having a mistake found in their work.
Are these fundamental limitations of the "human science system", as
it were, or are there ways around them do you think?
For (a), openness does not conflict with intellectual property rights.
So I think this is simply not a concern.
For (b), those scientists seem to miss the point of science, which is
to build upon others' work. This is where incentives to be open can
be most useful.
- Do you track, or have you come across or heard of, similar
infrastructure needs in humanities subjects? Presumably these
subjects do need storage and collaboration technology, but maybe
don't need computation as much as scientific subjects.
Digital Humanities. Talk to Ethan Watrall.
--titus
There are comments.