tl;dr? A while back I wrote that there are three uses of research
replication, reproduction, and reuse. The world of computational
science would be better off if people clearly delineated whether or not
they wanted anyone else to reuse their software, and I think it's a
massive mistake to expect that everyone's software should be reusable.
A few months back, I reviewed a pretty exciting paper - one I will
probably highlight on my blog, when it comes out. The paper outlined
a fairly simple concept for comparing sequences and then used that to
develop some new ultra-scalable functionality. The theory seemed novel,
the computational results were pretty good, and I recommended acceptance
(or minor revisions). This was in spite of the fact that the authors
stated quite clearly that they had produced largely unusable software.
Other reviewers were not quite so forgiving, however -- one reviewer
declined to review the paper until they could run the software on
their own data.
This got me thinking - I think I have a reputation as wanting people
to release their software in a useful manner, but I've actually shied
away from requiring it on several occasions. Here was a situation
that was a pretty direct conflict: neat result, with software that was
not intended for reuse. Interestingly, I've drawn this line before,
although without much personal comment. In my blog post on review
criteria for bioinformatics papers,
there's nothing there about whether or not the software is reusable
- it must just be legally readable and executable. But I'm also
pretty loud-mouthed about wanting good quality (or at least better
quality) software out there in the bioinformatics world!
So what gives? I felt that the new theory looked pretty awesome,
and would be tremendously useful, while the implementation was (as
stated) unlikely to be something I (or others) used. So what? Publish!
I think this highlights that there are two different possible goals
for bioinformatics papers. One goal is the standard scientific goal:
to demonstrate a new method or technique, whether it be mathematical
or computational. The other goal is different, and in some ways much
harder: to provide a functioning tool for use and reuse. These should
have different review standards, and that maybe the authors should be
given the opportunity to distinguish clearly between the two goals.
There's actually a lot of commonality between what I would request of
the software from either kind of paper, a technique paper or a tool paper.
- Both need to be accessible for download and viewing - otherwise, how
can I understand the details of the implementation?
- Both types of software need to be usable enough to reproduce the
results in the paper, in theory (e.g. given sufficient access to
- Both should be in a publicly accessible and archived format, to
avoid loss of the software
from personal Web sites, etc.
- Both should show evidence of decent principles of basic software
engineering, including the use of version control, some form of
testing (albeit unit testing or functional testing or even just
defined input with known good output), release/version information,
installation/dependency information, and the like.
However, there are some interesting differences. Off the top of my head,
I'm thinking that:
Crucially, the software from the technique paper would not need to
be open source - by the OSI definition, the technique code would not
need to be freely modifiable or re-sharable.
(To be clear, I know of neither any formal (journal) requirements
nor ethical requirements that a particular implementation be
modifiable or redistributable.)
Nor need the software from the technique paper be written in a
general way (e.g. to robustly process different formats), or for
broader re-use. In particular, this means that documentation and
unit/functional tests might be minimal - enough to support replication
but no more.
The software from the technique paper should be accessible to basic
review, but should not be subject to code review on style or
implementation - correctness only.
Software from a "tools" paper, by contrast, should be held to much
higher standards, and be subject to code review (in parts) and
examination for documentation and installation and ... oh, heck,
just start with the sustainability evaluation checklist at
I'm aware that by having such relaxed constraints on technique
publication I'm more or less directly contradicting myself in my
earlier blog post on automated testing and research software
- all of that definitely holds for software that you hope to be
I'm not sure how or where to draw the line here, exactly. It's
certainly reasonable to say that software that doesn't have unit tests
is likely to be wrong, and therefore unit tests should be required -
but, in science, we should never rely on a single study to prove
something anyway, so I'm not sure why it matters if software is wrong
in some details. This is where the difference between
"replicability" and "reproducibility" becomes
important. If I can't replicate your computation (at least in theory)
then you have no business publishing it; but reproducing it is
something that is a much larger task, outside the scope of any given
I want to quote David States, who wrote a comment two years ago on my blog:
Too often, developers work in isolation, and this creates a high
risk for code errors and science errors. Good code needs to be
accessible and this includes not just sharing of the source code
itself but also use of effective style, inclusion of tests and
validation procedures and appropriate documentation.
I think I agree - but what's the minimum, here, for a technique paper
that is meant to be a demonstration of a technique and no more?
One final point: in all of this we should recognize that the current
situation is quite poor, in that quite a bit of software is simply
inaccessible for replication purposes.
(This mirrors my personal experiences in bioinformatics review, too.)
Improving this situation is important, but I think we need to be
precise about what the minimim is. I don't think we're going to get
very far by insisting that all code be held to high standards; that's
a generational exercise (and part of why I'm so bullish on Software
So: what's the minimum necessary for decent science?
p.s. In case anyone is wondering, I don't think our software really meets my own criteria for
tool publication, although it's getting closer.
p.p.s. Drawing this distinction leads in some very good directions for
publishers and funding bodies to think about, too. More on that in
another blog post, if I get the chance.
p.p.p.s. My 2004 paper (Brown and Callan) has a table that's wrong due
to a fencepost error. But it's not seriously wrong. shrug
There are comments.