Note: this post is a guest post by Jerome Kelleher. Please also
see his letter to Bioinformatics
on this topic.
Overview
- Many journals provide a mechanism to publish computer
programs in the form of application notes or software
articles. These notes
are subject to peer review, but there is no explicit
requirement that the quality of the application be
taken into account.
- Even if reviewers examine the source code, there is no
reason to expect that they can provide an informed
opinion on its quality. Reviewers are experts in the
application domain but not necessarily in the programming
language(s) in question.
- A system of lightweight code review would greatly improve
the quality of the applications available to the
community, and incentivize well-written code. Along with
the existing scientific review, source code for an application
should be reviewed by an expert programmer who would provide
an opinion on the overall quality of the application, as well
as feedback to the developers on how it may be improved. The
code reviewer would not be expected to understand the application
in any deep sense, and would provide feedback only on the
obvious signals of quality, such as structure, style,
consistency, installation instructions, portability,
user interface design and so on.
- Since code reviewers do not need to understand the application
domain they can be drawn
from the wider developer community. Ideally, contributors
to free and open source projects should be recruited, since these
are people with verifiable expertise and a history of contributing
their time to a common good. Reviews should not be too
time-consuming (a maximum of one hour, say), and there should
be public acknowledgement of a reviewer's contribution to a
journal.
- The benefits would be an immediate improvement in the quality
of published applications. The majority of problems in scientific
applications not caused by fundamental errors in the
methods, but by superficial issues that arise through
inexperience. Introducing this limited form of code review
would also yield valuable insights for the larger goal
of making all scientific code subject to peer review.
The problem
A couple of years ago, I reviewed an application note for Bioinformatics.
As part of the review process, I downloaded the Python source code and examined
it to get a feel for the quality of the application. It was so poorly written
that I immediately formed the judgment that this application could not
be trusted, and should be immediately rejected.
The source code showed clear evidence of being written by
an unsupervised novice programmer. For example, one function
took 25 parameters and was 1130 lines long. Within this function,
there were 107 conditional statements, indented by up to 9 levels.
The same 18 lines of code were repeated (complete with
identical comments) thirty six times, with only integer constants
changing between each repeated block.
Basic mistakes such as these had been made throughout the
code base.
The application note was rejected, as the other reviewers also had
serious issues with the work. Neither of them, however,
brought up the issue of the quality of the code, which in my opinion
was sufficient grounds for rejection, regardless of whether
the application appeared to work or not. The peer review system
worked in this instance, but it had failed before. This particular
application had been published by another journal (the authors
had been seeking a "version 2.0" publication at Bioinformatics),
and had been used and cited in several papers.
Application notes perform an important role, as
they provide a forum for researchers to publicize useful code.
Most importantly, they provide a mechanism by which researchers can
be given due credit for the work involved in developing these
important tools. Without the reward mechanism of application note
publications and citations, there is simply no incentive to develop tools that
are useful to the wider community.
These tools, however, are often of lamentably low quality. Installation
can be difficult, and documentation sparse and poor. User interfaces
(command line or graphical) rarely follow established
conventions, making each tool a challenge to use.
The quality of source code is often
rather poor, making any reuse or modification very difficult.
The problem is that there is no requirement that the reviewers
of an application note examine the source code.
Most reviewers will download the application and ensure
that it installs and runs on example data. The main goal
of a review is to assess the novelty and importance
of the methods, not the quality of the software.
Because there is no assessment of quality
there is no incentive for quality.
I believe that this is a major flaw in the review
process for scientific software, and so
I wrote a letter of the editors of Bioinformatics
in which I suggested the introduction of a system of code
review for application notes. The editors declined to
publish this
letter.
Lightweight code review
My suggestion is to introduce a system of lightweight code review for
application notes. For many open source projects,
code review is an intrinsic part of the development process. A
developer submits a patch implementing some new feature
for review. Other developers then assess this
patch, to ensure that it meets the required standards, and
only after it has been reviewed will the code be committed.
There are other forms of code review, but these usually
involve one developer trying to fully understand
the code another has written in order to improve code quality.
This type of code review would not be feasible in the context
of application notes. It would be far too laborious, because it
would require a reviewer to fully understand all aspects of
an application they have never seen before.
Instead, I suggest a much more lightweight review process. Code
should be assessed for the more obvious signals of
quality: readability, consistency, documentation, structure
and packaging.
Code should follow the idioms of the language involved
and should consistently follow a well-known coding style. It
should be well documented and appropriately commented.
The structure of the code should be clear, and it should
be straightforward to find where a particular piece of
functionality is implemented.
There is a problem, of course, in finding people to
perform these reviews. It would seem that we need reviewers who are both
experts in the application
domain and in the programming language(s) in question,
and such people might be very difficult to find.
The solution to this problem is to split the
responsibility: experts in the application domain review
the manuscript and application for novelty and utility
to the community, and expert programmers review the source code
in terms of its apparent quality.
We are then free to choose code reviewers from anywhere, even
from outside of science. An expert programmer could perform
a lightweight review in less than an hour
and so the demands on their time should not be be too great.
Careful guidelines would be required for reviewers to ensure
that reviews remain helpful. For example, it should be made
quite clear that disagreements on aesthetic grounds are
not appropriate. Once code is consistent, idiomatic and
follows some well-known style, it does not matter which
particular style it follows.
Reviewers must be experts in the language that the application
is written in, and must have a proven track record of
delivering high-quality software. Convincing such people
to donate their time to review scientific code is the
most difficult part of this process, but there are
encouraging signs that it may be possible.
In a recent pilot study,
Mozilla engineers reviewed snippets of
source code from previously published PLOS Computational Biology papers.
This study found that
the engineers were happy to perform the reviews, but
were frustrated by the shallowness of the reviews
as a result of their lack of domain expertise.
All of the reviewers indicated that they were willing
to continue to participate in scientific code review
"provided they felt they could contribute something of value."
Such shallow (or lightweight) reviews are
precisely what is required for application
notes. By writing reviews, pointing out the weaknesses of
applications and indicating how to fix them, reviewers can enter
into a dialogue with application authors. Their expertise would
be highly valued, and provide a great service to the academic
community, helping to improve the quality of the software
used to progress science.
The benefits
It may seem that there is little point in such a lightweight
code review process. If a reviewer is only going to spend
an hour browsing the code and writing a report,
then they will surely miss all
sorts of bugs. This is true, and bugs will
remain in applications regardless of how much review
is done. The point is not to make the software published
in application notes perfect, as this is impossible. The point is
to try to ensure that published software is
reasonably good, as this would be a major improvement.
There would be many benefits. The first would be an
increase in the quality of the applications published.
Through the influence of the expert reviewers, software
would be easier to install and use. Feedback through
reviews may also result in improved performance, since
reviewers may point out ways in which implementation
could be improved. Ultimately, reviewers may choose to
contribute code directly to a project, if they
see that their help would make a difference and be
appreciated. This influx of experience and knowledge
would be a huge boon for science.
Another benefit would be an increased realization that
significant applications cannot be written by untrained
novice programmers. Applications submitted that are of very
low quality (such as the example discussed above)
should be immediately rejected. A few manuscript
rejections may be required before the message filters
through, but the message will surely be received
eventually. Hopefully, this will help foster a culture
in which students are trained to program effectively
as a matter of course.
There are comments.