My review: "Troubling Trends in Scientific Software Use"

Late last year, inspired by a review I did of a Science submission, I wrote a blog post asking what people thought of the Insight Journal. This was in response to the submission's mention of Image Processing On Line.

The Science paper is finally out -- actually, I missed it, it came out in May -- and there's even a set of correspondence about it in the July letters.

Read this paper! It's really important. It even raises some of the same issues that were raised by the Assemblathon 2 paper, which just came out.

My review

Troubling Trends in Scientific Software Use, Science, 17 May 2013, Vol 340, p814-815. 10.1126/science.1231535.

Note: this review is of the submission, not what was published. A skim suggests that it hasn't changed significantly since I reviewed it, but if you see something off, let me know -- it's probably my fault for not re-reading the paper in depth and adjusting this review accordingly.

This paper asks the question, "Why do scientists choose specific software?", by analyzing the results of a survey given to scientists working in a subdomain of ecology. The result of this survey is that many scientists rely on non-scientific reasons when choosing software: in particular, they rely on a presumption that published software has been peer reviewed and is hence more likely to be correct. The authors believe this indicates an ignorance of the process of peer review, which rarely, if ever, looks at the software. Based on this, the authors develop several policy recommendations, which I infer to be: first, that software should be peer reviewed, and second, that scientists should be educated in this area.

The writing is clear, the survey (and its interpretation) are straightforward, and I think the conclusion is exceptionally important for the future of science, which is increasingly reliant on computation.

The "implications" section of this paper, and especially the last two paragraphs in which they discuss the social software recommendations and the (failed) replicability of a particular package, is truly excellent. This is the best and most concise description of the fundamental social and educational problems underlying present-day computational science that I've yet seen.

The biggest downside I see to this paper is the narrowness of the survey results -- they apply to only one subfield, and I cannot tell how broadly they will apply. Personal experience suggests that they will apply quite broadly, unfortunately... I believe that this paper is necessary to establish a beachhead that can then be exploited by others to widen the investigation into mechanisms of software choice in science.

I note an unmentioned irony in the paper that there is relatively little agreement across programming practice, much less scientific computing, as to what is "good software engineering practice." In fact, relatively little scientific work has been done to validate or compare specific approaches and there is virtually no consensus within computer science. We could usefully learn from industry here, I think, which has taken a very pragmatic approach to minimum software engineering standards.

The authors dance around the question of requiring "open source" software licensing (which specifically allows for modification and redistribution) vs a more relaxed policy on published software, in which the source code must be available for review and replication but need not be open source. This necessarily impacts the social problems of requiring that software be peer reviewed. I do not know of a useful review in this area, unfortunately. I also note that venues for explicit publication of source code are still few and far between, and that many open source code repositories are not archival; this has been an active discussion in the online academic/open source community.

I welcome the result that more and better education is not only needed but desired; however, I am concerned that the authors overinterpret the aspect of the survey that indicate that scientists know they need it. The question asked is too narrow to reach the general conclusion they reach, and, in my experience, "time/opportunity" is precisely what most scientists are lacking.

--titus

Living in an Ivory Basement Stochastic thoughts on science, testing, and programming.

My review: "Troubling Trends in Scientific Software Use"

My review

Comments !