The khmer team is pleased to announce the release of khmer version 1.0.
khmer is our software for working efficiently with fixed length DNA
words, or k-mers, for research and work in computational biology.
khmer v1.0 is the culmination of about 9 months of development work by
Michael Crusoe (@biocrusoe). Michael is the software engineer I hired
on our NIH BIG DATA grant,
and he's spent at least half his time since being hired wrangling the
project into submission.
Why did we do all this work!?
The short answer is, we like it when our software works. khmer is
becoming increasingly broadly used, and it would be good if the
software were to continue working.
A slightly longer answer is that we are continuing to improve khmer --
we're making it faster, stronger, and better -- while we're also doing
research with it. We want to make sure that the old stuff keeps working
even as we add new stuff.
But it's not just that. There are now about five people working on
fixing, improving, and extending khmer -- several of them are graduate
students, working on kick-ass new functionality as part of their PhDs.
By having testing infrastructure that better ensures the stability and
reliability of our software, each graduate student can work out on their
own long branch of functionality, and -- when they're ready -- they can
merge their functionality back into the stable core, and we can all take
advantage of it.
Actually, it's even more. We're building an edifice here, and we
need to have a stable foundation. One very important addition that
just came last weekend was the addition of khmer acceptance testing -- making sure
that khmer not only works as we expect it to, but integrates with
other tools. For this, we turned to the khmer protocols, our
nascent effort to build open, integrated pipelines for certain kinds of
Our acceptance testing consists of running from raw data through the
assembly stage of the Eel Pond mRNAseq assembly protocol, albeit with
a carefully chosen data subset. This takes less than an hour to
run, but in that hour it tests some of the core functionality of
khmer at the command line, on real data. We hope to extend this
into the majority of our functionality over time -- for now we're
mostly just testing digital normalization and read manipulation.
Having passing acceptance tests before a major release is both
extraordinarily reassuring and quite useful: in fact, it caught
several last minute bugs that we'd missed because of either incomplete
unit and functional tests, OR because they were bugs at the integration
level and not at the unit/functional level.
And one interesting side note -- our acceptance tests encompass Trimmomatic,
fastx, and Trinity. That's right, we're passively-aggressively testing other
people's software in our acceptance tests, too ;).
Better Science Through Superior Software
Ultimately, this improved infrastructure and process lets us confidently
move forward with new features in khmer, lets my group work in concert
on orthogonal new features, enables larger processes and pipelines
with less fear, uncertainty, and doubt, and -- ultimately -- should
result in significant time savings as extend our research program. My
firm belief is that this will allow us to do better science as we move
Watch This Space. We'll let you know how it goes.