My review of "Determining the quality and complexity of NGS..."

I was a reviewer on Determining the quality and complexity of next-generation sequencing data without a reference genome by Anvar et al., PDF here. Here is the top bit of my review.

One interesting side note - the authors originally named their tool kMer and I complained about it in my review. And they renamed it to kPal! Which is much less confusing.


The authors show that a specific set of low-k k-mer profile analysis tools can identify biases and errors in sequencing samples as well as determine sample distances between metagenomic samples. All of this is done independently of reference genomes/transcriptomes, which is very important.

The paper is well written and quite clear. I found it easy to read and easy to understand. The work is also novel, I believe.

Highlights of the paper for me included a solid discussion of k-mer size selection, a thorough exploration of how to compare various k-mer-based statistics, the excellent quality evaluation bit (Figure 3),

I was a bit surprised by the shift from quality assessment to metagenomic analysis, but there is an underlying continuity in the approach that makes this a reasonable transition. There might be a way to update the text to make this transition easier for the non-bioinformatic reader.

It's hard to pick out one particularly important result; the two biggest results are (a) k-mer based and reference free quality evaluation works quite well, and (b) k-mer analysis does a great job of grouping metagenome samples. The theory work on transitioning between k-mer sizes is potentially of great technical interest as well.

Comments !

social