Here is the top bit of a review I wrote of a very nice paper by Itai Sharon et al., from Jill Banfield's lab, on using Illumina TruSeq long reads (a.k.a. Moleculo), to look at complex metagenomes.
The paper is newly available here (although it is behind a paywall ;(.
Citation: Accurate, multi-kb reads resolve complex populations and detect rare microorganisms Genome Res. gr.183012.114. Published in Advance February 9, 2015. doi: 10.1101/gr.183012.114
This is an excellent application of new long-read technology to further illuminate the characteristics of several medium-to-high complexity microbial communities. The methods are expert, the results are fascinating, and the discussion is well done.
Objectives:
- test the efficacy of assembling Moleculo reads to improve short-read contigs;
- evaluate accuracy of curated short-read assemblies;
- look at organisms present at very low abundance levels;
- evaluate levels of sequence variation & genomic content in strains that could not otherwise be resolved by short-read assembly;
Results:
- Long-read data revealed many very abundant organisms...
- ...that were entirely missed in short-read assemblies.
- Genome architecture and metabolic potential were reconstructed using a new synteny based method.
- "Long tail" of low-abundance organisms belong to phyla represented by highly abundant organisms.
- Diversity of closely-related strains & rare organisms account for major portion of the communities.
The portion of the results that is most novel and most fascinating is the extensive analysis of rare sequences and the disparity in observations from Illumina (assemblies) and Moleculo (long reads and assemblies). The basic results are, on first examination, counter-intuitive: many long-read sequences are obtained from abundant organisms that simply don't show up in Illumina short-read assemblies. The statement is made that this is because of strain variation in the community, i.e. that Illumina assemblies are fragmented due to strain variation and this blocks the observation of the majority of the community. This is to some extent born out by the low mapping percentages (which is the primary evidence offered by the authors), and also matches our own observations.
--titus
Comments !