A short note -- the lamprey genome (P. marinus) paper is finally out!
You can see the paper
and the Michigan State University press release.
(The press release isn't too bad, but I would like to point out that I
had no part in the sentence talking about how this could lead to an
understand of "when humans evolved jaws, matching arms and legs, an
adaptive immune system and more." <sigh>)
The lamprey is a jawless vertebrate that diverged from the jawed
vertebrate lineage around 550 mya. Lampreys, together with hagfish,
represent the last extant vestiges of the evolutionary lineage
bounded by invertebrate chordates -- organisms without vertebrate
features -- and jawed vertebrates, which include fish, frogs,
The genome was a gigantic pain in the butt. We (and by "we" I mean
Jeramiah Smith, the first author) could only assemble 800 Mbp, a
maximum of 2/3 of the estimated complete genome (which is in the range
of 1.2-1.6 Gbp, depending on which estimates you believe). This is
partly because the genome has a bunch of really annoying GC-rich
repeats that confounded much of our BAC sequencing and hence much of
The other reason for the incompleteness of the genome is much less
common and more problematic: we constructed our sequencing libraries
from liver, which, in the lamprey means that we're missing 20% or more
of the genome. This is because the lamprey genome undergoes
lineage-specific loss of genomic DNA. (At
this point you should say "WHAT? WHY!?" and/or lament the cost of
sequencing and analyzing a subset of the germ line genome :).
Remember, genomics is out to get you.
What's the single most interesting take-home observation?
The main section to read, I think, is "Duplication structure of the
genome." Here, we (again, mostly Jeramiah, with a lot of input from
others) argue that synteny analysis shows
"the most recent (two-round) whole-genome duplication event likely
occurred in the common ancestral lineage of lampreys and
Other things discussed are
we (i) provide genome-wide evidence for two whole-genome
duplication events in the common ancestral lineage of lampreys and
gnathostomes, (ii) identify new genes that evolved within this
ancestral lineage, (iii) link vertebrate neural signaling features
to the advent of new genes, (iv) uncover parallels in immune
receptor evolution and (v) provide evidence that a key regulatory
element in limb development evolved within the gnathostome lineage.
So, overall, pretty cool.
My main involvement in the nitty gritty of this paper was a sadly
failed attempt to use protein domain alignment to determine the
duplication structure of the genome. Because the initial assembly we
had was not very good (it was considerably worse than the one that
finally got published!) I tried to develop a novel approach using PFAM
models to drive gene/domain alignment, followed by automatic tree
examination. This approach unambiguously indicated that there had
been no 2R. However, a few months later, after I did some QC and ran
some models, it turned out that the approach was extraordinarily
sensitive to gene loss. This occasioned a very embarrassing e-mail
to the lamprey genome list, sigh.
(Genomics really is out to get you.)
The syntenic 2R analysis on a new Jeramiah-generated assembly turned out
to be much better and argued for the pre-divergence 2R scenario.
Are you still working on lamprey?
The lamprey genome is one of two projects that launched my research
into assembly; digital normalization was, in large part, driven by the
desire to assemble approximately 5 billion mRNAseq reads produced by
Weiming Li's lab. We were driven to do this by the poor quality of
the initial lamprey genome, and the newer revelation that large
portions of the genome are simply missing. (In general, it seems like
the genomics research community is starting to realize that mRNAseq is
a complementary approach to genome assembly, which is often quite
Our paper on assembling massive mRNAseq is still in the process of
being written. Preliminary results from that work indicate that we do
see about 20-30% of transcribed & conserved genes missing from the
lamprey genome, but we're still nailing down the numbers -- large
transcriptome assemblies turn out to be really messy!
(Note: fixed 800 Gbp => 800 Mbp; hat tip to Daniel Standage for noticing!)
There are comments.