The grim future for sequencing centers

In conversation with a colleague the other day, I found myself making a surprising prediction: the age of the big sequencing centers (Broad Institute, WUSTL, Baylor, DOE JGI, etc.) is coming to an end. In 5 years they will no longer exist.

This prediction is obvious in hindsight.

That is all.

Hah! No, seriously, I've had a number of interactions with sequencing centers over the last decade, and I feel that many of them are failing to make the transition from hugely-funded centers containing lots of cloning expertise and bajillions of ABI Sanger sequencing machines, to centers of genome expertise and analysis. The new reality of holy-cow-everyone-can-sequence-whatever-they-want, brought on Roche 454, Illumina GA, ABI SOLiD, and soon Pacific Biosystems, is driving this. It is now possible to sequence entire animal genomes in private facilities funded by single-investigator grants, which replaces the primary raison d'etre of big sequencing centers... so what next?

The new challenge of sequencing is in assembly and analysis of the data, and I think everyone is just overwhelmed here. Certainly when I talk to people at sequencing centers, they are capable of generating far more new sequence than they are of assembling or analyzing the sequence they generated last week. For example, the latest lamprey genome assembly was done by Jeramiah Smith in Chris Amemiya's lab, not by WUSTL; and the basic gene set is being constructed by Carson Holt in Mark Yandell's lab in Utah, not by ENSEMBL. The wait time to get into the assembly and analysis queues, and the iteration time needed to integrate new mRNAseq data into the gene set, is simply too great at the centers. Analysis of a large soil metagenomics project (200 gb and counting!) in collaboration with the JGI is running into machine access issues: none of us have quick access to machines capable of running the analyses quickly, although I appear to be the closest because of the MSU HPC.

Contrast this situation with other examples: for example, my recent trip to Mississippi State, where I had a great conversation with a graduate student who is assembling a brown mold genome, all on her own, on a lab machine, with no prior computational experience. Or some friends at Caltech, who have sequenced, assembled, and analyzed both the genome and transcriptome of a worm -- all on their own, with no center involvement. I mean, these people are all ridiculously smart and competent, but I think there are a lot of such people in academia. They just needed cheap sequencing to challenge them!

I wish I could blame the centers for lack of vision or something, but honestly I think they're just the biggest targets for everyone at the moment. People are used to the "mainframe model" of sequencing, where you go to the sequencing center with your genome in hand and beg the high poobahs to sequence, assemble, and annotate it for you; but their funding for computer power and analysis hasn't kept up with the sequencing bonanza (nor could it have), so now they are simply the most visible people failing to keep up with analysis. Unfair but whatcha gonna do?

Are there centers that are keeping up? It's hard for me to say, since I'm not in the rarified bajillion-dollar-PI meetings (note: I'm available for such meetings, folks; I bring 20 years of computational experience, a corresponding deep cynicism, and 10 years of bioinformatics to the table, plus a taste for expensive scotch. Reserve me today!). But I note that the Beijing Genome Institute has a distressing habit of publishing "firsts", including the short-read Panda genome paper and a Human Microbiome Project. I have concerns about their long-term viability but that will have to wait for another blog post...

OK, so what's the future, mr. smarty pants? Damned if I know. Paul Sternberg has a great quote that is my touchstone, though: the biggest, most exciting advances come from the sharpshooter on the hill rather than the army toiling across the plain. I've never been excited by large collaborations, which tend to get embroiled in management issues and politics; while there are some places (like HPCs) where centralization is good, lots of individual investigators are much more likely to generate the diversity of approaches that I think we need.

And did I mention training? Whoops, so silly of me to forget that.

Regardless, I think we're in for a wild and wooly ride on the next-gen sequencing train, and the next few years should be incredibly exciting. It's great to be a (computational) biologist!


Legacy Comments

Posted by sm on 2010-05-19 at 13:19.

An important point is these genome centers being at the forefront of a
lot of analytic development too! Howsoever thin, but WashU and Broad
and Sanger have been leading a lot of software development for
packages for alignment/assembly analysis, etc.    Just my thoughts...

Posted by Titus Brown on 2010-07-06 at 16:42.

Also see:    <a href="

Comments !