I'll be visiting the University of Arizona (the one in Tucson) in a week, Mon Aug 20th - Wed Aug 22nd. I offered to give one of three talks and they said "how about all three?" Uh, ok, I guess...
If you are at U of A and want to meet up, let me know. I can see if there is room during meals esp; some of my breakfasts look suspiciously non-meetinged :).
Without further ado:
Monday, 3:30-5pm, BioSciences West rm 208.
Visiting the Undiscovered Country: exploring non-model organisms and microbial communities with next-gen sequencing
(No abstract requested, but I blog about it a lot :)
Tuesday, 10:30am-12pm, room TBD.
Streaming lossy compression of biological sequence data using probabilistic data structures
In recent years, next-generation DNA sequencing capacity has completely outstripped our ability to computationally digest the resulting volume of data. Driven by the need to actually analyze the data, our lab has developed a suite of novel data structures and algorithms for graph compression and data reduction; in addition to being darned efficient on their own, our approaches make use of probabilistic data structures that enable substantially lower memory usage than the best possible exact approach. Using these approaches we have been able to scale de novo data assembly approaches down to cloud computing infrastructure, and we have also completed some of the largest de novo assemblies of metagenomes ever done. Last but not least, these approaches show the way to essentially infinite de novo assembly of environmental microbial data.
Wednesday, 8:30am - 10am, Bio5 rm 103.
Sensitive detection of splice isoforms from mRNAseq data
We can now deeply and quantitatively sample transcriptomes with relative ease, but analyzing the data is more challenging: while before researchers could largely ignore splice variants, mRNAseq has shed quite a bit of light on the large diversity of splice isoforms present in vertebrate tissue. Recovering these splice variants from the data is difficult even with a good reference genome, but many model organisms do not have sufficiently high quality genome sequences for reference-based approaches to work. We have been working on analyzing transcriptomes from chick development (neural crest) and disease (Marek's Disease) as well as lamprey nerve chord regeneration, and have developed a range of approaches for improving the recovery of isoforms from mRNAseq data. I will also talk about underacknowledged challenges in quantification, tissue-specific mRNA isolation approaches, and detection of genome misassemblies using mRNAseq.
On the plus side, I will finally be able to show off the range of work my lab actually does. On the minus side, I am clearly schizophrenic - or would that be "possessed of multiple personalities"?