An assessment report on ANGUS 2017.
read moreThere are comments.
We just finished teaching a second version of our two-day shotgun metagenome analysis workshop, this time at UC Santa Cruz (the first one was in October 2016, at Scripps Institute of Oceanography). Harriet Alexander led the workshop and Phillip Brooks and I co-taught; Luiz Irber, Shannon Joslin, and Taylor Reiter …
read moreThere are comments.
As part of our Summer Institute in Data Intensive Biology, we will be running nine week-long computational workshops from July 10 to July 17 at the University of California, Davis.
Week 1: July 10-15
There are comments.
Our two-week summer workshop (announcement, direct link) is shaping up quite well, but the application deadline is today! So if you're interested, you should apply sometime before the end of the day. (We'll leave applications open as long as it's March 17th somewhere in the world.)
Some updates and expansions …
read moreThere are comments.
As part of our Summer Institute in Data Intensive Biology, we will be running a week-long instructor training from June 18 to June 25 at the University of California, Davis.
The instructor training will include the following --
There are comments.
I am pleased to announce that we will be running a two-week summer workshop on analyzing high-throughput sequencing data! This workshop will run from June 26-July 8th, 2017, and it is an continuation of the two-week NGS workshop run at Michigan State University since 2010. (You can read about the …
read moreThere are comments.
We just finished teaching a two day workshop at the Scripps Institute of Oceanography down at UC San Diego. Dr. Harriet Alexander, a postdoc in my lab, and I spent two days going through cloud computing, short read quality and k-mer trimming, metagenome assembly, quantification of gene abundance, mapping of …
read moreThere are comments.
Every year since 2010, I've been the primary organizer for a summer workshop on Analyzing Next Generation Sequencing Data. In 2010 and 2011, I was funded internally by the Gene Expression in Disease and Development group at MSU; since 2012, I've had a $50,000/yr grant from the NIH …
read moreThere are comments.
On June 11th, 2010, I remember dropping the last workshop attendee off at the Kalamazoo train station, turning the car towards home, and nearly sobbing in relief that workshop was over and done and I could finally get some sleep now. That workshop was the first of a series of …
read moreThere are comments.
Here at the Lab for Data-Intensive Biology (TM) we are constantly trying to explore new ideas for advancing the practice of biological data sciences. Below are some ideas that originated with or were sharpened by conversations with Greg Wilson (Executive Director, Software Carpentry) and Tracy Teal (Project Lead, Data Carpentry …
read moreThere are comments.
Two weeks ago, I ran a workshop at UC Davis on mRNAseq analysis for semi-model organisms, which focused on building new gene models ab initio -- with a reference genome. This was a milestone for me - the first time I taught a workshop at UC Davis as a professor there! My …
read moreThere are comments.
I was a reviewer on Determining the quality and complexity of next-generation sequencing data without a reference genome by Anvar et al., PDF here. Here is the top bit of my review.
One interesting side note - the authors originally named their tool kMer and I complained about it in my …
read moreThere are comments.
As I mentioned, I am hoping to significantly scale up my training efforts at UC Davis; it's one of the reasons they hired me, it's a big need in biology, and I'm enthusiastic about the whole thing! A key point is that, at least at the beginning, it may replace …
read moreThere are comments.
I just finished reading Svante Paabo's autobiography, Neanderthal Man: In Search of Lost Genomes. The book is perfect -- if you're a biologist of any kind, you'll understand most of it without any trouble, and even physicists can probably get a lot out of the story (heh).
The book describes Svante …
read moreThere are comments.
The fifth annual Analyzing Next Generation Sequencing Data workshop just finished - #ngs2014. As usual the schedule and all of the materials are openly available.
tl; dr? Good stuff.
We've been running this thing since 2010, and we now have almost 120 alumni (5 classes of roughly 24 students each). The …
read moreThere are comments.
A few months back, we submitted a paper, These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure, to PLoS One. We got the (signed) reviews back in December, and I asked the reviewers if I could post their reviews publicly. They …
read moreThere are comments.
We've just posted a new paper to arXiv: "These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure." We'll be submitting it to PLoS One after we wait a few days for comments from the Twittersphere and/or on Haldane's Sieve.
The …
read moreThere are comments.
So, we've been running this course on NGS data analysis. And it's been fun and all. But a lot of work.
About a year ago, I thought hard about whether or not I wanted to apply for renewal, and ended up applying again. You can see the final grant if …
read moreThere are comments.
So, I got this grant. And, um, it looks like khmer has a future, which means... so does my lab.
What is khmer?
khmer is my lab's software for doing various things to sequencing data, and is largely focused on providing good demo implementations of low-memory data structures …
There are comments.
At the "What to Teach Biologists about Computing" meeting (discussed here, a bit) we received a strong message from Our Dear MozSciLabLeader, Kaitlin Thaney. The message was this: if we want to maximize reuse and remixing of educational materials, we should explicitly license them under CC0. (See her talk and …
read moreThere are comments.
(This blog post was mightily helped by Qingpeng Zhang, the first author of the paper; he wrote the pipeline. I just ran it a bunch :)
We have been benchmarking k-mer counters in a variety of ways, in preparation for an upcoming paper. As with the diginorm paper we are automating …
read moreThere are comments.
Leslie Babonis, an attendee at the 2013 NGS course, posted the following on facebook. I'm reposting with permission. --titus
an ode to my lab bench...:
i've returned, my dear friend, after a fortnight away delighted to find you, in just the same way your tube racks still brilliant in hues …read more
There are comments.
We just posted yet another pre-submission paper to arXiv.org:
Assembling large, complex environmental metagenomes
Authors: Adina Chuang Howe, Janet Jansson, Stephanie A. Malfatti, Susannah Tringe, James M. Tiedje, and C. Titus Brown
Abstract:
The large volumes of sequencing data required to deeply sample …read more
There are comments.
We just posted another pre-submission paper to arXiv.org:
Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets
Authors: Adina Chuang Howe, Jason Pell, Rosangela Canino-Koning, Rachel Mackelprang, Susannah Tringe, Janet Jansson, James M. Tiedje, and C. Titus Brown
Abstract:
Sequencing errors and …read more
There are comments.
I gave a talk last Wednesday at U. Michigan in the DCMB program where I included a slide estimating how much DNA sequencing (in base pairs) was needed for good de novo assembly of sequences from various biological environments or problems. The slide was there to motivate the challenges of …
read moreThere are comments.
An increasing number of people are asking about using our assembly approaches for things that we haven't yet written (or posted) papers about. Moreover, our assembly strategies themselves are also under constant evolution as we do more research and find ever-wider applicability of our approaches.
This has been moved to …
read moreThere are comments.
At BOSC 2012, we heard a report from Richard Holland on the Pistoia Alliance Sequence Squeeze competition. I'd run across this a couple of times before -- most notably in the Quip paper -- and was interested in hearing the results.
What was the problem being tackled? To quote,
read moreThe volume of …
There are comments.
One of my favorite in-class exercises is The Assembly Exercise, in which I provided "shotgun sequence" from some English text and ask the students to assemble it. Normally I provide a printout of about 10-20 pages of reads with range of read lengths, error rates, and single/paired end sequences …
read moreThere are comments.
The IPython Notebook (or 'ipynb' for short) is one of the most exciting technologies for teaching and research that I've seen in recent years. It is a completely open source, well architected, and fairly stable system for scientific computing and data exploration.
I've now been using it for teaching for …
read moreThere are comments.
We held the 2012 workshop on Analyzing Next Generation Sequencing Data from June 4 to June 15, at the Kellogg Biological Station in western Michigan, about 30 minutes north of Kalamazoo.
(This is a long delayed blog post. :)
The goal of the workshop is to take biologists with little in …
read moreThere are comments.
At our 2012 course on Analyzing Next-Generation Sequencing Data, we talked quite a bit about future sequencing technologies, as well as about what analyses are reasonably cookbook (and which ones aren't).
Here are my thoughts -- yours welcome!
The basic conclusions about sequencing tech were these:
There are comments.
As part of the 2012 Analyzing Next-Generation Sequencing Data course, I've been trying out ipython notebook for the tutorials.
In previous years, our tutorials all looked like this: Short read assembly with Velvet -- basically, reStructuredText files integrated with Sphinx. This had a lot of advantages, including Googleability and simplicity; but …
read moreThere are comments.
I'm going to pick on Mick Watson today. (It's OK. He's just a foil for this discussion, and I hope he doesn't take it too personally.)
Mick made the following comment on my earlier Big Data Biology blog post:
read moreI do wonder whether there is just a bit too much …
There are comments.
I'm out at a Cloud Computing for the Human Microbiome Workshop and I've been trying to convince people of the importance of digital normalization. When I posted the paper the reaction was reasonably positive, but I haven't had much luck explaining why it's so awesome.
At the workshop, people were …
read moreThere are comments.
We just posted a pre-submission paper to arXiv.org:
A single pass approach to reducing sampling variation, removing errors, and scaling de novo assembly of shotgun sequences
Authors: C. Titus Brown, Adina Howe, Qingpeng Zhang, Alexis B. Pyrkosz, and Timothy H. Brom
Paper Web site, with source code …
read moreThere are comments.
The 2012 MSU Next-gen Sequence Analysis course application period just closed, and we received 168 applicants. Last year, we received 133, and the year before that we received 33.
We can take 24.
I was also invited to go teach a ~1 week workshop at two other universities on these …
read moreThere are comments.
(updated to point to http://arxiv.org/).
Authors: Jason Pell, Arend Hintze, Rosangela Canino-Koning, Adina Howe, James M. Tiedje, C. Titus Brown
Abstract:
The memory requirements for de novo assembly of short-read shotgun sequencing data from complex microbial populations are an increasingly large practical barrier to environmental studies. Here we …read more
There are comments.
(and some related thoughts on reproducibility in computational science)
In a recent news article on the "data deluge" in biology, I was quoted as saying "It's not at all clear what you do with that data. Doing a comprehensive analysis of it is essentially impossible at the moment." So, naturally …
read moreThere are comments.
I'm just on my way back from a JGI workshop on metagenome informatics, and I thought I'd take the opportunity to write up a short review.
The workshop was, frankly, excellent. We saw a bunch of talks on metagenome assembly (my current interest) as well as single-cell sequencing approaches, and …
read moreThere are comments.
There's been a lot of hooplah in the last year or so about the fact that our ability to generate sequence has scaled faster than Moore's Law over the last few years, and the attendant challenges of scaling analysis capacity; see Figure 1a and 1b, this reddit discussion, and also …
read moreThere are comments.
During our next-gen course, a "student" (really a professor from Australia ;) asked me if I could provide some guidance on what computational infrastructure was necessary to handle next-gen sequencing data. While we used Amazon Web Services during the course, she was interested in finding out if they could use their …
read moreThere are comments.
Page 1 / 8 »