Slithering your way into bioinformatics with snakemake, round 2.
read moreThere are comments.
I was one of the reviewers of the Salmon paper by Patro et al., 2017, Salmon provides fast and bias-aware quantification of transcript expression, and I posted my review in large part because of Lior Pachter's blog post levying charges of intellectual theft and dishonest against the Salmon authors. More …
read moreThere are comments.
I was one of the reviewers of the Salmon paper by Patro et al., 2017, Salmon provides fast and bias-aware quantification of transcript expression. I was asked to review the paper on September 14, 2016, and submitted my review (or at least stopped getting reminders :) soon after October 20th.
The …
read moreThere are comments.
This blog post stems from notes I made for a 12 minute talk at the Oregon State Microbiome Initiative, which followed from some previous thinking about data integration on my part -- in particular, Physics ain't biology (and vice versa) and What to do with lots of (sequencing) data.
My talk …
read moreThere are comments.
We just finished teaching a second version of our two-day shotgun metagenome analysis workshop, this time at UC Santa Cruz (the first one was in October 2016, at Scripps Institute of Oceanography). Harriet Alexander led the workshop and Phillip Brooks and I co-taught; Luiz Irber, Shannon Joslin, and Taylor Reiter …
read moreThere are comments.
As part of our Summer Institute in Data Intensive Biology, we will be running nine week-long computational workshops from July 10 to July 17 at the University of California, Davis.
Week 1: July 10-15
There are comments.
Our two-week summer workshop (announcement, direct link) is shaping up quite well, but the application deadline is today! So if you're interested, you should apply sometime before the end of the day. (We'll leave applications open as long as it's March 17th somewhere in the world.)
Some updates and expansions …
read moreThere are comments.
So I've been invited to Imagining Tomorrow's University, and they have this series of questions they'd like me to answer.
(Note that you can follow the conversation at #TomorrowsUni on Twitter.)
Conveniently I already answered many of these questions in my "What is Open Science?" blog post. I've copy/pasted …
read moreThere are comments.
As part of our Summer Institute in Data Intensive Biology, we will be running a week-long instructor training from June 18 to June 25 at the University of California, Davis.
The instructor training will include the following --
There are comments.
I am pleased to announce that we will be running a two-week summer workshop on analyzing high-throughput sequencing data! This workshop will run from June 26-July 8th, 2017, and it is an continuation of the two-week NGS workshop run at Michigan State University since 2010. (You can read about the …
read moreThere are comments.
As part of my Moore Foundation Data Driven Discovery grant, I have to put together annual reports each year. (This is more or less standard for grants. ;) You can read my annual report narrative, here, and my (ancillary, not required) breakdown of projects in the lab, here.
There are comments.
We are currently soliciting applications for computational postdoctoral fellows to undertake exciting projects in computational biology/bioinformatics jointly supervised by Dr. Titus Brown (http://ivory.idyll.org/lab/) and Dr. Fereydoun Hormozdiari (http://www.hormozdiarilab.org/) at UC Davis.
UC Davis is a world class research institution with a strong …
read moreThere are comments.
This is another blog post on MinHash sketches; see also:
There are comments.
Note: This is the fifth post in a mini-series of blog posts inspired by the workshop Envisioning the Scientific Paper of the Future.
This post was put together after the event and benefited greatly from conversations with Victoria Stodden, Yolanda Gil, Monya Baker, Gail Peretsman-Clement, and Kristin Antelman!
There are comments.
Note: This is the fourth post in a mini-series of blog posts inspired by the workshop Envisioning the Scientific Paper of the Future.
This is an outline of the talk I didn't give at Caltech, because I decided that Victoria Stodden and Yolanda Gil were going to cover most of …
read moreThere are comments.
Note: This is the third post in a mini-series of blog posts inspired by the workshop Envisioning the Scientific Paper of the Future.
I've been struggling to put together an interesting talk for the workshop, and last night Gail Clement (our host, @Repositorian) and Justin Bois helped me convinced myself …
read moreThere are comments.
Note: This is the second post in a mini-series of blog posts inspired by the workshop Envisioning the Scientific Paper of the Future.
An important yet rarely articulated assumption of a lot of my work in biological data analysis is that data implies software: it's not much good gathering data …
read moreThere are comments.
Note: This is the first post in what I hope to be a mini-series of blog posts inspired by the workshop Envisioning the Scientific Paper of the Future.
Even preprints go through some review before they're posted, just to make sure they're …
There are comments.
I'm writing this up for the rOpenSci call on Codes of Conduct that I'm participating in today.
My lab has a lab Code of Conduct.
We adapted it from https://github.com/confcodeofconduct/confcodeofconduct.com. So the "how" was easy enough :).
Key points I want to make:
There are comments.
One of the uses that we are most interested in MinHash sketches for is the indexing and search of large public, semi-public, and private databases. There are many specific use cases for this, but the basic goal is to be able to find data sets by content queries, using sequence …
read moreThere are comments.
This is an update to last week's blog post, "Efficiently searching MinHash Sketch collections".
Last week, Thanksgiving travel and post-turkey somnolescence gave me some time to work more with our combined MinHash/SBT implementation. One of the main things the last post contained was a collection of MinHash signatures of …
read moreThere are comments.
There is an update to this blog post: please see "Quickly searching all the microbial genomes, mark 2 - now with archaea, phage, fungi, and protists!
Note: This blog post is based largely on work done by Luiz Irber. Camille Scott, Luiz Irber, Lisa Cohen, and Russell Neches all collaborated on …
read moreThere are comments.
Update: Zenodo will remove content upon request by the owner, and hence is not suitable for long-term archiving of published code and data. Please see my comment at the bottom (which is just a quote from an e-mail from a journal editor), and especially see "Ownership" and "Withdrawal" under Zenodo …
read moreThere are comments.
We just finished teaching a two day workshop at the Scripps Institute of Oceanography down at UC San Diego. Dr. Harriet Alexander, a postdoc in my lab, and I spent two days going through cloud computing, short read quality and k-mer trimming, metagenome assembly, quantification of gene abundance, mapping of …
read moreThere are comments.
Our first JOSS submission (paper? package?) is about to be accepted and I wanted to enthuse about the process a bit.
JOSS, the Journal of Open Source Software, is a place to publish your research software packages. Quoting from the about page,
The Journal of Open Source Software (JOSS) is …read more
There are comments.
(This is an invited chapter for a memorial book about my father. You can also read my remembrances from the day after he passed away.)
Dr. Gerald E. Brown was a well known nuclear physicist and astrophysicist who worked at Stony Brook University from 1968 until his death in 2013 …
read moreThere are comments.
I just left Woods Hole, MA, where I spent the last 6 and a half weeks taking the Microbial Diversity course as a student. It was fun, exhausting, stimulating, and life changing!
The course had three components: a lecture series, in which world-class microbiologists gave 2-3 hrs of talks each …
read moreThere are comments.
As I wrote last week my latest enthusiasm is MinHash sketches, applied (for the moment) to RNAseq data sets. Briefly, these are small "signatures" of data sets that can be used to compare data sets quickly. In the previous blog post, I talked a bit about their effectiveness and showed …
read moreThere are comments.
(I gave a talk on this on Monday, April 11th - you can see the slides slides here, on figshare.
This is a Reproducible Blog Post. You can regenerate all the figures and play with this software yourself on binder.)
So, my latest enthusiasm is MinHash sketches.
A few weeks back …
read moreThere are comments.
So, there's this fairly large collection of about 700 RNAseq samples, from 300 species in 40 or so phyla. It's called the Marine Microbial Eukaryotic Transcriptome Sequencing Project (MMETSP), and was funded by the Moore Foundation as a truly field-wide collaboration to improve our reference collection for genes (and more …
read moreThere are comments.
I'm writing a proposal to the Sloan Foundation for about $20k to support a workshop to hack on mybinder. Comments solicited. Note, it's, umm, due today ;).
(I know the section on "major related work" is weak. I could use some help there.)
If you're interested in participating and don't mind …
read moreThere are comments.
This March, Andreas Hejnol invited me to give a talk in Bergen, Norway, and as part of the trip I arranged to also give a trial workshop on "computational reproducibility" at the University in Oslo, where my friend & colleague Lex Nederbragt works.
(The workshop materials are here, under CC0.)
The …
read moreThere are comments.
Over the last few months, I've been playing with hypothes.is and thinking about how to use it to further my scientific work. This resulted in some brainstorming with Jon Udell and Maryann Martone about, well, lots of things. And now we're putting in an open science prize entry!
tl …
read moreThere are comments.
If you haven't seen mybinder.org, you should go check it out. It's a site that runs IPython/Jupyter Notebooks from GitHub for free, and I think it's a solution to publishing reproducible computational work.
For a really basic example, take a look at my demo Software Carpentry lesson. Clicking …
read moreThere are comments.
We're running three half-day workshops for your remote viewing pleasure! All three will be live-streamed via YouTube (well, Hangouts on Air).
Today 2/17, at 9:15am PT, Tiffany Timbers (Simon Fraser U.) is going to be going through regular expressions in Python - see the workshop description.
On Friday, 2 …
read moreThere are comments.
CORRECTION: I mistakenly linked to Geoff Bilder, Jennifer Lin, and Cameron's piece on infrastructure in the first posted version, rather than Cameron's post on culture in data sharing. Both are worth reading but the latter is more relevant to this post, and I also wanted to make sure I correctly …
read moreThere are comments.
Today at 9:15am PT, Raniere Silva will be giving a lesson on advanced git usage, including bisect, branches, pull requests, and rebasing.
This lesson will be broadcast via YouTube as a Hangout on Air - if you're interested in watching it, we will post the link on the workshop page …
read moreThere are comments.
I wrote the below in response to someone who e-mailed me about trying out our partitioning approach for metagenome assembly.
yes, the original partitioning approach worked only on low coverage data sets. The main reason is that highly connected regions (repeats, from biology; and some kinds of sequencing errors) are …
read moreThere are comments.
A while back, Kai Blin (via Nick Loman) asked Michael Barton:
If we containerize all these things won't it just encourage worse software development practices; right now developers still need to consider someone other than themselves installing the software.
and Michael Barton's response, transcribed, was:
"It's a good point. Ultimately …read more
There are comments.
We just finished the second day of a workshop on Docker at the Berkeley Institute for Data Science. I was invited to organize the workshop after some Berkeley folk couldn't make our Davis workshop in November, and so I trundled on down for two days to give it a try …
read moreThere are comments.
Recently I was asked by someone at a funding organization about the term "hardening software"; I wrote a blog post asking others what they thought, and this got a number of great comments (as well as spurring Dan Katz to write a blog post of his own). I'd already written …
read moreThere are comments.
I just received an e-mail from someone in the funding world who thinks a lot about software, and they were interested in any thoughts I might have on the term "software hardening", and its practice. To quote,
This is about making research software more robust, more easily usable and possibly …read more
There are comments.
When I decided to move to UC Davis, I was also seizing the opportunity to dramatically expand my training efforts. At Davis, it was made clear to me that I could substitute skills training -- in whatever guise I chose -- for my for-credit teaching; in consequence, I'd asked for money for …
read moreThere are comments.
On Friday, Emily Dolson, a doctoral student at Michigan State University in Charles Ofria's lab, walked a bunch of us through d3.js for data visualization. Crucially, she did this from Michigan, and in addition to a local classroom, taught three other classrooms -- one in Florida, one in Virginia, and …
read moreThere are comments.
Every year since 2010, I've been the primary organizer for a summer workshop on Analyzing Next Generation Sequencing Data. In 2010 and 2011, I was funded internally by the Gene Expression in Disease and Development group at MSU; since 2012, I've had a $50,000/yr grant from the NIH …
read moreThere are comments.
As part of our Docker hands-on workshop earlier this month, I learned a lot about building Dockerfiles, running Docker containers on remote hosts with docker-machine, and using data volumes to manage data in remotely hosted Docker containers.
During and after the workshop, I put together Docker images (and, more importantly …
read moreThere are comments.
What are the most concretely useful, interesting, awesome or neat things about Project Jupyter?
Over here at the Lab for Data Intensive Biology, we're putting on a workshop on Project Jupyter notebooks (the notebook system formerly known as IPython Notebook). This will be a two-day hands-on workshop, Carpentry-style, with the …
read moreThere are comments.
I've often wanted to mark up arbitrary Web sites with annotations and reminders, and it's always been puzzling to me that this is missing from the Web. In recent years, I've heard more and more about a small non-profit called Hypothesis, which provides this general functionality via both a Chrome …
read moreThere are comments.
This version of the site policy was posted in Oct 2015 and applies going forward from then. I'll provide an updated link whenever I update the policy.
The text was lifted from Captain Awkward's site policy at the URL http://captainawkward.com/site-policies-and-faqs/ on Oct 19, 2015, and then modified …
read moreThere are comments.
Pubwication. Pubwication is what bwings us togethew today. Pubwication, that bwessed awwangement, that dweam within a dweam. And authorship, twue authorship, wiww fowwow you fowevah and evah. So tweasuwe youw authorship.
Last week, our software paper on khmer 2.0 was published on F1000Research. We intend this paper to be …
read moreThere are comments.
Note: at the Lab for Data Intensive Biology, we're trying out a new journal club format where we summarize our thoughts on the paper in a blog post. For this blog post, Lisa Cohen wrote the majority of the text and the rest of us added questions and comments; Lisa …
read moreThere are comments.
I just heard the sad news that Eric Davidson, my PhD advisor, passed away.
Eric was a giant in the field of developmental biology and gene regulatory networks. His work spanned more than fifty years, and had an indelible impact on gene regulation studies. (You can read up on his …
read moreThere are comments.
On June 11th, 2010, I remember dropping the last workshop attendee off at the Kalamazoo train station, turning the car towards home, and nearly sobbing in relief that workshop was over and done and I could finally get some sleep now. That workshop was the first of a series of …
read moreThere are comments.
Just as I was moving to UC Davis, a funding call for a training coordination center came out. I got partway down the path of applying for it before realizing that I was overwhelmed with the move, but I did generate some text that I thought was OK. Here it …
read moreThere are comments.
Note: at the Lab for Data Intensive Biology, we're trying out a new journal club format where we summarize our thoughts on the paper in a blog post. For this blog post, Luiz wrote the majority of the text and the rest of us added questions and comments.
The paper …
read moreThere are comments.
Note: A year ago, I wrote this in response to an editorial request. Ultimately they weren't interested in publishing it, and I got distracted and this languished on my hard disk. So when I remembered it recently, I decided to just push it out to my blog, where I should …
read moreThere are comments.
Note: Last week, I submitted my review of Stephen R. Piccolo, Adam B. Lee, and Michael B. Frampton's paper, Tools and techniques for computational reproducibility. Soon after, Dan Katz wrote a blog post about notebooks, and in a comment I mentioned Piccolo's paper; and, after dropping a note to Dr …
read moreThere are comments.
This is a response to (parts of) Dr. Lior Pachter's post, "The myths of bioinformatics software". (You can also see my post on bioinformatics software licensing for at least some of the background arguments.)
I agree with a lot of what Lior says: most bioinformatics software is not very good …
read moreThere are comments.
If a piece of bioinformatics software is not fully open source, my lab and I will generally seek out alternatives to it for research, teaching and training. This holds whether or not the software is free for academic use.
If a piece of bioinformatics software is only available under the …
read moreThere are comments.
(This is a review of Large-Scale Search of Transcriptomic Read Sets with Sequence Bloom Trees, Solomon and Kingsford, 2015.)
In this paper, Solomon and Kingsford present Sequence Bloom Trees (SBTs). SBT provides an efficient method for indexing multiple sequencing datasets and finding in which datasets a query sequence is present …
read moreThere are comments.
We just submitted our review of the paper Large-Scale Search of Transcriptomic Read Sets with Sequence Bloom Trees., by Brad Solomon and Carl Kingsford.
The paper outlines a fairly simple and straightforward way to query massive amounts of sequence data (5 TB of mRNAseq!) in very small disk (~70 GB …
read moreThere are comments.
Yesterday morning, we announced a Software Carpentry workshop here at UC Davis, running July 6-7 -- see the Web site for more information. I'm organizing, and Easton White and Noam Ross are co-lead instructors. (This is the first workshop I'm running since we became an affiliate!)
I'd love it if you …
read moreThere are comments.
Some background: I'm a white, male, tenured faculty member at UC Davis, and a 3rd generation academic. I work in relatively uncontroversial areas of science (primarily bioinformatics & genomics) at a university that is about as protective of academic freedom as you can get these days. I also live in a …
read moreThere are comments.
I gave a presentation at the BEACON Center's coding group this past Monday; here are my notes and followup links. Thanks to Luiz Irber for scribing!
My short slideshow: here
The khmer project is on github, and we have a tutorial for people who want to try out our development …
read moreThere are comments.
On Tuesday, I wrote a draft blog post in response to Michael Eisen's blog post on how Lior Pachter's blog post was a a model for post-publication peer review (PPPR). (My draft post suggested that scientific bloggers aim for inclusivity by adopting a code of conduct and posting explicit site …
read moreThere are comments.
I'm starting to work on a grant renewal for khmer, and with a lot of help from the community, including most especially Richard Unna-Smith, I've put together the following blurb. Suggestions for things to rearrange, highlight or omit welcome, as well as suggestions for things to add. I can't make …
read moreThere are comments.
Note: at the Lab for Data Intensive Biology, We're trying out a new journal club format where we summarize our thoughts on the paper in a blog post. For this blog post, Camille wrote the majority of the text and the rest of us added questions and comments.
Inanç Birol …
read moreThere are comments.
Last week we wrote five blog posts about some previously un-publicized features in the khmer software - most specifically, read-to-graph alignment and sparse graph labeling -- and what they enabled. We covered some half-baked ideas on graph-based error correction, variant calling, abundance counting, graph labeling, and assembly evaluation.
It was, to be …
read moreThere are comments.
One of our long-term interests has been in figuring out what the !$!$!#!#%! assemblers actually do to real data, given all their heuristics. A continuing challenge in this space is that short-read assemblers deal with really large amounts of noisy data, and it can be extremely hard to look at assembly …
read moreThere are comments.
De Bruijn graph alignment should also be useful for exploring concepts in transcriptomics/mRNAseq expression. As with variant calling graphalign can also be used to avoid the mapping step in quantification; and, again, as with the variant calling approach, we can do so by aligning our reference sequences to the …
read moreThere are comments.
There's an interesting and intuitive connection between error correction and variant calling - if you can do one well, it lets you do (parts of) the other well. In the previous blog post on some new features in khmer, we introduced our new "graphalign" functionality, that lets us align short sequences …
read moreThere are comments.
About a month ago, I took some time to try out Docker, a container technology that lets you bundle together, distribute, and execute applications in a lightweight Linux container. It seemed neat but I didn't apply it to any real problems. (Heng Li also tried it out, and came to …
read moreThere are comments.
After a fair amount of time thinking about software's place in science (see blog posts 1, 2, 3, and 4), and thinking about khmer's short- and long-term future, we're making some changes to our development process.
Semantic versioning: The first change, and most visible one, is that we are going …
read moreThere are comments.
I finally got a chance to more thoroughly read Mark Stalzer and Chris Mentzel's arxiv preprint, "A Preliminary Review of Influential Works in Data-Driven Discovery". This is a short review paper that discusses concepts highlighted by the 1,000+ "influential works" lists submitted to the Moore Foundation's Data Driven Discovery …
read moreThere are comments.
We just finished teaching the second of my RNAseq workshops at UC Davis -- the fifth workshop I've hosted since I took a faculty position here in VetMed. In order, we've done a Train the Trainers, a Data Carpentry, a reference-guided RNAseq assembly workshop, a mothur (microbial ecology) workshop, and a …
read moreThere are comments.
Note - this was an internal funding request solicited by the Center for Open Science. It's been funded!
Brief: We propose to integrate OSF into Galaxy as a data store. For this purpose, we request 3 months of funding (6 months, half-time) for one developer, plus travel.
Introduction and summary: Galaxy …
read moreThere are comments.
So I wrote this thing that got an awful lot of comments, many telling me that I'm just plain wrong. I think it's impossible to respond comprehensively :). But here are some responses.
In that blog post, I argued that software shouldn't …
There are comments.
Update - I've written Yet Another blog post, More on scientific software on this topic. I think this blog post is a mess so you should read that one first ;).
This blog post was spurred by a simple question from Pauline Barmby on Twitter. My response didn't, ahem, quite fit in …
read moreThere are comments.
Here are some statistics from this year's applications to the NGS course. Briefly, this is a two-week workshop on sequence analysis at the command line and in the cloud.
The short version is that demand remains high; note that we admit only 24 applicants, so generally < 20%...
Year | Number of … |
---|
There are comments.
I'm reading Galileo's Middle Finger by Dr. Alice Dreger (@alicedreger), and it's fantastic. It's a paean to evidence-based popular discourse on scientific issues -- something I am passionate about -- and it's very well written.
I bought the book because I ran across Dr. Dreger's excellent and hilarious live-tweeting of her son's …
read moreThere are comments.
tl;dr? A while back I wrote that there are three uses of research software: replication, reproduction, and reuse. The world of computational science would be better off if people clearly delineated whether or not they wanted anyone else to reuse their software, and I think it's a massive mistake …
read moreThere are comments.
At PyCon 2015, I had the pleasure of attending the Ally Skills Workshop, organized by @adainitiative (named after Ada Lovelace).
The workshop was a 3 hour strongly guided discussion centering around 4-6 person group discussion of short scenarios. There's a guide to running them here, although I personally would not …
read moreThere are comments.
(The below issues are very much on my mind as I think about how to apply for another NIH grant to fund continued development on the khmer project.)
Imagine that we have a graph of novel functionality versus software engineering effort for a particular project, cast in the shape of …
read moreThere are comments.
I'm at the PyCon 2015 sprints (day 2), and I took the opportunity to play around with Docker a bit.
First, I created a local docker container that contained an installed version of khmer. I ran a blank docker container:
docker run -it ubuntu
and then installed the khmer prereqs …
read moreThere are comments.
I'm at the PyCon 2015 sprints (day 2), and I took the opportunity to play around with named pipes.
I was reminded of named pipes by Vince Buffalo in this great blog post, and since we at the khmer project are very interested in streaming, and named pipes fit well …
read moreThere are comments.
Here are talk notes and links for my PyCon 2015 talk.
The talk slides are up on SlideShare.
You should definitely check out Mike Lin's great blog posts on "Blogging my genome".
I found SNPedia through this wonderful blog post on how to use 23andMe irresponsibly, on Slate …
There are comments.
Note: Turns out Nick Loman is a C programmer. Well, that's what happens when I make assumptions, folks ;).
Jared Simpson just posted a great blog entry on nanopolish, an HMM-based consensus caller for Oxford Nanopore data. In it he describes how he moved from a Python prototype to a standalone …
read moreThere are comments.
This is a stub blog post for the talk notes for my OpenCon talk on how to get tenure as an open scientist.
A few links --
More …
read moreThere are comments.
A few weeks back, a journalist contacted me about my old blog post comparing physics and biology, and amidst other conversation, I pointed them at my latest blog post on data and said that I thought a lot of (molecular) biologists were "culturally confused about data". The next question was …
read moreThere are comments.
Here at the Lab for Data-Intensive Biology (TM) we are constantly trying to explore new ideas for advancing the practice of biological data sciences. Below are some ideas that originated with or were sharpened by conversations with Greg Wilson (Executive Director, Software Carpentry) and Tracy Teal (Project Lead, Data Carpentry …
read moreThere are comments.
We just posted a new preprint (well, ok, a few weeks back)! The preprint title is "Crossing the streams: a framework for streaming analysis of short DNA sequencing reads", by Qingpeng Zhang, Sherine Awad, and myself. Note that like our other recent papers, this paper is 100% reproducible, with all …
read moreThere are comments.
I've just opened applications for the 2015 summer course on Analyzing Next Generation Sequencing Data. The course will run from August 10th through August 21st; for details, see the course information page.
This year there will also be a third week of the course that is invitation-only. This third week …
read moreThere are comments.
The other day I was contacted by someone whose student wants to attend the MSU NGS course in 2015, because they are interested in learning how to data integration with (among other things) metagenome data. My response was "we don't cover that in the course", which isn't very helpful ;).
So …
read moreThere are comments.
On March 19th and 20th, the Center for Open Science hosted a small meeting in Charlottesville, VA, convened by COS and co-organized by Kaitlin Thaney (Mozilla Science Lab) and Titus Brown (UC Davis). People working across the open science ecosystem attended, including publishers, infrastructure non-profits, public policy experts, community builders …
read moreThere are comments.
On a recent west coast speaking junket where I spoke at OSU, OHSU, and VanBUG (Brown PNW '15!), I put together a new talk that tried to connect our past work on scaling metagenome assembly with our future work on driving data sharing and data integration. As you can maybe …
read moreThere are comments.
I'm returning from a small, excellent meeting on "Open Source, Open Science", held at the Center for Open Science in Charlottesville, VA. We'll post a brief meeting report soon, but I wanted to share my particular highlights --
First, I got a chance to really dig into what the Center for …
read moreThere are comments.
Two weeks ago, I ran a workshop at UC Davis on mRNAseq analysis for semi-model organisms, which focused on building new gene models ab initio -- with a reference genome. This was a milestone for me - the first time I taught a workshop at UC Davis as a professor there! My …
read moreThere are comments.
I've been putting together a streaming API for khmer that would let us use generators to do sequence analysis, and I'd be interested in thoughts on how to do it in a good Pythonic way.
Some background: a while back, Alex Jironkin asked us for high level APIs, which turned …
read moreThere are comments.
A colleague who is starting their own computational lab just asked me for some advice on how to run software projects, and I wrote up the following. Comments welcome!
A brief summary of what we've converged on for our own needs is this:
everything's on github (you can have private …
There are comments.
Michael R. Crusoe and I are throwing a sprint!
Somewhat in the vein of last year's mini-Hackathon, Michael and I and other members of the lab are going to focus in on reviewing contributions and closing issues on the khmer project for a 5 day period.
read moreThere are comments.
We are pleased to announce that the Laboratory for Data Intensive Biology at UC Davis has joined the Software Carpentry Foundation as an Affiliate Member for three years, starting in January 2015.
"We've been long-term supporters of Software Carpentry, and Affiliate status lets us support the Software Carpentry Foundation in …
read moreThere are comments.
It may not surprise peope to learn that I was one of the reviewers on the MEGAHIT metagenome assembly paper... which is now published!.
Below is my review, edited to remove all of the stuff they addressed in their revision.
Please also see our first blog post on MEGAHIT and …
read moreThere are comments.
Today at 3pm EST, the Moore Data Driven Discovery Investigators will be answering questions on reddit, in the science "ask me anything (AMA)" series. This is an opportunity to ask us anything you want about our research, data-driven discovery more generally, or ...well, you tell us!
read moreThere are comments.
I participated in my second Balti and Bioinformatics on Wednesday - unlike the first one, which ended with only slightly sketchy Indian food in Birmingham, this one was entirely online. The technology worked really well and I think this is a great way to do talks!
For those that haven't seen …
read moreThere are comments.
A while back, someone else's graduate student asked me (slightly edited to protect the innocent :) --
I already have two independent sets of de novo transcriptome assemblies and annotations of the NGS data [...] 1) from the company who did the sequencing and analysis, and 2) from our pipeline here. It would …read more
There are comments.
On December 10th, 2014, I was formally awarded tenure at UC Davis, where I will start as an Associate Professor in the School of Veterinary Medicine on January 5th, 2015. In my research statement for my job application, I wrote:
Open science and scientific reproducibility: I am a strong advocate …read more
There are comments.
Here are three movies that I can recommend, starting with the best.
Jiro Dreams of Sushi - a great movie about the pursuit of perfection, one slice of fish at a time. Truly inspiring. Software engineers and scientists will recognize and identify with Jiro's aspirations.
The Angels' Share - a fun, laid …
read moreThere are comments.
I was a reviewer on Determining the quality and complexity of next-generation sequencing data without a reference genome by Anvar et al., PDF here. Here is the top bit of my review.
One interesting side note - the authors originally named their tool kMer and I complained about it in my …
read moreThere are comments.
The apocalypse is nigh. Soon, binary executables and containers in object stores will join the many Web-based pipelines and the several virtual machine images on the dystopic wasteland of "reproducible science."
Anyway.
I had a conversation a few weeks back with a senior colleague about container-based approaches (like Docker) wherein …
read moreThere are comments.
Dear <chairs>,
I am resigning my Assistant Professor position at Michigan State University effective January 2nd, 2015.
Sincerely,
CTB.
Anticipated FAQ:
There are comments.
Brian O'Shea (a physics prof at Michigan State) asked me the following, and I thought I'd post it on my blog to get a broader set of responses. I know the answer is "Python 3", but I would appreciate specific thoughts from people with experience either with the specific packages …
read moreThere are comments.
A few months ago, I wrote a short description of how we make our papers replicable in the lab. One problem with this process is that for complex pipelines, it's not always obvious how to connect a number in the paper to the steps in the pipeline that produced it …
read moreThere are comments.
As we think about the next few years of khmer development, it is helpful to explore what khmer is, and what our goals for khmer development are. This can provide guiding principles for development, refactoring, extension, funding requests, and collaborations.
Comments solicited!
Links:
There are comments.
Here's an excerpt from an e-mail to a student whose committee I'm on; they were asking me about a comment their advisor had made that they shouldn't put a result in a paper because "It'll confuse the reviewer."
One thing to keep in mind is that communicating the results _is …read more
There are comments.
A colleague just e-mailed me to ask me how I felt about journal impact factor being such a big part of the Academic Ranking of World Universities - they say that 20% of the ranking weight comes from # of papers published in Nature and Science. So what do I think?
There are comments.
Sean Eddy wrote an interesting blog post on how scripting is something every biologist should learn to do. This spurred a few discussions on Twitter and elsewhere, most of which devolved into the usual arguments about what, precisely, biologists should be taught.
I always find these discussions not merely predictable …
read moreThere are comments.
Since being chosen as a Moore Foundation Data Driven Discovery Investigator, I've been putting together the paperwork at UC Davis to actually receive the money. Part of that is putting together a budget and a Statement of Work to help guide the conversation between me, Davis, and the Moore Foundation …
read moreThere are comments.
Yesterday I gave my third keynote address ever, at the Australasian Genomics Technology Association's annual meeting in Melbourne (talk slides here). On my personal scale of talks, it was a 7 or 8 out of 10: I gave it a lot of energy, and I think the main messages got …
read moreThere are comments.
A few weeks back, Nick Loman (via Manoj Samanta) brought MEGAHIT to our attention on Twitter. MEGAHIT promised "an ultra-fast single-node solution for large and complex metagenome assembly" and they provided a preprint and some open source software. This is a topic near and dear to my heart (see Pell …
read moreThere are comments.
I would like to build a community site. Or, more precisely, I would like to recognize, collect, and collate information from an already existing but rather diffuse community.
The focus of the community will be academic data science, or "data driven discovery". This is spurred largely by the recent selection …
read moreThere are comments.
I am very, very happy to announce that I have been selected to be one of the fourteen Moore Data Driven Discovery Investigators.
This is a signal investment by the Moore Foundation into the burgeoning area of data-intensive science, and it is quite a career booster. It will provide my …
read moreThere are comments.
Note: the source data for this is available on github at https://github.com/ctb/dddi
Today, the Moore Foundation announced that they have selected fourteen Moore Data Driven Discovery Investigators.
In reverse alphabetical order, they are:
Dr. Ethan White, University of Florida
Proposal: Data-intensive forecasting and prediction for ecological …
read moreThere are comments.
At the NIH ADDS meeting, we had several breakout sessions. Michelle Dunn and I led the training session. For this breakout, we had the following agenda:
First, build "sticky-note clouds", with one sticky-note cloud with notes for each of the following topics:
There are comments.
In Extracting shotgun reads based on coverage in the data set, we showed how to get a read coverage spectrum for a shotgun data set. This is a useful diagnostic tool that can be used to estimate total genome size, average coverage, and repetitive content.
Uses for this recipe include …
read moreThere are comments.
I am pleased to announce that Dr. Greg Wilson will be giving a two-day Software Carpentry Instructor Training workshop at UC Davis, January 6-7, 2015. This will be an in-person version of the instructor training that Greg runs every quarter; see my blog post about the first such instructor training …
read moreThere are comments.
As I mentioned, I am hoping to significantly scale up my training efforts at UC Davis; it's one of the reasons they hired me, it's a big need in biology, and I'm enthusiastic about the whole thing! A key point is that, at least at the beginning, it may replace …
read moreThere are comments.
Update 3/29/15: the CAMI FAQ now includes information on reproducibility measures, and looks very promising. The data sets they are producing also seem fascinating.
If you're into metagenomics, you may have heard of CAMI, the Critical Assessment of Metagenome Interpretation. I've spoken to several people about it in …
read moreThere are comments.
This is a recipe that provides a time- and memory- efficient way to loosely estimate the likely size of your assembled genome or metagenome from the raw reads alone. It does so by using digital normalization to assess the size of the coverage-saturated de Bruijn assembly graph given the reads …
read moreThere are comments.
This recipe provides a time-efficient way to determine whether you've saturated your sequencing depth, i.e. how much new information is likely to arrive with your next set of sequencing reads. It does so by using digital normalization to generate a "collector's curve" of information collection.
Uses for this recipe …
read moreThere are comments.
Inspired by Sarah Bisbing's excellent post on her first year as a faculty member, here are the questions I remember asking myself during my first six years:
Year 0: What science do I want to do?
Year 1: What the hell am I doing all day and why am I …
read moreThere are comments.
The below is a recipe for subsetting a high-coverage data set to a given average coverage. This differs from digital normalization because the relative abundances of reads should be maintained -- what changes is the average coverage across all the reads.
Uses for this recipe include subsampling reads from a super-high …
read moreThere are comments.
In recent days, we've gotten several requests, including two or three on the khmer mailing list, for ways to extract shotgun reads based on their coverage with respect to the reference. This is fairly easy if you have an assembled genome, but what if you want to avoid doing an …
read moreThere are comments.
I just finished reading Svante Paabo's autobiography, Neanderthal Man: In Search of Lost Genomes. The book is perfect -- if you're a biologist of any kind, you'll understand most of it without any trouble, and even physicists can probably get a lot out of the story (heh).
The book describes Svante …
read moreThere are comments.
Every month, Bjorn Ostman finds another sucker^W^W^W organizes a Carnival of Evolution blog post, that does a roundup of blogs on evolution from a previous month. This month, I'm hosting it -- it's a bit late, due to some teaching duties, so apologies!
Trigger warning: This blog post …
read moreThere are comments.
This past weekend, I accepted an offer to join UC Davis as an Associate Professor of Genetics in the Department of Population Health and Reproduction, in the School of Veterinary Medicine. The appointment is still pending tenure review, but I expect to join Davis whether or not they give me …
read moreThere are comments.
The fifth annual Analyzing Next Generation Sequencing Data workshop just finished - #ngs2014. As usual the schedule and all of the materials are openly available.
tl; dr? Good stuff.
We've been running this thing since 2010, and we now have almost 120 alumni (5 classes of roughly 24 students each). The …
read moreThere are comments.
Here are my talk notes for the Data Driven Discovery grant competition ("cage match" round). Talk slides are on slideshare You can see my full proposal here as well.
Hello, my name is Titus Brown, and I'm at Michigan State University where I run a biology group whose motto is …
read moreThere are comments.
Our lab is part of the ongoing online conversation about how to properly credit software and algorithms; as is my inclination, we're Just Trying Stuff (TM) to see what works. Here's an update on our latest efforts!
A while back (with release 1.0 of khmer) we added a CITATION …
read moreThere are comments.
Note to all: this is satire... As Marcia McNutt says below, please see Science Magazine's Contributors FAQ for more detailed information.
Recently I had some conversations with Science Magazine about preprints, and when they're counted as double publication (see: Ingelfinger Rule). Now, Science has an enlightened preprint policy:
...we do …read more
There are comments.
In September, I will be visiting the NIH to "chart the next 5 years of data science at the NIH." This meeting will use an open space approach, and we were asked to provide some suggested topics. Here are five topics that I suggested, and one that Jeramia Ory suggested …
read moreThere are comments.
Create a github repository named something like '2014-paper-xxxx'. Ask me for name suggestions.
In that github repo, do the following:
Write a Makefile or some other automated way of generating all results from data - see
https://github.com/ged-lab/2013-khmer-counting/blob/master/pipeline/Makefile
or ask Camille (@camille_codon) what …
There are comments.
These are the talk notes for my opening talk at the 2014 Bioinformatics Open Source Conference.
Normally my talk notes aren't quite so extensive, but for some reason I thought it would be a good idea to give an "interesting" talk, so my talk title was "A History of Bioinformatics …
read moreThere are comments.
I'm at the 2014 Marine Microbes Gordon Conference right now, and at the end of my talk, I brought up the point that the function of most genes is unknown. It's not a controversial point in any community that does environmental sequencing, but I feel it should be mentioned at …
read moreThere are comments.
We just released khmer v1.1, a minor version update from khmer v1.0.1 (minor version update: 220 commits, 370 files changed).
Cancel that -- _I_ just released khmer, because I'm the release manager for v1.1!
As part of an effort to find holes in our documentation, "surface" any …
read moreThere are comments.
Eli Kintisch (@elikint) just wrote a very nice article on "Sharing in Science" for Science Careers; his article contained quotes from my MSU colleague Ian Dworkin as well as from me.
When Eli sent me an e-mail with some questions about open science, I responded at some length (hey, I …
read moreThere are comments.
As part of the 2-day Mozilla Science Labs hackathon in late July, the khmer project will be providing a "mentored open source contributathon" experience. This will provide an opportunity for people interested in trying out our instance of the "github flow" model, in which contributions are submitted for review using …
read moreThere are comments.
(or, What I Did For One Day Of My Summer Vacation.)
tl;dr? I played around with building a CountMin Sketch that is dynamic in size, based on a scalable Bloom Filter approach. I'm not sure it worked. Thoughts, suggestions, help?
In our research, we've made some hay …
There are comments.
About 10 days ago, I gave a talk in Manchester to Carole Goble's group, hosted by Aleksandra Pawlik. The talk title was "Six ways to Sunday: Approaches to computational reproducibility in non-model sequence analysis." I've posted the slides (here).
For the talk, I put together a list of five things …
read moreThere are comments.
I'm on a European trip that involves several plane flights accompanied by long airport stays, and I just used some of that time to do a bit of tedious coding on khmer.
The coding I did was to add proper exception handling to khmer's internal file loading routines (see the …
read moreThere are comments.
Earlier today, I posted our response to the reviewers' comments on our k-mer counting paper, "These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure.
A side note -- I was wondering how many public examples there are of the whole paper submission …
read moreThere are comments.
A few months back, we received some reviews for our paper on k-mer counting with khmer. After many months, we (mostly Qingpeng Zhang, the first author) has finished revising the paper. Here is our response to reviewers.
The latest (resubmitted) version of the paper is here, while the version the …
read moreThere are comments.
I've just made my full application to the Moore Foundation's Data Driven Discovery Investigator program available, but wanted to post an HTML version, too. You can also see a short sci-fi story about what I want to enable.
You might wonder why I'm posting this. Well, there's a snowflakes chance …
read moreThere are comments.
My second-round Data Driven Discovery application is due on Monday, and my first draft contained the following story. I don't think I'll include it in the actual application, but it was entertaining enough to write that I thought I'd post it here.
A vision of the future I would like …
read moreThere are comments.
There were lots of problems with PyCon this year. For example, the free, hi-speed wifi made you log in each day. And it was in Montreal, so one of my foreign students couldn't come because he didn't get a visa in time. The company booths were not centrally located. And …
read moreThere are comments.
tl;dr? The Software Carpentry train-the-trainers workshop in Toronto this past M-W was just fantastic. I can't recommend it enough.
A bit of background: Software Carpentry is a project to teach scientists to use computing more effectively. Started by Greg Wilson about 16 years ago, the project has progressed through …
read moreThere are comments.
I recently finished two books that really stood out in my memory of recent reading. Lighter fare than I have recommended in the past, but more representative of what I read day to day :)
The first book, The Martian, is a near-future book about an astronaut who wakes up on …
read moreThere are comments.
So my daughter just participated in her first science fair, at the age of 6. ("Conclusion: science can be fun! and sticky!")
Over dinner, my wife and I came up with some ideas for her next fair. She was having trouble dissolving sugar in ice water, so we suggested maybe …
read moreThere are comments.
Links, software, thoughts -- all solicited! Add 'em below or send 'em to me, t@idyll.org.
---
Imagine... a rolling 48 hour hackathon, internationally teleconferenced, on reproducing analyses in preprints and papers. Each room of contributors could hack on things collaboratively while awake, then pass it on to others in overlapping …
read moreThere are comments.
Resources:
There are comments.
I'm pleased to announce the publication of "Tackling soil diversity with the assembly of large, complex metagenomes", by Adina Howe, Janet Jansson, Stephanie Malfatti, Susannah Tringe, James Tiedje, and myself. The paper is openly available on the PNAS Web site here (open access).
External links:
read moreThere are comments.
Note: updated 2/18 with Benton Gravely's name -- he did the squid genome sequencing!
A few months back, I announced the khmer protocols project, an effort to write down an explicit, open protocol for transcriptome and metagenome assembly. This project was started during the summer of 2013 at the Woods …
read moreThere are comments.
This term, I'm once again teaching my upper-division CSE undergrad course in Web Dev here at MSU. For the second time, I'm requiring students to use github for their homework; unlike last year, I now understand pull requests and have integrated them into the process.
There are comments.
A few months back, we submitted a paper, These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure, to PLoS One. We got the (signed) reviews back in December, and I asked the reviewers if I could post their reviews publicly. They …
read moreThere are comments.
On January 27th, 2014, the MSU BEACON Center graduate students held a panel on how to review. The panel was organized by Emily Weigel (@choosy_female) and Jory Schossau, Bjorn Ostman (@CarnyEvolution), Kristin Parent, Rich Lenski (@RELenski), and Arend Hintze were panel members, together with me.
I put together the …
read moreThere are comments.
I've just posted the narrative for a recently funded USDA grant on improving the quality of the chick genome assembly on the lab's research page. The issues are laid out in detail in the grant, but, basically, the question is: how can we improve the quality of the assembly? The …
read moreThere are comments.
Over the past year of PyCon, #sciox, and other gender imbalance/harassment/sexism discussions that I've seen on Twitter and blogging, I've run into a few posts on these matters that stood out to me in terms of clarity, logic, and/or good instruction for me. I'm sharing them below …
read moreThere are comments.
In 1947 a Bedouin shepherd found a bunch of ancient scrolls in a cave near the Dead Sea. These scrolls, now known as the Dead Sea scrolls, included some of the oldest known Biblical texts as well as other Jewish religious writing. Over the next few decades, these scrolls - of …
read moreThere are comments.
As part of a February visit to the Whitney Marine Lab in Florida, I'm giving a talk for the public. I chose "The Genomic Revolution: How Sequencing Anything and Everything Is Changing the Way We Do Science" as the title. Basically, I want to talk about what the DNA sequencing …
read moreThere are comments.
Over the last year, digital normalization has occupied an increasingly privileged position in sequence analysis: it's a lightweight way to achieve an assembly, one that is computationally cheaper than almost anything else you can do; our software works reasonably well in practice; sequencing data generation capacity is only increasing; and …
read moreThere are comments.
(with Camille Scott, Michael Crusoe, and Leigh Sheneman; Josh Rosenthal contributed to eel-pond; and Adina Howe contributed to kalamazoo)
This summer, I spent a lot of time writing up computational protocols for both mRNAseq and metagenome assembly in the Amazon cloud.
I'm happy to announce that they are now available …
read moreThere are comments.
I've started to think more broadly about bioinformatics training, and after some conversations with Vicky Schneider at TGAC, Terri Atwood at GOBLET, and others, I thought I'd write down some thoughts on bioinformatics classrooms. In particular, what kind of compute infrastructure is needed?
Before I get started, my assumptions and …
read moreThere are comments.
I've been using EBSeq for a few things lately, and have had trouble getting some of the dependencies installed -- in particular, gplots doesn't seem to be readily available for R 2.14, 2.15, etc. Judging by my Google searches, others have been having the same problems; see e.g …
read moreThere are comments.
Dear <student>,
I'd be happy to, but I do have a few conditions/requests based on prior experience with students!
First, please schedule all of your meetings at least 2 months in advance :)
Second, a condition for my signing off on your thesis will be that, for any paper for …
read moreThere are comments.
It's not often that someone perfectly and thoroughly summarizes the challenges inherent in data science being confronted by academic institutions, but that's just what Fernando Perez did in this blog post. Just... just go read it, trust me :)
The new data driven discovery centers being funded by Sloan & Moore are …
read moreThere are comments.
I'm on my way back from a great week in England. I spent most of the week in Norwich at The Genome Analysis Center (t-gaaaaaaaack), hosted by Vicky Schneider-Gricar. I gave a talk, taught two workshops together with Aleksandra Pawlik -- one for biologists and one for bioinformaticians -- and met quite …
read moreThere are comments.
I just read Scientific Data - ultimate salami slicing publishing, in which Pedro Beltrao argues that Nature's new journal is simply another venue for them to suck money out of scientists. Maybe. But I'm strongly considering sending a lot of stuff there, and I really think Pedro is missing something very …
read moreThere are comments.
A recent visiting speaker, Dr. Sinead Collins from Edinburgh, mentioned in passing during her talk that she was particularly interested in mentoring and empowering women in science. I am also interested in this, but as a male in a position of power I'm wary of preaching to women on the …
read moreThere are comments.
I recently had the pleasure of meeting with Randy LeVeque, Bill Howe, and Steven Roberts at UW, along with Jory Schossau, after the UW bootcamp that Jory and I ran. I already knew Bill from before (see our conversation on VMs and reproducibility) and Steven had taken the workshop, but …
read moreThere are comments.
I just finished my third workshop in two weeks. I taught 3.5 days of microbial bioinformatics at Caltech, 2 days of intro computing for biologists at MSU, and another 2-day intro computing for biologists workshop at UW. The Caltech workshop was sponsored by CEMI, the Caltech Environmental Microbial Initiative …
read moreThere are comments.
I just finished reading A House in the Sky, by Amanda Lindhout and Sara Corbett. It's an excellent book about Amanda Lindhout's kidnapping and ransom by Somalian Muslims; it's very well written, engaging, and utterly terrifying and horrible to read. Highly recommended.
--titus
read moreThere are comments.
Erica Check Hayden at Nature News wrote this article about a Mozilla Science Lab effort to bring code review to scientific code. Code review is an important part of many open source, startup, and corporate software development cultures, and the goal of the Mozilla effort is to See What Happens …
read moreThere are comments.
Question: What do Nick Loman, Jared Simpson, Lex Nederbragt, and I all have in common?
Answer: We all spend way too much time thinking about assembly.
Question: What does Jonathan Eisen's lab do?
Answer: Sequence lots of really weird things that they'd like to assemble.
Motivated by …
There are comments.
We've just posted a new paper to arXiv: "These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure." We'll be submitting it to PLoS One after we wait a few days for comments from the Twittersphere and/or on Haldane's Sieve.
The …
read moreThere are comments.
The paper Howison et al., 2013, just appeared in early form in Bioinformatics. Here is my first round review, which they handily addressed in their revisions; since I was quite positive I felt I might as well post the whole thing, though.
Note that a relevant paper from Mihai Pop …
read moreThere are comments.
Here's a draft PyCon '14 proposal. Comments and suggestions welcome!
Title: Data intensive biology in the cloud: instrumenting ALL the things
Description: (400 ch)
Cloud computing offers some great opportunities for science, but most cloud computing platforms are both I/O and memory limited, and hence are poor matches for …
read moreThere are comments.
So, we've been running this course on NGS data analysis. And it's been fun and all. But a lot of work.
About a year ago, I thought hard about whether or not I wanted to apply for renewal, and ended up applying again. You can see the final grant if …
read moreThere are comments.
Two papers on the Haemonchus contortus genome just came out (Schwarz et al. and Laing et al.), and I'm an author on one of them (Schwarz et al.). H. contortus, or Haemonch (as I affectionately called it) is a nasty parasitic nematode that feasts on the mucosal blood of ruminants …
read moreThere are comments.
In my post on proselytizing version control, an underlying and implicit assumption was that version control was not fulfilling the function of a lab notebook. But I didn't make that explicit. And then someone asked in the comments. So now I'm making it explicit.
tl; dr? Version control's a really …
read moreThere are comments.
I gave a talk yesterday at the 2013 BEACON Congress titled "How to build an enduring online research presence using social networking and open science." The talk slides are here.
This talk was a combined survey of sites and personal perspective on how social media has helped shape the last …
read moreThere are comments.
Since I attended the entire STAMPS course at MBL this year, which was an entirely computational course, I had the opportunity to proselytize computational reproducibility and good practice to a number of people.
Now, with students I'm usually fairly gentle about this kind of thing, and try to get my …
read moreThere are comments.
Late last year, inspired by a review I did of a Science submission, I wrote a blog post asking what people thought of the Insight Journal. This was in response to the submission's mention of Image Processing On Line.
The Science paper is finally out -- actually, I missed it, it …
read moreThere are comments.
As the title says, I've got a new job.
But it's not really that exciting a switch, sorry :)
As of mid-August sometime, I will officially switch my appointment from 2/3 Computer Science and Engineering / 1/3 Microbiology and Molecular Genetics, to 2/3 Microbiology and Molecular Genetics, 1/3 …
read moreThere are comments.
(With very little apology whatsoever to Geoffrey North.)
The airplane age, in particular the advent of large, well-attended conferences, has created a brave new world of broadcasting instant criticism of scientific papers, for good or ill.
I think there is a clear "good" side, illustrated by cases where papers making …
read moreThere are comments.
So, I got this grant. And, um, it looks like khmer has a future, which means... so does my lab.
What is khmer?
khmer is my lab's software for doing various things to sequencing data, and is largely focused on providing good demo implementations of low-memory data structures …
There are comments.
I've been asked -- in several different contexts now -- whether not the openness of my lab has had any specific impact. I blog about active research; we develop our code in the open; we post papers to arXiv; we emphasize remixability; I'm pushing open data in consortia; and we are trying …
read moreThere are comments.
At the "What to Teach Biologists about Computing" meeting (discussed here, a bit) we received a strong message from Our Dear MozSciLabLeader, Kaitlin Thaney. The message was this: if we want to maximize reuse and remixing of educational materials, we should explicitly license them under CC0. (See her talk and …
read moreThere are comments.
We all know that biology (along with other sciences) is becoming ever more data intensive. Biologists (among other scientists) are not terribly well prepared for this, because of a lack of computational culture, lack of computational training, and a lack of tools. What do they need to know?
This question …
read moreThere are comments.
(This blog post was mightily helped by Qingpeng Zhang, the first author of the paper; he wrote the pipeline. I just ran it a bunch :)
We have been benchmarking k-mer counters in a variety of ways, in preparation for an upcoming paper. As with the diginorm paper we are automating …
read moreThere are comments.
My father, Gerry Brown, passed away yesterday. He had been convalescent for some time, so this was not entirely unexpected, but it sure is final when it happens.
Gerry was a fairly well known physicist. His Wikipedia page gives a pretty good summary of his professional life, and I am …
read moreThere are comments.
I've been reading Peter Seibel's excellent book, Coders at Work, which is a transcription of interviews with a dozen or so very well known and impactful programmers. After the first two interviews, I found myself itching to highlight certain sections, and then I thought, heck, why not post some of …
read moreThere are comments.
I just finished reading The Immortal Life of Henrietta Lacks, an excellent book about the HeLa cell line cultured from cancerous cells taken from Henrietta Lacks. In addition to raising some really interesting and astonishing questions about the appropriate (mis)use of patients' tissue samples, a section about George Gey …
read moreThere are comments.
My advice to graduate students: blog! post garnered some interesting comments here and there. Most of the responses were positive, but then again, most anyone who reads blogs probably doesn't need to be convinced that blogging is useful. In particular, note that some of the comments at the bottom of …
read moreThere are comments.
I've been reading Peter Seibel's excellent book, Coders at Work, which is a transcription of interviews with a dozen or so very well known and impactful programmers. After the first two interviews, I found myself itching to highlight certain sections, and then I thought, heck, why not post some of …
read moreThere are comments.
Sometimes you've really got to wonder.
The Chronical of Higher Ed just posted this article on a collaboration between A. Sean Pue, Tracy K. Teal, and myself. It's about bringing bioinformatics (or, really, CS and computational linguistics) to the study of Urdu poetic meter.
The article has two interesting flaws …
read moreThere are comments.
So, there this guy, Matt Welsh. And he left Harvard to go to Google. OK.
Now he's baaaack, to point out that academia isn't that rosy.
Yep. He's not wrong.
Were I in a sarcastic mood, I would say something like "ohmigod, Matt Welsh is pointing out that …
There are comments.
I was a reviewer of the PLoS One paper, An Integrated Pipeline for de Novo Assembly of Microbial Genomes, and just recently came across the review again. I didn't post it at the time, but heck, why not now? ;)
Note that for our recent microbial genomes assembly workshop we wrote …
read moreThere are comments.
Software installation is a real problem.
I'm writing this as I return from my fourth Software Carpentry workshop, or -- if you count the one I ran at LLNL almost a decade ago -- my fifth one. This workshop was taught with Karen Cranston and Rich Enbody, both of them very experienced …
read moreThere are comments.
Is khmer evolving?
The khmer project is our software package to work with short reads, and it enables a lot of things like k-mer counting and de Bruijn graph exploration and modification. As data volume grows, interest in partitioning and digital normalization is also growing. But we haven't really talked …
read moreThere are comments.
On the angenmap mailing list, I wrote:
Overall, I see little justification for believing that our current system [ of peer review ] is particularly good. It's just comfortable, especially for the people who have been molded by it.
Chris Moran responded:
read moreI frequently hear these assertions about weak correlations, false negative …
There are comments.
Continuing in the saga of "what do sequencing errors do to our de Bruijn graph density measure" (read the first post here), I have some new results.
The conclusion of the first post was that on random (non-real) genomes, both with and without repeats, we see that de Bruijn graph …
read moreThere are comments.
Are our reviewers correct or incorrect?
About two months ago we got back reviews for our assembly artifacts paper, in which we showed that there was a strong 3' bias in the reads towards higher graph connectivity. Since shotgun sequencing is supposed to be random, we asserted that this 3' …
read moreThere are comments.
The tech community is messed up in da head, yo.
Several times since Steve Holden's I'm Sorry post I've written long blog posts about my own views on codes of conduct and professional behavior, including the views informed by some of my own extraordinarily embarrassing transgressions. I never felt that …
read moreThere are comments.
A week or two ago, I posted a crazy idea about crowdsourcing a bioinformatics analysis pipeline. I may still try to do that. But in the meantime, here's another crazy idea.
First, some background.
I'm writing this as I fly back from …
There are comments.
Description
Random algorithms and probabilistic data structures are algorithmically efficient and can provide shockingly good practical results. I will give a practical introduction, with live demos and bad jokes, to this fascinating algorithmic niche. I will conclude with some discussions of how our group has applied this to …
read moreThere are comments.
I'm worried about our current mRNAseq analysis strategies.
I recently posted a draft paper of ours to arXiv entitled RNA-Seq Mapping Errors When Using Incomplete Reference Transcriptomes of Vertebrates; the link is to the Haldane's Sieve discussion of the paper. Graham Coop said I should write something quick, and so …
read moreThere are comments.
Over the last few weeks, I've been on a bit of a Cory Doctorow kick. I started by reading Homeland, a sequel to the excellent Little Brother; these two very important books about anti-terror-enabled government suppression of liberty and free speech are very well written and extremely timely. I then …
read moreThere are comments.
Or, "can we crowdsource BGI?" ;)
With all of the crazy need surrounding genomic analysis -- most of it on a shoestring budget -- I am thinking about a mildly crazy idea.
What if I offered to computationally analyze people's non-model transcriptomic and metagenomic data for them, in exchange for (a) non-exclusive access …
read moreThere are comments.
I just finished attending a 1-day workshop on Cyberinfrastructure for Marine 'Omics down in DC. It was a meeting organized by the Gordon and Betty Moore Foundation but attended by program managers from about a dozen different agencies and divisions (NSF BIO, NSF GEO, etc.); a bunch of pretty serious …
read moreThere are comments.
I'm on my way down to D.C. to attend another meeting about cyberinfrastructure, this time with a bent towards metagenomics pipelines. (At least, I'm pretty sure that's why I'm invited. It's getting hard to tell these days.)
Inspired by James Watters' blog post on his "fork you" shirt, I …
read moreThere are comments.
Julia Gustavsen and I just finished teaching one room of a 3-room Software Carpentry boot camp at University of Washington, Seattle.
Students remained interested and vocal, even when looking exhausted.
Almost all came back the second day!
We got some good individual feedback that we'd taught useful things …
There are comments.
A short note -- the lamprey genome (P. marinus) paper is finally out! You can see the paper and the Michigan State University press release. (The press release isn't too bad, but I would like to point out that I had no part in the sentence talking about how this could …
read moreThere are comments.
Note: this is the general part of the submitted review; I left out the things that I expect might change if revisions are made.
Also see Thoughts on the Assemblathon 2 paper.
Re the Assemblathon 2 paper <http://arxiv.org/abs/1301.5406>,
Bradnam et …
There are comments.
(Also see Assemblathon 2 review, round 1, parts thereof)
I just finished reviewing the Assemblathon 2 paper, in which many of the extant de novo genome assembly pipelines were evaluated against three different organismal data sets. (I'll post the review when I can.) Good paper.
To me, the biggest outcome …
read moreThere are comments.
One of the things that I have struggled with over the years is how to teach people how to actually program -- by this I mean the minute-to-minute process and techniques of generating code, more so than syntax and data structures and algorithms. This is generally not taught explicitly in college …
read moreThere are comments.
I received this letter in the mail the other day. Can anyone help?
---
Dear Dr. Abby,
I am at a top-50 R1 research institution, and we are currently conducting faculty hiring searches for a number of professors in biology. The applicant pool has been stunningly good this year, and we …
read moreThere are comments.
I was a reviewer on Boisvert et al., Ray Meta: scalable de novo metagenome assembly and profiling, and (as with DSK: k-mer counting with very low memory usage) I thought I'd share my review.
(Sorry, it's really short. My first round review had some comments that they handily addressed in …
read moreThere are comments.
Why do I blog?
I've been blogging now for almost 8 years, since around when Grig Gheorghiu started the Southern California Python Interest Group. Since then I've gotten a PhD, taken a postdoc, had one child, started a faculty position, had another child, and basically gotten way, way busier. Why …
read moreThere are comments.
One of my graduate students and I were reviewers on Rizk et al., DSK: k-mer counting with very low memory usage, and I thought I'd share our review. At the moment I cannot easily see the entire paper so I have not modified the review to account for post-review changes …
read moreThere are comments.
I just spent a really fun and exciting two hours installing a piece of software that I needed to run to do a paper review. The software itself downloaded, but failed routinely on their own test data; after delving through four layers of Perl and Python, I discovered that the …
read moreThere are comments.
The other day, I purchased a new car from the car company down the street. This was a small boutique shop, and their marketing brochure was slick -- 0-60 in 6 seconds, heated seats, a good safety rating -- and the technical reviews were amazing -- "Never seen anything like it! Really novel …
read moreThere are comments.
I just left the NAS meeting on Integrating Environmental Health Data to Advance Discovery, where I was an invited speaker. It was a pretty interesting meeting, with presentations from speakers who worked on chemotoxicity data, pollution data, exposure data, and electronic health records, as well as a few "outsiders" from …
read moreThere are comments.
I just left the NAS meeting on Integrating Environmental Health Data to Advance Discovery, where I was an invited speaker. It was a pretty interesting meeting, with presentations from speakers who worked on chemotoxicity data, pollution data, exposure data, and electronic health records, as well as a few "outsiders" from …
read moreThere are comments.
For each of the last two summers, I've returned from co-teaching our Analyzing Next-Generation Sequencing Data course, slept for 48 hours straight, and then hunkered down and bunkered up to write grants. (To be clear, sometimes this bunkering up involves travelling out to California and sitting on my in-laws' beach …
read moreThere are comments.
We just posted yet another pre-submission paper to arXiv.org:
Assembling large, complex environmental metagenomes
Authors: Adina Chuang Howe, Janet Jansson, Stephanie A. Malfatti, Susannah Tringe, James M. Tiedje, and C. Titus Brown
Abstract:
The large volumes of sequencing data required to deeply sample …read more
There are comments.
We just posted another pre-submission paper to arXiv.org:
Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets
Authors: Adina Chuang Howe, Jason Pell, Rosangela Canino-Koning, Rachel Mackelprang, Susannah Tringe, Janet Jansson, James M. Tiedje, and C. Titus Brown
Abstract:
Sequencing errors and …read more
There are comments.
This post can be referenced and cited at the following DOI: http://dx.doi.org/10.6084/m9.figshare.98198.
For a few months, the Trinity list was awash with discussions about how to use digital normalization to lower the memory and compute requirements for mRNASeq assembly. At some point …
read moreThere are comments.
I recently had the pleasure of reviewing an excellent paper that used actual data (DATA!) to argue that source code needs to be part of the review process. (When it is published I will post again about it; for now, the process of secret handshakes in smoke-filled back rooms must …
read moreThere are comments.
I am just returning from a trip to Southern California that included, among other things, the teaching of a two day Software Carpentry workshop at The Scripps Research Institute. There were two instructors, myself and Tracy Teal, a research scientist at MSU; and two external TAs, Qingpeng Zhang (one of …
read moreThere are comments.
Just over a week ago, I posted a list of wanted tech that I thought would help further open science. One item that struck a chord with a number of people on Twitter and in the comments was the idea of giving blog entries a DOI:
read more
- An easy way to …
There are comments.
I gave a talk last Wednesday at U. Michigan in the DCMB program where I included a slide estimating how much DNA sequencing (in base pairs) was needed for good de novo assembly of sequences from various biological environments or problems. The slide was there to motivate the challenges of …
read moreThere are comments.
This is one of a bunch of posts on science and the Web. Start here for an overview.
It's been fun to watch (and occasionally help drive) science moving online and taking advantage of the Web. Here are some of my favorite examples.
Simple, easy ways of sharing process abound …
There are comments.
This is one of a bunch of posts on science and the Web. Start here for an overview.
The web represents an opportunity for a phase transition in terms of connectedness and openness in scientific practice, as in software development, and we're not taking much advantage of it. Why?
There are comments.
This is one of a bunch of posts on science and the Web. Start here for an overview.
I've been reading Michael Nielsen's book Reinventing Discovery, which is an awesome and inspirational book about (among other things) accelerating scientific discovery using the Internet. Highly recommended.
From my position within academia …
read moreThere are comments.
This is one of a bunch of posts on what I'm calling 'w4s' -- using the Web, and principles of the Web, to improve science. The others are:
The awesomeness we're experiencing, which provides some examples of current awesomeness in this area.
The challenges ahead, which covers some of the reasons …
read moreThere are comments.
This is one of a bunch of posts on science and the Web. Start here for an overview.
I don't think I can devote myself to any big projects, but I do have a bunch of ideas for relatively small projects that I think could lead to worthwhile change.
Here …
read moreThere are comments.
In his paper, Reproducible Research and Cloud Computing, Bill Howe asks:
What happens if you do all your work on a virtual machine hosted in the cloud? When it came time to publish, you might make a snapshot of the VM, make it public, and cite it in your paper …read more
There are comments.
Inspired by the awesomeness of disqus on my other sites, I wanted to make it possible to enable disqus on my sites on ReadTheDocs. A bit of googling led me to Mikko Ohtamaa's excellent work on the Plone documentation, where a blinding flash of awesomeness hit me and I realized …
read moreThere are comments.
An increasing number of people are asking about using our assembly approaches for things that we haven't yet written (or posted) papers about. Moreover, our assembly strategies themselves are also under constant evolution as we do more research and find ever-wider applicability of our approaches.
This has been moved to …
read moreThere are comments.
Randy Olson, who is watching amusedly from the side lines as I struggle once again to teach programming to graduate students in biology, asked a really good question (it's rare, yes, but it should be acknowledged for encouragement) --
read moreHad a student from your class ask me today if it's typical …
There are comments.
At BOSC 2012, we heard a report from Richard Holland on the Pistoia Alliance Sequence Squeeze competition. I'd run across this a couple of times before -- most notably in the Quip paper -- and was interested in hearing the results.
What was the problem being tackled? To quote,
read moreThe volume of …
There are comments.
After yet another round of futile Twittering on the subject of research software, I thought I'd share a deeply personal story -- a story that explains some of my rather adamant stance that most research scientists need to think more critically about their code, and should adopt at least some of …
read moreThere are comments.
One of my favorite in-class exercises is The Assembly Exercise, in which I provided "shotgun sequence" from some English text and ask the students to assemble it. Normally I provide a printout of about 10-20 pages of reads with range of read lengths, error rates, and single/paired end sequences …
read moreThere are comments.
The IPython Notebook (or 'ipynb' for short) is one of the most exciting technologies for teaching and research that I've seen in recent years. It is a completely open source, well architected, and fairly stable system for scientific computing and data exploration.
I've now been using it for teaching for …
read moreThere are comments.
These talk notes are for my talk at the 2012 Argonne Soil Metagenomics Workshop.
The slides are available for viewing and download here, on Slideshare.
I'm going to be talking about our assembly pipeline for soil metagenomes.
Much of this work was done by …
There are comments.
I just finished reading The Idea Factory: Bell Labs and the Great Age of American Innovation, by Jon Gertner, an absolutely fabulous book on Bell Labs, and their invention of the transistor, the laser, and almost everything to do with modern telecommunications and computers ;).
The final chapter is about the …
read moreThere are comments.
We held the 2012 workshop on Analyzing Next Generation Sequencing Data from June 4 to June 15, at the Kellogg Biological Station in western Michigan, about 30 minutes north of Kalamazoo.
(This is a long delayed blog post. :)
The goal of the workshop is to take biologists with little in …
read moreThere are comments.
Try out this thought experiment.
Suppose you are a bio professor, and a grad student came to you and said, "I'm trying to figure out what classes to take, and there're all these math, modeling, and computational courses that I could take. But I just don't think that math or …
read moreThere are comments.
I'm giving a talk at XLDB 2012 tomorrow, and I thought I'd post a bunch of accompanying links and discussion, since this audience is pretty far away from my normal audience ;).
Here's the talk itself, on slideshare: Streaming and Compression Approaches for Terascale Biological Sequence Data Analysis
Acknowledgements (slide 3 …
read moreThere are comments.
I'm starting to notice that a lot of bioinformatics is anecdotal.
People publish software that "works for them." But it's not clear what "works" means -- all to often either the exact parameters or the specific evaluation procedure is not provided (and yes, there's a double standard here where experimental methods …
read moreThere are comments.
I've been invited to present at the Extremely Large Databases (XLDB) 2012 conference as a practicing biologist who occasionally speaks with physicists, and I'm trying to come up with something to say that will explain why physicists and biologists don't often collaborate all that well.
Here are some guesses.
---
There are comments.
There's been a lot of discussion about PyCon talks that we do want to see. Here's a brief list of those I don't want to see, for those of you considering a submission -- in no particular order.
There are comments.
[ Note: I wrote the following e-mail to the Microbiology (MMG) department faculty mailing list here at MSU. I'll post any interesting responses that I get. --titus ]
---
Hi all,
I'm an unabashed proponent of Open Access publishing, as well as the idea of decoupling correctness from estimated impact of a paper …
read moreThere are comments.
One of the biggest problems with basic sequence analysis -- some would say the biggest problem -- is the error rate. If our sequencing reads were error-free, both assembly and mapping would be much, much easier. Alas, Illumina reads have a 0.1-1% error rate per base, and PacBio has an error …
read moreThere are comments.
I'm at the MBL STAMPS course, "Strategies and Techniques for Analyzing Microbial Population Structure," and one of the things I needed to address in my morning talk was the role that the k parameter plays in de Bruijn graph assemblers.
In most de Bruijn assemblers that I have used -- Velvet …
read moreThere are comments.
Suppose you have a community that has two organisms in it, at widely varying abundances. What can you do?
Partitioning takes this mixed distribution and, based on graph connectivity, splits the reads into two bins. Thus, you go from this:
to this:
where the reads are …
There are comments.
This is the story behind our PNAS paper, "Scaling Metagenome Assembly with Probabilistic de Bruijn Graphs" (released from embargo this past Monday).
Why did we write it? How did it get started? Well, rewind the tape 2 years and more...
There we were in May 2010, sitting on 500 million …
read moreThere are comments.
I've just posted my 2nd try at the NSF CAREER award to the lab Web site, where it joins my recent NSF BIGDATA proposal, my Moore Foundation proposal, last year's (rejected) NSF CAREER proposal, my NGS course grant, and my one big funded grant, my USDA proposal from 2009. The …
read moreThere are comments.
Data and materials availability All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science. All computer codes involved in the creation or analysis of data must also be available to any reader of Science. After publication, all reasonable …read more
There are comments.
I recently attended an NSF BIO directorate meeting about cyberinfrastructure needs. Here's a list of training & education challenges identified at that meeting:
read more
- development and adaptation of tools to archive data and metadata from diverse sources to enable data mining
- integration of structured and unstructured data from heterogenous data sources
- discussion …
There are comments.
Here's a data analysis question for all you Big Data folk.
A beachcomber is interested in obtaining up to 10 examples of every type of shell present on a beach. The shells are individually easy to find, but some types are really rare and some are really abundant. The beachcomber …
read moreThere are comments.
(Or, "A better way to publish bioinformatics.")
We just got word that our paper, "Scaling metagenome assembly with probabilistic de Bruijn graphs" [ arXiv ] [ github ] has been accepted for publication in PNAS. (Yay!) I just posted the final version to github, and the arXiv PDF should be updated to the third …
read moreThere are comments.
At our 2012 course on Analyzing Next-Generation Sequencing Data, we talked quite a bit about future sequencing technologies, as well as about what analyses are reasonably cookbook (and which ones aren't).
Here are my thoughts -- yours welcome!
The basic conclusions about sequencing tech were these:
There are comments.
I've just moved my blog over to Pelican, a static blog generator that takes in reStructuredText and spits out, well, this! I'm now using Disqus for commenting, too.
The main motivations for the move (apart from slightly better theming) were to escape dynamic-blog-land in favor of static-blog-land, while enabling a …
read moreThere are comments.
About a year ago, I came across a really interesting Science paper entitled "Rapid and inefficient isolation of single cells from soil". In it, the authors -- from a well-known lab at UC Davis -- described how they used low-percentage agarose gels to extract thousands of individual cells from a soil sample …
read moreThere are comments.
As part of the 2012 Analyzing Next-Generation Sequencing Data course, I've been trying out ipython notebook for the tutorials.
In previous years, our tutorials all looked like this: Short read assembly with Velvet -- basically, reStructuredText files integrated with Sphinx. This had a lot of advantages, including Googleability and simplicity; but …
read moreThere are comments.
I just returned from a NESCent Catalysis meeting on Cephalopod Genomics. I was invited as a bioinformatics and genomics guy, and so I spent four days in North Carolina talking about the opportunities and challenges of sequencing cephalopods.
Cephalopods are a class of the molluscs, and include squid and octopus …
read moreThere are comments.
This is a draft proposal of a policy to encourage pre-publication data release and data sharing within a community. This policy is based on discussions at the Cephalopod Genomics Workshop (a Catalysis workshop sponsored by NESCent).
Note, this is made available under a CC-BY-SA license permitting use and re-use with …
read moreThere are comments.
Greg Wilson, Ethan White and I have been talking a bit about what Responsible Conduct of Research (RCR) standards would look like for computational science. I'm having trouble coming up with more than the below standards, which are largely related to publication.
Note, if you regard these as obvious, that's …
read moreThere are comments.
Brad Chapman (@chapmanb on twitter) wrote and signed a nice review of my submission to the Bioinformatics Open Source Conference. In his review, he said
My only small suggestion is to include some discussion about your reproducibility work during the talk: the Amazon AMI, documentation and reproducible ipython workflows. This …read more
There are comments.
(I came across this fragmentary blog post that I wrote sometime in December. It's a fine example of a failed allegory. To what, I'll let you determine for yourself. Anyway, in case anyone wants to know what dreck doesn't make it out of my computer onto the Intarweb, well, here's …
read moreThere are comments.
I'm a pretty big advocate of anything open -- open source, open access, and open science, in particular. I always have been. And now that I'm a professor, I've been trying to figure out how to actually practice open science effectively
What is open science? Well, I think of it as …
read moreThere are comments.
I'm going to pick on Mick Watson today. (It's OK. He's just a foil for this discussion, and I hope he doesn't take it too personally.)
Mick made the following comment on my earlier Big Data Biology blog post:
read moreI do wonder whether there is just a bit too much …
There are comments.
I'm out at a Cloud Computing for the Human Microbiome Workshop and I've been trying to convince people of the importance of digital normalization. When I posted the paper the reaction was reasonably positive, but I haven't had much luck explaining why it's so awesome.
At the workshop, people were …
read moreThere are comments.
I'm pretty proud of our most recently posted paper, which is on a sequence analysis concept we call digital normalization. I think the paper is pretty kick-ass, but so is the way in which we're approaching replication. This blog post is about the latter.
(Quick note re "replication" vs "reproduction …
read moreThere are comments.
We just posted a pre-submission paper to arXiv.org:
A single pass approach to reducing sampling variation, removing errors, and scaling de novo assembly of shotgun sequences
Authors: C. Titus Brown, Adina Howe, Qingpeng Zhang, Alexis B. Pyrkosz, and Timothy H. Brom
Paper Web site, with source code …
read moreThere are comments.
The 2012 MSU Next-gen Sequence Analysis course application period just closed, and we received 168 applicants. Last year, we received 133, and the year before that we received 33.
We can take 24.
I was also invited to go teach a ~1 week workshop at two other universities on these …
read moreThere are comments.
I'm putting together a computational pipeline for a paper - a Makefile that runs a ton of stuff and outputs files, combined with an ipython notebook file that takes those output files and turns them into figures for inclusion in a LaTeX file. (Yes, very 2000, except for the ipython notebook …
read moreThere are comments.
At PyCon 2012, I took a tutorial on the IPython Notebook, which I'd already been using for a month or two (ever since Wes McKinney (the author of pandas) introduced me to it). The tutorial blew me away -- these guys have done a great job with the notebook so far …
read moreThere are comments.
If you're like me, we pretend to care about the science in bioinformatics software. But what we really do is try to find reasons not to outright loathe the software -- because, lud knows, there are usually plenty of reasons to hate it.
In no particular order, here are the top …
read moreThere are comments.
(updated to point to http://arxiv.org/).
Authors: Jason Pell, Arend Hintze, Rosangela Canino-Koning, Adina Howe, James M. Tiedje, C. Titus Brown
Abstract:
The memory requirements for de novo assembly of short-read shotgun sequencing data from complex microbial populations are an increasingly large practical barrier to environmental studies. Here we …read more
There are comments.
I'm writing this on my way back from Stockholm, where I attended a workshop on the 4th Paradigm. This is the idea (so named by Jim Gray, I gather?) that data-intensive science is a distinct paradigm from the first three paradigms of scientific investigation -- theory, experiment, and simulation. I was …
read moreThere are comments.
(and some related thoughts on reproducibility in computational science)
In a recent news article on the "data deluge" in biology, I was quoted as saying "It's not at all clear what you do with that data. Doing a comprehensive analysis of it is essentially impossible at the moment." So, naturally …
read moreThere are comments.
This blog post was inspired by two recent events.
First, in response to a NY Times article about the "data deluge" affecting biologists, one of my Facebook friends said something like "stop whining about how hard it is to analyze the data and do some good experiments instead!" I vehemently …
read moreThere are comments.
I'm just on my way back from a JGI workshop on metagenome informatics, and I thought I'd take the opportunity to write up a short review.
The workshop was, frankly, excellent. We saw a bunch of talks on metagenome assembly (my current interest) as well as single-cell sequencing approaches, and …
read moreThere are comments.
As sequencing gets cheaper and cheaper, one would expect the answer for how to best sequence (and assemble!) any given genome would change. Most biologists assume something along these lines: everyone else has achieved some standard coverage (say 10x, or 100x) for their genome, so all we need to do …
read moreThere are comments.
There's been a lot of hooplah in the last year or so about the fact that our ability to generate sequence has scaled faster than Moore's Law over the last few years, and the attendant challenges of scaling analysis capacity; see Figure 1a and 1b, this reddit discussion, and also …
read moreThere are comments.
One of the most important jobs a professor has is to pay it forward: that is, to teach, train, mentor, support, and open up opportunities for their students and postdocs. It's a job that is undervalued by those who focus on the short term -- the administrators and review committees that …
read moreThere are comments.
During our next-gen course, a "student" (really a professor from Australia ;) asked me if I could provide some guidance on what computational infrastructure was necessary to handle next-gen sequencing data. While we used Amazon Web Services during the course, she was interested in finding out if they could use their …
read moreThere are comments.
For anyone who actually wants to know what it is I do, I've updated my lab Web site, http://ged.msu.edu/, to be a bit more representative of what it is we're doing these days. (I wrote it over three years ago, so it's been becoming increasingly dated.) In …
read moreThere are comments.
I just flew back from Montreal, where I gave a talk at the International Tunicate Meeting on the Molgula project. This is a project wherein we are doing quantitative mRNA sequencing on two species of ascidians, or sea squirts -- specifically, on M. oculata (tailed), M. occulta (tailless) -- and their hybrids …
read moreThere are comments.
The second iteration of our bioinformatics summer course, Analyzing Next-Generation Sequencing Data, just finished. It was a great success, at least judging from the comments that people made to us personally; the evaluations aren't yet complete.
The what: a two week course on analyzing next-gen sequencing data, using the Amazon …
read moreThere are comments.
I desperately need something to run and test things at the command line, both for course documentation (think "doctest" but with shell prompts) and for script testing (as part of scientific pipelines). At the 2011 testing-in-python BoF, Augie showed us cram, which is the mercurial project's internal test code ripped …
read moreThere are comments.
First, I write a recipe file, 'metagenome.recipe', laying out my job description for, say, sequence trimming and assembly with Velvet:
fasta_file soil-data.fa qc_filter min_length=50 remove_Ns=true graph_filter min_length=400 velvet_assemble k=33 min_length=1000 scaffolding=True
Then I specify …
read moreThere are comments.
Looking back, the last 5 years have, collectively, been rather overwhelming.
Five years ago, I was a big-mouthed 7th year graduate student. The biggest change in my recent life was getting a cat (first) and getting married (second).
Now, I'm the father of two (adorable) daughters. I have a minivan …
read moreThere are comments.
If you've been under a rock (or indulging in arsenic yourself), you've heard about NASA's "arsenic" article, claiming the discovery of a microbial species that can substitute arsenate for phosphate. The paper was pre-announced via a press conference that then announced the results.
Immediate blogtastrophe! The paper was critically reviewed …
read moreThere are comments.
I'm just finishing up my Computational Science for Evolutionary Biologists course, and I'm finding it tricky to come up with a good high-level summary of what I would like them to take away. As you can see from the class notes they've done some reasonably neat stuff with Digital Life …
read moreThere are comments.
PLoS Biology published an article entitled Open Education, Open Minds, in which they solicited ideas for contributions to a series on life sciences education:
Contributions to the Education Series are encouraged; ideas should be sent to plosbiology@plos.org.
So I sent in an e-mail and got back
> This message …read more
There are comments.
I just parachuted in on (and heli'd out of?) the Beyond the Genome conference in Boston. I gave a very brief workshop on using EC2 for sequence analysis, which seemed well received. (Mind you, virtually everything possible went wrong, from lack of good network access to lack of attendee computers …
read moreThere are comments.
In thinking about open science and open communication about science, I've always been frustrated by the people who claim that the risks outweight the benefit. Their arguments seem sound if you buy into a certain kind of logic (the creationists will try to twist whatever you say! the climate change …
read moreThere are comments.
(with Billie Swalla)
I've spent the last two weeks out at the Roscoff Statione Biologique in Roscoff, France. This little port is on the northern coast of the French region of Bretagne, or Brittany. I'm here with Billie Swalla, a professor at UW Seattle, and Elijah Lowe, a Computer Science …
read moreThere are comments.
(with Adina Howe, Jason Pell, Rosangela Canino-Koning, and Arend Hintze).
A few weeks ago I blogged a bit about a k-mer filtering system, khmer, that we were using to reduce metagenomic data to a more tractable size by throwing out error-prone reads (see A memory efficient way to remote …
There are comments.
I'm a big believer in open science -- see this great polemic over at Mendeley for a good read -- but it's always interesting to think about how such things as "data release" can be perverted by clever scientists. I'm currently in France working on some ascidians with Billie Swalla -- more on …
read moreThere are comments.
Course page at: http://ged.msu.edu/courses/2010-fall-cse-891/:
This course will introduce biologists to computational thinking, practical computational techniques, and research topics in computational evolution. The course will consist of three intensive hands-on 5-week modules: computational competence in UNIX; data mining and hypothesis generation using the Avida digital life …read more
There are comments.
(This project is a collaboration with Jason Pell and Adina Howe)
A few weeks ago I posted about a k-mer filtering approach that we were using to remove low-abundance k-mers from metagenomic data sets, prior to assembly. This technique is working well, and we've managed to do some assembly of …
read moreThere are comments.
The Terabase Metagenomics meeting was good fun, but I most valued the computational component (because that's what I do). Rachel Mackelprang and Rob Knight and I wrote down a list of the computational issues involved in a petabase metagenomics project, and that list will help direct my future research. I'll …
read moreThere are comments.
I'm on my way back from the Terabase Metagenomics meeting in Snowbird, UT, and I'm buzzing with ideas about how to move forward in metagenomics and bioinformatics research. Metagenomics, the use of genomics approaches to study microbial communities, has been opening up as sequencing drops in price. With sequencing becoming …
read moreThere are comments.
(Everyone needs an "I'm wonderful" wall, right? Here's mine!)
My rather snarky post on data management made it into Episode 34 of the Coast to Coast Bio Podcast.
Turns out I'm now a leading evangelist for cloud computing in genomic research, which I guess is true if evangelism is relative …
read moreThere are comments.
I've spent the last few weeks working on a simple solution to a challenging problem in DNA sequence assembly, and I think we've got a nice simple theoretical solution with an actual implementation. I'd be interested in comments!
Briefly, the algorithmic challenge is this:
We have a bunch of …
There are comments.
After my recent next-gen sequencing course, which was supposed to tie into the whole software carpentry (SWC) effort but didn't really succeed in doing so the first time through, I started thinking about the Right Way to tie in the SWC material. In particular, how do you both motivate scientists …
read moreThere are comments.
Laurie Dillon just posted the SIGPLAN eduction board article on Why Undergraduates Should Learn the Principles of Programming Languages to our faculty mailing list at the MSU Computer Science department. One question that came up in the ensuing conversation was: what functional programming language(s) would/should we teach?
I …
read moreThere are comments.
Our sequencing analysis course ended last Friday, with an overwhelmingly positive response from the students. The few negative comments that I got were largely about organizational issues, and could be reshaped as suggestions for next time rather than as condemnations of this year's course.
The 23 students -- most with no …
read moreThere are comments.
So, I've been teaching a course on next-generation sequence analysis for the last week, and one of the issues I had to deal with before I proposed the course was how to deal with the volume of data and the required computation.
You see, next-generation sequence analysis involves analyzing not …
read moreThere are comments.
So, I'm running this summer course and I am trying to figure out how to organize the notes for students. I'd like to mix curriculum-specific notes ("here's what we're doing today, and here are some problems to work on") with tutorials (material independent of a single course, like "here's how …
read moreThere are comments.
In conversation with a colleague the other day, I found myself making a surprising prediction: the age of the big sequencing centers (Broad Institute, WUSTL, Baylor, DOE JGI, etc.) is coming to an end. In 5 years they will no longer exist.
This prediction is obvious in hindsight.
That is …
read moreThere are comments.
Dear NSF,
I am happy to respond to your request for a 2-page Data Management Plan.
First of all, let me say how enthusiastic I am that you have embraced this new field of "large scale data analysis". Ever since I started working with large Avida data sets in 1993 …
read moreThere are comments.
Just got news that the BEACON NSF Science and Technology Center for the study of Evolution in Action funded Chris Adami to come do a sabbatical here at Michigan State University for the next year. This puts me, Chris, and Charles Ofria at the same institution (now MSU, then Caltech …
read moreThere are comments.
Inspired by the brilliant mind(s) behind python-commandments.org, here's a list of ideas you can use to help newbies learn Python!
There are comments.
I've been doing some more focused bioinformatics programming recently, and as I'm thinking about how to teach biologists about data analysis, I realize more and more how much backstory goes into even relatively simple programming.
The problem: given a reference genome, and a very large set of short, error-prone, random …
read moreThere are comments.
These days, molecular biologists are dealing with lots and lots of sequences, largely due to next-gen sequencing technologies. For example, the Illumina GA2 is producing 100-200 million DNA sequences, each of 75-125 bases, per run; that works out to 20 gb of sequence data per run, not counting metadata such …
read moreThere are comments.
Analyzing Next-Generation Sequencing Data
May 31 - June 11th, 2010
Kellogg Biological Station, Michigan State University
CSE 891 s431 / MMG 890 s433, 2 cr
Applications are due by midnight EST, April 9th, 2010.
Course sponsor: Gene Expression in Disease and Development Focus Group at Michigan State University.
Instructors: Dr. C. Titus …
read moreThere are comments.
I've spent several hours in the past year trying to debug a frustrating error with the vmware Web UI for Linux. This UI relies on running iceweasel (aka Firefox) on the VM host machine. I think I finally solved the problem today, after much tooth gnashing.
Briefly, if you get …
read moreThere are comments.
The National Science Foundation just announced that the BEACON Science and Technology Center centered at Michigan State University was just funded. BEACON stands for "Bio/computational Evolution in Action Consortium" - you can check out the Web site here.
In my own nutshell, BEACON is focused on studying the evolution of …
read moreThere are comments.
A new meme was born at PyCon 2010: The Testing Goat.
Or, "Be Stubborn. Obey the Goat."
The goat actually emerged from the Testing In Python Birds of a Feather session at PyCon, where Terry Peppers used slides full of goat in his introduction. This was apparently an overreaction to …
read moreThere are comments.
On the heels of my aggressive competence post, about (among other things) my failure to outline my expectations for students, I've started putting together a page to help manage student expectations for the pony-build project, which is participating in the Undergraduate Capstone Open-Source Projects (UCOSP) course this term.
(Please comment …
read moreThere are comments.
I just finished teaching Concepts in Database-Backed Web Development for the second time -- the post-mortem from the first course is here.
In the course, the students implement a reasonably complete HTTP server from the socket library on up, and integrate CSS, JavaScript (jQuery), and a little bit of databases into …
read moreThere are comments.
I've recently turned my basilisk eye from Web testing and code coverage analysis to continuous integration, as you can see from my PyCon '10 talk and my UCOSP proposal, not to mention everyone wants a pony.
There's some confusion about what "continuous integration" means (see Martin Fowler on CI) so …
read moreThere are comments.
This last term I facilitated the participation of five MSU students in the Undergraduate Capstone Open Source Projects (UCOSP) program, in which students do distributed open source software development and receive home institution credit. UCOSP was managed out of U Toronto by Greg Wilson, and I was (and am) enthusiastic …
read moreThere are comments.
or, "those python-dev people are awesome."
My experience with the Python bug tracker has been pretty sparse and largely limited to some of the eternaissues like "make HTMLParser deal with even more broken HTML" that never really get resolved because they're not very important and don't have a champion. So …
read moreThere are comments.
Does anyone have any experience with CloudStore, formerly known as KosmosFS? From http://en.wikipedia.org/wiki/CloudStore:
CloudStore (KFS, previously Kosmosfs) is Kosmix's C++ implementation of Google File System. ... CloudStore supports incremental scalability, replication, checksumming for data integrity, client side fail-over and access from C++, Java and Python.
The …
read moreThere are comments.
(Some more meanderings on the brouhaha about diversity in the Python world.)
First, I've removed 'python' from the tags and made sure that neither Planet Python nor Advogato feed from this blog otherwise; I suspect by talking about politics and feelings in OSS I'm getting further from my normal target …
read moreThere are comments.
Since a few people have asked, here's a rough guide to the diversity discussion. No specifics allowed.
1. diversity list created to (among other things) ponder an official diversity statement for Python. List is closed-archive but open for general subscription.
2. Various diversity list discussions become heated. Some people (including …
read moreThere are comments.
As I wrote over the weekend, the Google Highly Open Participation contest (intended to get high-school students involved in open source work) may be run again this winter. I say "may", because quite a bit of work needs to be done on the GHOP hosting app, Melange.
We in the …
read moreThere are comments.
In the interests of social anthropology, I feel compelled to point Pythonistas at this fascinating discussion on the stdlib-sig on adding argparse to the Python stdlib. (Yeah, it's pretty much the only traffic that list got so far this month.)
Fascinating stuff. If there's a secret cabal out there masterminding …
read moreThere are comments.
I'm looking for examples of frustratingly simple-yet-wrong Python code, suitable for an undergrad class to debug. I'd prefer things that don't rely on tricky features of Python (like shared list references), but rather code where subtly bad logic or program flow leads to bad behavior.
Comment below, or e-mail me …
read moreThere are comments.
My wife and I were talking with my USDA collaborator about some possible chicken research, and I asked about access to animals. His response? "Chickens are not a rate limiting factor."
Did you know that 1 million chickens are slaughtered per hour, on average, in the US? Wow.
--titus
read moreThere are comments.
Very odd. I mean, it's nice to have my prayers spell-checked and all, but really, Apple? Cthulhu?
Also, jinja2 rocks. I think I'll be teaching it as a templating language this term...
And finally, people interested in using sqlite3 for shelve-like storage in Python 2.x can take a look …
read moreThere are comments.
Sarah Mei posts about teaching Ruby to high school girls. Good stuff.
While searching for some GHOP info from way back, I ran across this post asking "where are the girls among the GHOP winners?" (The statistics mentioned in the post may have been posted since, although I haven't seen …
read moreThere are comments.
So, it's nomination season for the Python Software Foundation again... and I have this niggling feeling that I'm forgetting about several people that have demonstrated significant commitment to the Python community, are good 'uns, and are otherwise people I would trust with some part of the future of Python and …
read moreThere are comments.
Courtesy of Rich Enbody, this blog post, How XML Threatens Big Data -- Dataspora, elicited a big "duh" from me.
You don't solve any of the semantic problems with data by elaborating on a textual format. You may bring them into the light, but along with the visibility comes "bureaucracy" -- technology …
read moreThere are comments.
OK, so you have a genome -- let's say it's about 1gb in size -- and you want to do ChIP-seq on a transcription factor that you think binds ~1000 places in the genome. You've measured the specificity of the transcription factor and it seems to enrich about 10-fold over background (an …
read moreThere are comments.
Victoria Laidler just announced a Pandokia release. She gave a great lightning talk on Pandokia at the PyCon '09 testing BoF and I've been looking forward to this release.
Pandokia seems like a nice way to manage test running and results analysis & reporting, and it fits a fairly unique niche …
read moreThere are comments.
After being on the new Python diversity mailing list for a bit, I've just unsubscribed. While there was an unpleasant personal incident that catalyzed my decision, I also don't think I'm a good fit for the style of discussion taking place. (YMMV ;)
That having been said, I want to give …
read moreThere are comments.
I'm nominally involved in co-mentoring or cheerleading 5 Google Summer of Code projects this summer, and several of the students have the same problem: they send me one big e-mail (or post one big blog entry), every few weeks, asking for input.
This imposes a big energetic barrier to me …
read moreThere are comments.
The last two weeks were pretty miserable, for some scientific/collaboration reasons as well as some personal reasons (visiting sick parents != fun). Two things that weren't miserable -- that were in fact quite fun -- were PyOhio and the Science 2.0 talks in Toronto.
PyOhio was a nice little community-based conference …
read moreThere are comments.
This TDD anti-pattern catalogue is truly excellent!
--titus
Legacy Comments
Posted by J Klassen on 2009-06-11 at 02:41.
hey, i used that whilst studing TDD for a CMPT 376 (tech writing) paper.
Posted by Paul Hildebrandt on 2009-06-12 at 23:06.
Great link, thanks!read more
There are comments.
I'd like to find an MSU student to report semi-monthly on python-dev. The student would be responsible for monitoring the python-dev mailing list and active PEPs, summarizing substantive discussions in a public forum, and integrating feedback from the community. This would be a 1 credit CSE independent study course (CSE …
read moreThere are comments.
Apparently the ipaddr module in Python 3.1 is disliked by some, and there was a reasonably robust discussion on python-dev about how it's wrong, wrong, wrong. Guido finally ruled: ixnay on the addr-pay.
This is pretty relevant given the twitstorm caused by Zed Shaw's ludicrously self-confident rants about how …
read moreThere are comments.
I just submitted a Mellon Award for Tech Collaboration nomination for the Python Buildhaus. What's that, you ask?
read moreThe Python Buildhaus is a project to systematically build, test and release Open Source Python packages on Windows, Mac OS X, and a wide array of other UNIX architectures and operating systems …
There are comments.
Just submitted this on Thursday:
Next generation sequencers are beginning to impact agricultural biology. Over the next few years, next generation sequencing will produce incredibly large datasets that will address structural (e.g., SNPs, CNVs, indels, methylation, translocations) and functional (e.g., RNA expression, transcription factor binding sites) variation in …read more
There are comments.
I'm writing some proposals to expand support for Python infrastructure (think cross-platform build and test farms a la Snakebite) and for the Mellon Foundation application, I'd like to find out how Python is being used in the humanities. I found NLTK, the Natural Language Toolkit; what else is big?
thanks …
read moreThere are comments.
I'd like to invite you to attend the last of the Michigan State University CSE colloquia for the 2008-2009 academic year: jointly sponsored as an AT&T Visiting Lecturer by the MSU LCT, and the CSE department, Sam Ramji will speak about
Open Source at Microsoft: The Past, Present and …read more
There are comments.
As part of a CiSE submission I'm working on, I interviewed the lead developer on a scientific software package today. This software package is mainly used for evolutionary studies, and has a small but devoted following - ~6 developers and ~12 users locally, plus a few dozen users outside of MSU …
read moreThere are comments.
Open source coding is like a not-so-demanding mistress: I work on it at night, surreptitiously, after my wife and daughter are asleep. twill and figleaf are like bastard children, who only get attention when I can spare it from my "real" family (my teaching, research or my actual family, depending …
read moreThere are comments.
Anyone out there used disco (http://discoproject.org/)? Comments, good/bad/neutral?
From the page:
read moreDisco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers.
The Disco core is written in …
There are comments.
My talk this year kinda sucked -- more on that later -- and I am trying to come up with good and perhaps even non-testing talk ideas for next year.
One intriguing idea contributed by Brian Dorsey is that of giving 5 lightning talks in a 30 minute session. Since I like …
read moreThere are comments.
John Gall apparently said:
A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work. You have to start over, beginning with …read more
There are comments.
Just a short note with characteristic inhumility (ahumility? abhumility?) -- for my Concepts in Database-Backed Web Programming course, I received the Withrow Award for Teaching Excellence from the students.
This means a lot to me, because I spent a huge amount of time on that course (and will have to do …
read moreThere are comments.
'tis the season, and so it's time for me to post my list of accumulated project ideas. I'll transfer these over to the wiki tomorrow, after I track down some references. I'm willing to mentor any or many of these but I'd prefer to find someone to be the primary …
read moreThere are comments.
i asked two other friends of mine, [ ... ], for recommendations about model code and about their work environment. their feedback was extremely helpful and i thought you'd be interested to hear the opinions of other good programmers. i would also love to hear your thoughts on these (extracted and scattered) comments …
read moreThere are comments.
I recently had the pleasure of being the technical reviewer for a new Apress offering, Beginning Python Visualization, by Shai Vaingast.
To quote from the apress page,
read moreWhat you'll learn:
- Write ten lines of code and present visual information instead of data soup.
- Set up an open source environment ready …
There are comments.
My last post, Good code coverage: Necessity vs Sufficiency, about how you should maintain high code coverage with your automated tests, seems to have really struck a nerve in a small group of people -- I got some fantastic comments, with some great pointers. Michael Foord's comment, 'Too often "testing is …
read moreThere are comments.
I'm going to stop using my Paypal account regularly as soon as I finish transferring some money around. They're too screwed up to be reliable, and their customer service is either incompetent or dishonest.
Luckily, many places are now accepting credit cards or Google Checkout, with which I've had no …
read moreThere are comments.
I get really frustrated with posts that claim your unit tests lie to you or 100% code coverage is fallacious or there are flaws in coverage measurement. These are sensationalist headlines that encourage bad behavior, by confusing new or inexperienced or argumentative or lazy developers: "well, we all know test …
read moreThere are comments.
Over at OLPC News, Wayan Vota asks: "Do you get better FOSS code ... if the developers are paid or unpaid?"
Interesting question, especially as considered in light of the OLPC code base.
As of about a year ago, virtually everybody I talked to was shocked and stunned at the poor …
read moreThere are comments.
Infoworld did a brief writeup of Snakebite; my only quibble is that MSU comes off as a more of a passive partner in the article than we actually are ;).
--titus
Legacy Comments
Posted by titus brown on 2009-02-14 at 10:46.
foo
Posted by titus brown on 2009-02-14 at 10 …
read moreThere are comments.
I'm switching several projects from darcs to either git on github, or svn on Google Code.
twill, a simple Web testing tool/HTTP driver in Python, was switched over to Google Code several months ago: see http://code.google.com/p/twill. I'll post more on twill development soon, I …
read moreThere are comments.
As Jesse writes, Trent revealed the existence of Snakebite yesterday. Snakebite is an "open network" of various machines that Trent and others (myself included) are making available to the Python community for build and debug purposes. I'm coordinating the MSU component, which basically means that I run interference for Trent …
read moreThere are comments.
A fellow prof here at MSU, Rich Enbody, whipped up the following cheat-sheet for new programmers transitioning from Python (CSE 231) to C++ (CSE 232). He welcomes comments. Here's the link:
http://web.cse.msu.edu/~cse231/python2Cpp.html
Paranthetically, he and his cohort in crime, Bill Punch, will be …
read moreThere are comments.
A friend asks,
i'm going to be recoding <x> from scratch starting next week, in python. what codebase would you recommend as good to model after?
Any thoughts on a well-formed, reasonably sized (yet not huge), and simple Python code base?
There have to be some examples somewhere! I'd suggest …
read moreThere are comments.
I've been watching Terminator: the Sarah Connor Chronicles on Hulu (best legit media site ever!), and I decided I needed to rant (mildly).
Even ignoring the whole problematic issue of time travel (central to the plot)), and the generally poor plot line inherited from the Terminator movies (whatcha gonna do …
read moreThere are comments.
Google for pretencious. What an odd first result!
(I was looking for the correct spelling. Turns out it's a 't', not a 'c' or an 's', before the ious.)
--titus
read moreThere are comments.
The decision of python-dev to deprecate bsddb has left us in a bit of a pickle (hah!) over in the pygr project. We're looking for a replacement for bsddb for default storage of infrequently- (or never-) changed pickled Python objects. Some of the parameters under consideration are:
read more
- Python version availability …
There are comments.
I don't suppose anyone knows of a low- or medium-intensity pickup frisbee game somewhere within walking distance of Waikiki...?
If so, drop me a note -- thanks!
--titus
Legacy Comments
Posted by Bill Mill on 2009-01-04 at 09:08.
E-mail Larry O'brien and ask him, his email's on the front page …read more
There are comments.
(This blog post is a long, rambling retrospective on my recent undergrad comp-sci course at Michigan State U., newly renamed to "Concepts in Database-backed Web Programming".)
I set out this term to teach a CS class in the way I would have wanted it taught when I was an undergrad …
read moreThere are comments.
Here's a summary of the e-mail responses I got to my lazyweb query re code review and git.
A few people (Paul Nasrat and Jeff Balogh) pointed out that Review Board supports git. So I may try that.
Charles McCreary has had good experience with Rietveld, but I don't think …
read moreThere are comments.
The latest hot shit idea for making a protein-protein interaction database leaves me lukewarm.
A few months ago I met with a genomics group, and we had a back-and-forth about genome annotation. The conversation went something like this:
them: "We have to improve the tools for annotating un-annotated genes!" me …read more
There are comments.
The pygr project is gearing up to do some code reviews, and we're not aware of too many (any?) mature (or even adolescent) tools that interact well with git. A Google search finds Gerrit and a blog post about Code Review -- anything else we should know about?
thanks, --titus
p …
read moreThere are comments.
We're going through the PyCon '09 review process, and participating in the process has been pretty interesting. (I joined the Program Committee in large part because I was told to put up or shut up after I critiqued PyCon '08. Ahh, the open source world... where you're encouraged to go …
read moreThere are comments.
The ongoing debate about doctests (here, and links therein) seems to me to be somewhat silly.
doctests should be assessed by their utility to you and your project, in whatever role you happen to be using them. I personally find them to be very useful in API documentation, where they …
read moreThere are comments.
As a new prof, I've been too busy to blog much. What am I doing?
Apart from all the normal academic crud (meeting with people, answering e-mail, doing paperwork, etc.) and parenting & home ownership stuff, I've been teaching my Intro to Database-Backed Web Programming course. This has been neither a …
read moreThere are comments.
This post on why academics should work to integrate their research ideas with open source software in order to actually push forward on the OSS side (and presumably vice versa) is really good. I think such things could be applied in software testing, too, where there's a gulf between how …
read moreThere are comments.
I have a habit of occasionally sending odd e-mails to my postdoc lab mailing list, for reasons that I cannot adequately explain. Here's the latest one:
Dear Bronner-Fraser Lab, I would like to thank you all for your private letters of support; between the blizzards of Colorado, the floods of …read more
There are comments.
My last post initiated a discussion on the biology-in-python mailing list about BioPython, among other things. (Here is a link to the discussion, which is kind of long and unfocused.)
I'm happy that the bip list is serving as a place for people to interact with the BioPython maintainers to …
read moreThere are comments.
Chris Lasher wrote a nice blog post naming me as a rabble rouser in the area of "Python in bioinformatics". His post raised a number of interesting points, some of which I'd like to discuss here on my blog.
First, why is Python not more dominant in bioinformatics? I really …
read moreThere are comments.
We have an opening for a project on which I'm collaborating:
Full-time 12 month appointment academic position for a genomics scientist. The incumbent will spend 50% time as the Associate Director of the Comparative Genomics Laboratory, with duties in directing daily activities, long-range planning and seeking extramural funding, and 50 …read more
There are comments.
I'm surprised I haven't seen this on planetpython yet...
...an emerging consensus in the scripting community holds that Python is the right solution for freshman programming. Ruby would also be a defensible choice.
(emphasis mine). Originally found via Lambda the Ultimate, and also passed onto me by Rich Enbody.
In …
read moreThere are comments.
I read things like this report on SciFoo and think, gawd! I'd have had a great time! I should try to beg/bully/buy/brown-nose my way into the next SciFoo so I can talk about Science 2.0 etc.!
And then I think back to the heady days of …
read moreThere are comments.
In reply to elanthis's post on Advogato,
1. I agree that the documentation could be improved, and we've been working on it. The next release should add a whole bunch of examples. Google is your friend, as is the Python Cookbook.
There are comments.
Recently the question came up: suppose you wanted to give enthusiastic people some guidance on how to help work on Python. What suggestions do you have? Surely there's a Web page on this!
Well, no: a few quick Google searches led me to discover that "contributing to Python" was answered …
read moreThere are comments.
Just finished the book The Fragile Light, by David Nurenberg. Good stuff; independent author. Worth reading.
Briefly, it's a SF&F novel about a world where mutants are sometimes heroes, and more often feared; where there are Herotown ghettoes full of supers; and where only licensed heroes can join in …
read moreThere are comments.
I finally got sick of manually schlepping BLAST files around, so I wrote something to do it for me. 'zounds' is a very simple server/client system for coordinating a bunch of 'worker' nodes through a central server; it does everything in Python with objects and pickling, so it's easy …
read moreThere are comments.
We've been talking about how to manage pygr resources remotely via the existing XML-RPC interface, and for that HTTPS is a requirement. I offered to track down the code necessary for running an XML-RPC server over HTTPS. Here's what I found:
It turns out that while the Python stdlib supports …read more
There are comments.
On Thursday, May 15th, I finished my post-doc position at Caltech.
On Friday, May 16th, I officially started as an Assistant Professor split between Computer Science & Engineering and Microbiology & Molecular Genetics at Michigan State University.
On Friday evening and Saturday, we hung out down at the Caltech Marine Lab and …
read moreThere are comments.
I've been hit by a few different e-mail-related problems over the last few months, and it's becoming intensely frustrating.
Some servers seem to randomly drop messages from me, for no obvious reason; at least, people don't get one message and then do get another, a day later. gmail may be …
read moreThere are comments.
(pygr is a neat bioinformatics framework in Python.)
After some commenters on my last post seemed happy to hear that pygr was the focus of some summer work, I realized I had only discussed the pygr summer work in a post to the biology-in-python list.
Whoops.
So, here's the scoop …
read moreThere are comments.
Dear Lazyweb, help!
I'm embarking on a number of summer projects in my new lab at MSU, and several of them focus on using pygr to do cool genomic stuff. In particular, I'm planning to build a personal genome annotation system that will let people run their own full genome …
read moreThere are comments.
So I'm pretty bullish on testing for maintenance reasons. It was nice to see how well it worked out for me when a user recently reported a problem with Cartwheel.
This is what happened: third-party package (LAGAN) that the user was running through the Web interface depended on certain command-line …
read moreThere are comments.
I read a lot of total crap, and one of my recurring crap authors has been John Ringo. He's a total nutjob politically, but he writes good battle scenes and is an enjoyable read once you cut through the nonsense. Still, I'm having a tough time getting through the opening …
read moreThere are comments.
I'm having a long-running discussion with some people about threading and why using threads with simple subprocess calls is almost certainly an overcomplicated (== BAD) use of threads. Everyone seems to think I'm wrong (at least, there's either deafening silence or straight out argument ;) and I think I finally figured out …
read moreThere are comments.
In some discussions with a moderately new Python programmer who seems to value complexity over simplicity, I may have coined a new term:
"Penis size" style of programming -- the (mistaken) belief that the more advanced programming language features you use, the more impressive your code will look.
I think it's …
read moreThere are comments.
Pavel Vinogradov <fastnix> has been keeping me updated on an issue he discovered while testing TCMalloc with Python as a Google Highly Open Participation (GHOP) task, task 105.
Briefly, Pavel discovered a situation in which replacing the Python memory allocator with TCMalloc resulted in really bad performance. The latest is …
read moreThere are comments.
At Google Campfire One, v 2.0 -- introducing AppEngine.
IT'S FREEZING. The cider ran out. Brr.
Deploying Web apps is annoyingly difficult. Technical hurdles, etc. Need machines. Blech. Costly.
AppEngine solves all these problems. Runs web apps, handles app lifecycles, apps are run on Google infrastructure can make use of …
read moreThere are comments.
Please send this on to anyone who might be interested...
Disney Animation has an opening for a summer intern to work on a testing project under the supervision of Paul Hildebrandt and Dr. C. Titus Brown. The ideal candidate will have experience with a dynamic language supporting introspection (Python preferred …
read moreThere are comments.
From Wages or Shortage, this comment
""" A-grade engineers are unfortunately similar to Welsh longbowmen: devastatingly potent compared to their peers, but you have to start their training at age 10 or so. Simply upping the salaries of A-grade engineers won't magically create more of them. We know this, as we …read more
There are comments.
On top of dreamhost dropping off the 'net just when I posted a bunch of screencasts... our socal-piggies meeting nearly got whacked because this month's organizer uses Yahoo, and most of the messages going through my mail server (which hosts the mailing list) were filed as "spam".
Now Yahoo is …
read moreThere are comments.
I put together an unofficial screencast about the Google Summer of Code based on a pitch I gave to the Michigan State undergrad CSE population. Enjoy. Please forward on to anyone who might be interested...
--titus
Legacy Comments
Posted by Leo Soto on 2008-03-27 at 09:16.
idyll …read more
There are comments.
The simple application I demoed at PyCon '08 during my talk on the OLPC and testing is now available: I call it "peekaboo".
peekaboo is a way to watch your code being executed in another process using sys.settrace, figleaf, and XML-RPC.
The two screencasts below should explain it.
The …
read moreThere are comments.
At PyCon '08, I gave a talk on testing and the OLPC project where I referred to the "Testing Death Spiral". My accompanying slide, which aimed to be simple rather than comprehensive, had this scenario:
Write a bunch of code & manually test it.
(Good so far.)
Start adding features over …
There are comments.
Here are some of the materials from my PyCon activities:
The tutorial source code (which is really just a cut down version of my PyCon '07 talk's source code; see the README and my blog post from back then).
(Sorry about …
read moreThere are comments.
Kumar MacMillan pointed me towards GUITAR, a framework built by Atif Memon and others. There's a YouTube video of him and Adam Porter talking at the Google Test Automation Conference (2007).
Looks and sounds interesting. Also nice to note that GUITAR is being open-sourced...
--titus
Legacy Comments
Posted by Kumar …
read moreThere are comments.
Just left PyCon yesterday; now I'm up in Michigan looking at some more houses, arranging lab stuff, talking with people, and getting ready to prosyletize the Google Summer of Code to a bunch of Michigan State CSE students as well as a few professors.
Some freeflow thoughts. Feel free to …
read moreThere are comments.
Steve Holden and Doug Napoleone both attended our testing tutorial, (as did AMK, which was a bit of a surprise!), and had fairly positive things to say about it. This was a relief, because Grig and I always wonder whether or not this stuff is useful to anyone.
Our hope …
read moreThere are comments.
I'm at PyCon, and I have a tip for people: don't stay at the same hotel as everyone else.
My hotel room Internet connection is great, perhaps in part because I'm not sharing it with the rest of the PyCon attendees! (Plus it's free -- the DoubleTree wireless Internet doesn't charge …
read moreThere are comments.
The blognet is full of people posting their own opinions, and that's a good thing. What is a little less supportable is flawed argumentation.
I recently spent some time discussion a post about software engineering; I was trying to figure out why the author thought what he did. The annoying …
read moreThere are comments.
I spent some time over the last week adding fairly simple motif searching to Cartwheel, my bioinformatics site for biologists doing cis-regulatory analysis of genomic sequence. The new features include the ability to define and search with IUPAC and position-weight matrix (PWM) motifs, as well as visualization of motif search …
read moreThere are comments.
I'm having trouble with some tests of a PostgreSQL-based system. Briefly, I have a set of functional tests that
- create a new database
- populate it with a data model
- run a Web server (in-process)
- test the integrated Web server - database functionality
The tests are now slow enough that I'm averse …
read moreThere are comments.
Noah and Grig have been CCing me on a conversation about JoelOnChecklists and Grig's post. Noah's writing a book chapter on this stuff, and asked for some tips.
Here are mine.
First, I have a bunch of individual twill scripts in a directory that are run every hour. These scripts …
read moreThere are comments.
Tracy recently asked me if there were any good guidelines about how to write configuration files -- not coding-level guidelines, but guidelines on structure and content.
I was unable to come up with anything: my Google-fu failed me, and my DevonThink database was silent (although it did have some nice testing …
read moreThere are comments.
Via http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising, on the Nucleic Acids Res "database" issue:
As we pass the one thousand databases mark (1kDB) I wonder, what proportion of the data in these databases will never be used?
This is an unsettling thought for …
read moreThere are comments.
I just finished a chapter for a book, Methods in Avian Embryology, being edited by my boss, Marianne Bronner-Fraser. This chapter is intended for developmental biologists who are interested in locating regulatory modules and analyzing them for binding sites. It ended up being my outlet for a compilation of problems …
read moreThere are comments.
As many people have doubtless read, PyCon '08 has announced the tutorial sessions. This year, Grig and I are doing a workshop-tutorial on testing rather than a teaching-tutorial; what this means is that our tutorial will focus on actually applying testing tools effectively to your source code.
We're billing this …
read moreThere are comments.
Steve Yegge recently wrote a long article, "Code's Worst Enemy", about how "many lines of code" causes problems in projects.
That's obviously pretty silly. To see why, let's examine a little project I've recently started; conservatively, I estimate that it incorporates well over a million lines of code:
print 'hello …read more
There are comments.
Brett Cannon notes that GHOP is working out well, and muses about the future of the CPython test infrastructure (among other things). This is something I'm interested in as well (guess where all those testing tasks in GHOP came from? ;) and I've been confused, if not frustrated, by the apparent …
read moreThere are comments.
Andrew Binstock hits the nail perfectly on the head with his post, Beautiful Code vs Readable Code.
--titus
read moreThere are comments.
Matt Harrison's post, Gnome devs too lazy for python?, and the linked post by Thomas Vander Stichele strongly typed, are both really entertaining and illuminating.
I preserve them here for my own reference.
--titus
read moreThere are comments.
heh, this applies to many fields, I think...
Luis Ibanez This presentation is a satire of the current obsession with intellectual property, innovation and originality that plagues the field of medical image analysis. The presentation makes the point that most Journals and Conferences focus on Originality and despise Reproducibility and …read more
There are comments.
So, the Google Highly Open Participation Contest is going quite well for Python, with about 25 additions/reworkings of core CPython documentation and tests and dozens of contributions to other projects.
One of the really unique opportunities that GHOP offers for the Python community is being underutilized, however, and that …
read moreThere are comments.
My future colleague, Bill Punch at MSU, is teaching Python to intro CS students. He asks (slightly edited):
In C++, you can write multiple constructors, each one taking a different type and/or number of arguments. Let's say we are writing a RationalNumber class. I could write 2 constructors:
class …read more
There are comments.
A long time ago, Kalid Azad sent me an e-mail about his Visual Guide to Version Control. Good stuff!
--titus
Legacy Comments
Posted by Kalid on 2007-12-09 at 15:01.
Hi Titus, thanks for the mention, I hope your readers find it useful.
Posted by Chris Lasher on 2007-12-09 at …
read moreThere are comments.
If anyone is having a Python Interest Group meeting this month, please consider devoting 15-30 minutes to coming up with random task ideas for the Google Highly Open Participation Contest.
Briefly,
read more
- tasks must involve Python and Open Source;
- non-core pet projects are welcome;
- building screencasts, updating documentation, and adding unit …
There are comments.
My Computer Science department at Michigan State University is looking for an assistant professor! We are casting a fairly wide net (databases, graphics, medical imaging, and bioinformatics) but I'd really like to attract a bioinformatician.
The Computer Science department at MSU is a nice, small department …
read moreThere are comments.
Hi folks,
just a quick note -- Kumar McMillan has offered to take over wsgi_intercept. You can see the new project over at code.google.com, http://code.google.com/p/wsgi-intercept/.
While I will miss the income from the project, I think that Kumar will treat it well.
--titus …
read moreThere are comments.
I'm going through some of my saved up e-mail from the last few months, and found these two gems.
Noah Gift on grokking threads, from the testing-in-python list:
Trying to understand what massive pools of threads that spawn other massive spools of threads, that spawn other massive pools of threads …read more
There are comments.
Here are some factoids about the GHOP/Python project, from the end of the first 30 hours.
Of 63 tasks, 32 remain unclaimed.
26 tasks have been claimed by students and are being worked on; some of those are nearing completion.
5 have already been completed: all three Rosetta Code …
read moreThere are comments.
In trying to pull together ~50 tasks for the initial Python part of the Google Highly Open Participation Contest, I ran into some interesting issues.
First: it is not easy to find "easy" or "intro" tasks for projects.
I sent out a lot of e-mails and posted a blog request …
read moreThere are comments.
Based on the current "burn" rate of Python's task list for the Google Highly Open Participation Contest, I would estimate that we will need more tasks within at most two weeks. The current mentors are certainly capable of coming up with more tasks, but we would welcome input and ideas …
read moreThere are comments.
I recently gave an informal talk on Software Carpentry for the Caltech e-Science 101 course. Since even "Intro Software Carpentry" is a whole course of study, I obviously couldn't cover much, but I tried to motivate people to get interested. And, of course, I pushed testing. TESTING, DAMNIT!
Anyway, here …
read moreThere are comments.
So, next May I'm starting as an assistant professor split between the Computer Science and Microbiology and Molecular Genetics departments at Michigan State U., and I'm interested in attracting as many good CS grad applicants as I can from the open source and bioinformatics communities. (I would also like to …
read moreThere are comments.
So, I've been wanting to write a bit about why grad school and academia in general isn't necessarily a waste of time. Unfortunately I'm now a white male in a power position that depends on me sucking good people into grad school and then exploiting them, so this may end …
read moreThere are comments.
Here's a small collection of links to take the edge off my general grumpiness at how much software really does suck. Presumably more will follow.
Firefox will be released with many known bugs.
I switched away from Firefox months ago, because it was so unstable and broken. I'm now using …
read moreThere are comments.
Calling all Pythonistas,
What is your personal list of "modules you love and hate" -- stdlib modules that you use all the time, but that have weak documentation, poor examples, or are otherwise difficult to use the first time?
Here are five modules that I think could use some documentation help …
read moreThere are comments.
Hi all,
I need to find some useful projects for new, young contributors, especially in the area of 3rd party packages; we've been thinking about things like porting 3rd party packages to Py3K, adding tests to existing projects, and providing Windows binary eggs for various packages. Everything would be open …
read moreThere are comments.
So I'm having a little caption contest for this picture of our daughter, Amarie Rose Brown, who is now nearly 6 weeks old (but only four weeks old in this picture).
Captions so far:
There are comments.
charter.net, my home ISP, is intercepting name-not-found errors and replacing them with a host that answers with a Web search page. This leads to some "entertaining" problems when trying to ssh to a host that doesn't exist.
Is this actually wrong or just sleazy?
In fact, it actually breaks …
read moreThere are comments.
Some books make me want to leap up out of my chair and go change the world, or die trying.
Charles Stross' Halting State is not one of them.
However, it is a darn good read, and -- for those of you who are into the Internet, Python, MMORPGs, and/or …
read moreThere are comments.
Hey everyone,
Grig and I are thinking of doing another tutorial at PyCon '08, but we'd like to break out of the mold of "intro testing" and do something more exciting for us. Here's a promotional blurb:
Practical Agile Web Testing --------------------------- Have Web site? Need testing? Bring your tired (code …read more
There are comments.
I'm happy to announce that the Python Software Foundation is part of a new Google Open Source program, the Highly Open Participation Contest. This contest is an effort by Google to engage pre-college students in open source programming: Google is offering prizes and awards for completing a variety of tasks …
read moreThere are comments.
I am in need of some test automation tools for a GTK/GNOME GUI, and after scanning the Python Testing Tools Taxonomy and doing a bit of investigating, it looks like there are three possibilites.
dogtail and the Linux Desktop Testing Project both use accessibility libraries to drive GNOME applications …
read moreThere are comments.
I spent a fairly satisfying two days setting up unit and functional tests for figleaf, my code coverage package. I implemented the tests using a technique that I gather is called vertical slicing -- I just called it "starting a new phase of a project" before ;). The idea is that you …
read moreThere are comments.
Every few days, a new hysterical screed against formal education, undergrad studies, or grad studies comes across my screen. As a result I've been mulling over the hot-button issue of academic study, both undergrad and grad, for the last few months. Why are there so many loud people decrying the …
read moreThere are comments.
If you use sys.settrace to set a tracing function, and that function prints to sys.stdout`, then don't ever trash ``sys.stdout, even briefly. You will raise an invisible exception and your trace function will be removed.
(I don't know precisely what is supposed to happen when a trace …
read moreThere are comments.
I haven't seen Kyle Wilson's article, Software is Hard, making the rounds yet... worth reading!
--titus
Legacy Comments
Posted by Noah Gift on 2007-10-03 at 08:15.
Great article, at least it is a comfort that some people admit it is hard and that "software is hard" is becoming more …read more
There are comments.
There's been lots of discussion recently about the global interpreter lock, and how evil it is. (I personally don't think it's evil. More on that some day when I write code again, if ever.)
Then I read this article, on parrot and Threading Building Blocks, and somethign clicked. I quote …
read moreThere are comments.
Rob Campbell found me by google, and pointed me towards his blog, Science and Software. Funny, well written, and very apropos! Why isn't there more software, commercial or otherwise, for labs?
There has been a lot of local interest (i.e. two or three people have discussed it at various …
read moreThere are comments.
After our long software licensing discussion on the biology-in-python list, I realized that I wanted something different in a license for scientific software.
Specifically, I would like to attach the following clause to either a BSD or L/GPL style license:
Publications relying on derivative works of this software must …read more
There are comments.
This month the newly minted biology-in-python mailing list erupted into a discussion of licenses. There was some confusion about the goal of the discussion, for which I'm largely responsible: we didn't make it clear that we were talking about licenses for code and content posted on the bio.scipy.org …
read moreThere are comments.
In the spirit of cleaning up my desktop... here's a PDF of my talk on Cartwheel at SciPy 2007.
--titus
read moreThere are comments.
I'm now listed on the Gene Expression in Disease and Development page, as well as on the CompSci faculty page, MicroMolecularGenetics faculty page, QuantBio page, and SysBio page.
It was quite a shock to log into the CompSci cluster at MSU and see my group set as "faculty". As a …
read moreThere are comments.
So, a few people commented on my how to write Python code that doesn't suck post, and I thought I'd respond here rather than in the comments.
First, John Camara suggests adding the MIT license as an option. I chose the BSD because it's essentially equivalent to the MIT license …
read moreThere are comments.
My last post on this subject got a number of good comments, both here and on the biology-in-python mailing list, so I've amended and updated it. (Note that Brandon King is now listed as a contributor.)
I would particularly appreciate comments on the licensing section and the conclusions. Also, I'll …
read moreThere are comments.
Note: this is ultimately intended for the biology-in-python Wiki at http://bio.scipy.org/. I will release it under a CC license, so please feel free to use it for your own site! --titus
Here are some prescriptions for writing Python code that other Python programmers will find more usable …
read moreThere are comments.
I've been on the run for well over a year -- I started writing my PhD thesis in July '06, just after I got back from teaching at Woods Hole. At the time I was also interviewing for a faculty position at MSU (since offered & accepted). Since then I've defended my …
read moreThere are comments.
Synchronicity: Shannon ditches FireFox the same day I do, for much the same reason.
--titus
read moreThere are comments.
It's been a busy few weeks, in part because I've been writing a grant. Last Thursday, I submitted a grant proposal to NIH for their program announcement, Continued Development and Maintenance of Software. The proposal was to continue maintaining Cartwheel, while integrating a new visualization frontend (MUSSA) and a fast …
read moreThere are comments.
I was the official mentor for a Google Summer of Code student this year -- Martin van Loewis was "technical mentor" -- and I found it to be a disappointing experience. At the beginning, I felt guilty about not being more on the ball about pushing the student to do more work …
read moreThere are comments.
I'm in the process of writing up a "when and how to test" screed, and I discovered this:
Karl Fogel's book, Producing Open Source Software, has precisely two keyword hits to testing in the ToC.
WTF?
--titus
Legacy Comments
Posted by Ricardo Niederberger Cabral on 2007-08-22 at 17:00.
I …read more
There are comments.
So, I "organized" a Biology Birds of a Feather at SciPy 2007. This mainly consisted of posting about it and then trying to write stuff on a white board while keeping abreast of the conversation. About 15 people attended.
I didn't get everyone's name and in any case I don't …
read moreThere are comments.
Thanks to a kind invitation by Fernando Perez, I was alerted to a BoF on Python/testing at SciPy. He made the mistake of introducing me as "the resident expert" so I felt even less inhibited than normal, which was hopefully not too problematic...
Gael Varoquaux took notes.
Basically, this …
read moreThere are comments.
I wanted to write a comment on timing unittests, but that blog does not allow anonymous comments and there is no obvious place to e-mail the author.
Bummer.
(The short version of my comment is that getting the basic data out with something like nose is trivial; see my pinocchio …
read moreThere are comments.
There was another mildly amusing incident during the recent SoCal Piggies meeting.
Michael Carter was showing us his incredibly neat Web 3.0 / HTTP PUSH software, orbited, by demoing an interactive IRC client on the Web. He signed onto the ruby-on-rails IRC channel and (this being a Python meeting) asked …
read moreThere are comments.
To get people talking, I've created a "biology-in-python" mailing list. You can subscribe here: http://lists.idyll.org/listinfo/biology-in-python, and you can post to it at bip@lists.idyll.org once you're a member.
This list is a tool/package/library-agnostic list, for people who use Python to work …
read moreThere are comments.
Chris Lee and I would like to set up a Birds-of-a-Feather gathering at SciPy '07. We'll probably have an initial meeting on Thursday, August 16th, and then maybe work into sprint mode for that Saturday.
Contact me if you're interested. No reservations needed, but we should probably all plan to …
read moreThere are comments.
While I'm thinking about SciPy '07, here are a few other notes:
- Chris Lasher and I are thinking about doing a Software Carpentry sprint of some kind. Interested?
- I'd be up for doing a half-day tutorial on "Testing for Scientists" or "Idiomatic Python". Interested?
--titus
Legacy Comments
Posted by Chris …
read moreThere are comments.
I've just put up a simple lab Web site for my future lab at Michigan State U.; I'm calling it the Lab of Genomics, Evolution, and Development.
--titus
Legacy Comments
Posted by Melissa on 2007-07-10 at 05:03.
Really cool Titus! The science you are doing is very interesting ... hmmm …read more
There are comments.
It's nice to see Python come out on top for threading.
--titus
Legacy Comments
Posted by Scott Lamb on 2007-06-28 at 12:32.
Hmm, I wouldn't call "better than Perl" validation. "Easier to read the Perl, better threading support than COBOL, more productive than TriMedia VLIW assembler, faster than MUMPS …read more
There are comments.
On Tuesday (June 12), Wednesday, and Thursday I taught the course "Intermediate and Advanced Software Carpentry in Python" at Lawrence Livermore National Labs. This was intended to be an extension of some of the ideas from the Software Carpentry course.
The pre-course course advert, the handouts distributed at the course …
read moreThere are comments.
Does anyone know if there are any faculty programming contests out there?
It'd be fun, and I can't imagine that the competition would be as tough as the student programming contests probably are ;).
--titus
Legacy Comments
Posted by jt on 2007-06-22 at 19:56.
Um... faculty members... programming? You lost …read more
There are comments.