Fri, 31 Dec 2010
The last five years
Looking back, the last 5 years have, collectively, been rather overwhelming.
Five years ago, I was a big-mouthed 7th year graduate student. The biggest change in my recent life was getting a cat (first) and getting married (second).
Now, I'm the father of two (adorable) daughters. I have a minivan, a big house (plus mortgage) and a salary that is several times my grad salary (not necessarily saying that much...). I've got a PhD and am an assistant professor up for reappointment. I have six big-mouthed graduate students and four even bigger-mouthed post-docs. I've "graduated" one Masters student and an undergrad or three. I'd guess I've added almost half again to my stock of friends and acquaintances.
Very weird and fairly sudden transition!
Some thoughts:
Having kids is awesome. It's lots of work (especially for the primary caregiver, usually the mom) but a total riot of life and energy. One of the most amazing things to do with kids is to watch them sleep, which they do with the same total commitment that they give to everything else (...including throw tantrums).
Big houses with big backyards are surprisingly relaxing, if they're in reasonably good shape and require only upkeep maintenance.
Minivans? WAY AWESOME.
The job: Just like everyone else, professors get pulled between too many strong competing needs: the bureaucracy of science (running a lab, managing paperwork and money, doing reviews), teaching (billed as important but not seen as such), mentoring (fun and rewarding but a lot of work), and doing research (what we really want to do). The flip side of all that work is that we get to pick what we're going to work on, each day. It's worth it to me, but boy did it take a long time to get here - 15 years out of high school. It would really suck to take all that time to get here and find out that you didn't have a passion for research...
Actually, it's hard to overstate how fantastic the job is. Running a lab (or, more accurately, pointing it in some direction, feeding money into it, and seeing what happens) is as different from being a grad student or a postdoc as, umm, being a parent is from being single. Surprisingly fantastic good times amongst the hectic busy-ness, with a few rather miserable times (mostly paperwork). Hopefully it will get better once we actually publish some of the great stuff we have up our sleeves.
I'd love to be able to do more programming. My last in depth programming project led to an epic inbox disaster that I'm still working through, though.
--titus
posted at: 19:05 | path: /dec-10 | 1 comments
Thu, 23 Dec 2010
What's in it for me? Thoughts on open science.
If you've been under a rock (or indulging in arsenic yourself), you've heard about NASA's "arsenic" article, claiming the discovery of a microbial species that can substitute arsenate for phosphate. The paper was pre-announced via a press conference that then announced the results.
Immediate blogtastrophe! The paper was critically reviewed in the blogosphere by a lot of people; I'm particularly fond of Rosie Redfield's Moreover, NASA has not covered itself in glory in its responses, claiming that blog reviews are not worth responding to, even when done by practicing scientists.
The Guardian has an article by Martin Robbins summing up much of the ensuing commentary, which boils down to some variation on "this paper should not have been published in Science", or "reviewer fail".
I found the Guardian article interesting, and I wanted to particularly comment on one of the concluding paragraphs:
At almost every stage of this story the actors involved were collapsing under the weight of their own slavish obedience to a fundamentally broken... well... 'system' is the right word, but I find myself toying with 'ideology'.
It's almost an article of faith online (blogs, twitter, yada) that many venerable academic institutions, including peer review and the whole scientific publication model, are basically broken . I don't disagree, although I'm hardly an expert. But I do have a comment, or rather a question, that I think is pertinent to the discussion of how to improve science through blogging, online peer review, and other methods of openness.
What's in it for me?
More generally, why should Dr. Jane Random Researcher invest any time or effort into blogging (and responding to other bloggers), writing good software, or anything else? What good does it do her? Is online networking just another (still rather poor) social networking tool?
I don't think you can use the arsenic paper as an argument for peer review in the blogosphere. The only reason we noticed the arsenic paper was that it was a .1-percenter: fascinating results with significant implications, hyped to high heaven by NASA, and reviewed quite visibly by a number of serious scientists. If the paper hadn't been publicized as heavily, little or none of the online stuff would have matter. As it is, I can virtually guarantee that the first author is not going to be able to ride the glory of a Science paper into a faculty position, because everyone knows about the controversy now. But that's this paper, not the other thousands that will be published this year (hundreds in Science alone).
This is the problem with the online world for scientists: there's no real systematized incentive to any of this online stuff. And that makes it really tough. I'm going through Reappointment right now (that's where you fill in a lot of little boxes tallying your papers, grants, teaching, and other stuff, so that your university can decide if you're worth keeping on for another few years). Nowhere on there is there a place for "influential blog posts" -- how would you measure that, anyway? Same with software -- I listed my various software releases on the "scientific products" page of the form, and have since been asked to describe and discuss the impact of my software. Since I don't track downloads, and half or more of the software hasn't been published yet and can't easily be cited, and people don't seem to reliably cite open source software anyway, I'm not sure how to document the impact.
So, why do I release software, or blog? Well, I do what little I do because I like it. Personally, I'm ideologically bent towards openness: open source, open science, open review, etc. And I'm willing to spend some of my time investing in it, writing about it, and otherwise trying to practice it. And I've managed to make it work for me reasonably well, at least so far; more on that in future blog posts.
Having an identifiable incentive structure, however, is important. If you want people in general to change, you need to be able to show them that there's some gain in it for them - not monetary (no scientist I know is in it for the money) but economic in the academic sense. This boils down to cold hard grant cash & publications. Why? Because that's what the hiring, reappointment, promotion and tenure committees care about, so that's how you get and keep jobs -- and it's awfully hard to do research without a job.
The notion that publicizing your science leads to scientific fame and fortune is silly. The idea that additional citations is "a tangible benefit" is nonsense. The only "tangible benefit" that junior scientists care about is more time, more grants, and more publications. Writing blogs publicizing your own research is generally not going to help with that; rather, it's going to reduce the time you get to spend on doing science.
So how do you go from "online" to "grants and pubs"?
I don't know of any robust mechanisms for converting online reputation, from blogging or software release, into academic grants or publications. There are a few weak venues for software, like the NIH Software Maintenance grant program or the NSF Software Infrastructure for Sustained Innovation program, and some journals do exist to support the publication of software, but these haven't made much of an impact yet, and -- just as importantly -- seem to be largely uncoupled from software quality, at least as far as I can tell. Sean Eddy wrote a great article touching on the need for better software developer incentives that is particularly worth reading.
So why write software? Right now you only write software for the purpose of doing your own research, and there's very little incentive to make it public, much less make it good. It's rarely if ever peer reviewed, and the number of people using your software is (at best) useful for convincing grant reviewers that maybe you had some useful ideas a few years back.
In light of all of this, I'm very pleased to announce a new journal, Open Research Computation, or ORC. ORC is a journal for those of us who, for one reason or another, spend a lot of time working on the software. Cameron Neylon neyls it in his blog post:
Computation lies at the heart of all modern research. ... Open Research Computation is a journal that seeks to directly address the issues that computational researchers have. ... The primary consideration for publication in ORC is that your code must be capable of being used, re-purposed, understood, and efficiently built on.
I'm extra-specially-pleased to be on the board of editors, not least because so far it seems like this journal is trying to break significant new ground. Our ed board discussions so far have included discussions on how to properly "snapshot" version control repositories upon publication of the associated paper (easy for DVCS... not so much for svn) and considerations for "repeat" publishing of significant new software versions, as the software matures, in order to help encourage people to actually update and release their software.
This new journal isn't a panacea, of course. It's going to take 3-5 years, or even more, to make a real impact, if it ever does. But I'm enthusiastic about a venue that speaks to a major theme of my own scientific efforts -- responsible computing -- and that could help in the struggle to place responsible computing more squarely in the scientific focus.
I also hope that this kind of journal -- providing incentives for more online interaction, if only in software -- will help convince scientists that online interaction is a Good Thing. At the least it's one more brick in the road.
--titus
p.s. Merry Christmas, all!
posted at: 15:44 | path: /dec-10 | 1 comments
Thu, 09 Dec 2010
(Some) Principles of Computational Science
I'm just finishing up my Computational Science for Evolutionary Biologists course, and I'm finding it tricky to come up with a good high-level summary of what I would like them to take away. As you can see from the class notes they've done some reasonably neat stuff with Digital Life and (separately!) next-gen sequence analysis, but the class has been somewhat random in its topics and train of thought.
Anyway, for the final class I decided I'd go slide by slide through a number of principles that they should apply if and when they find themselves doing computational science. In each case I can point to class exercises and homeworks that illustrate the points, which I think means I haven't totally failed... ;)
Anyway, here's what I have so far:
13 Principles of Computational Science:
1) Computational science is just like any other science: don't trust it if you don't understand it.
Seriously. Computers aren't magic, and computational jargon isn't any more meaningful than any other jargon.
- The entire chain of evidence matters.
Keep close track of the raw data; the analysis source code; and the parameters used at each stage of data generation, processing and summarization.
Corollary: Make your raw data available. To do otherwise is just silly.
- If it's not automated, it's crrrrrap
As soon as there's some manual step in your pipeline, you've lost track of what you're doing. You may do it differently, or not at all, or incorrectly. And you'll never know. You'll just get different results. Sometimes.
- Use version control.
If it's neither raw data (backed up!) nor generated data, put it in version control.
- Using other people's software to do science is hard.
They probably had some other use in mind that doesn't fit your needs, but you're going to try to adapt it anyway, aren't you? Good luck with that.
Corollary: using your own software to do science, 2 years after you wrote it, is hard -- because you're not you any more. (Remember, you can never step in the same stream twice.)
- No software is trustworthy.
Until you understand your software stack intuitively, have obsessed over parameter choices, and have locked down your software behavior with automated tests, don't trust it. After that, you can grudgingly extend some minimal trust to it, at least until the next version is released.
- Computation is not science.
Science is science. Computation may be one of the ways in which you do science.
- Hypotheses are good.
It's virtually impossible to analyze data without some kind of hypothesis in mind.
Corollary: Each hypothesis is only a starting point. It's probably wrong, so don't get too attached to it.
- More data is not necessarily less confusing.
The more data you have, the harder it can be to get a clean signal. Statistics help here, unless of course you have an unknown systematic bias in your data.
Corollary: You have an unknown systematic bias in your data.
- Interdisciplinary research is hard.
You need to be an expert in multiple fields, each with its own special techniques, lingo, and "commonly understood" shibboleths. Proper hypothesis testing involves mastering the first two; publication may depend on avoiding the latter.
Corollary: computational science is implicitly interdisciplinary, hence hard. (If it were easy, we wouldn't need smart people like you to do it, right?)
- A lot of computing is just details.
There's very little magical about computing. An awful lot of it is just more details to remember. Running software, gathering the results, processing them, plotting them, tweaking parameters, etc.
- Look at your data.
Look at your data, and your results, in as many ways as possible. You'll often be surprised by what's actually in there.
- Above all, tell a story.
Nobody is interested in just graphs. If you don't have an interesting story, dig deeper.
I know, somewhat scattered. Any more thoughts, or pointers to similar lists?
thanks,
--titus
p.s. I plan to finish up with my (IMO very underappreciated) principles of How to be a Successful Computational Scientist, summarized here:
- Never show them your data.
- Do not, under any circumstances, communicate clearly.
- Never release your source code, either.
- Judge computational science by results, not quality.
- Use as much data as possible.
Then they get to fill out evaluations. Whee!
posted at: 21:45 | path: /dec-10 | 2 comments