Thu, 23 Dec 2010
What's in it for me? Thoughts on open science.
If you've been under a rock (or indulging in arsenic yourself), you've heard about NASA's "arsenic" article, claiming the discovery of a microbial species that can substitute arsenate for phosphate. The paper was pre-announced via a press conference that then announced the results.
Immediate blogtastrophe! The paper was critically reviewed in the blogosphere by a lot of people; I'm particularly fond of Rosie Redfield's Moreover, NASA has not covered itself in glory in its responses, claiming that blog reviews are not worth responding to, even when done by practicing scientists.
The Guardian has an article by Martin Robbins summing up much of the ensuing commentary, which boils down to some variation on "this paper should not have been published in Science", or "reviewer fail".
I found the Guardian article interesting, and I wanted to particularly comment on one of the concluding paragraphs:
At almost every stage of this story the actors involved were collapsing under the weight of their own slavish obedience to a fundamentally broken... well... 'system' is the right word, but I find myself toying with 'ideology'.
It's almost an article of faith online (blogs, twitter, yada) that many venerable academic institutions, including peer review and the whole scientific publication model, are basically broken . I don't disagree, although I'm hardly an expert. But I do have a comment, or rather a question, that I think is pertinent to the discussion of how to improve science through blogging, online peer review, and other methods of openness.
What's in it for me?
More generally, why should Dr. Jane Random Researcher invest any time or effort into blogging (and responding to other bloggers), writing good software, or anything else? What good does it do her? Is online networking just another (still rather poor) social networking tool?
I don't think you can use the arsenic paper as an argument for peer review in the blogosphere. The only reason we noticed the arsenic paper was that it was a .1-percenter: fascinating results with significant implications, hyped to high heaven by NASA, and reviewed quite visibly by a number of serious scientists. If the paper hadn't been publicized as heavily, little or none of the online stuff would have matter. As it is, I can virtually guarantee that the first author is not going to be able to ride the glory of a Science paper into a faculty position, because everyone knows about the controversy now. But that's this paper, not the other thousands that will be published this year (hundreds in Science alone).
This is the problem with the online world for scientists: there's no real systematized incentive to any of this online stuff. And that makes it really tough. I'm going through Reappointment right now (that's where you fill in a lot of little boxes tallying your papers, grants, teaching, and other stuff, so that your university can decide if you're worth keeping on for another few years). Nowhere on there is there a place for "influential blog posts" -- how would you measure that, anyway? Same with software -- I listed my various software releases on the "scientific products" page of the form, and have since been asked to describe and discuss the impact of my software. Since I don't track downloads, and half or more of the software hasn't been published yet and can't easily be cited, and people don't seem to reliably cite open source software anyway, I'm not sure how to document the impact.
So, why do I release software, or blog? Well, I do what little I do because I like it. Personally, I'm ideologically bent towards openness: open source, open science, open review, etc. And I'm willing to spend some of my time investing in it, writing about it, and otherwise trying to practice it. And I've managed to make it work for me reasonably well, at least so far; more on that in future blog posts.
Having an identifiable incentive structure, however, is important. If you want people in general to change, you need to be able to show them that there's some gain in it for them - not monetary (no scientist I know is in it for the money) but economic in the academic sense. This boils down to cold hard grant cash & publications. Why? Because that's what the hiring, reappointment, promotion and tenure committees care about, so that's how you get and keep jobs -- and it's awfully hard to do research without a job.
The notion that publicizing your science leads to scientific fame and fortune is silly. The idea that additional citations is "a tangible benefit" is nonsense. The only "tangible benefit" that junior scientists care about is more time, more grants, and more publications. Writing blogs publicizing your own research is generally not going to help with that; rather, it's going to reduce the time you get to spend on doing science.
So how do you go from "online" to "grants and pubs"?
I don't know of any robust mechanisms for converting online reputation, from blogging or software release, into academic grants or publications. There are a few weak venues for software, like the NIH Software Maintenance grant program or the NSF Software Infrastructure for Sustained Innovation program, and some journals do exist to support the publication of software, but these haven't made much of an impact yet, and -- just as importantly -- seem to be largely uncoupled from software quality, at least as far as I can tell. Sean Eddy wrote a great article touching on the need for better software developer incentives that is particularly worth reading.
So why write software? Right now you only write software for the purpose of doing your own research, and there's very little incentive to make it public, much less make it good. It's rarely if ever peer reviewed, and the number of people using your software is (at best) useful for convincing grant reviewers that maybe you had some useful ideas a few years back.
In light of all of this, I'm very pleased to announce a new journal, Open Research Computation, or ORC. ORC is a journal for those of us who, for one reason or another, spend a lot of time working on the software. Cameron Neylon neyls it in his blog post:
Computation lies at the heart of all modern research. ... Open Research Computation is a journal that seeks to directly address the issues that computational researchers have. ... The primary consideration for publication in ORC is that your code must be capable of being used, re-purposed, understood, and efficiently built on.
I'm extra-specially-pleased to be on the board of editors, not least because so far it seems like this journal is trying to break significant new ground. Our ed board discussions so far have included discussions on how to properly "snapshot" version control repositories upon publication of the associated paper (easy for DVCS... not so much for svn) and considerations for "repeat" publishing of significant new software versions, as the software matures, in order to help encourage people to actually update and release their software.
This new journal isn't a panacea, of course. It's going to take 3-5 years, or even more, to make a real impact, if it ever does. But I'm enthusiastic about a venue that speaks to a major theme of my own scientific efforts -- responsible computing -- and that could help in the struggle to place responsible computing more squarely in the scientific focus.
I also hope that this kind of journal -- providing incentives for more online interaction, if only in software -- will help convince scientists that online interaction is a Good Thing. At the least it's one more brick in the road.
--titus
p.s. Merry Christmas, all!
posted at: 15:44 | path: /dec-10 | 1 comments
Fri, 03 Sep 2010
Open Science, and Risk/Benefit Analysis
In thinking about open science and open communication about science, I've always been frustrated by the people who claim that the risks outweight the benefit. Their arguments seem sound if you buy into a certain kind of logic (the creationists will try to twist whatever you say! the climate change deniers will use your words in ways you did not intend! people will steal your research! you cannot communicate openly about what you're doing!) but I could never pin down why I felt that way. I had a eureka moment about it today, though.
When someone tells me that (for example) we should not make all BEACON research proposals fully public because they will be misinterpreted by creationists and used against us, they are saying this: in their personal opinion, the identified risks outweight the identified benefits. They already know (and I agree that this will happen) that people will take the BEACON-funded study of -- for example -- some fascinating tailless ascidians as a scientific boondoggle, an excuse for a trip to France that won't result in anything but more incomprehensible literature about chordate origins. And they can't imagine that, without careful shaping of the message and management of the public image, this will not happen. Since there's no particularly obvious benefit to posting them publicly, the risks (of misinterpretation) outweigh the benefits (of some nebulous "open science" thingy). So halt! the publication.
Same arguments apply to climate change (but they'll just misuse/misinterpret the data!) and open science in general (but someone will just steal my data/ ideas/...!)
This is fundamentally a failure of imagination. It is doing a risk analysis based on your worst fears, and neglecting a benefit analysis of your wildest hopes.
For examples:
In the case of BEACON, we have a sprawling collection of 100 faculty spread across 5 institutions. I have literally no idea what more than half of them are doing. Wouldn't it be great if I could do a text search of their proposals, and even better if I could stumble across a BEACON colleague in a Google search on some topic or other? Or if we could attract students that didn't even know they were interested in "evolution in action", but came to our Web site based on Google's indexing of a rich array of research projects and then found themselves hooked?
What about the climate change skeptic (or agnostic) who suddenly gets a chance to sit down and look at all the data and can conclude that hey, this is actually really complicated? And it's probably not as simple as the skeptics claim? (Aside: I'm unbelievably pissed at the climate change community for the idiocy of their current closed-ness.)
And what about the collaborators that I could get (and am getting) from posting about some of our projects? In the worst case, I post about things and no one pays any attention; in the best case that I can think of, I make connections and establish cred that enables future collaborations, publications, and grant opportunities. (This is already happening.)
At the heart of science is an ethos that has to include openness in order to work properly. Any constriction in the flow of ideas and the interchange of opinions is a block in the very lifeblood of science itself. If we indulge those who argue against free communication, we are preventing not only some imagined negative consequences, but all of the happy coincidences that are beyond our limited imaginations.
So turn on, tune in, and don't drop out.
--titus
posted at: 20:55 | path: /sep-10 | 3 comments
Thu, 26 Aug 2010
Galileo, Open Science, and History
I'm a big believer in open science -- see this great polemic over at Mendeley for a good read -- but it's always interesting to think about how such things as "data release" can be perverted by clever scientists. I'm currently in France working on some ascidians with Billie Swalla -- more on that later -- and we've been talking about what data we plan to release, and how. During these talks (leisurely conducted over cafe au lait and chausson pomme, of course!) Billie brought up an interesting historical parallel.
The story, as I understand it, is this: when Galileo Galilei first looked through a good quality telescope and discovered Jupiter's moons, nobody believed him. Since he was the only person able to make such good telescopes, he actually made and distributed them to other scientists -- not just as a profitable sideline, but so that the other scientists could confirm his observations!
One could see this a first step towards "open science": in order to reproduce Galileo's observations, astronomers had to have a telescope that only Galileo could make. So Galileo had to make telescopes and send them out, thus allowing others to both reproduce his observations and build upon them.
The story takes on a different aura, however, when you realize that Galileo could have just given out the actual manufacturing instructions for the telescopes, but didn't. Two possible reasons are money (he made money selling the telescopes to others) and scientific miserliness: he didn't want others to get credit for building on his results. As long as he withheld the details necessary to reproduce his instruments, he ensured that no one could build on his results, and that he would have preeminence in astronomy. (The parables between this and source code are uncanny, no?)
It was quite a balancing act. To quote from Dr. Biagioli's "Replication or Monopoly" (pdf here),
"His primary worry was not that some people might reject his claims, but rather that those able to replicate them could too easily proceed to make further discoveries on their own and deprive him of future credit (Galilei 1989, 17). Consequently, he tried to slow down potential replicators to prevent them from becoming competitors. He did so by not providing other practitioners access to high-power telescopes and by withholding detailed information about how to build them.
But as important as it was for Galileo to keep his fellow astronomers in the dark, such negative tactics alone would not have allowed him to gain credit from his discoveries and move from his post at the university of Padua to a position at the Medici court in Florence as mathematician and philosopher of the grand duke - goals clearly on his mind in 1610.He needed proactive tactics as well. First, he did his best to make sure the grand duke saw the satellites of Jupiter (which Galileo had named "Medicean Stars") by sending detailed instructions to Florence on how to conduct these observations, and then by going to court himself at Easter time (Galilei 1890- 1909, X:281, 304). Second, through the prompt publication of the Sidereus nuncius in March of 1610 he tried to establish priority and international visibility - resources he needed to impress his prospective patron, not just the republic of letters.
The Nuncius was carefully crafted to maximize the credit Galileo could expect from readers while minimizing the information given out to potential competitors."
Here you can see calculation as fine as any modern professor, trying to decide if they should release all their data, or only some of it; all of their source code, or only a crippled version.
Billie also observes that one potential irony in this story is that Galileo, by so strongly taking sole credit for his discoveries, made himself a clear target for the Catholic Church...
An even more pernicious approach, seeking priority while avoiding embarrassment by publishing hashes (well, anagrams ;) of formulae or observations, was common in the 17th century. In The Newton Handbook, by Derek Gjertsen, Gjertsen writes:
"It was not uncommon for seventeenth-century scientists to record their more valued results in the form of anagrams. Thus, Galileo published his discovery in 1610 of the phases of Venus in a thirty-five letter anagram, Huygens announced his 1656 observation that Saturn was surrounded by a ring in a sixty-three letter anagram, while, in England, Robert Hooke and Christopher Wren resorted to similar stratagems. The advantages of the ploy are obious. Priority was established, yet nothing was given away to potential rivals. If, by chance, the work failed to stand up to further analysis it could be quietly forgotten without the embarrassment public failures tended to incur."
One can only wonder how many one-shot awesome Science and Nature papers, using software that was and remains unavailable, are entirely unreplicable or otherwise uninteresting -- for example, I like to pick on one of Eran Segal's publications, because it's so neat and yet very very difficult to replicate without source code. (A colleague is trying.)
Compare this to the recent discussion of the (leaked) P != NP proof, now shown to be erroneous - see, e.g., Greg Baker's blog post, P != NP. Now this is the way science is supposed to work! Quick, thoughtful commentary by experts, highlighting potential problems with your work -- and allowing or enabling others to build off of it.
It's clear to see that by withholding the manufacturing instructions, Galileo may well have held back astronomy as a whole. And by publishing their equations in anagram form, it's likely that Newton and the others did damage to science as a whole.
Today, intellectual reputations like that are in some ways less important (at least in my bottom-feeding scientific world). Publications and citations are more important, since they're measurable by Promotion & Tenure committees. I (and probably many other scientists) are continually worrying about the line between publishing good stuff that enables citations, and giving away all of our future research directions. It takes a real act of faith to throw yourself off the cliff and offer up your latest & greatest source code and data to the world, in the hopes that somehow the resulting "usefulness" will provide lift to your career. We'll see how that goes: road kill? Or tenure?
Back to Galileo -- I think the Galileo example is why, as wonderful as the Panton Principles are for data, for truly open science it's critical to provide not only the raw data, but the source code used to do the analysis. And not only the source code, but useful source code: documented and tested source code [1]. To do anything else would be the equivalent of selling telescopes while withholding the manufacturing instructions that would let others build on your own ideas.
Interesting stuff to think about! Now, back to science...
--titus
[1] Yeah, I realize that most scientific source code probably isn't documented or tested. Draw your own conclusions there ;).
posted at: 14:07 | path: /aug-10 | 3 comments