Via http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising, on the Nucleic Acids Res "database" issue:
As we pass the one thousand databases mark (1kDB) I wonder, what proportion of the data in these databases will never be used?
This is an unsettling thought for those (wonderful) people who spend so much time gathering and distributing knowledge, and it's an interesting problem, too. But I'm not here to be thoughtful -- I'm here to be snarky!
One comment, from Erich Schwarz (who works on Wormbase, one of the bigger and more successful genome databases):
Erich's Two Rules of Database Relevance:
- If keeping a biological database working and up-to-date isn't causing you routine feelings of being overwhelmed, chances are, your database is irrelevant.
- While the converse is probably not strictly true -- feeling routinely overwhelmed doesn't guarantee your database is relevant -- it's certainly a Bayesian indicator.
And my question:
I wonder how many of them make their data available for bulk download?
(Andy Cameron, who runs the Sea Urchin Genome Project, reminds me: "data and code"?)