A few weeks back, a journalist contacted me about my old
comparing physics and biology,
and amidst other conversation, I pointed them at my latest blog post
and said that I thought a lot of (molecular) biologists were
"culturally confused about data". The next question was, perhaps
obviously, "what do you mean by that?" and I wrote in response:
... for molecular biologists, "data" is what they collect piece
by piece with PCR, qPCR, clone sequencing, perturbation experiments,
image observations. It is so individual and specialized to a problem that to
share it prior to publication makes no sense; no one could understand it all
unless it were fit into a narrative as part of a pub, and the only useful
product of the data
is the publication; access to the data is only useful
for verifying that it wasn't manufactured.
Sequencing data was one of the first
outputs (as opposed to things like
reagents, antibodies and QPCR primers) that was useful beyond a particular
narrative. I might sequence a gene because I want to knock it down and
need to know its leader sequence for that, but then you might care about
its exonic structure for evolutionary reasons, and Phil over there might
be really interested in its protein domains, while Kim might be looking at
an allele of that gene that is only in part of the population.
I'm probably overstating that distinction but it helps explain a LOT of
what I've seen in terms of cultural differences between my grad/pd labs
(straight up bio) and where I think bio is going.
I'm sure I'm wrong (certainly incomplete) about lots of this, but it does
fit my own personal observations. Other perspectives welcome!
I decided to write this up as a blog post because I read
Strasser's excellent blog post introducing open science,
which emphasizes data, and it made me think about my response above.
I think it's interesting to think about how "data" can be interpreted
by different fields, and I'd like to stress how important it is that
we bridge the gap between these high-level views and day-to-day
practice in each subdomain - the culture and language can vary so
significantly between even neighboring fields!
Oh, and Carly Strasser is now one of the
Moore Data Driven Discovery
Initiative Program Officers - I'm really
happy to see the Moore Foundation confronting these aspects of data head
on by hiring someone with Carly's experience and expertise, and I look
forward to interacting with her more on these issues!
Proudly powered by
pelican, which uses python.
The theme is subtlely modified from one by
Smashing Magazine, thanks!
For more about this blog's author, see
the main site or the lab site
While the author is employed by the University of California, Davis, his opinions are his own and almost certainly bear no resemblance to what UC Davis's official opinion would be, had they any.