A few weeks back, a journalist contacted me about my old blog post comparing physics and biology, and amidst other conversation, I pointed them at my latest blog post on data and said that I thought a lot of (molecular) biologists were "culturally confused about data". The next question was, perhaps obviously, "what do you mean by that?" and I wrote in response:
... for molecular biologists, "data" is what they collect piece by piece with PCR, qPCR, clone sequencing, perturbation experiments, image observations. It is so individual and specialized to a problem that to share it prior to publication makes no sense; no one could understand it all unless it were fit into a narrative as part of a pub, and the only useful product of the data is the publication; access to the data is only useful for verifying that it wasn't manufactured.
Sequencing data was one of the first outputs (as opposed to things like reagents, antibodies and QPCR primers) that was useful beyond a particular narrative. I might sequence a gene because I want to knock it down and need to know its leader sequence for that, but then you might care about its exonic structure for evolutionary reasons, and Phil over there might be really interested in its protein domains, while Kim might be looking at an allele of that gene that is only in part of the population.
I'm probably overstating that distinction but it helps explain a LOT of what I've seen in terms of cultural differences between my grad/pd labs (straight up bio) and where I think bio is going.
I'm sure I'm wrong (certainly incomplete) about lots of this, but it does fit my own personal observations. Other perspectives welcome!
I decided to write this up as a blog post because I read Carly Strasser's excellent blog post introducing open science, which emphasizes data, and it made me think about my response above. I think it's interesting to think about how "data" can be interpreted by different fields, and I'd like to stress how important it is that we bridge the gap between these high-level views and day-to-day practice in each subdomain - the culture and language can vary so significantly between even neighboring fields!
Oh, and Carly Strasser is now one of the Moore Data Driven Discovery Initiative Program Officers - I'm really happy to see the Moore Foundation confronting these aspects of data head on by hiring someone with Carly's experience and expertise, and I look forward to interacting with her more on these issues!