Doing biology, however it needs to be done

Sean Eddy wrote an interesting blog post on how scripting is something every biologist should learn to do. This spurred a few discussions on Twitter and elsewhere, most of which devolved into the usual arguments about what, precisely, biologists should be taught.

I always find these discussions not merely predictable but rather besides the point. Statisticians will always complain that people need a better appreciation of stats; bioinformatics will point to alignment or sequence comparison or whatnot; evolutionary biologists will talk about how important evolution is; physicists will point to more math; etc. But the truth is there's very, very little that is necessary to be a biologist.

My perspective on this is informed by my own background, which is a tad idiosyncratic. Despite being an assistant professor in a Microbiology department and (soon) a professor in a VetMed program, I have taken more physics courses than biology courses; heck, I've taught more biology courses than I've taken. I've never taken a chemistry course and know virtually nothing about biochemistry, in particular. Most of my evolutionary knowledge derives from my reading for my research on Avida; I've never been taught anything about ecology and still know very little. I failed my first qual exam at Caltech because I didn't actually know anything about cells (which event, ahem, spurred me to learn). Despite this utter lack of formal background in biology, I spent 5-10 years of my life doing experimental molecular biology for developmental biology research (after 3-5 years learning basic molecular biology), and I've learned what I think this a reasonable amount about cell biology into the bargain. I know a fair bit about developmental biology, evo-devo, molecular biology, and genomics. My PhD is actually in Biology, with a Developmental Biology focus. But I learned it all as I needed it. (My undergrad degree is in Unapplied Mathematics.)

My weakest formal training is probably in stats, where I know enough to point out where whatever system I'm studying is violating standard statistical requirements, but not enough to point how to rescue our approach.

Despite having "run" a bioinformatics lab for the last few years, my formal bioinformatics background is basically nil - I took a strong programming background, learned a bunch of biology and genomics, and then realized that much of bioinformatics is fairly obvious at that point. I don't really understand Hidden Markov Models or sequence alignment (but shh, don't tell anyone!)

With all of this, what do I call myself? Well, I definitely consider myself a biologist, as do at least a few different hiring panels, including one at UC Davis :). And when I talk to other biologists, I think that at least some of them agree - I'm focused on answering biological questions. I do so primarily in collaboration at this point, and primarily from the standpoint of data, but: biology.

So here's my conclusion: to be a biologist, one must be seriously trying to study biology. Period. Clearly you must know something about biology in order to be effective here, and critical thinking is presumably pretty important there; I think "basic competency in scientific practice" is probably the minimum bar, but even there you can imagine lab techs or undergraduates putting in useful work at a pretty introductory level here. I think there are many useful skills to have, but I have a hard time concluding that any of them are strictly necessary.

The more interesting question, to my mind, is what should we be teaching undergraduates and graduate students in biology? And there I unequivocally agree with the people who prioritize some reasonable background in stats, and some reasonable background in data analysis (with R or Python - something more than Excel). What's more important than teaching any one thing in specific, though, is that the whole concept that biologists can avoid math or computing in their training (be it stats, modeling, simulation, programming, data science/data analysis, or whatever) needs to die. That is over. Dead, done, over.

One particular challenge we are facing now is that we don't have many people capable of teaching these younger biologists the appropriate data analysis skills, because most biologists (including the non-research-active faculty that do most teaching) don't know anything about them, and data analysis in biology is about data analysis in biology -- you can't just drop in a physicists or an engineer to teach this stuff.

At the end of the day, though, a scientist either learns what they need to know in order to do their research, or they collaborate with others to do it. As data becomes ever more important in biology, I expect more and more biologists will learn how to do their own analysis. One of my interests is in figuring out how to help biologists to make this transition if they want to.

So perhaps we can shift from talking about what you must know in order to practice biology, and talk about what we're going to teach, to whom, and when, to people who are the biologists of the future?


Comments !