Sean Eddy wrote an interesting blog post
on how scripting is something every biologist should learn to do.
This spurred a few discussions on Twitter and elsewhere, most of which
devolved into the usual arguments about what, precisely, biologists
should be taught.
I always find these discussions not merely predictable but rather besides
the point. Statisticians will always complain that people need a better
appreciation of stats; bioinformatics will point to alignment or sequence
comparison or whatnot; evolutionary biologists will talk about how important
evolution is; physicists will point to more math; etc. But the truth is
there's very, very little that is necessary to be a biologist.
My perspective on this is informed by my own background, which is a
tad idiosyncratic. Despite being an assistant professor in a
Microbiology department and (soon) a professor in a VetMed program, I
have taken more physics courses than biology courses; heck, I've
taught more biology courses than I've taken. I've never taken a
chemistry course and know virtually nothing about biochemistry, in
particular. Most of my evolutionary knowledge derives from my reading
for my research on Avida;
I've never been taught anything about ecology and still know very
little. I failed my first qual exam at Caltech because I didn't
actually know anything about cells (which event, ahem, spurred me to
learn). Despite this utter lack of formal background in biology, I
spent 5-10 years of my life doing experimental molecular biology for
developmental biology research (after 3-5 years learning basic
molecular biology), and I've learned what I think this a reasonable
amount about cell biology into the bargain. I know a fair bit about
developmental biology, evo-devo, molecular biology, and genomics. My
PhD is actually in Biology, with a Developmental Biology focus. But I
learned it all as I needed it. (My undergrad degree is in Unapplied
Mathematics.)
My weakest formal training is probably in stats, where I know enough
to point out where whatever system I'm studying is violating standard
statistical requirements, but not enough to point how to rescue our
approach.
Despite having "run" a bioinformatics lab for the last few years, my
formal bioinformatics background is basically nil - I took a strong
programming background, learned a bunch of biology and genomics, and
then realized that much of bioinformatics is fairly obvious at that
point. I don't really understand Hidden Markov Models or sequence
alignment (but shh, don't tell anyone!)
With all of this, what do I call myself? Well, I definitely consider
myself a biologist, as do at least a few different hiring panels,
including one at UC Davis :). And
when I talk to other biologists, I think that at least some of them
agree - I'm focused on answering biological questions. I do so
primarily in collaboration at this point, and primarily from the
standpoint of data,
but: biology.
So here's my conclusion: to be a biologist, one must be seriously
trying to study biology. Period. Clearly you must know something
about biology in order to be effective here, and critical thinking is
presumably pretty important there; I think "basic competency in
scientific practice" is probably the minimum bar, but even there you
can imagine lab techs or undergraduates putting in useful work at a
pretty introductory level here. I think there are many useful skills
to have, but I have a hard time concluding that any of them are
strictly necessary.
The more interesting question, to my mind, is what should we be
teaching undergraduates and graduate students in biology? And there I
unequivocally agree with the people who prioritize some reasonable
background in stats, and some reasonable background in data analysis
(with R or Python - something more than Excel). What's more important
than teaching any one thing in specific, though, is that the whole
concept that biologists can avoid math or computing in their training
(be it stats, modeling, simulation, programming, data science/data
analysis, or whatever) needs to die. That is over. Dead, done, over.
One particular challenge we are facing now is that we don't have many
people capable of teaching these younger biologists the appropriate
data analysis skills, because most biologists (including the
non-research-active faculty that do most teaching) don't know anything
about them, and data analysis in biology is about data analysis in
biology -- you can't just drop in a physicists or an engineer to
teach this stuff.
At the end of the day, though, a scientist either learns what they
need to know in order to do their research, or they collaborate with
others to do it. As data becomes ever more important in biology, I
expect more and more biologists will learn how to do their own
analysis. One of my interests is in figuring out how to help
biologists to make this transition if they want to.
So perhaps we can shift from talking about what you must know in order
to practice biology, and talk about what we're going to teach, to whom,
and when, to people who are the biologists of the future?
--titus
There are comments.