Here at the Lab for Data-Intensive Biology (TM) we are constantly
trying to explore new ideas for advancing the practice of biological
data sciences. Below are some ideas that originated with or were
sharpened by conversations with Greg Wilson (Executive Director,
Software Carpentry) and Tracy Teal (Project Lead, Data
Carpentry) that I am interested in turning into reality, as part of my
training efforts in Data Intensive Biology at UC Davis (also see my blog
post).
Quarterly in-depth "unconferences" on developing advanced domain-specific data analyses
I spend an increasing amount of time working to teach people how to
analyze sequencing data, but in
practice we kinda suck at analyzing this data, especially when it's from
non-model systems. We need some workshops to advance those efforts,
too.
So, I am thinking of running quarterly (4x year) week-long
unconferences, each on a different topic. The idea is to get together
a small group of people (~15-25) actively and openly working on
various aspects of a specific type of data analysis, to hang out and
collaborate/coordinate their efforts for a week. The plan would be
to intersperse a few presentations with lots of hacking and
communication time, with the goal of making progress on topics of
mutual interest and nucleating collaborative online interactions.
The following topics are areas where I can easily imagine a week-long
technical workshop making some substantive progress:
- vet/ag genome (re)annotation and curation
- non-model transcriptome analysis
- geomicrobiology/function-focused metagenome analysis
- bioinformatics training (e.g. train the trainers)
- reproducibility in genomic analysis
- computational protocols and benchmarking
- advanced statistical approaches to data integration
Importantly, these workshops would be inexpensive and largely
unfunded - I would ask participants to fund themselves, rather than
seeking to write grants to support a bunch of people. If we can
locate them in an inexpensive place then the total cost would be in
the $1000-2000/person range, which most active research labs could
probably support. I would seek funding for scholarships to increase
diversity of participants, but beyond that my goal would be to make
these workshops so useful that active and funded researchers want to
come. (I mean, I wouldn't turn down money that dropped into my lap,
but I've had too many workshop proposals rejected as not quite what
the PMs wanted to get on that merry-go-round again, unless it's
critical to make something super important happen.)
One non-negotiable component for me would be that everything worked on
at these meetings would be under open licenses, and already being
developed openly. Although I suppose we could have a meeting where
people interested in opening their software could get together to do
so in a guided fashion... proponents of open science, take note!
Such workshops would not need to be hosted at UC Davis, or by me; I
just want to make 'em happen and am happy to co-organize or
coordinate, and think I could do ~4 a year myself. There are a lot of
people invested in progressing on these issues who already have some
money, and so one option for moving forward more generally would be to
find those people and co-opt them :).
A workshop consisting of half-day focused lessons
Last week, I ran a workshop on starting a new project with reproducibility in mind --
Reproducible Computational Analysis - How to start a new project
Description: Computational science projects, from data analysis
to modeling, can benefit dramatically from a little up-front
investment in automation; starting off with version control and
automated building of results will pay off in efficiency,
agility, and both transparency and reproducibility of the
results. However, most computational researchers have never been
exposed to a completely automated analysis pipeline. I will
demonstrate the process of initiating a new project, building a
few initial scripts, and automating the generation of results, as
well as building some graphs. While the topic will be from my own
research in bioinformatics, the overall approach should apply to
anyone doing data analysis or simulations.
Technology used will include git, IPython Notebook, and 'make'.
This is an interactive seminar intended for computational science
researchers with some experience in version control and scripting
(for example, if you've taken a Software Carpentry workshop, you
will be at a good starting point).
This is an idea that originated with Greg - he nucleated the idea, and
then I went ahead and tried it out. More on that workshop later, but...
why not do this a lot?
We are thinking about how to do a focused series of these ~3 hour
learning opportunities, either all demo or half-demo/half
participation, each on a different topic. For example, my lab
could do a section on k-mer analyses of large sequencing data sets, or
on GitHub Flow, or on
software testing, or on whatever; the important thing is that there
are tons of Software Carpentry instructors with deep roots in one discipline or
another, and it'd be a fun way to learn from each other while teaching
to a larger audience.
This is something we might try during the third week of our NGS
summer course; if
you're a badged SWC instructor and want to demo something related to
sequence analysis, drop me a note with a brief proposal.
Instructor gatherings for lesson development and testing
Tracy Teal, Jason Williams, Mike Smorul, Mary Shelley, Shari Ellis,
and Hilmar Lapp just ran a Data Carpentry hackathon
focused on lesson development and assessment. Riffing off of that,
what about getting instructors together to do lesson development and
testing on a regular basis, and then present it in front of a more
advanced crowd? This would be an opportunity for people to develop
and test lessons for Software Carpentry and Data Carpentry on a
tolerant audience, with other instructors around to offer help and
advice, and without the challenges of a completely novice audience for
the first time through.
This is also something we might try during the third week of our
NGS summer course; if
you're a badged SWC instructor and want to do something on sequence
analysis, please drop me a note and tell me what!
Any other thoughts on things that have worked, or might work, for advancing
training and practice in a hands-on manner?
thanks,
--titus