Here at the Lab for Data-Intensive Biology (TM) we are constantly trying to explore new ideas for advancing the practice of biological data sciences. Below are some ideas that originated with or were sharpened by conversations with Greg Wilson (Executive Director, Software Carpentry) and Tracy Teal (Project Lead, Data Carpentry) that I am interested in turning into reality, as part of my training efforts in Data Intensive Biology at UC Davis (also see my blog post).
Quarterly in-depth "unconferences" on developing advanced domain-specific data analyses
I spend an increasing amount of time working to teach people how to analyze sequencing data, but in practice we kinda suck at analyzing this data, especially when it's from non-model systems. We need some workshops to advance those efforts, too.
So, I am thinking of running quarterly (4x year) week-long unconferences, each on a different topic. The idea is to get together a small group of people (~15-25) actively and openly working on various aspects of a specific type of data analysis, to hang out and collaborate/coordinate their efforts for a week. The plan would be to intersperse a few presentations with lots of hacking and communication time, with the goal of making progress on topics of mutual interest and nucleating collaborative online interactions.
The following topics are areas where I can easily imagine a week-long technical workshop making some substantive progress:
- vet/ag genome (re)annotation and curation
- non-model transcriptome analysis
- geomicrobiology/function-focused metagenome analysis
- bioinformatics training (e.g. train the trainers)
- reproducibility in genomic analysis
- computational protocols and benchmarking
- advanced statistical approaches to data integration
Importantly, these workshops would be inexpensive and largely unfunded - I would ask participants to fund themselves, rather than seeking to write grants to support a bunch of people. If we can locate them in an inexpensive place then the total cost would be in the $1000-2000/person range, which most active research labs could probably support. I would seek funding for scholarships to increase diversity of participants, but beyond that my goal would be to make these workshops so useful that active and funded researchers want to come. (I mean, I wouldn't turn down money that dropped into my lap, but I've had too many workshop proposals rejected as not quite what the PMs wanted to get on that merry-go-round again, unless it's critical to make something super important happen.)
One non-negotiable component for me would be that everything worked on at these meetings would be under open licenses, and already being developed openly. Although I suppose we could have a meeting where people interested in opening their software could get together to do so in a guided fashion... proponents of open science, take note!
Such workshops would not need to be hosted at UC Davis, or by me; I just want to make 'em happen and am happy to co-organize or coordinate, and think I could do ~4 a year myself. There are a lot of people invested in progressing on these issues who already have some money, and so one option for moving forward more generally would be to find those people and co-opt them :).
A workshop consisting of half-day focused lessons
Last week, I ran a workshop on starting a new project with reproducibility in mind --
Reproducible Computational Analysis - How to start a new project
Description: Computational science projects, from data analysis to modeling, can benefit dramatically from a little up-front investment in automation; starting off with version control and automated building of results will pay off in efficiency, agility, and both transparency and reproducibility of the results. However, most computational researchers have never been exposed to a completely automated analysis pipeline. I will demonstrate the process of initiating a new project, building a few initial scripts, and automating the generation of results, as well as building some graphs. While the topic will be from my own research in bioinformatics, the overall approach should apply to anyone doing data analysis or simulations.
Technology used will include git, IPython Notebook, and 'make'.
This is an interactive seminar intended for computational science researchers with some experience in version control and scripting (for example, if you've taken a Software Carpentry workshop, you will be at a good starting point).
This is an idea that originated with Greg - he nucleated the idea, and then I went ahead and tried it out. More on that workshop later, but... why not do this a lot?
We are thinking about how to do a focused series of these ~3 hour learning opportunities, either all demo or half-demo/half participation, each on a different topic. For example, my lab could do a section on k-mer analyses of large sequencing data sets, or on GitHub Flow, or on software testing, or on whatever; the important thing is that there are tons of Software Carpentry instructors with deep roots in one discipline or another, and it'd be a fun way to learn from each other while teaching to a larger audience.
This is something we might try during the third week of our NGS summer course; if you're a badged SWC instructor and want to demo something related to sequence analysis, drop me a note with a brief proposal.
Instructor gatherings for lesson development and testing
Tracy Teal, Jason Williams, Mike Smorul, Mary Shelley, Shari Ellis, and Hilmar Lapp just ran a Data Carpentry hackathon focused on lesson development and assessment. Riffing off of that, what about getting instructors together to do lesson development and testing on a regular basis, and then present it in front of a more advanced crowd? This would be an opportunity for people to develop and test lessons for Software Carpentry and Data Carpentry on a tolerant audience, with other instructors around to offer help and advice, and without the challenges of a completely novice audience for the first time through.
This is also something we might try during the third week of our NGS summer course; if you're a badged SWC instructor and want to do something on sequence analysis, please drop me a note and tell me what!
Any other thoughts on things that have worked, or might work, for advancing training and practice in a hands-on manner?