As I mentioned, I am hoping to significantly scale up my training efforts at UC Davis; it's one of the reasons they hired me, it's a big need in biology, and I'm enthusiastic about the whole thing! A key point is that, at least at the beginning, it may replace some or all of my for-credit teaching. (Note that the first four years of Analyzing Next-Generation Sequencing Data counted as outreach, not teaching, at MSU.)
I don't expect to fully spool up before fall 2015, but I wanted to start outlining my thoughts.
The ideas below came in large part from conversations with Tracy Teal, a Software Carpentry instructor who is one of the people driving Data Carpentry, and who also was one of the EDAMAME course instructors.
How much training, how often, and to whom?
I think my initial training efforts will center on Software Carpentry-style workshops, on a variety of (largely bio-specific) topics. These would be two-day in-person workshops, 9-5am, each focused on a specific topic.
I think I can sustainably lead one a month, with perhaps a few months where I organize two in the same week (M/Tu and Th/Fri, perhaps).
These would be on top of at least one NGS course a year, too. I also expect I will participate in various Genome Center training workshops.
The classes would be targeted at grad students, postdocs, and faculty -- same as the current NGS course. I would give attendees from VetMed some priority, followed by attendees with UC Davis affiliations, and then open to anyone. I imagine doing this in a tiered way, so that some outsiders could always come; variety and a mixed audience are good things!
On what topics?
I have a laundry list of ideas, but I'm not sure what to start with or how to make decisions about what to teach when. ...suggestions welcome. (I also can't teach all of these myself, but I want to get the list of ideas down!)
I'd like to preface this list with a few comments: I've been teaching and training in these topics for five years (at least) now, so I'm not naive about how hard (or easy) it is to teach this to computationally inexperienced biologists. It's clear that there's a progression of skills that need to be taught for most of these, as well as a need for careful lesson planning, tutorial design, and pre/post assessment. These workshops would also be but one arrow in the quiver -- I have many other efforts that contribute to my lab's teaching and training.
With that having been said, here's a list of general things I'd like to teach:
- Shell and UNIX (long running commands, remote commands, file and path management)
- Scripting and automation (writing scripts, make, etc.)
- Bioinformatics and algorithms
- "Big data" statistics
- Data integration for sequencing data
- Software engineering (testing, version control, code review, etc.) on the open source model
- Practical bioinformatics (See topics below)
- Modeling and simulations
- Workflows and replication tracking
- Software Carpentry
- Data Carpentry
I have many specific topics that I think people know they want to learn:
- Mapping and variant calling
- Genome assembly and evaluation (microbial & large genomes both)
- Transcriptome assembly and evaluation (reference free & reference based)
- Genome annotation
- Differential expression analysis
- Metagenome assembly
- Microbial ecology and 16s approaches
- Functional inference (pathway annotations)
- Marker development
- Genotyping by sequencing
- Population genomics
And finally, here are two shorter workshop ideas that I find particularly neat: experimental design (from sample prep through validation), and sequencing case studies (success and failure stories). In the former, I would get together a panel of two or three people to talk through the issues involved in doing a particular experiment, with the goal of helping them write a convincing grant For the latter, I would find both success and failure stories and then talk about what other approaches could have rescued the failures, as well as what made the successful stories successful.
To what end? Community building and collaborations.
Once I started focusing in on NGS data at MSU as an assistant professor, I quickly realized that I could spend all my time in collaborations. I learned to say "no" fairly fast :). But all those people still need to do data analysis. What to do? I had no clear answer at MSU, but this was one reason I focused on training.
At Davis, I hope to limit my formal collaborations to research topics, and concentrate on training everybody to deal with their own data; in addition to being the only scalable approach, this is career-building for them. This means not only investing in training, but trying to build a community around the training topics. So I'd like to do regular (weekly? fortnightly?) "help desk" afternoons for the campus, where people can come talk about their issue du jour. Crucially, I would limit this to people that have gone through some amount of training - hopefully both incentivizing people to do the training, and making sure that some minimal level of effort has been applied. The goal would be to move towards a self-sustaining community of people working in bioinformatic data analysis across multiple levels.
Cost and materials.
Since UCD VetMed is generously supporting my salary, I am naively expecting to charge nothing more than a nominal fee -- something that would discourage people from frivolously signing up or canceling. Perhaps lunch money? (This might have to be modified for people from outside of VetMed, or off-campus attendees.)
All materials would continue to be CC0 and openly available, of course. 'cause life's too short to limit the utility of materials.
I'd love to put together a slush fund so that I can invite out speakers to run workshops on topics that I don't know that well (most of 'em).
How about a workshop focused on teaching people how to teach with the materials we put together? (I would expect most of these workshops to be cloud-based.)
p.s. In addition to Tracy, thanks to Keith Bradnam, Aaron Darling, Matt MacManes and Ethan White, for their comments and critiques on a draft.