Mon 08 September 2014
C. Titus Brown
nih adds bd2k
the NIH ADDS meeting, we had
several breakout sessions. Michelle Dunn and I led the training
session. For this breakout, we had the following agenda:
First, build "sticky-note clouds", with one sticky-note cloud with notes
for each of the following topics:
Desired outcomes of biomedical data science training;
Other relevant training initiatives and connections;
Topics that are part of data science;
Axes of issues (e.g. online training, vs in person?)
Second, coalesce and summarize each topic;
Third, refine concepts and issues;
and fourth, come back with:
three recommended actionable items that NIH can do _now_
three things that NIH should leave to others (but be aware of)
This was not the greatest agenda, because we ended up spending only a very
little time at the end talk about specific actionable items, but otherwise
it was a great brainstorming session!
You can read the output document
here; below, I expand and
elaborate on what _I_ think everything meant.
In no particular order,
Increased appreciation by biomedical scientists of the role that data science could play;
More biomedical scientists biologists using data science tools;
More data scientists developing tools for biology;
Increased cross-training of data scientists in biology;
Map(s) and guidebooks to the data science landscape;
Increased communication between biomedical scientists and data scientists;
Making sure that all biomed grad students have a basic understanding of data science;
Increased data science skills for university staff who provide services
Data science topics list
Data science topics list:
Stat & experimental design, modeling, math, prob, stochastic
processing, NLP/text mining, Machine learning, inference, viz for
exploration, regression, classification, Graphical models, data
CS - principles of metadata, metadata format algorithms, databases,
optimization, complexity, programming, scripting, data management,
data wrangling, distributed processing, cloud computing, software
development, scripting, Workflows, tool usage
Presentation of results, computational thinking, statistical thinking
We arranged these topics on two diagrams, one set on the "data
analysis workflow" chart and one set on the "practical to abstract skill"
Patches welcome -- I did these in OmniGraffle, and you can grab the
here and here.
Axes/dichotomies of training:
The underlying question here was, how multidimensional is the biomedical
training subspace? (Answer: D is around 10.)
Clinical to biomedical to basic research - what topics do you target with your training?
Just-in-time vs. background training - do you train for urgent research needs, or for broad background skills?
Formal vs. informal - is the training formal and classroom oriented, or more informal and unstructured?
Beginner vs. advanced - do you target beginner level or advanced level students?
In-person vs. online - do you develop online courses or in-person courses?
Short vs. long - short (two-day, two week) or long (semester, longer) or longer (graduate career) training?
Centralized physically vs. federated/distributed physically - training centers, to which everyone travels? Or do you fly the trainers to the trainees? Or do you train the trainers and then have them train locally?
Andragogy v. pedagogy (Full professor through undergrads) - do your target more senior or more junior people?
Project-based vs. structured - do you assign open projects, or fairly stereotyped ones, or just go with pure didactic teaching?
Individual learning vs team learning - self-explanatory.
I should say that I, at least, have no strong evidence that any one
point on any of these spectra will be a good fit for even a large
minority of potential trainees and goals; we're just trying to lay out
the options here.
Existing training initiatives and data science connections:
(The first two were from the group; the last three were from the meeting
Solicit “crazier” R25s in existing BD2K framework; maybe open up
another submission round before April.
Provide a list of topics and axes from our discussion; fund things
like good software development practice, group-project camp for
biomedical data driven discovery, development of good online data sets
and explicit examples, hackathons, data management and metadata
training, software development training, and communication training.
Solicit proposals for training coordination cores (2-3) to gather,
catalog, and annotate existing resources, cooperate nationally and
internationally, and collaborate on offering training. Also, develop
evaluation and assessment criteria that can be applied across
CTB: fund early stage more broadly; don’t just limit training to
graduate students in biomedical fields. For example, the biology
grad student of today is the biomedical post doc of tomorrow.
Mercè Crosas: Fund internships, hands-on collaborative
contributions in multi-disciplinary Data Science Labs
Leverage T-32 existing by adding req to include data
science. (Addendum: there are already supplements available
to do this)
Right now we're evaluating education grants individually; it would
be good to also have a wide and deep evaluation done
A few other comments --
Training should cover senior and junior people
About 3% of the NIH budget is spent on training
Also, Daniel Mietchen (via online document comment) said: Consider
using open formats. A good example is the Knight Challenge, currently
asking for proposals around the future of libraries:
and that's all, folks!