The NIH #ADDSup meeting on the next five years of data science at the NIH

Here is a links roundup and some scattered thoughts on the recent meeting on "the next five years of data science at the NIH"; this meeting was hosted by Phil Bourne, the new Associate Director for Data Science at the NIH.

Before I go any further, let me make it clear that I do not speak in any way, shape or form for the NIH, or for Phil Bourne, or anyone else other than myself!

Introduction and commentary:

Phil Bourne took on the role of Associate Director for Data Science at the NIH in March 2014, with the mission of "changing the culture of the NIH" with respect to data science. The #ADDSup meeting was convened on September 3rd with about 50 people attending, after the previous night's dinner and sauna party at Phil's house. (One highlight of dinner was the NIH director, Francis Collins, leading a data science singalong/kumbaya session. I kid you not.) The meeting was incredibly diverse, with a range of academic faculty attending along with representatives from funders, tech companies, biotech, and publishers/data infrastructure folk.

It's very hard to summarize the information-dense discussion in any meaningful way, but notes were taken throughout the meeting if you're interested. I should also note that (like a previous meeting on Software Discovery, #bd2kSDW) tweeting was encouraged with the hashtag #ADDSup -- here's the storify.

I ended up co-leading the training breakout session with Michelle Dunn (NIH), and I am writing a blog post on that separately.

Background links:

  1. Data and Informatics Working Group report
  2. Phil Bourne's job application statement.
  3. Phil Bourne's "universities as Big Data".
  4. Phil Bourne's 10 week report.
  5. The final report from the May Software Discovery Workshop (storify here, with video links.)
  6. Uduak Thomas' excellent BioInform article (open PDF here) summarizing Phil Bourne's keynote on the NIH Commons, from the 2014 Bioinformatics Open Source Conference. Also see video and slides;

Meeting links:

  1. Cover letter for meeting
  2. Agenda
  3. Vision statement
  4. ADDS guiding principles
  5. Draft training guidelines
  6. NIH Commons
  7. Vision statement

During-meeting coverage:

  1. Running notes
  2. A storify of the Twitter conversation for #ADDSup

Some fragmented thoughts.

Again, all opinions my own :).

It's on us. The NIH can fund things, and mandate things, but cannot _do_ all that much. If you want biomedical data science to advance, figure out what needs to be done and talk to Phil to propose it.

Two overwhelming impressions: the NIH moves very slowly. And the NIH has an awful lot of money.

Nobody in the academic community is interested in computational infrastructure building, unless there's a lot of money involved, in which case we will do it badly (closed source, monolithic architecture, closed data, etc.) Contractors would love to do it for us, but the odds are poor. There may be exceptions but it was hard to think of any extramural infrastructure project that had, long term, met community expectations and been sustainable (counter-examples in comments, please!)

Very few people in the biomedical community are particularly interested in training, either, although they will feign interest if it supports their graduate students in doing research (see: T-32s).

Because of this, the NIH (and the ADDS more specifically) is left carrying water with a sieve. Data science depends critically on software, data sharing, computational infrastructure, and training.

Open was missing. (Geoffrey Bilder pointed this out to me midway through the morning.) That having been said, most of the meeting attendees clearly "got it", but oops?

Use cases! The NIH ADDS is looking for use cases! What do you want to enable, and what would it look like?

The point was made that the commercial data science sector is way more active and advanced than the academic data science sector. There are lots of links, of course, but are we taking advantage of them? I would also counter that this is IMO not true in the case of biomedical data science, where I am unimpressed with what I have seen commercially so far. But maybe I'm just picky.


Comments !

(Please check out the comments policy before commenting.)