News from the NIH Data Commons Pilot Phase Consortium

You may recall that about a year and a half ago, I got involved in the NIH Data Commons.

Between then and now, we built a project execution plan, ran Phase 1 for six months, and then in October took a planned work moratorium for the purpose of doing future planning.

Then, in February, we received word that the the NIH Data Commons Pilot Phase Consortium (DCPPC) would not continue in its current form. Here's what we received:

The NIH Office of Data Science Strategy has been asked to lead the next phase of trans-NIH data ecosystem development as described in the NIH Strategic Plan for Data Science. The deliverables from the DCPPC will inform next steps, but we will not pursue a second phase of the DCPPC. New initiatives may emerge from the ODSS and/or from the ICs in response to the Strategic Plan, but they will communicate their plans as they are established.

My award finished at the end of March, and I thought it would be a good time to update y'all (especially since I've been receiving questions!)

What did the NIH Data Commons Pilot Phase Consortium achieve?

I think we achieved quite a lot in our fairly short stint! (And there's a fair amount of public material that was made available as part of it, although it's not well advertised.)

I'm going to focus on things my team helped with, because that's what I know best. There were lots of technical prototypes as well, but those were produced by other teams and are not mine to discuss. (See the list of deliverables and their reviews for more info. Happy to connect you to the authors if you're interested - drop me a line at

First off, here is the top link to the public site that we created for the end of the first Pilot Phase. There are links and documents in there that I continue to find useful, and expect to find useful for many years to come.

I'm particularly happy with how the Use Case Library effort was proceeding. I think we set a good path for collaboratively developing use cases for Phase 2, and even without a Phase 2 I will be making use of this approach and this material for other projects.

The Centillion search engine that my team built was pretty cool!! See the October writeup of it, here and also the public GitHub page, here.

The "On Commonsing" document we wrote up after a workshop on "Data Commonses" is something that I will be coming back to regularly!

People interested in pragmatic standards development might be interested in Why Multiple Stacks are Necessary.

I continue to think the FAIRshake portal is unreasonably cool... check out the projects.

Personally, I learned a lot about interoperability and creating and growing community from this experience, and I think the same is true of most of the other participants. Completely apart from the technical and infrastructure efforts, the coordination and community aspects of this Pilot Phase seem likely to have long-term positive impacts on how many of us deal with these kinds of projects in the future.

So what's next?

I'm not sure!

I think it's fair to say that the problems the NIH Data Commons effort was tackling are not going away (you can see more about these problems in my talk slides from my 2018 talk at the Dutch Techcentre for Life Sciences). And the NIH and broader biomedical research community will certainly be working on many things in this area. And I may not be involved but I'm sure to have opinions. So, stay tuned!


Comments !