It's not often that someone perfectly and thoroughly summarizes the challenges inherent in data science being confronted by academic institutions, but that's just what Fernando Perez did in this blog post. Just... just go read it, trust me :)
The new data driven discovery centers being funded by Sloan & Moore are just the latest in a series of announcements that bear on the general topic of open science/data/source in academia, a subject of intense personal and professional interest to me. The other initiative that I'm keeping a weather eye on is the Mozilla Science Lab, a Sloan-funded effort for Mozilla to act as an honest broker in the area of open science. But there are tons of efforts, including Software Carpentry, rOpenSci, figshare, etc. (These are the ones I interact with. Additions welcome in comments below!)
In the face of all this activity, what can individual scientists do? We're not all at Berkeley, NYU, or UW Seattle (the locations of the data science centers), so we can't participate in these initiatives; and, in fact, centralizing all data science activity at these institutions would defeat the grander purpose of having these centers as seeds for the next generation. Not all of us have the ear of important people at various funding agencies or at our local institutions; and given the time demands of our daily jobs, it's not all that clear that we should spend a lot of time proselytizing anyway. So what should the little people like us be doing?
My personal belief is that we now have enough organizers, thought leaders, and policy experts working in this area. Awareness is high. What is missing is the work to connect the broad, general policy statements (open is good! reproducibility is important! good software is necessary!) to field- and sub-field-specific practice. How do we incentivize and enable molecular biologists, bioinformaticians, and ecologists, not to mention physicists, astronomers, chemists, psychologists, etc. to integrate "good enough" practices into their research? This needs to be done through a combination of top-down and bottom-up approaches, and I think we're lacking on the bottom-up side.
Here are a few suggestions for bottom-up work.
1. Do, don't just talk.
Do good, reproducible research; progressively adopt better practice, by starting to use version control, testing, etc., as we can; value reproducibility and good practice in your own and other publications; ask appropriate questions in reviews.
2. Talk, don't just do.
Share our motives. Before embarking on a new project or framework or tool for open science or data availability, take some time to articulate the reasons why you need to do something new. Your project/framework/tool may succeed or fail, but a clear discussion of your vision for it will last independently. Moreover, it may inspire others, connect you to a larger community, and lead to creative new suggestions that can help you accomplish your goals. (You can always ignore what other people think, of course. But there's enduring value in explaining why you're doing it. ;)
...blogs are good for this, as are mailing lists. I'd be happy to host guest blog posts for this purpose.
3. Learn, educate, and train.
Invest in self-improvement; teach and train others; and embark on training. Software Carpentry exists to help us do this, and welcomes your participation; moreover, they will help train you to train others (the next round of instructor training starts in mid-March 2014).
Specifically, invite Software Carpentry to come run a workshop; connect to your local community; run informal workshops to find out what people are interested in and concerned about; and provide training and educational opportunities to help "level-up" your local community.