The talk I didn't give at Caltech (Paper of the Future)

Note: This is the fourth post in a mini-series of blog posts inspired by the workshop Envisioning the Scientific Paper of the Future.

This is an outline of the talk I didn't give at Caltech, because I decided that Victoria Stodden and Yolanda Gil were going to cover most of it and I would rather talk about a random collection of things that they might not talk about. (I think I was 7 for 10 on that. ;)

This is in outline-y form, but I think it's fairly understandable. Ask questions in the comments if not!

What will the paper of the future look like?

A few assertions about the scientific paper of the future:

  • The paper of the future will be open - open access, open data, and open source.
  • The paper of the future will be highly repeatable.
  • The paper of the future will be linked.
  • The paper of the future will not depend on expensive infrastructure.
  • The paper of the future will be commonplace.
  • The paper of the future will be archivable (or will it? Read on.)

What's our experience with the paper of the future been?

My lab (and many, many others) have been doing things like:

  • Automating the entire analysis from raw data to conclusion.
  • Publishing data narratives and notebooks.
  • Using version control for paper and data notebook and source code.
  • Anointing data sets with DOIs.
  • Posting virtual environments & execution specifications for papers.

We've been doing parts of this for many years, and while we're not always that systematic about certain parts, I can say that everything works fairly smoothly. The biggest issues we have often seem to be about the small details, such as choice of workflow engine, whether we're using AWS or an HPC as our "reference location" to run stuff, etc.

From this experience, I see two problems:

The two big problems I see

  • Adoption!

    We need community use & experience & training; we also need funder and journal buy-in.

    The training aspect is what Software Carpentry and Data Carpentry focus on, and it's one of the reasons I'm involved with them.

  • Archivability!

    Our software stack is anything but robust, static, or archivable.

    This is a huge problem that I don't think is accorded enough attention.

This last issue, archivability, is both somewhat technical and important - so I decided to move that to a new blog post, "How I learned to stop worrying and love the coming archivability crisis in scientific software".

Concluding thoughts

In which I summarize the above :)


Comments !

(Please check out the comments policy before commenting.)