tl;dr? We ran a workshop on binder. It was fun!
What is binder?
Imagine... that you are visiting the data repository for a preprint
you are reviewing, and with the click of a button you are brought to a
fully configured RStudio Server containing that data.
Imagine... you are running a workshop, and you want to introduce
everyone in the workshop to a machine-learning approach. You give
them all the same URL, and within seconds everyone in the room is
looking at their own live environment, copied from your blueprint but
individually modifiable and exportable.
Imagine... your lab has a collection of standard data analysis
protocols in Jupyter Notebooks on your GitHub site, and anyone in your
lab can, with a single click, bring them to life and run them on a
new data set.
Binder is a concept and technology that makes all of the above, and more,
tantalizingly close to everyday realization! The techie version is this:
upon Web request, binder grabs a GitHub repository, inspects it, and
builds a custom Docker image based on a variety of configuration
then, binder spins up a Docker container and redirects the Web browser to
at some point, binder detects lack of activity and shuts down the
All of this is (currently) done without authentication or payment of any
time, which makes it a truly zero configuration/single-click experience for
Just as important, the binder infrastructure is meant to be widely distributed,
reusable, and hackable open source tech that supports multiple deployments and
In 2016, I wrote a
proposal to fund a workshop on binder
to the Sloan Foundation and it was funded!! We finally ran the workshop
last week, with the following organizing committee:
Why a workshop?
Many people, including myself, see massive potential in binder, but it
is still young. The workshop was intended to explore possible
technical directions for binder's evolution, build community around
the binder ecosystem, and explore issues of sustainability.
One particular item that came up early on in the workshop was that
there are many possible integration points for binder into current
data and compute infrastructure providers. That's great! But, in the
long term, we also need to plan for the current set of endeavors
failing or evolving, so we should be building a community around the
core binder concepts and developing de facto standards and practice.
This will allow us to evolve with endeavors as well as finding new
So that's why we ran a workshop!
Who came to the workshop?
The workshop attendees were a collection of scientists, techies,
librarians, and data people. For this first workshop I did my best to
reach out to people from a variety of communities - researchers from a
variety of disciplines, librarians, trainers, data scientists,
programmers, HPC admins, and research infrastructure specialists. In
We didn't advertise very widely, partly just because of a last minute
time crunch, and also because too many people would have been a
problem for the space we had.
As we figure out more of a framework and sales pitch for binder, I expect the
set of possible attendees to expand. Still, for hackfest-like workshops, I'm
a big fan of small diverse groups of people in a friendly environment.
What is the current state of binder?
The original mybinder.org Web site was created and supported by
the Freeman Lab, but maintenance on the site suffered when
Jeremy Freeman moved to the Chan-Zuckerberg Initiative and became even
busier than before.
The Jupyter folk picked up the binder concept and reimplemented the
Web site with somewhat enhanced functionality, building the new
BinderHub software in Python around JupyterHub and splitting the
repository-to-docker code out into
repo2docker. This is now running
on a day-to-day basis on a beta site.
A rough breakdown, and links to documentation, follow:
JupyterHub - JupyterHub manages multiple instances of the single-user Jupyter notebook server. JupyterHub can be used to serve notebooks to a class of students, a corporate data science group, or a scientific research group.
Zero-to-JupyterHub - Zero to JupyterHub with Kubernetes is a tutorial to help install and manage JupyterHub.
BinderHub - BinderHub builds "binders" containing data+code from GitHub repos and then serves the binders in a custom computing environment. beta.mybinder.org is a public BinderHub.
repo2docker - repo2docker builds, runs, and pushes Docker images from source code repositories.
Highlights of the binder workshop!
What did we do? We ran things as an unconference, and had a lot of
discussions and brainstorming around use cases and the like, with some
super cool results. The notes from those are linked below!
A few highlights of the meeting deserve, well, highlighting --
Amazingly, we got to the point where binder ran an RStudio Server
instance, started from a Jupyter console!! Some tweets of this made
the rounds, but it may take a few more weeks for this to make it
into production. (This was based on Ryan Lovett's earlier work,
which was then hacked on by Carl Boettiger, Yuvi Panda and Aaron
Culich at the workshop. I have it on authority that Adelaide Rhodes
asking lots of questions by way of encouragement ;).
Everyone who attended the workshop got to the point where we had our
own BinderHub instance on Google!! (We used these
Yuvi Panda gave us a rundown on the data8 /
"Foundations of Data Science" course at UC Berkeley, which uses
JupyterHub to host several thousand users, with up to 700 concurrent
We came up with lots of use cases - see ~duplicate set of notes, here.
Other stuff we did at the workshop
(All the notes are on GitHub, here)
Here is a fairly comprehensive list of the other activities at the workshop --
Issues that we only barely touched on:
"I have a read only large dataset I want to provide access to for untrusted users, who can do whatever they want but in a safe way." What are good practices for this situation? How do we provide good access without downloading the whole thing?
It would be nice to initiate and control (?) Common Workflow Language workflows from binder - see nice Twitter conversation with Michael Crusoe.
How do we do continuous integration on notebooks??
We need some sort of introspection and badging framework for how reproducible a notebook is likely to be - what are best practices here? Is it "just" a matter of specifying software versions etc and bundling data, or ...??
Far reaching issues and questions --
it's likely that the future of binder involves many people running many different binderhub instances. What kind of clever things can we do with federation? Would it be possible for people to run a binder backend "close" to their data and then allow other binderhubs to connect to that, for example?
Many issues of publishing workflows, provenance, legality - notes
It would be super cool if realtime collaboration was supported by JupyterHub or BinderHub... it's coming, I hear. Soon, one hopes!
Topics we left almost completely untouched:
I'm hoping to find money or time to run at least two more hackfests or
conference -- perhaps we can run one in Europe, too.
It would be good to run something with a focus on developing training materials
(and/or exemplary notebooks) - see Use Cases, above.
I'm hoping to find support to do some demo integrations with scholarly
infrastructure, as in in the Imagine... section, above.
If (if) we ran a conference, I could see having some of the following
- A hackfest building notebooks
- A panel on deployment
- keynote on the roadmap for binder and JupyterHub
- Some sort of community fest
If you're interested in any of this, please indicate your interest in future workshops!!
Where to get started with binder
There are lots of example repositories, here:
you can click "Launch Binder" in any of the READMEs to see examples!
There is a gitter chat channel that is pretty active and good for support:
And, finally, there is a google groups forum,
Some other links worth mentioning:
aaaand some notes from singularity:
One way to convert docker images to singularity images,
docker run -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/image:/output \
--privileged -t --rm singularityware/docker2singularity ubuntu:14.04
Another way to simply run docker containers in singularity:
singularity exec docker://my/container <runcommand>
I have no particular conclusion other than we'll have to do this again!
There are comments.