Recently I’ve been reading a lot about the general area of digital commons - which includes open online resources, open source projects, and (presumably) whatever a data commons is. Most broadly defined, these are resources and projects that are open to contribution, and produce digital public goods. I’m particularly interested in the question of sustainability, which led me to Ostrom’s design principles for common pool resources, and the idea that “engaged effort” or “attention” is the common pool resource to be managed by a community for sustainability.
I’ve come across several different ideas about what the common pool resource is in digital commonses, and I wanted to explore them a bit. So, here goes.
Data cannot be the basis of a data commons: a response to the New Zealanders :)
The Data Commons Blueprint is a brilliant writeup of what a Data Commons might be, and I share most of their ideas about community governance and the importance of building communities of practice around data sharing and data analysis. But I have a pretty basic disagreement with one pillar of their argument. The authors define a data commons as being significantly founded on the availability of data:
Unless all parties feel good about sharing their data, they will be unlikely to do so. … A model where data is fenced off as private property reinforces silos of competing interests rather than data integration of sharing.” (p11)
How - and why - access to and reuse of data is controlled goes a long way to explaining why creation of a commons faces real challenges. (p20)
This view of data is enshrined in their first Principle of a Data Commons: “Data is a common-pool resource.” (p31).
But there is a fundamental disconnect here with common-pool resource (CPR) theory. A CPR must be both non-excludable, and rivalrous (see this matrix). Non-excludability is “easy”, in the context of a Data Commons: it just means that data access is hard to restrict in practice, which is clearly true. But rivalrous is is harder to achieve with data, because, fundamentally, data doesn’t get “consumed” by use - and, in fact, opening up data access is an act with significant positive externalities, in that data use and reuse generates many indirect benefits for people other than the originator of the data. That’s actually kind of the point of a data commons!
An additional challenge for me, if not for the general idea of a data commons, is that the NIH Data Commons cannot open up data access to all because we are including human subjects data, e.g. the TOPmed and GTEx studies and eventually many more. Access to much this data will never be completely open due to IRB requirements. This legal restriction would suggest that the NIH Data Commons cannot ever be a Data Commons, which seems problematic to me, at least, as someone engaged in trying to build it.
Interestingly, this intersects with another disagreement I have - this time, with Albert Wenger in World After Capital, a book that Nadia Eghbal pointed me towards. In this (fascinating and very readable) book, Wenger argues that eventually all information will be open, and has a whole section on “Getting over Privacy and Confidentiality.” While he makes many interesting arguments, I think he’s arguing past a fundamental mismatch with human psychology and how humans view risk, and that this will involve a pretty radical shift in how human brains work. More to the point, data discoverability and interoperability seems at least as important as data access - it doesn’t matter if you can access the data if you can’t find it or work with it, and if you can find the data and believe you can work with it, you will be more motivated to seek access.
I think we can work our way past both of these disagreements if we state that data openness itself is not the key to a data commons, but rather a “good” that is managed and curated by a data commons for a purpose, e.g. knowledge production from that data. If the data is not open, it can be viewed as a club good; if it’s open, it’s a public good. But either way it requires management and curation (and presumably various kinds of infrastructure to help with knowledge production).
This brings us back around to the concept of engaged effort or attention as the resource to be managed by a Data Commons, in support of sustainability.
Source code cannot be the basis of an open source commons
I just finished my first pass reading of an argument by Schweik in Ostrom and Hess, 2007 that defines source code as the common pool resource being managed by an open source project:
In FOSS commons, groups of people act collectively to produce a public good (the software), rather than overappropriate the resource. (p279)
I also disagree fundamentally with this. There are some interesting arguments about copyleft and the GPL and management of the CPR, but at least on first reading, they fall apart when you realize that (a) a lot of FOSS uses non-copyleft licenses, and (b) a public good cannot be a common-pool resource anyway, because source code itself is not rivalrous.
This intersects with a point that Nadia Eghbal and others pointed out to me - that successful open source projects are at least partly about managing the maintenance effort involved in software production.
Knowledge (and publications) are probably not the basis of a scholarly commons.
Interestingly, this all connects to another discussion, this time about open access, and scientific publication more generally. Recently, William Gunn made the argument that publishers contribute quite a bit to science by “assemb[ing] thousands of people to devote their lives to producing and distributing a corpus of high quality [ … ] knowledge. … this is a valuable thing & it’s worth the money spent.” Viewed through the above lens of a commons and its sustainability, what Gunn is saying is that the effort Elsevier and others are putting into sustaining a scholarly commons is valuable.
I actually agree with that! I have at least two points of major disagreement, though - first, it is clear that Elsevier in particular is making a particularly handsome profit off of the scholarly commons, and that seems like an unwise use of our limited funds for science. I certainly don’t think this should be a for-profit activity, but I do see value in it and agree that someone needs to be paid to do it, somehow; and there are many models for that.
My second (more fundamental) disagreement with Gunn is that Elsevier and others’ business models depend on restricting access to information, and this impedes research. More specifically, all closed-access publishers' business models rely on “successfully monetizing inconvenience” (this wonderful phrasing is from Justin Peters at Slate!) Rephrasing this in the terms above, the academic-publishing complex is producing club goods, and publishers are not only profiting from producing these goods (see previous paragraph) but for accessing these goods. This is bullshit, period - the marginal cost of distributing digital content on this scale is effectively zero, and closed access publishing is an absolutely absurd case of rent seeking. We know this because there is a perfect counterexample: Sci-Hub distributes scientific publications to all and sundry for free.
Circling back around once again: if a scholarly commons is partly about managing effort in pursuit of sustainability, then universities and faculty positions can be viewed as one way to pay for the effort involved, while publication fees can be viewed as another. This is quite distinct from the question of whether the knowledge produced (publications) should be club goods or public goods; whatever they should be, it’s impossible to to argue that publications are a rivalrous commodity, and hence they cannot be considered as a common pool resource.
Commons should be about managing common-pool resources.
Common-pool resources are defined as resources that are non-excludable and rivalrous.
Data is non-rivalrous. Source code is non-rivalrous. Publications are non-rivalrous. Bits, in general, are non-rivalrous. So bits can’t be the common pool resource being managed by a commons.
This is fine and leads in some interesting directions!
p.s. Thanks, as always, to Nadia Eghbal for her insight! And thanks to Josh Greenberg, once again, for pointing me at club goods!