Sourmash and branchwater licensing: thoughts on extractive engagement with projects

I am helping maintain some petabase-scale genomic search infrastructure as part of the sourmash and branchwater projects. One of the questions that's frequently in the back of my mind is how to incentivize commons-style engagement rather than extractive engagement, and a key tool for this purpose is licensing.

Sourmash is BSD-licensed, which, in essence, means that anyone can do whatever they want with the code - including incorporating it unchanged into a commercial closed-source product, rebranding it as a new product, and/or changing it in incompatible ways (and then rebranding it as a new and better product). This is typically something that companies will do, although it also happens with open source forks. (See: Elasticsearch to OpenSearch; and Matrix).

Branchwater, our internal code-name for the collection of sourmash-based functionality that enables petabase-scale search, is licensed under AGPL. This means that anyone can use it however they want, as long as they release any modifications they make to the source code. In particular, this also applies to people providing a service based on the branchwater code:

Let’s say you create a software program. Another developer takes and modifies it, and then provides access to that modification to paying customers through a software-as-a-service model. Under the GPL v3, that modification would essentially become proprietary because it wasn’t technically distributed. Under AGPL, however, that developer would need to make their modified source code available for download. (link)

IIRC, there are a couple of reasons that Dr. Luiz Irber (the initial author of the branchwater code, and the originator of most of the branchwater code and supporting infrastructure) chose AGPL. One of the main ones (again, IIRC) is to discourage incompatible forks of the source code. But it also discourages many kinds of extractive behavior: a company could not, for example, take this code, modify it in sekret ways, and provide services based upon that sekrecy, without providing the modified code openly under the AGPL license.

You could argue that the AGPL license decreases certain kinds of uptake. Perhaps so, and I chose the BSD license for sourmash (with Luiz's OK, albeit in a situation where I was his supervisor...) specifically to encourage uptake, reuse, modification, and experimentation. I don't know how to evaluate the success of this choice, really, other than to say that I still don't see a blindingly obvious downside to it (as of Jan 5, 2024 :).

At the end of the day, my thoughts trend towards seeing the value in sourmash as less algorithmic innovation and more infrastructure innovation. We are maintaining and sustaining a very functional and useful piece of software, with good documentation and an ever-expanding range of use cases. And it remains very useful to me and my lab, specifically. Not only do I not care if companies extract value from it - there are many ways to skin this particular cat - but I am happy and excited that my labor as an academic is actually useful to someone else.

On the flip side, branchwater is both more niche and more difficult. There aren't many ways to do petabase-scale search, and there is a lot more infrastructure maintenance involved. I would be sad to see someone take our (collective) investment in this functionality and build upon it without returning something to the community of developers.

I'm not sure what and where the dividing line between these two situations is for me. But I think sketching out the current line is a good start :).


Comments !