What is open science?

Gabriella Coleman asked me for a short, general introduction to open science for a class, and I couldn't find anything that fit her needs. So I wrote up my own perspective. Feedback welcome!

Some background: Science advances because we share ideas and methods

Scientific progress relies on the sharing of both scientific ideas and scientific methodology - “If I have seen further it is by standing on the shoulders of Giants” (https://en.wikipedia.org/wiki/Standing_on_the_shoulders_of_giants). The natural sciences advance not just when a researcher observes or understands a phenomenon, but also when we develop (and share) a new experimental technique (such as microscopy), a mathematical approach (e.g. calculus), or a new computational framework (such as multi scale modeling of chemical systems). This is most concretely illustrated by the practice of citation - when publishing, we cite the previous ideas we’re building on, the published methods we’re using, and the publicly available materials we relied upon. Science advances because of this sharing of ideas, and scientists are recognized for sharing ideas through citation and reputation.

Despite this, however, there are many barriers that lie in the way of freely sharing ideas and methods - ranging from cultural (e.g. peer review delays before publication) to economic (such as publishing behind a paywall) to methodological (for example, incomplete descriptions of procedures) to systemic (e.g. incentives to hide data and methods). Some of these barriers are well intentioned - peer review is intended to block incorrect work from being shared - while others, like closed access publishing, have simply evolved with science and are now vestigial.

So, what is open science??

Open science is the philosophical perspective that sharing is good and that barriers to sharing should be lowered as much as possible. The practice of open science is concerned with the details of how to lower or erase the technical, social, and cultural barriers to sharing. This includes not only what I think of as “the big three” components of open science -- open access to publications, open publication and dissemination of data, and open development, dissemination, and reuse of source code -- but also practice such as social media, open peer review, posting and publishing grants, open lab notebooks, and any other methods of disseminating ideas and methods quickly.

The potential value of open science should be immediately obvious: easier and faster access to ideas, methods, and data should drive science forward faster! But open science can also aid with reproducibility and replication, decrease the effects of economic inequality in the sciences by liberating ideas from subscription paywalls, and provide reusable materials for teaching and training. And indeed, there is some evidence for many of these benefits of open science even in the short term (see How open science helps researchers succeed, McKiernan et al. 2016). This is why many funding agencies and institutions are pushing for more science to be done more openly and made available sooner - because they want to better leverage their investment in scientific progress.

Some examples of open science

Here are a few examples of open science approaches, taken from my own experiences.

Preprints

In biology (and many other sciences), scientists can only publish papers after they undergo one or more rounds of peer review, in which 2-4 other scientists read through the paper and check it for mistakes or overstatements. Only after a journal editor has received the reviews and decided to accept the paper does it “count". However, in some fields, there are public sites where draft versions of papers can be publicly posted prior to peer review - these “preprint servers” work to disseminate work in advance of any formal review. The first widely used preprint server, arXiv, was created in the 1980s for math and physics, and in those fields preprints now often count towards promotion and grant decisions.

The advantages of preprints are that they get the work out there, typically with a citable identifier (DOI), and allow new methods and discoveries to spread quickly. They also typically count for establishing priority - a discovery announced in a preprint is viewed as a discovery, period, unless it is retracted after peer review. The practical disadvantages are few - the appearance of double-publishing was a concern, but is no longer, as most journals allow authors to preprint their work. In practice, most preprints just act as an extension of the traditional publishing system (but see this interesting post by Matt Stephens on "pre-review" by Biostatistics). What is viewed as the major disadvantage can also be an advantage - the work is published with the names of the authors, so the reputation of the authors can be affected both positively and negatively by their work. This is what some people tell me is the major drawback to preprints for them - that the work is publicly posted without any formal vetting process, which could catch major problems with the work that weren't obvious to the authors.

I have been submitting preprints since my first paper in 1993, which was written with a physicist for whom preprinting was the default (Adami and Brown, 1994). Many of my early papers were preprinted because my collaborators were used to it. While in graduate school, I lapsed in preprinting for many years because my field (developmental biology) didn’t “do” preprints. When I started my own lab, I returned to preprinting, and submitted all of my senior author papers to preprint servers. Far from suffering any harm to my career, I have found that our ideas and our software have spread more quickly because of it - for example, by the time my first senior author paper was reviewed, another group had already built on top of it based on our preprint (see Pell et al., 2014 which was originally posted at arXiv, and Chikhi and Rizk 2013).

Social media

There are increasingly many scientists of all career stages on Twitter and writing blogs, and they discuss their own and others’ work openly and even candidly. This has the effect of letting people restricted in their travel into social circles that would otherwise be closed to them, and accelerates cross-subject pollination of ideas. In bioinformatics, it is typical to hear about new bioinformatics or genomics methods online 6 months to a year before they are even available as a preprint. For those who participate, this results in fast dissemination and evaluation of methods and it can quickly generate a community consensus around new software.

The downsides of social media are the typical social media downsides: social media is its own club with its own cliques, however welcoming some of those cliques can be; identifiable women and people of color operate at a disadvantage here as elsewhere; cultivating a social media profile can require quite a bit of time that could be spent elsewhere; and online discussions about science can be gossipy, negative, and even unpleasant. Nonetheless there is little doubt that social media can be a useful scientific tool (see Bik and Goldstein, 2013), and can foster networking and connections in ways that don’t rely on physical presence - a major advantage to labs without significant travel funds, parents with small children, etc.

In my case, I tend to default to being open about my work on social media. I regularly write blog posts about my research and talk openly about ideas on twitter. This has led to many more international connections than I would have had otherwise, as well as a broad community of scientists that I consider personal friends and colleagues. In my field, this has been particularly important; since many of my bioinformatics colleagues tend to be housed in biology or computer science departments rather than any formal computational biology program, the online world of social media also serves as an excellent way of discovering colleagues and maintaining collegiality in an interdisciplinary world, completely independent of its use for spreading ideas and building reputation.

Posting grants

While reputation is the key currency of advancement in science, good ideas are fodder for this advancement. Ideas are typically written up in the most detail in grant proposals - requests for funding from government agencies or private foundations. The ideas in grant proposals are guarded jealously, with many professors refusing to share grant proposals even within their labs. A few people (myself included) have taken to publicly posting grants when they are submitted, for a variety of reasons (see Ethan White's blog post for details).

In my case, I posted my grants in the hopes of engaging with a broader community to discuss the ideas in my grant proposal; while I haven’t found this engagement, the grants did turn out to be useful for junior faculty who are confused about formatting and tone and are looking for examples of successful (or unsuccessful) grants. More recently, I have found that people are more than happy to skim my grants and tell me about work outside my field or even unpublished work that bears on my proposal. For example, with my most recent proposal, I discovered a number of potential collaborators within 24 hours of posting my draft.

Why not open science?

The open science perspective - "more sharing, more better" - is slowly spreading, but there are many challenges that are delaying its spread.

One challenge of open science is that sharing takes effort, while the immediate benefits of that sharing largely go to people other than the producer of the work being shared. Open data is a perfect example of this: it takes time and effort to clean up and publish data, and the primary benefit of doing so will be realized by other people. The same is true of software . Another challenge is that the positive consequences of sharing, such as serendipitous discoveries and collaboration, cannot be accurately evaluated or pitched to others in the short term - it requires years, and sometimes decades, to make progress on scientific problems, and the benefits of sharing do not necessarily appear on demand or in the short term.

Another block to open science is that many of the mechanisms of sharing are themselves somewhat new, and are rejected in unthinking conservatism of practice. In particular, most senior scientists entered science at a time when the Internet was young and the basic modalities and culture of communicating and sharing over the Internet hadn’t yet been developed. Since the pre-Internet practices work for them, they see no reason to change. Absent a specific reason to adopt new practices, they are unlikely to invest time and energy in adopting new practices. This can be seen in the rapid adoption of e-mail and web sites for peer review (making old practices faster and cheaper) in comparison to the slow and incomplete adoption of social media for communicating about science (which is seen by many scientists as an additional burden on their time, energy, and focus).

Metrics for evaluating products that can be shared are also underdeveloped. For example, it is often hard to track or summarize the contributions that a piece of software or a data set makes to advancing a field, because until recently it was hard to cite software and data. More, there is no good technical way to track software that supports other software, or data sets that are combined in a larger study or meta-study, so many of the indirect products of software and data may go underreported.

Intellectual property law also causes problems. For example, in the US, the Bayh-Dole Act stands in the way of sharing ideas early in their development. Bayh-Dole was intended to spur innovation by granting universities the intellectual property rights to their research discoveries and encouraging them to develop them, but I believe that it has also encouraged people to keep their ideas secret until they know if they are valuable. But in practice most academic research is not directly useful, and moreover it costs a significant amount of money to productize, so most ideas are never developed commercially. In effect this simply discourages early sharing of ideas.

Finally, there are also commercial entities that profit exorbitantly from restricting access to publications. Several academic publishers, including Elsevier and MacMillan, have profit margins of 30-40%! (Here, see Mike Taylor on The obscene profits of commercial scholarly publishers.) (One particularly outrageous common practice is to charge a single lump sum for access to a large number of journals each year, and only provide access to the archives in the journals through that current subscription - in effect making scientists pay annually for access to their own archival literature.) These corporations are invested in the current system and have worked politically to block government efforts towards encouraging open science.

Oddly, non-profit scientific societies have also lobbied to restrict access to scientific literature; here, their argument appears to be that the journal subscription fees support work done by the societies. Of note, this appears to be one of the reasons why an early proposal for an open access system didn't realize its full promise. For more on this, see Kling et al., 2001, who point out that the assumption that the scientific societies accurately represent the interests and goals of their constituents and of science itself is clearly problematic.

The overall effect of the subscription gateways resulting from closed access is to simply make it more difficult for scientists to access literature; in the last year or so, this fueled the rise of Sci-Hub, an illegal open archive of academic papers. This archive is heavily used by academics with subscriptions because it is easier to search and download from Sci-Hub than it is to use publishers' Web sites (see Justin Peters' excellent breakdown in Slate).

A vision for open science

A great irony of science is that a wildly successful model of sharing and innovation — the free and open source software (FOSS) development community— emerged from academic roots, but has largely failed to affect academic practice in return. The FOSS community is an exemplar of what science could be: highly reproducible, very collaborative, and completely open. However, science has gone in a different direction. (These ideas are explored in depth in Millman and Perez 2014.)

It is easy and (I think) correct to argue that science has been corrupted by the reputation game (see e.g. Chris Chambers' blog post on 'researchers seeking to command petty empires and prestigious careers') and that people are often more concerned about job and reputation than in making progress on hard problems. The decline in public funding for science, the decrease in tenured positions (here, see Alice Dreger's article in Aeon), and the increasing corporatization of research all stand in the way of more open and collaborative science. And it can easily be argued that they stand squarely in the way of faster scientific progress.

I remain hopeful, however, because of generational change. The Internet and the rise of free content has made younger generations more aware of the value of frictionless sharing and collaboration. Moreover, as data set sizes become larger and data becomes cheaper to generate, the value of sharing data and methods becomes much more obvious. Young scientists seem much more open to casual sharing and collaboration than older scientists; it’s the job of senior scientists who believe in accelerating science to see that they are rewarded, not punished, for this.


Other resources and links:

Comments !

social