Gabriella Coleman asked me for a short, general introduction to open
science for a class, and I couldn't find anything that fit her needs.
So I wrote up my own perspective. Feedback welcome!
Some background: Science advances because we share ideas and methods
Scientific progress relies on the sharing of both scientific ideas and
scientific methodology - “If I have seen further it is by standing on
the shoulders of Giants”
(https://en.wikipedia.org/wiki/Standing_on_the_shoulders_of_giants). The
natural sciences advance not just when a researcher observes or
understands a phenomenon, but also when we develop (and share) a new
experimental technique (such as microscopy), a mathematical approach
(e.g. calculus), or a new computational framework (such as multi scale
modeling of chemical systems). This is most concretely illustrated by
the practice of citation - when publishing, we cite the previous ideas
we’re building on, the published methods we’re using, and the publicly
available materials we relied upon. Science advances because of
this sharing of ideas, and scientists are recognized for sharing ideas
through citation and reputation.
Despite this, however, there are many barriers that lie in the way of
freely sharing ideas and methods - ranging from cultural (e.g. peer
review delays before publication) to economic (such as publishing
behind a paywall) to methodological (for example, incomplete
descriptions of procedures) to systemic (e.g. incentives to hide data
and methods). Some of these barriers are well intentioned - peer
review is intended to block incorrect work from being shared - while
others, like closed access publishing, have simply evolved with
science and are now vestigial.
So, what is open science??
Open science is the philosophical perspective that sharing is good
and that barriers to sharing should be lowered as much as possible.
The practice of open science is concerned with the details of
how to lower or erase the technical, social, and cultural barriers to
sharing. This includes not only what I think of as “the big three”
components of open science -- open access to publications, open
publication and dissemination of data, and open development,
dissemination, and reuse of source code -- but also practice such as
social media, open peer review, posting and publishing grants, open
lab notebooks, and any other methods of disseminating ideas and
methods quickly.
The potential value of open science should be immediately obvious:
easier and faster access to ideas, methods, and data should drive
science forward faster! But open science can also aid with
reproducibility and replication, decrease the effects of economic
inequality in the sciences by liberating ideas from subscription
paywalls, and provide reusable materials for teaching and training.
And indeed, there is some evidence for many of these benefits of open
science even in the short term (see How open science helps
researchers succeed, McKiernan et al. 2016). This is why many
funding agencies and institutions are pushing for more science to be
done more openly and made available sooner - because they want to
better leverage their investment in scientific progress.
Some examples of open science
Here are a few examples of open science approaches, taken from my own
experiences.
Preprints
In biology (and many other sciences), scientists can only publish
papers after they undergo one or more rounds of peer review, in which
2-4 other scientists read through the paper and check it for mistakes
or overstatements. Only after a journal editor has received the
reviews and decided to accept the paper does it “count". However, in
some fields, there are public sites where draft versions of papers can
be publicly posted prior to peer review - these “preprint servers”
work to disseminate work in advance of any formal review. The first
widely used preprint server, arXiv, was created in the 1980s for math
and physics, and in those fields preprints now often count towards
promotion and grant decisions.
The advantages of preprints are that they get the work out there,
typically with a citable identifier (DOI), and allow new methods and
discoveries to spread quickly. They also typically count for
establishing priority - a discovery announced in a preprint is viewed
as a discovery, period, unless it is retracted after peer review. The
practical disadvantages are few - the appearance of double-publishing
was a concern, but is no longer, as most journals allow authors to
preprint their work. In practice, most preprints just act as an
extension of the traditional publishing system (but see this
interesting post by Matt Stephens on "pre-review" by Biostatistics).
What is viewed as the major disadvantage can also be an advantage -
the work is published with the names of the authors, so the reputation
of the authors can be affected both positively and negatively by their
work. This is what some people tell me is the major drawback to
preprints for them - that the work is publicly posted without any
formal vetting process, which could catch major problems with the work
that weren't obvious to the authors.
I have been submitting preprints since my first paper in 1993, which
was written with a physicist for whom preprinting was the default
(Adami and Brown, 1994).
Many of my early papers were preprinted because my collaborators
were used to it. While in graduate school, I lapsed in preprinting for many
years because my field (developmental biology) didn’t “do”
preprints. When I started my own lab, I returned to preprinting, and
submitted all of my senior author papers to preprint servers. Far
from suffering any harm to my career, I have found that our ideas and
our software have spread more quickly because of it - for example, by
the time my first senior author paper was reviewed, another group had
already built on top of it based on our preprint (see Pell et al.,
2014 which was
originally posted at arXiv, and
Chikhi and Rizk 2013).
Posting grants
While reputation is the key currency of advancement in science, good
ideas are fodder for this advancement. Ideas are typically written up
in the most detail in grant proposals - requests for funding from
government agencies or private foundations. The ideas in grant
proposals are guarded jealously, with many professors refusing to
share grant proposals even within their labs. A few people (myself
included) have taken to publicly posting grants when they are
submitted, for a variety of reasons (see Ethan White's blog post
for details).
In my case, I posted my grants in the hopes of engaging with a broader
community to discuss the ideas in my grant proposal; while I haven’t
found this engagement, the grants did turn out to be useful for junior
faculty who are confused about formatting and tone and are looking for
examples of successful (or unsuccessful) grants. More recently, I
have found that people are more than happy to skim my grants and tell
me about work outside my field or even unpublished work that bears on
my proposal. For example, with my most recent proposal,
I discovered a number of potential collaborators within 24 hours of
posting my draft.
Why not open science?
The open science perspective - "more sharing, more better" - is slowly
spreading, but there are many challenges that are delaying its spread.
One challenge of open science is that sharing takes effort, while
the immediate benefits of that sharing largely go to people other than
the producer of the work being shared. Open data is a perfect example
of this: it takes time and effort to clean up and publish data, and
the primary benefit of doing so will be realized by other people. The
same is true of software . Another challenge is that the positive
consequences of sharing, such as serendipitous discoveries and
collaboration, cannot be accurately evaluated or pitched to others in
the short term - it requires years, and sometimes decades, to make
progress on scientific problems, and the benefits of sharing do not
necessarily appear on demand or in the short term.
Another block to open science is that many of the mechanisms of
sharing are themselves somewhat new, and are rejected in unthinking
conservatism of practice. In particular, most senior scientists
entered science at a time when the Internet was young and the basic
modalities and culture of communicating and sharing over the Internet
hadn’t yet been developed. Since the pre-Internet practices work for
them, they see no reason to change. Absent a specific reason to adopt
new practices, they are unlikely to invest time and energy in adopting
new practices. This can be seen in the rapid adoption of e-mail and
web sites for peer review (making old practices faster and cheaper) in
comparison to the slow and incomplete adoption of social media for
communicating about science (which is seen by many scientists as an
additional burden on their time, energy, and focus).
Metrics for evaluating products that can be shared are also
underdeveloped. For example, it is often hard to track or summarize
the contributions that a piece of software or a data set makes to
advancing a field, because until recently it was hard to cite software
and data. More, there is no good technical way to track software that
supports other software, or data sets that are combined in a larger
study or meta-study, so many of the indirect products of software and
data may go underreported.
Intellectual property law also causes problems. For example, in the
US, the Bayh-Dole Act stands in the way of sharing ideas early in
their development. Bayh-Dole was intended to spur innovation by
granting universities the intellectual property rights to their
research discoveries and encouraging them to develop them, but I
believe that it has also encouraged people to keep their ideas secret
until they know if they are valuable. But in practice most academic
research is not directly useful, and moreover it costs a significant
amount of money to productize, so most ideas are never developed
commercially. In effect this simply discourages early sharing of
ideas.
Finally, there are also commercial entities that profit exorbitantly
from restricting access to publications. Several academic publishers,
including Elsevier and MacMillan, have profit margins of 30-40%!
(Here, see Mike Taylor on The obscene profits of commercial scholarly
publishers.)
(One particularly outrageous common practice is to charge a single
lump sum for access to a large number of journals each year, and only
provide access to the archives in the journals through that current
subscription - in effect making scientists pay annually for access to
their own archival literature.) These corporations are invested in
the current system and have worked politically to block government
efforts towards encouraging open science.
Oddly, non-profit scientific societies have also lobbied to restrict
access to scientific literature; here, their argument appears to be
that the journal subscription fees support work done by the societies.
Of note, this appears to be one of the reasons why an early proposal
for an open access system didn't realize its full promise. For more on
this, see Kling et al., 2001,
who point out that the assumption that the scientific societies
accurately represent the interests and goals of their constituents and
of science itself is clearly problematic.
The overall effect of the subscription gateways resulting from closed
access is to simply make it more difficult for scientists to access
literature; in the last year or so, this fueled the rise of Sci-Hub,
an illegal open archive of academic papers. This archive is heavily
used by academics with subscriptions because it is easier to search
and download from Sci-Hub than it is to use publishers' Web sites (see
Justin Peters' excellent breakdown in Slate).
A vision for open science
A great irony of science is that a wildly successful model of sharing
and innovation — the free and open source software (FOSS) development
community— emerged from academic roots, but has largely failed to
affect academic practice in return. The FOSS community is an exemplar of what
science could be: highly reproducible, very collaborative, and
completely open. However, science has gone in a different
direction. (These ideas are explored in depth in Millman and Perez
2014.)
It is easy and (I think) correct to argue that science has been
corrupted by the reputation game (see e.g. Chris Chambers' blog post
on 'researchers seeking to command petty empires and prestigious
careers')
and that people are often more concerned about job and reputation than
in making progress on hard problems. The decline in public funding
for science, the decrease in tenured positions (here, see Alice
Dreger's article in Aeon),
and the increasing corporatization of research all stand in the way of
more open and collaborative science. And it can easily be argued that
they stand squarely in the way of faster scientific progress.
I remain hopeful, however, because of generational change. The
Internet and the rise of free content has made younger generations
more aware of the value of frictionless sharing and collaboration.
Moreover, as data set sizes become larger and data becomes cheaper to
generate, the value of sharing data and methods becomes much more
obvious. Young scientists seem much more open to casual sharing and
collaboration than older scientists; it’s the job of senior scientists
who believe in accelerating science to see that they are rewarded, not
punished, for this.
Other resources and links:
"Influential works in Data Driven Discovery", by Stalzer and
Mentzel, 2015 shows how modern
data science rests, in large part, on software (not just methods) -
see my blog commentary.
The New England Journal of Medicine had several editorials on "research
parasites" that make for illuminating reading on an alternative perspective
of how science should work: see Longo and Drazen, 2016 and Longo and Drazen, 2016 (2).
Why scientists should code in the open,
by Juan Nunez-Iglesias.
How a happy moment for neuroscience is a sad moment for science, by Mark Humphries. Quote:
The release of this data took a privately funded institute. It
could not have come from a publicly-funded scientist. It is a
striking case-study in how modern science is worryingly broken,
because it prioritizes private achievement over the public good.
In defense of extreme openness, a presentation by Jake VanderPlas.
A list of open science resources (somewhat dated),
by SVAKSHA.