Preprints won’t just publish themselves: Why we need centralized services for preprints

There has been much attention recently given to preprints, the early versions of journal articles that haven’t yet been peer-reviewed. While preprints have been around since before launched in 1991, fields outside of physics are starting to push for more early sharing of research data, results and conclusions. This will undoubtedly speed up research and make more of it available under open and reusable licenses.

We are seeing the beginning of a proliferation in the number of preprint publishing services. This is a good thing. We know from the organic growth of journals that researchers often choose to publish in places that serve their own community. This will no doubt be true with preprint services as well and offering them choices makes sense. Even within the very large arXiv preprint server, there are many different community channels where researchers look for their colleagues work.

Last year ASAPbio formed with the goal of increasing preprint posting in the life sciences. There was agreement that preprints should be searchable, discoverable, mineable, linked, and reliably archived. These are all steps that the online journal publishing industry needed to take 20 years ago, and there are well-understood mechanisms in place. This is how cross-journal databases such as PubMed came to be, best practices such as assigning DOIs evolved, standards such as COUNTER were developed to ensure consistent reporting on usage, and integration with research databases such as GenBank were worked out.

These same efforts will be needed across the different preprint services to ensure that preprints are taken seriously as research artifacts. As more preprint channels arise, this infrastructure and operating standards will only be more important. A research communication service is not necessarily the same as its underlying technology and, though people tend to equate the two, shared preprint infrastructure is actually the best way to ensure costs are kept down and standards are applied.

As the ASAPbio conversation evolved, so did the discussion of whether a central service was needed for aggregation of preprints. I believe that what is needed is a collection of services that are centralized in some way to ensure a low cost and easy path to preprint services and that they work together as effectively as possible.

Several of the needed services include:

  1. Consistent standards applied to preprints (identifiers, formats, metadata)
  2. Reliable archiving for long term preservation
  3. A record of all preprints in one place for further research purposes (text and data mining, informatics, etc)
  4. Version recording and control
  5. Best practices in preprint publishing applied across all services
  6. Sustainability mechanisms for existing and new preprint services

Comparable services for journals have helped to make journal literature reliable and persistent. If we want preprints to turn into first class research artifacts in the life sciences and other fields outside of physics, we need to apply some degree of the same treatment for them – and at this early stage, now is the time to plan for these services.

A centralized set of services could ensure that, for preprint services that already exist, their efforts are tracked and a record is kept. If they don’t have DOIs they can get affordably get them. If they have DOIs, those DOIs are tracked and searchable through a central API. If the preprints are PDF-only, a version could be converted to structured data and held in a minable database.

The sooner that the research communication community gets out in front of these support services for preprints, the less chance there is for loss of data and an incomplete record of this growing segment of research literature.




Preprints and the ASAPBio "Central" Services

Jo McEntyre, EMBL-EBI; Thomas Lemberger, EMBO; Mark Patterson, eLife; Kristen Rattan, Collaborative Knowledge Foundation; Alfonso Valencia, Barcelona Supercomputer Centre.
The use of preprints in the life sciences offers tantalising opportunities to change the way research results are communicated and reused, and the work of ASAPbio has been key in engaging the scientific community to promote their uptake. We fully support these goals, and consequently submitted a response to the recent ASAPbio Request for Applications (RFA). In light of ASAPbio’s understandable recent decision to suspend the RFA process for four months, we are making our proposal public here, to encourage and contribute to ongoing, open discussions on these matters.
Our consortium is led by the European Bioinformatics Institute (EMBL-EBI), with collaborators in the Collaborative Knowledge Foundation, the Barcelona Supercomputer Centre, eLife and EMBO. We appreciate that not everyone interested in preprints will have time to read the full proposal, so we summarise some of the main points here.
We put in a response to the RFA because we share the excitement and enthusiasm that has emerged recently around the use of preprints in the biological sciences. The reason for our excitement is simple – alongside the rapid communication of research, we see massive potential for innovation based on preprint content. We envision that the best route to enable these goals is through a reasonable number of preprint servers and services, coordinated through the operation of agreed community standards. The standards will allow content to be federated and/or aggregated across servers, depending on the use cases. This model allows a diversity of approaches to addressing the opportunities and challenges that preprints bring.
Between us, we are developing infrastructure and services for publishing processes, article enrichment, text and data mining tools, bioinformatics, and mechanisms for data integration and discovery. But more important than our singular contributions, we are also embedded in broader researcher and developer communities that are as enthusiastic as we are about the opportunities for innovation that preprints offer. Alongside the core elements in the ASAPbio RFA, the fundamental theme of our proposal is therefore to enable those communities to engage with preprint content and contribute to moving scientific communication “beyond the PDF”.
Our proposal is to combine existing and emerging open-source software and open data infrastructure to facilitate the ingestion of preprints from any source into a community archive and then share the content in different ways. This satisfies not only the scientific imperative of rapidly discoverable research results, but also creates a platform for innovation that has the promise to make information discovery faster and more effective in the future.
In short, the central services we envisage will enable any interested party to develop “plug-in” applications that can be used – optionally and in any order – in any part of the system. Some applications might work on individual documents prior to release (for example in quality control); others might work on the collection as a whole, post release; some might be fundamental “mission-critical” steps (like document conversions); and some might be more experimental. We propose to engage the developer and text- and data-mining communities through open challenges to invent new applications based on preprints. No-one knows where the next “killer app” will come from, so we want to foster broad participation and expose these developments to the wider scientific community.
The top priority is to support the uptake of preprints by the scientific community and ensure their citability and discoverability. But in order to realise transformative developments in the future, there are necessities beyond this.
Most critical among these is the ability to reuse preprints. By this, we mean not only that the content has a license that supports reuse (the CC-BY license), but also that the content is readily available as a whole, so that would-be application developers and text-miners do not have to struggle to gather content together. Most peer-reviewed literature is still subject to access and reuse restrictions and is highly distributed – with preprints we have a unique opportunity to support unrestricted and comprehensive reuse from the outset.
Secondly, quality metadata and the consistent application of standards are essential. We care about open standards like JATS for structuring the XML of full text articles, and are open to discussion about how this may evolve to support preprints in the future. Author names with ORCIDs, machine readable data citation, correctly identified institutions and funding sources are all critical for a connected research management ecosystem. Given these building blocks, others could develop tools that reduce the repetitive reporting burden on researchers, or services and indicators to give a wider stakeholder group a better understanding of the influence and impact of research. Finally, a governance structure that represents the interests of the community is a necessity, as services around preprints need to remain current and address evolving user needs over time. This approach to preprints infrastructure lends itself to reuse within different disciplinary contexts, providing a basis for cross-disciplinary standards of core elements, yet allowing adaptation by those communities according to their specific scientific requirements. Central services are a crucial part of biology today. It is hard to imagine how biology could progress without resources such as the wwPDB, or the International Nucleotide Sequence Database Collaboration. We are excited about preprints because they offer a tremendous opportunity to move science forward in parallel with these data resources, enabling integration of research outputs and knowledge discovery. We welcome comments and discussion as we move towards these shared goals, supporting science into the future.

The history of peer review, and looking forward to preprints in biomedicine


Peer review is not as old as you might think

Peer review is often regarded as a ‘touchstone of modern evaluation of scientific quality’ but it is only relatively recently that it has become widely adopted in scientific publishing. The journal Nature did not introduce a formal peer review system until 1967. Before then some papers were reviewed, others were not. Michael Nielsen suggests that with the ‘increasing specialization of science…editors gradually found it harder to make informed decisions about what was worth publishing’.

Aileen Fyfe has pointed out that ‘peer review should not be treated as a sacred cow … rather, it should be seen for what it is: the currently dominant practice in a long and varied history of reviewing practices’.

Challenging the status quo

The widespread adoption of the Internet as a means of scholarly interaction began in the mid to late 1990s. Even back then discussions raged about the benefits and disbenefits of challenging the publishing status quo. Tony Delamothe, writing in 1998, summed up the arguments thus:

At one extreme were enthusiasts for electronic preprints, who regard them not as scientific papers in evolution but as near enough finished articles. To these respondents, the current long process of peer review and paper publication is detrimental to science and the public health: any way of getting scientific advances into the public domain fast is worth supporting.

At the other extreme were respondents who thought “too much junk” was already being published. Lacking the skills to distinguish between “valuable material and garbage” journalists and the public could be misled.

More recently the realization has been growing that researchers will use electronic preprints because of their benefits—however much journals may rail against them.

The following year it seemed that the world was really changing when the US National Institutes of Health published its E-biomed proposal but this proved too radical for many in the biomedical research community.

Scientific reports in the E-biomed repository would be submitted through either of two mechanisms… (i) Many reports would be submitted to editorial boards. These boards could be identical to those that represent current print journals or they might be composed of members of scientific societies or other groups approved by the E-biomed Governing Board. (ii) Other reports would be posted immediately in the E-biomed repository, prior to any conventional peer review, after passing a simple screen for appropriateness.

That last part seemed too big a departure from peer review, and the proposal was watered down, leading to the establishment of the PubMedCentral repository for published papers. The proposal indirectly stimulated the creation of two new publishers – the commercial BioMedCentral and the not-for-profit PLOS.

The early impact of open access on peer review

Early proponents of open access took pains to make it clear that their immediate goal was to improve access to research literature, and not to challenge peer review practices. They were careful not to lose the support of those who held peer review dear.

However, as new open access journals were established they did provide opportunities to experiment with enhancements to peer review. PLOS ONE, launching in 2006, famously popularized the idea of a ‘megajournal’ with its mission to publish “scientifically rigorous research regardless of novelty”. This model was followed by a swathe of other megajournals. The Frontiers series of journals launched in 2007 and introduced ‘interactive collaborative review’, which aimed to turn the peer review process into a “direct online dialogue, enabling quick iterations and facilitating consensus”. In 2012 eLife launched with the aim of ‘taking the pain out of peer review’, again by a more collaborative approach. Gradually, the role of peer review was challenged and the practice changed. PLOS ONE introduced the idea that dissemination of research was at least as important as validation of research.

Happily, these challenges have not caused the whole world of research communication to come crashing down. In recent years there has been a bit of a rash of retractions, but these are more strongly associated with high end journals than with megajournals. The strong positions that PLOS ONE and Scientific Reports have
achieved suggest that megajournals are here to stay.

A few journals have sought to modify peer review further – F1000Research and Wellcome Open Research make preprints of articles available almost immediately after submission, and then invite post-publication open peer review.

If most research articles are posted first as preprints then access to research findings becomes possible as soon as an article is completed rather than, as at present, when the article is accepted and published in a journal.


A preprint is “a scientific manuscript uploaded by authors to an open access, public server before formal peer review”. Currently proponents of preprints are following the same strategy as the early OA advocates. Preprints are advocated as a route to better access rather than as a challenge to peer review. The ASAPbio initiative is
all about faster access to research findings – ‘Accelerating Science and Publication in Biology’. If most research articles are posted first as preprints then access to research findings becomes possible as soon as an article is completed rather than, as at present, when the article is accepted and published in a journal.

Challenges to the current practice of peer review will surely follow the wider adoption of preprints.

Preprints have been widely adopted by physicists through the ArXiv server, but publishing practices and sharing cultures vary greatly between different research fields and biomedical researchers did not show much enthusiasm for preprints until recently. ArXiv has provided a home for computational biology preprints, and this
helped to pave the way for the establishment of bioRxiv – a preprint server for biomedical sciences.

Most articles uploaded to bioRxiv are also submitted to journals and subsequently peer-reviewed and published. But some also see bioRxiv as a permanent home for research results. One researcher has declared that one of his bioRxiv preprints is the “final version” and that he will not submit it for publication in a journal. Partly
this is because the article is a response to a previously published article, rather than a full article in its own right. But the researcher also wanted to experiment with how preprints are perceived by researchers.”

Preprints are still a tiny fraction of the total output of biomedical research papers. If there is widespread adoption, and researchers become accustomed to reading research reports that have not been peer-reviewed, we may increasingly question the value of peer review as a means of screening all research reports. Bernd Pulverer has suggested that:

“If preprints should attain the dominant role they have in physics, publishing papers in journals may remain attractive only in journals that add real value to the scientific communication process.”

He suggests that it will be worthwhile only for quality journals “to invest time and effort to add reliability and reproducibility assurances to research findings through careful peer review and prepublication quality control and curation processes.”

We may be moving to a world where some research is just published ‘as is’, and subject to post-publication peer review, while other research goes through a more rigorous form of review including reproducibility checks. This will be a gradual process, over a period of years. New tools such as Meta and Yewno, using artificial
intelligence, will help by providing new ways to discover and filter the literature. A new set of research behaviors will emerge around reading, interpreting and responding to preprint literature. The corridors of science will resound with caveat lector and nullius in verba.


Further reading

1. Baldwin, M. “In referees we trust?” Physics Today 70:2 (2017), 44, doi:

2. “A succinct history of academic peer review.” Frontiers blog (2015). Retrieved Mar 10 2017 from:

3. Keroso, N. H. “Open and Post Peer Review: New Trends in Open Access Publications.” UA Magazine (2016). Retrieved Mar 10 2017 from:

4. Flier, J. S. “It’s time to overhaul the secretive peer review process.” STAT (2016). Retrieved Mar 10 2017 from:

5. Patterson, M. “Bringing eLife to life.” Insights 26:3 (Nov 2013).

6. “Collaborative Peer Review.” Frontiers. Retrieved Mar 10 2017, from:

The post The history of peer review, and looking forward to preprints in biomedicine appeared first on BioMed Central blog.

OSF Preprints and Innovation in Scholarly Communication

Brian Nosek and Rusty Speidel provide a summary of OSF Preprints and where the service is heading.

OSF Preprints is an interface built on the Open Science Framework–a scholarly commons supporting the documentation, archiving, and sharing of data, materials, and outcomes of the research lifecycle. OSF Preprints has three defining features:

  • Aggregated.  Powered by SHARE, OSF Preprints aggregates search across preprint services.  Eleven are integrated so far, including arXiv, bioRxiv, PeerJ, and RePEc, representing access to over 2 million preprints.
  • Brandable. Any group that wants to offer a preprint service can launch and manage a fully functional service for their community.
  • Open-source.  OSF Preprints and the OSF supporting it are public goods infrastructure, with a public roadmap.

OSF Preprints is available as a general preprint service that accepts submissions from any domain of scholarship. However, the real power of the public infrastructure is in supporting branded services run by communities themselves.  So far, five branded services are in production: SocArXiv for the social sciences, PsyArXiv for psychology, engrXiv for engineering, and the new AgriXiv for the agricultural sciences and BITSS for social sciences research methodology.  Across these new services, more than 2,000 preprints are posted already and growth is accelerating.”

When is a Preprint Server Not a Preprint Server?

David Crotty discusses the differences between preprints and post publication peer review

“The key to the definition of “preprint” is in the prefix “pre”. A preprint is the author’s original manuscript, before it has been formally published in a journal. One of the primary purposes of preprints is that they allow authors to collect feedback on their work and improve it before submitting it for formal peer review and publication.”

The post When is a Preprint Server Not a Preprint Server? appeared first on The Scholarly Kitchen.

SSP Scholarly Kitchen Webinar: The Future Of Preprints

Scholars have distributed their work as preprints since at least the early 20th century, but the last year has seen increased interest in their use, particularly for fields like BioMedicine. We present a panel of experts to discuss new developments in Preprints, how new technologies are changing their visibility and use, how funders are hoping to drive uptake, and what impact this may have on traditional journal publishing.


  • David Crotty, Oxford University Press


  • Richard Sever, Cold Spring Harbor Laboratory Press
  • Darla Henderson, Open Access Programs, American Chemical Society
  • Gregory J. Gordon, SSRN

what we mean when we talk about preprints

Cameron Neylon, Damian Pattinson, Geoffrey Bilder, and Jennifer Lin have just posted a cracker of a preprint onto biorxiv.

On the origin of nonequivalent states: how we can talk about preprints

Increasingly, preprints are at the center of conversations across the research ecosystem. But disagreements remain about the role they play. Do they “count” for research assessment? Is it ok to post preprints in more than one place? In this paper, we argue that these discussions often conflate two separate issues, the history of the manuscript and the status granted it by different communities. In this paper, we propose a new model that distinguishes the characteristics of the object, its “state”, from the subjective “standing” granted to it by different communities. This provides a way to discuss the difference in practices between communities, which will deliver more productive conversations and facilitate negotiation on how to collectively improve the process of scholarly communications not only for preprints but other forms of scholarly contributions.

The opening paragraphs are a treat to read, and provide a simple illustration of a complex issue. They offer a model of state and standing, that provides a clean way of talking about what we mean when we talk about preprints.

There are a couple of illustrations in the paper of how this model applies to different fields, in particular, physics, biology, and economics.

I think it would be wonderful to extend this work to look at transitions in the state/standing model within disciplines over time. I suspect that we are in the middle of a transition in biology at the moment.

On the origin of nonequivalent states: how we can talk about preprints

On the origin of nonequivalent states: how we can talk about preprints
Cameron Neylon, Damian Pattinson, Geoffrey Bilder, Jennifer Lin

Increasingly, preprints are at the center of conversations across the research ecosystem. But disagreements remain about the role they play. Do they “count” for research assessment? Is it ok to post preprints in more than one place? In this paper, we argue that these discussions often conflate two separate issues, the history of the manuscript and the status granted it by different communities. In this paper, we propose a new model that distinguishes the characteristics of the object, its “state”, from the subjective “standing” granted to it by different communities. This provides a way to discuss the difference in practices between communities, which will deliver more productive conversations and facilitate negotiation on how to collectively improve the process of scholarly communications not only for preprints but other forms of scholarly contributions.

Read full article

Preprints are go at Crossref!

We’re excited to say that we’ve finished the work on our infrastructure to allow members to register preprints. Want to know why we’re doing this? Jennifer Lin explains the rationale in detail in an earlier post, but in short we want to help make sure that:

  • links to these publications persist over time
  • they are connected to the full history of the shared research results
  • the citation record is clear and up-to-date

Doing so will help fully integrate preprint publications into the formal scholarly record.

What’s new?

We’ve had to do some work on our own infrastructure to facilitate the inclusion of preprints, enabling:

  • Crossref membership for preprint repositories by updating our membership criteria and creating a policies for preprints
  • The deposit of persistent identifiers for preprints to ensure successful links to the scholarly record over the course of time via the DOI resolver.
  • Content registration for preprints with custom metadata that reflect researcher workflows from preprint to formal publication (this custom metadata will then be visible to anyone using the Crossref metadata).
  • Notification of links between preprints and formal publications that may follow (journal articles, monographs, etc.).
  • Auto-update of ORCID records to ensure that preprint contributors get credit for their work.
  • Preprint and funder registration to automatically report research contributions based on funder and grant identification.
  • It will also allow for the collection of “event data” that capture activities surrounding preprints (usage, social shares, mentions, discussions, recommendations, links to datasets and other research entities, etc.).

Now we’re ready to go!

Early adopters

We have been working with various preprint publishers who are launching (or planning to launch) their own preprint initiatives. is the first to successfully make preprints deposits using the dedicated schema. For example, this preprint is registered with Crossref. It is linked to a published journal article both in the online display as well the preprint’s Crossref metadata record. Others are getting ready to go – will your organisation be next? (Technical documentation available here.)

Martyn Rittman, from Preprints, operated by MDPI said: is delighted to be the very first to integrate the Crossref schema for preprints. We believe it is an important step in allowing working papers and preliminary results to be fully citable as soon as they are available. It also makes it easy to link to the final peer-reviewed version, regardless of where it is published. Thanks to the hard work of Crossref and clear documentation, the schema was very simple to implement and has been applied retrospectively to all preprints at

Jessica Polka, Director, ASAPbio adds: ASAPbio is a scientist-driven community initiative to promote the productive use of preprints in the life sciences. We’re thrilled to see Crossref’s development of a service that enables preprints to better contribute to the scholarly record. This infrastructure lays a necessary foundation for increasing acceptance of preprints as a valuable form of scientific communication among biologists.


Get in touch with any questions or comments, or join our upcoming webinar to talk about preprints, infrastructure and where we go from here.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑