PLOS Collaborates on Recommendations to Improve Transparency for Author Contributions

In a new report, a group convened by the US National Academy of Sciences and including a dozen journal editors reflects on authorship guidelines and recommends new ways to make author contributions more transparent.

What does it mean to be author number seven on a twenty-five–author article?

Establishing transparency for each author’s role in a research study is one of the recommendations in a report published today in the Proceedings of the National Academy of Sciences (PNAS) by a group led by Marcia McNutt, President of the National Academy of Sciences. The recommendations issued by this group, which included one of us, were adapted based on community feedback and peer review from an original draft presented as a preprint. PLOS supports the recommendations for increased transparency and has already put some of them in practice. Continue reading “PLOS Collaborates on Recommendations to Improve Transparency for Author Contributions”

Hypercompetition and journal peer review

By Chris Pickett Journal peer review is a critical part of vetting the integrity of the literature, and the research community should do more to value this exercise. Biomedical research is in a period of hypercompetition, and the pressures of hypercompetition force scientists to focus on metrics that define success in the current environment—funding, publications […]

Should scientists receive credit for peer review?

by Stephen Curry, Professor of Structural Biology, Imperial College (@Stephen_Curry) As the song goes – and I have in mind the Beatles’ 1963 cover version of Money (that’s all I want) – “the best things in life are free.” But is peer review one of them? The freely given service that many scientists provide as validation […]

Making Progress Toward Open Data: Reflections on Data Sharing at PLOS ONE

Posted May 8, 2017 by Meg Byrne in Editorial and Publishing Policy

“Since its inception, PLOS has encouraged data sharing; our original data policy (2003 – March 2014) required authors to share data upon request after publication. In line with PLOS’ ethos of open science and accelerating scientific progress, and in consultation with members of the wider scientific community, PLOS journals strengthened their data policy in March 2014 to further promote transparency and reproducibility.[1] This move was viewed as controversial by many, particularly for PLOS ONE, the largest and most multidisciplinary journal to ever undertake such a mandate. In this post, we look at our experience so far.”

A connected culture of collaboration: recognising and understanding its value for research

I contributed to the recently published Digital Science report on the Connected Culture of Collaboration. In it I explore how it is important for science to understand more about how collaboration, multi-disciplinary research and team science work to best effect. And maybe, when collaboration might not be the best option. There is also a possibility of what researchers at MIT, Magdalini Papadaki and Gigi Hirsch, coined ‘consortium fatigue’ arising, whereby large scale research may result in, for example, low productivity or a sense of redundancy.

‘Science of science’ (or as the MIT team suggested ‘science of collaboration’) always seems woefully neglected, and under-funded, given that if we knew how to optimise support for science and research, we should be able to produce many more of those outputs that funding agencies are keen to count, and accelerate their impact both within and beyond academia.

Why collaborate?

There is more to know about when and how to forge, sustain and nurture collaboration.

The starting point for understanding collaboration is, as Laure Haak, Executive Director of ORCID, who wrote the Collaboration report’s foreword says, ‘we need to be intentional with our infrastructure’. The way research is set up, directed, executed, where, with what, with whom, and all the other things that can influence the results of an experiment at any given time, on any given day, provides the context that is likely to be pivotal in making a breakthrough, or not. Put simply, the environment and resources, and the team, available for scientific research are crucial.

It is easy to find examples of multi-disciplinary teams and collaborations that have produced significant leaps forward and far-reaching impact. A recent analysis of the UK’s Research Excellence Framework (REF) found that over 80 per cent of the REF impact case studies described impact that was based upon multidisciplinary research. There is, however, more to know about when and how to forge, sustain and nurture collaboration. There is also evidence that working as part of a large team or collaboration can have a detrimental effect on the career of some individuals; particularly while research articles remain a researcher’s main currency.

Assigning authors’ roles

To provide an updated view of authorship and greater transparency around research contributions, the Contributor Roles Taxonomy (CRediT) was developed.

Original research papers with a small number of authors, particularly in the life sciences, have become increasingly rare.  Therefore, use of author position, as a way to estimate levels of researcher contributions is not useful nor is it easy to distinguish the role each author played. To provide an updated view of authorship and greater transparency around research contributions, the Contributor Roles Taxonomy (CRediT) was developed.

CRediT is the result of cross-sector collaboration, medical journal editors, researchers, research institutions, funding agencies, publishers and learned societies, and provides a simple taxonomy of roles that can be assigned as descriptors of individuals’ contributions to scholarly published output.

Individual contributions are captured in a structured format and stored as a piece of meta-data during an article’s submission process.  The taxonomy, going way beyond the concept of ‘authorship’, includes a range of roles such as data curation; development of design methodology; programming and software development; application of statistical or mathematical techniques to analyze data; and data visualization. Assigning these roles to those putting their name to a piece of scholarly output allows individuals to be recognised for specific skills and contributions to the research enterprise.

What changes have we seen?

If we can understand how collaborations work and when, we can properly incentivise the sorts of behaviours and collaborations that might make breakthroughs more commonplace

Since its launch in 2014, there has been considerable support for CRediT’s pragmatic way to provide transparency and discoverability to research contributions, and importantly build this into the scholarly communication infrastructure at minimal effort to researchers.  The standards organisation, CASRAI (Consortia Advancing Standards in Research Administration), is the custodian of the CRediT taxonomy, and many organisations are already using the taxonomy.  In 2016 PLOS implemented the CRediT taxonomy for authors across all its journals; Cell Press have endorsed the use of the roles amongst their ‘authors’; Aries Systems includes the taxonomy in its Editorial Manager manuscript submission system; and F1000 are implementing the taxonomy across their open research publishing platforms during 2017.

If others follow, this means that we will be able to tie contributions, to collaborations, to outputs and to impact. Collaborations are considered by policymakers and funding agencies to be increasingly crucial ways to tackle complex scientific problems and global challenges. If we can understand how collaborations work and when, we can properly incentivise the sorts of behaviours and collaborations that might make breakthroughs more commonplace and potentially speed up the translation to tangible impacts.  And for ‘science of science’ enthusiasts like me, it will take us a small, but helpful, step closer to being able to understand how science and research works.

I will be talking about the ‘connected culture of collaboration’ in a webinar on Thursday 6th April. Find out more about the event and sign up here.

How big does our text-mining training set need to be?

We got some great feedback from reviewers our new Sloan grant, including a suggestion that we be more transparent about our process over the course of the grant. We love that idea, and you’re now reading part of our plan for how to do that: we’re going to be blogging a lot more about what we learn as we go.

A big part of the grant is using machine learning to automatically discover mentions of software use in the research literature. It’s going to be a really fun project because we’ll get to play around with some of the very latest in ML, which currently The Hotness everywhere you look. And we’re learning a lot as we go. One of the first questions we’ve tackled (also in response to some good reviewer feedback) is: how big does our training set need to be? The machine learning system needs to be trained to recognized software mentions, and to do that we need to give it a set of annotated papers where we, as humans, have marked what a software mention looks like (and doesn’t look like). That training set is called the gold standard. It’s what the machine learning system learns from. Below is copied from one of our reviewer responses:

We came up with the number of articles to annotate through a combination of theory, experience, and intuition.  As usual in machine learning tasks, we considered the following aspects of the task at hand:

  • prevalence: the number of software mentions we expect in each article
  • task complexity: how much do software-mention words look like other words we don’t want to detect
  • number of features: how many different clues will we give our algorithm to help it decide whether each word is a software mention (eg is it a noun, is it in the Acknowledgements section, is it a mix of uppercase and lowercase, etc)

None of these aspects are clearly understood for this task at this point (one outcome of the proposed project is that we will understand them better once we are done, for future work), but we do have rough estimates.  Software mention prevalence will be different in each domain, but we expect roughly 3 mentions per paper, very roughly, based on previous work by Howison et al. and others.  Our estimate is that the task is moderately complex, based on the moderate f-measures achieved by Pan et al. and Duck et al. with hand-crafted rules.  Finally, we are planning to give our machine learning algorithm about 100 features (50 automatically discovered/generated by word2vec, plus 50 standard and rule-based features, as we discuss in the full proposal).

We then used these estimates.  As is common in machine learning sample size estimation, we started by applying a rule-of-thumb for the number of articles we’d have to annotate if we were to use the most simple algorithm, a multiple linear regression.  A standard rule of thumb (see is 10-20 datapoints are needed for each feature used by the algorithm, which implies we’d need 100 features * 10 datapoints = 1000 datapoints.  At 3 datapoints (software mentions) per article, this rule of thumb suggests we’d need 333 articles per domain.

From there we modified our estimate based on our specific machine learning circumstance.  Conditional Random Fields (our intended algorithm) is a more complex algorithm than multiple linear regression, which might suggest we’d need more than 333 articles.  On the other hand, our algorithm will also use “negative” datapoints inherent in the article (all the words in the article that are *not* software mentions, annotated implicitly as not software mentions) to help learn information about what is predictive of being vs not being a software mention — the inclusion of this kind of data for this task means our estimate of 333 articles is probably conservative and safe.

Based on this, as well as reviewing the literature for others who have done similar work (Pan et al. used a gold standard of 386 papers to learn their rules, Duck et al. used 1479 database and software mentions to train their rule weighting, etc), we determined that 300-500 articles per domain was appropriate. We also plan to experiment with combining the domains into one general model — in this approach, the domain would be added as an additional feature, which may prove more powerful overall. This would bring all 1000-1500 articles to the test set.

Finally, before proposing 300-500 articles per domain, we did a gut-check whether the proposed annotation burden was a reasonable amount of work and cost for the value of the task, and we felt it was.


Duck, G., Nenadic, G., Filannino, M., Brass, A., Robertson, D. L., & Stevens, R. (2016). A Survey of Bioinformatics Database and Software Usage through Mining the Literature. PLOS ONE, 11(6), e0157989.

Howison, J., & Bullard, J. (2015). Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature. Journal of the Association for Information Science and Technology (JASIST), Article first published online: 13 MAY 2015.

Pan, X., Yan, E., Wang, Q., & Hua, W. (2015). Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers. Journal of Informetrics, 9(4), 860–871.

The post How big does our text-mining training set need to be? appeared first on Impactstory blog.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑