When evaluating research, different metrics tell us different things

This post was originally published on this site

Science has long been accepted by policy makers as valuable; however, recently scientists and research institutions have been asked for evidence to justify their research. How this evidence is provided is grounds for lively debate.

Scientific peer-review based on human judgement is time consuming and complex. As a result, it has become commonplace to make assumptions around the quality of research based on indicators of reuse by other academics – the number of citations the corresponding articles receive. Citation impact is used as a proxy for quality in this way, though there are manifold issues around this proxy. Should negative citations count? Are all citations of equal merit? There is likely static noise on a large scale.

Policy makers are interested to understand the quality (impact) of research programs and research institutions; however, it can be difficult to assess this based on citations of research publications alone – not least because citations take time to accrue. This is one of the main reasons why young researchers, and recently founded research institutions, cannot be evaluated meaningfully using citation impact.


Are other means of research evaluation available?

The meaning and value of alternative metrics in understanding impact remains unclear

Alternative metrics (altmetrics) have been introduced to complement and supplement traditional metrics. Altmetrics are various metrics under one umbrella term, covering mentions of publications on Twitter, Facebook, blogs, news articles, bookmarks in online reference managers, references in Wikipedia, and many more. These metrics have been proposed as a new way for a broader impact measurement of research. Citations only refer to the academic area.

Many publishers – for example, Wiley, Springer Nature, F1000, and PLOS – include altmetrics alongside published articles. Researchers have also started to add these indicators of the use and reuse of their output to CVs and grant applications. However, the meaning and value of alternative metrics in understanding impact remains unclear. What kind of impact is created when a publication is mentioned in a tweet? Is it important who has sent the tweet? How should we deal with highly tweeted fraudulent research?

Various studies have found a close to zero correlation between citations and tweets, while other studies have shown that bookmark counts for example, and especially via Mendeley, are related to scientific impact. This is understandable as many scientists use online reference managers to support the production of a manuscript. Against this backdrop we conducted two studies both published as preprints on arXiv to further explore the potential value of alternative metrics to support research evaluation.


Comparing the metrics

We wanted to know whether both methods of measuring quality come to the same or different conclusions on the articles.

In the first study, we analyzed how traditional and alternative research metrics correlated with a post publication quality measure – namely an expert view of an article, drawing upon the F1000Prime recommendation scores. F1000Prime is a platform that provides expert assessments of research articles – essentially ‘post-publication peer review’. By comparing both traditional (citations) and alternative metrics (Twitter mentions and bookmarks in online reference managers) with judgements by experts who had rated articles on F1000Prime, we investigated the convergent validity of the metrics.

We wanted to know whether both methods of measuring quality (experts and metrics) come to the same or different conclusions on the articles. We found that mentions of publications on Twitter have a significantly weaker association with expert judgements and thus quality than traditional metrics. In contrast, bookmarks in online reference managers were much more closely correlated with traditional metrics.

In the second study, we included other altmetrics besides Twitter mentions: data on inclusion in policy-related documents, Wikipedia, Facebook, blogs, and news articles. The results for these altmetrics validate the results of the first study. They demonstrate that the relationship between altmetrics and assessments by experts is not as strong as the relationship between citations and assessments by experts. In fact, the relationship between citations and peer assessments is about two to three times stronger than the association between altmetrics and assessments by expert peers. The implication of our analyses are that altmetrics are much less valuable as an indicator of research quality and, therefore, for research evaluation than traditional citation-based indicators – at least in the biosciences.


Is there a silver lining for altmetrics?

Measurement of broader impact, on society as a whole or on certain societal groups unrelated to academic research, has become more important in science policy in recent years. Currently, societal impact is measured using case studies in the UK Research Excellence Framework, which can be expensive and biased – usually only positive outcomes are reported. Quantitative indicators, which are rather inexpensive to obtain, are required for societal impact measurement, perhaps some of the colorful bouquet of altmetrics can be useful here.

Our research indicated that altmetrics, as currently framed, are significantly weaker indicators of research quality – as measured by expert peers’ assessments – than traditional metrics. This may be a problem for the use of altmetrics in research excellence evaluation. We would welcome feedback on our research preprints and are keen to continue the debate around what indicators of research are most helpful in supporting efficient and effective research evaluation.

The post When evaluating research, different metrics tell us different things appeared first on F1000 Blogs.

Comments are closed.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑