It’s amazing what you can do with data these days. A couple of weeks ago, Digital Science held its third annual US Publisher day. This year, one of the themes that emerged was data, scientometrics and how we, as both an industry and as Digital Science, can use it to support publishers in strategic decision making.
The traditional data type for understanding the research landscape are citations and bibliometrics. While in recent years, we’ve all come to view impact as being much broader than citations, it’s only very recently that other types of data and analyses have been used for strategic planning and business intelligence.
That is what the Digital Science Consultancy team does. We apply new types of data and new analysis techniques to support funders, institutions, and publishers to make better decisions faster. During my talk at the Publisher Day, I broke down three aspects of how we go about doing that.
1) First, you need data
We have some pretty unique datasets at Digital Science. Most people are familiar with altmetrics and our portfolio company Altmetric.com. It was Altmetric that really helped define the discipline. We also have Uber Research, which created the only database of awarded research funding: Dimensions. Then there’s GRID, our open dataset of institutional identities.
Not only do we have data, but often our customers have data. For publishers, the most obvious source is in the form of authors, affiliations, citations, subscription information and the most underutilized data that publishers have; the full text of the articles that they publish.
2) Data is only useful if you have the right tools
Once you have data, you need the right tools to interpret it. In March 2017, Digital Science released a Digital Research Report in which we used the affiliation data from PLOS One articles dating between 2006 and 2016. Affiliation data is traditionally a challenge to work with. It’s usually a free-text field in journal submission forms, which results in a non-uniform hotchpotch of variant spellings and word orders. Sometimes, they’re in more than one language.
The GRID suite of tools contains a matcher that allows us to discover and deduplicate author affiliations with a high level of accuracy. Once we know which GRID records are the right ones, we than have a plethora of other information available including ISNI record numbers, Crossref, relationships between parent-child institutions and importantly, geolocation data.
In the report, we analyzed the global network of research collaborations, how collaboration is used strategically, and how it’s changing over time. The report is well worth a read if your business involves helping researchers communicate, just click on the image below to access it.
A graph of global research collaboration, colour coded by country. For the full digital research report, click on the image.
Text mining, natural language processing and topic modelling have come into their own in recent years. The field has moved so fast that many people, even in the information space, aren’t aware of just how powerful things have gotten.
As part of a long-standing relationship, Digital Science helped the Higher Education Funding Council for England (HEFCE) analyze the results of the last Research Excellence Framework (REF). As a part of the REF, Universities are asked to submit a series of impact case studies that detail how the Institution’s activities impact society in ways other than bibliometric citations.
Those written accounts are read and scored by one of four panels (physical sciences, life sciences, social sciences, and arts and humanities). From a publisher perspective, this type of content is similar to a full text archive; it’s mostly words rather than numbers and is written for humans to read, rather than computers.
We used natural language processing to assess the similarity between case studies. Each one was plotted on a graph, colour coded by the panel that assessed it. The distance between the dots is inversely dependant on the similarities between the studies.
Despite the fact that we did not tell the computer anything about the structure of research in the UK, spontaneous clusters emerged from the dataset, enabling us to identify areas of excellence in UK research. The reuslts were pretty remarkable, as you can see below.
A cluster analysis based on similarities between impact case studies submitted for the REF. The four panels are color coded. Red for physical sciences, green for life sciences, blue for social sciences and yellow for arts and humanities.
If we zoom in on a particular cluster (the nicely multi-coloured one on the bottom left edge (detail shown below), we see that the cluster contains an interdisciplinary group of research on environmental management of waterways.
Zooming in on one particularly multidisciplinary cluster shows that it’s about environmental management of waterways.
You can play with the interactive visualization yourself, here.
The applications of these techniques for publishers are exciting. From discovery of emergent fields, to consolidation of existing titles and everything in between. As I said during the publisher day,
Imagine what we could do together by combining your data with ours and applying our techniques.
3) The secret sauce: domain expertise
Accurate interpretation requires an understanding of the conditions that generated the data. This is an area where both Digital Science and our Consultancy come into their own.
Inside the Digital Science Consultancy, our expanding team contains amongst others, a professor of bibliometrics, a world leading data scientist, an institutional research management and libraries expert, a health care and bibliometrics analyst, a very well known bibliometrics leader and a former academic research scientist.
Digital Science more broadly has invested in companies, sold to customers, driven progress through outreach and research reports, and even found many of its employees and entrepreneurs at each stage in the scholarly supply chain. This experience and depth of knowledge give us a truly unique perspective across the entire landscape.
Digital Science has products, services, and expertise along the entire scholarly supply chain.
Supporting academic publishers
The purpose of the publisher day was partly to inform our customers of the developments that we’ve been working on at Digital Science, but it was also to learn from publishers about how we can help them.
We heard from publishers who want to know what topics are emerging in their fields based on funding data. Others wanted to look across the landscape with cluster analyses and either find new emergent fields, or opportunities to consolidate titles. There was also interest in identifying emergent geographies and patterns of collaboration, as well as a desire to find authors for special issues or reviews, editors, or just leaders and rising stars in a field.
With data and metadata analysis finally coming into its own to inform strategic decision making in publishing, these are exciting times. I’m personally looking forward to seeing not only how our capabilities at Digital Science continue to grow but also how others in the industry make use of data as business intelligence.