Can we assess library electronic resource usage with citations, altmetrics & more ?

This post was originally published on this site

Last month I wrote 4 different ways of measuring library eresource usage  and argued one can obtain usage statistics by obtaining them from publishers or rely on one’s own analysis of ezproxy logs.  You can also analyse them by studying downloads or sessions, leading to a 2 by 2 matrix of measures.

However, there are in fact other ways that can provide alternative dimensions to measure usage or even help predict use.

My odd idea/insight is this while bibliometrics and altmetrics tends to be used to measure impact of articles/items produced by your users in terms of cites or altmetrics, is it also possible to flip it around and use it to look at what your users are citing/using (as measured by altmetrics) as a measure of use?

Can we use cites/altmetrics  from your users/ all users *to* electronic resource as a measure of use?

A question to consider, is it better to measure use by say citations from all users to the resource, or citations from only your users to the resource? The latter takes into local conditions (e.g. Nature might be well cited globally, but your institution might have no hard sciences people), but in many cases the former as you will see is much easier to obtain as the nature of some altmetrics (e.g. tweets) makes track the user’s institutional affiliation difficult.

The other question is do such measurements add more dimensions beyond just looking at sessions or downloads or are they correlated? Much research has been done to determine if traditional citation/altmetrics/usage factors correlate of course……..

In this long post, I consider using citations from traditional citation indexes like Scopus to more unconventional sources like  Open citations, from Syllabus (open Syllabus project/reading list software?), citations to datasets and various altmetrics (reference managers, social media etc).

I end with considering two alternatives to ezproxy for handling authentication, openathens and RemoteXs and see if they provide easier/better usage analytics.

Usage based on citations done by your users

While saying that your electronic resource has been accessed or downloaded by someone is useful, ultimately what you really want to know if the resource was used. For instance someone might have searched multiple times in your database generating many sessions or may even have downloaded a few articles generating many downloads but in the end did not use or even read any of them.

One downstream way to measure use for journal articles would be simply to see if your users are citing from resources you offer. While not a fool-proof method (your faculty might be citing an article that you provide via subscriptions but how sure are you they accessed it via your subscription as opposed to finding a free copy?), it does provide a different dimension to just tracking downloads.

Recently I realized (with help!) that with tools like Scopus it’s fairly easy to find what journal sources are actually being cited by your faculty/researchers.

Simply go to Scopus, do a affiliation search for your institution,

Select the correct institution grouping and “show documents”, on that screen select “all references” and then select “view references”. If you can’t find the “view references” link it might be hidden under a drop down menu depending on your screen resolution.

To be clear this is the reverse of the usual citation analysis we tend to do. We are not looking at who cites papers produced by our faculty, rather we are looking at what our faculty are citing.

Do note this analysis is limited to working for 2,000 documents, so you may need to split up the analysis by limiting by year or subject etc first.

In my example, I am looking at 37,337 references cited by 1,157 items indexed in Scopus (I limited to year of publication 2015-2017 to get below the 2k limit) published by my faculty.

The you can click on “Analyse search results” to take a closer look at the 37k items.

You can do various things but from here you can definitely see which journal sources are cited the most by your faculty. This supplements your electronic resource download and sessions statistics. I suppose one can even generate statistics at journal level like cost per cite per journal or even cite per download. But bear in mind the earlier point that you can’t assume a cite is a result of your providing access and also some journals provide a lot more articles leading to more possible cites and various other issues (.e.g subject differences etc)

As a sidenote, I notice articles like this are already trying to measure cost per Impact Factor, Eigenfactor, and Article Influence Score. The difference between what I suggested and those measures is that the measures mentioned in the article are global in scope and don’t take into account local factors. For example Nature might be high in cites, but that means nothing if none of your faculty are in the hard sciences and hence don’t cite it.

Don’t have Scopus or want to go beyond Scopus? There’s now the OpenCitations initiative,  which possibly could be used (I haven’t looked)


Citing from Syllabus

Another idea would be seeing which books or articles or assigned by your lecturers.
The first thing that comes to mine is looking at the open syllabus project.
Using Open Syllabus explorer to filter to University of Georgia’s syllabus
But that only works if you have your syllabus indexed in this project. But I’m guessing most modern library management systems like Alma and/or reading lists systems like Leganto, Talis Aspire etc can output such reading list data.
Then again, items that are placed on reading lists or syllabus most certainly will generate a ton of usage anyway, so this method might not generate any new insights.

Citing from books

Though Scopus has now expanded to include books in their index , one wonders if one could mine the largest source of book full text out there – Google Books to see what cites are made by books authored by our publishers. I’ve talked about the idea of Measuring value of special collections  and services by mining for thanks and acknowledgements in Google books for example.


Citing of datasets

Another type of cite that I have always thought that would be interesting to try to measure would be citation to data sets. While many datasets in STEM are free, for those of you working in the Financial/Business areas you are aware that libraries spend tons of money on buying financial, company and economic datasets.

Academic libraries buy such data or provide access to them at high cost via systems and platforms  like Datastream , Eikon, WRDS, Bloomberg etc.

While such data is critical to researchers, they use data from such sources to conduct research and literally couldn’t do any research without them measuring use is difficult. After all, most researchers would request we purchase a dataset, and then end up downloading everything once and that’s it.

But is the dataset eventually really used? The key of course is to see if the papers produce by them cite or mention their use of the dataset. Unlike citing of journal articles, where most cites are glancing or secondary, cites to datasets are HUGE.

The problem is until recently it was very tough to check what datasets were cited or used in papers. I’ve toyed with the idea of playing with Google Scholar to somehow search or mine the data by searching full text papers for mentions of WRDS, Eikon etc.

As such I see with great interest the following announcement by Elsevier titled “Wharton Research Data Services, SSRN and Elsevier Announce Groundbreaking New Collaboration — Elevating Research and Researchers Who Use WRDS

I haven’t seen it yet but the announcement states that there will be an establishment of a WRDS Research Paper Series (RPS). It will be “a searchable repository of all papers submitted to SSRN that cite WRDS in their work.”

I’m unsure how the system determines if a paper is citing WRDS, whether it will be automatic or manual. Or whether the cite is nuanced enough to tell you if the cite is a cite of the data for use for the study.

I can imagine though librarians might use this to check what papers produced by their faculty are in this working papers series as a first cut.

Of course this works with just WRDS, so it would be nice if this started working with other data sources.

Elsevier (them again!) also blogged this month that they are enabling Research Dataset Linking on the Scopus document pages. 

Research data linking in Scopus

They mention something about Data Literature Interlinking (DLI) service and Scholix, which I don’t know anything about , but will study further.


Usage based on altmetrics – reference managers

While I always had some interest in bibliometrics, I found it curious I had this blind spot with regards to the fact that one could use citation data to gauge usage of the library resources until recently. I think the reason is bibliometrics tends to be outward looking. We typically are interested in who is citing us or for librarians we help faculty see who are citing the papers they wrote.

Here we are looking at the reverse, what are our faculty citing? With this insight, it’s obvious to wonder if besides traditional citation measures can one use altmetrics to gauge usage of library resources?

Of all altmetrics, one of the most promising ones getting the most attention is looking at what is in reference managers as various studies have shown this is a leading indicator in predicting cites.

Of all reference manager, Mendeley  (owned by Elsevier again) gets the most attention because getting data from them is relatively easy hence many have generally settled on measuring the number of “readers” in mendeley (people who have the article in their reference library) for each article as a altmetric measure.


Readers in Mendeley

So for our purposes, we want to mine Mendeley to see what articles are in the reference managers of our faculty. Leaving aside the privacy issues, how can one do so? There’s a Mendeley API but I’m not sure if it’s easily doable.

One sure fire way of getting such information is to subscribe to Mendeley Institutional Edition. Subscribing to it will allow researchers in your institution more benefits such as more cloud space, but most importantly for our purposes it gives you the librarian analytics.

Analytics dashboard in Mendeley for institutions (outdated?)

I don’t have access to a recent screenshot of the analytics dashboard this is a much older one that was released back when the service was first available. It shows the top journals of articles in researcher’s Mendeley Library.

Again this gives you insight into your researchers use of journals beyond pure downloads, access  or sessions.

As an aside, I believe Proquests reference manager Proquest flow used to offer a institutional edition version that offered similar analytics as Mendeley for institutions but now I see it has been folded into Refworks, it’s possible similar statistics can be found?


Usage based on browzine

Many libraries today subscribe to Browzine, a app that makes browsing for favourite journals easier.

As your users add journal titles to their book shelves, it probably indicates a measure of popularity.

Journal titles on book shelf in Browzine


Usage based on other altmetrics

There a wide variety of altmetrics beyond mendeley readers provided by altmetrics providers like, impactstory and Plumx.

Plumx which perhaps has the most complete set of altmetrics, classifies them into 5 broad categories.
Can we use them to assess use by our users?
I’m not the most familiar with altmetrics tools having not looked at them recently, but I suspect like traditional citation tools, the general focus is on tracking altmetrics on a set of documents published by an organization or author rather than the reverse tracking altmetrics generated by your faculty or organization.
I think even if there was an attempt to do so it would be very tough. For example social media/mentions made would need to be tracked by author and mapped to organizations.
If that was done, so for example you could say articles in this journal received 50 tweets from users in your organization, had 5 blog posts by users in your organization mention it etc.
Anyone know if this can be easily done? Or is this too big brother?


Systems that measure sessions and downloads in a easier way – OpenAthens & Remotex

When I blogged 4 different ways of measuring library eresource usage   I lamented that getting usable statistics from ezproxy logs was a big pain.

There seems to be 2 issues here.

Firstly and most well known, it is extremely difficult to identify which of the requests in the logs are referring to downloads. Ezpaarse which I blogged about before is perhaps best known open project to try to do so, but it’s far from perfect and is unable to handle many common resources.

The second issue which is perhaps related to the first is that it’s not trivial either to determine from the HTTP Request which resource it belongs to. Some resources span multiple domains and subdomains, some resources are easier to parse into more grainular subresources (e.g. platform->database-> journals) others are less clear etc. Even looking at what is proxied in the ezproxy configuration file is not a complete solution.

Can our systems do better?

One solution that has recently been getting a lot more buzz (or so it seems to me) is Openathens.


My understanding of openathens is still basic so I won’t talk about what it is but I will say this. Compared to ezproxy it provides better usage statistics. As seen above you automatically get statistics summarised by user group and more importantly which resources was accessed (by sessions basically).

I haven’t played with it enough to know how granular access is , whether it is at content provider level (e.g. Ebsco) or database level (e.g. Business Source Complete) or by Journal (Harvard business review) though.

On the downside compared to ezproxy you don’t get any data once they get past the authentication login. Neither do you get downloads.

The next product I will discuss RemoteXs handles this aspect beautifully. It’s a service I discovered when I was in India for a conference and will probably blog more about it in the future.

But it’s essentially I think a ezproxy competitor, but made up with modern requirements in mind. A huge part about it that is relevant for discussion here is that it’s has extremely rich usage data captured in the analytical dashboards.

This particularly view in the dashboard blew my mind.

RemoteXs dashboard – logins/downloads by user category

You can click on the image to get a closer view but it basically displays for the last 30 days for each category of user , the number of logins, the number of users , total downloads, download data (in MBs) and Browsing data (in MBs)

I’m not sure what the definitions for each of these metrics are yet, but they look really promising.

Obviously you can see the same data in other views such as by day, by resource type, even by user.


RemoteXs dashboard – logins/downloads by day


RemoteXs dashboard – logins/downloads by resource



RemoteXs dashboard – logins/downloads by individual user


The appetite for data and analytics is growing. I note the rise of a new class of services, such as Mendeley for institutions, Figshare for institutions that offer a free service to entice researchers on board and then turn around to institutions to offer them a premium service which will give their users some add-ons but more importantly provides analytics of their use behavior on those platforms.
New services like RedLink, JISC’s Journal Usage Statistics Portal, offer to help libraries create dashboards of electronic usage statisics (via COUNTER?).

It will be interesting to see how all this plays out.

All in all using traditional metrics or altmetrics to measure use of resources by your users seems to be possible in some cases, though in many cases it may not add additional signal beyond the usual download signals.

Comments are closed.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑