Last month I wrote 4 different ways of measuring library eresource usage and argued one can obtain usage statistics by obtaining them from publishers or rely on one’s own analysis of ezproxy logs. You can also analyse them by studying downloads or sessions, leading to a 2 by 2 matrix of measures.
However, there are in fact other ways that can provide alternative dimensions to measure usage or even help predict use.
My odd idea/insight is this while bibliometrics and altmetrics tends to be used to measure impact of articles/items produced by your users in terms of cites or altmetrics, is it also possible to flip it around and use it to look at what your users are citing/using (as measured by altmetrics) as a measure of use?
A question to consider, is it better to measure use by say citations from all users to the resource, or citations from only your users to the resource? The latter takes into local conditions (e.g. Nature might be well cited globally, but your institution might have no hard sciences people), but in many cases the former as you will see is much easier to obtain as the nature of some altmetrics (e.g. tweets) makes track the user’s institutional affiliation difficult.
The other question is do such measurements add more dimensions beyond just looking at sessions or downloads or are they correlated? Much research has been done to determine if traditional citation/altmetrics/usage factors correlate of course……..
In this long post, I consider using citations from traditional citation indexes like Scopus to more unconventional sources like Open citations, from Syllabus (open Syllabus project/reading list software?), citations to datasets and various altmetrics (reference managers, social media etc).
Usage based on citations done by your users
While saying that your electronic resource has been accessed or downloaded by someone is useful, ultimately what you really want to know if the resource was used. For instance someone might have searched multiple times in your database generating many sessions or may even have downloaded a few articles generating many downloads but in the end did not use or even read any of them.
One downstream way to measure use for journal articles would be simply to see if your users are citing from resources you offer. While not a fool-proof method (your faculty might be citing an article that you provide via subscriptions but how sure are you they accessed it via your subscription as opposed to finding a free copy?), it does provide a different dimension to just tracking downloads.
Recently I realized (with help!) that with tools like Scopus it’s fairly easy to find what journal sources are actually being cited by your faculty/researchers.
Simply go to Scopus, do a affiliation search for your institution,
Select the correct institution grouping and “show documents”, on that screen select “all references” and then select “view references”. If you can’t find the “view references” link it might be hidden under a drop down menu depending on your screen resolution.
To be clear this is the reverse of the usual citation analysis we tend to do. We are not looking at who cites papers produced by our faculty, rather we are looking at what our faculty are citing.
Do note this analysis is limited to working for 2,000 documents, so you may need to split up the analysis by limiting by year or subject etc first.
In my example, I am looking at 37,337 references cited by 1,157 items indexed in Scopus (I limited to year of publication 2015-2017 to get below the 2k limit) published by my faculty.
The you can click on “Analyse search results” to take a closer look at the 37k items.
You can do various things but from here you can definitely see which journal sources are cited the most by your faculty. This supplements your electronic resource download and sessions statistics. I suppose one can even generate statistics at journal level like cost per cite per journal or even cite per download. But bear in mind the earlier point that you can’t assume a cite is a result of your providing access and also some journals provide a lot more articles leading to more possible cites and various other issues (.e.g subject differences etc)
As a sidenote, I notice articles like this are already trying to measure cost per Impact Factor, Eigenfactor, and Article Influence Score. The difference between what I suggested and those measures is that the measures mentioned in the article are global in scope and don’t take into account local factors. For example Nature might be high in cites, but that means nothing if none of your faculty are in the hard sciences and hence don’t cite it.
Don’t have Scopus or want to go beyond Scopus? There’s now the OpenCitations initiative, which possibly could be used (I haven’t looked)
Citing from Syllabus
Citing from books
Citing of datasets
Another type of cite that I have always thought that would be interesting to try to measure would be citation to data sets. While many datasets in STEM are free, for those of you working in the Financial/Business areas you are aware that libraries spend tons of money on buying financial, company and economic datasets.
Academic libraries buy such data or provide access to them at high cost via systems and platforms like Datastream , Eikon, WRDS, Bloomberg etc.
While such data is critical to researchers, they use data from such sources to conduct research and literally couldn’t do any research without them measuring use is difficult. After all, most researchers would request we purchase a dataset, and then end up downloading everything once and that’s it.
But is the dataset eventually really used? The key of course is to see if the papers produce by them cite or mention their use of the dataset. Unlike citing of journal articles, where most cites are glancing or secondary, cites to datasets are HUGE.
The problem is until recently it was very tough to check what datasets were cited or used in papers. I’ve toyed with the idea of playing with Google Scholar to somehow search or mine the data by searching full text papers for mentions of WRDS, Eikon etc.
As such I see with great interest the following announcement by Elsevier titled “Wharton Research Data Services, SSRN and Elsevier Announce Groundbreaking New Collaboration — Elevating Research and Researchers Who Use WRDS”
I haven’t seen it yet but the announcement states that there will be an establishment of a WRDS Research Paper Series (RPS). It will be “a searchable repository of all papers submitted to SSRN that cite WRDS in their work.”
I’m unsure how the system determines if a paper is citing WRDS, whether it will be automatic or manual. Or whether the cite is nuanced enough to tell you if the cite is a cite of the data for use for the study.
I can imagine though librarians might use this to check what papers produced by their faculty are in this working papers series as a first cut.
Of course this works with just WRDS, so it would be nice if this started working with other data sources.
Elsevier (them again!) also blogged this month that they are enabling Research Dataset Linking on the Scopus document pages.
They mention something about Data Literature Interlinking (DLI) service and Scholix, which I don’t know anything about , but will study further.
Usage based on altmetrics – reference managers
While I always had some interest in bibliometrics, I found it curious I had this blind spot with regards to the fact that one could use citation data to gauge usage of the library resources until recently. I think the reason is bibliometrics tends to be outward looking. We typically are interested in who is citing us or for librarians we help faculty see who are citing the papers they wrote.
Here we are looking at the reverse, what are our faculty citing? With this insight, it’s obvious to wonder if besides traditional citation measures can one use altmetrics to gauge usage of library resources?
Of all altmetrics, one of the most promising ones getting the most attention is looking at what is in reference managers as various studies have shown this is a leading indicator in predicting cites.
Of all reference manager, Mendeley (owned by Elsevier again) gets the most attention because getting data from them is relatively easy hence many have generally settled on measuring the number of “readers” in mendeley (people who have the article in their reference library) for each article as a altmetric measure.
So for our purposes, we want to mine Mendeley to see what articles are in the reference managers of our faculty. Leaving aside the privacy issues, how can one do so? There’s a Mendeley API but I’m not sure if it’s easily doable.
One sure fire way of getting such information is to subscribe to Mendeley Institutional Edition. Subscribing to it will allow researchers in your institution more benefits such as more cloud space, but most importantly for our purposes it gives you the librarian analytics.
I don’t have access to a recent screenshot of the analytics dashboard this is a much older one that was released back when the service was first available. It shows the top journals of articles in researcher’s Mendeley Library.
Again this gives you insight into your researchers use of journals beyond pure downloads, access or sessions.
As an aside, I believe Proquests reference manager Proquest flow used to offer a institutional edition version that offered similar analytics as Mendeley for institutions but now I see it has been folded into Refworks, it’s possible similar statistics can be found?
Usage based on browzine
Many libraries today subscribe to Browzine, a app that makes browsing for favourite journals easier.
As your users add journal titles to their book shelves, it probably indicates a measure of popularity.
Usage based on other altmetrics
Plumx which perhaps has the most complete set of altmetrics, classifies them into 5 broad categories.
Systems that measure sessions and downloads in a easier way – OpenAthens & Remotex
When I blogged 4 different ways of measuring library eresource usage I lamented that getting usable statistics from ezproxy logs was a big pain.
There seems to be 2 issues here.
Firstly and most well known, it is extremely difficult to identify which of the requests in the logs are referring to downloads. Ezpaarse which I blogged about before is perhaps best known open project to try to do so, but it’s far from perfect and is unable to handle many common resources.
The second issue which is perhaps related to the first is that it’s not trivial either to determine from the HTTP Request which resource it belongs to. Some resources span multiple domains and subdomains, some resources are easier to parse into more grainular subresources (e.g. platform->database-> journals) others are less clear etc. Even looking at what is proxied in the ezproxy configuration file is not a complete solution.
Can our systems do better?
One solution that has recently been getting a lot more buzz (or so it seems to me) is Openathens.
My understanding of openathens is still basic so I won’t talk about what it is but I will say this. Compared to ezproxy it provides better usage statistics. As seen above you automatically get statistics summarised by user group and more importantly which resources was accessed (by sessions basically).
I haven’t played with it enough to know how granular access is , whether it is at content provider level (e.g. Ebsco) or database level (e.g. Business Source Complete) or by Journal (Harvard business review) though.
On the downside compared to ezproxy you don’t get any data once they get past the authentication login. Neither do you get downloads.
The next product I will discuss RemoteXs handles this aspect beautifully. It’s a service I discovered when I was in India for a conference and will probably blog more about it in the future.
But it’s essentially I think a ezproxy competitor, but made up with modern requirements in mind. A huge part about it that is relevant for discussion here is that it’s has extremely rich usage data captured in the analytical dashboards.
This particularly view in the dashboard blew my mind.
You can click on the image to get a closer view but it basically displays for the last 30 days for each category of user , the number of logins, the number of users , total downloads, download data (in MBs) and Browsing data (in MBs)
I’m not sure what the definitions for each of these metrics are yet, but they look really promising.
Obviously you can see the same data in other views such as by day, by resource type, even by user.
It will be interesting to see how all this plays out.
All in all using traditional metrics or altmetrics to measure use of resources by your users seems to be possible in some cases, though in many cases it may not add additional signal beyond the usual download signals.