I tried using oaDOI.org API to find the amount of Open Access and you won’t believe what I found! (Click bait title)
By now most people would have heard of the very useful Unpaywall service that allows you to bypass many paywalled articles by sending you to free versions.
While not perfect and all comprehensive, it’s one of the easiest ways to find if a free version of an article exists. The underlying service of unpaywall is oaDOI.org, which provides a free nifty API service that you can use.
Note: Unpaywall itself can find more articles compared to the oaDOI.org service, because it “supplements oaDOI with other data sources, too; for instance, Unpaywall tries to parse and understand scholarly article pages as you view them. Consequently, Unpaywall’s results are a bit more comprehensive than what you’d get by calling oaDOI directly.”
How to use the oaDOI API to find out how much is free
All you have to do is have a bunch of dois that you are curious to see how much is available free and point the oaDOI.org service API at it and the API can spit out a bunch of useful information like whether a free to read version exists, the URL of the free version (if exists), color of the open access version etc.
The very useful “Collecting Open Access information using OpenRefine and the oaDOI API”, sets out the steps to do so , but what can you really do with this?
Scenario 1 : Check how much of your institution’s scholarly output is free to read.
Ever wondered how much of your institution’s scholarly output is already free to read? While it might be impossible to know with 100% certainty, you can certainly get a ballpark figure using the oaDOI.org API.
First you need a source of dois. Easiest perhaps is to use something like Scopus or Web of Science to find all articles by affiliation search and export the results with dois to csv for that. This is what I did. Other sources of dois, say from your Institutional repository or CRIS system works too.
You may need to do this multiple times for various years if your institution has an output that exceed 2,000 results. Once you have the results you needed checked, select “Export”
In my case, I exported records from 2013 to 2017, a total of 1,968 records.
Then I ran Openrefine and waited for the web interface to load.
Once it’s loaded, I clicked onCreate project->Get data from this computer->Choose file and browse and I selected the file just downloaded from Scopus.
Once I checked everything was loaded okay, I clicked on “create project”
Once everything was imported into openRefine , I checked how many of these records have dois, as oadoi API only works on dois.
To do that, go to the DOI column (click on the down arrow)-> Facet-> Customized facets-> Facet by blank
I see from the side filter panel on the left, 220 records have no dois (blank=true) and hence 1,748 have dois. oaDOI.org API will only work on the 1,748 records with doi. I click on false in the side panel to filter down to just the records that have dois and ignore the blank records.
The next step is where the magic happens. On the DOI column (click on the down arrow), click
Edit column->Add column by fetching URLs.
In the expression space, type “https://api.oadoi.org/ +value+”?firstname.lastname@example.org”. This constructs for each record a API call to oadoi using dois.
Don’t forget to name the new column, in my case I used oadoi.
No registration is needed for the use of the API (no rate limits but suggested for below 100k calls per day), but do remember to change the part that says =email@example.com to your email as well as the new column name. In my example below I changed the new column name to be created to oadoi.
It will take roughly 2–3 hours for the API calls to complete but when it is done you will see a new column appear. In my case it is the oadoi column.
The data extracted is in JSON, which may be intmidating but don’t worry, openrefine provides tools to handle it.
On the oaodoi column, click on the down arrow -> Edit Column->Add column based on this column.
In the expression column try
Again remember to enter a name for the new column.
This will allow you to extract the value key pair “is_free_to_read” for each record. Obviously for records that have values “true”, there is a free version found by oaDOI.
You can repeat the process and create new columns with expressions like
a. value.parseJson().results.free_fulltext_url (create column with URL of free full text found)
b. value.parseJson().results.oa_color (create columns with information of OA color — Green/Gold/Blue)
Want to know more about the fields you can add beyond oa_color or free_fulltext_url ? Refer here.
How much of the content is free to read?
In my example above, I extracted fields like OA_color, Free to read (set to “true” if free copy found, “false” otherwise), Free Text URL (blank if no free full text found), Found Green OA , Hybrid available (“true” if copy is free to read in hybrid journal).
Using this data with openrefine it’s very easy to work out that out of 1,748 records with doi, I got a total of 260 free to read (14.8%).
So 14.8% of output with dois by my institution had free to read copies found by oaDOI.org. This could be an underestimate given contents in my institution’s institutional repository occasionally has problems getting discovered by oaDOI.org
27 (1.5%) were Gold, 226 (12.9%) Green and 7 (0.4%) Blue
Interestingly while publication output slowly increased from 2013 to 2016 (2017 is in progress), free to read articles are most likely to be available the older the year it was published. Is the effect of embargos showing here? I would need to match articles and journal titles they are in with Sherpa romeo to confirm this.
You can use Free Text URL column to look for URLs of the free full text to study where researchers are putting up their papers or even download them to deposit into your own IR if you have a practice of doing so.
You can go further and use openrefine with SherpaRomeo API, to check on all articles including those without free to read versions to see how much more potentially could be made open access.
Scenario 2: Check how much of a journal title you are considering subscribing or cancelling is free to read.
One of the most long standing debates in the open access world is whether embargos are needed. Publishers of course claim embargos are needed to protect themselves otherwise librarians would start cancelling subscriptions due to availability of self archive versions.
Some open access advocates claim that librarians can never cancel subscriptions due to self archiving allowed by Green OA because this can happen only if Green OA reaches 100% for the title.
My view is this.
@RickyPo Ok. I don’t know of any librarians who cancel journals based on embargo policy. Green OA levels not high enough plus hard to figure out
The “ hard to figure out” part s slowly changing with commercial services like 1science’s OAfigr but with the magic of oaDOI and openrefine you can figure out a similar statistic with some effort following the same steps as before.
The only difference is that you use the dois of the articles in the journal title you are interested in. One can again use Scopus or better yet crossref’s api (if title not indexed in Scopus) to do so.Once you have the list of dois, the same steps as above apply.
So far I have only tried to do this for 3 Journals, focusing on LIS related ones. What do you think I found?
What % of LIS journals do you expect to be free to read?
I’ve so far tried with only 2 LIS titles and these are the results
Journal of Business and Finance Librarianship (Taylor & Francis)— Out of 310 articles (1990–2017) with doi, only 11 articles were free (Green), that’s 3.5%.
Journal of Academic Librarianship (Elsevier) — Out of 1,398 articles with DOI (1993–1996, 2001–2017) with doi, only 112 articles were free (11 blue, 101 Green), that’s 8.0%.
I don’t see any particular patterns between years of publication and likelihood to be made free to read, though for the later, the highest counts are in 2014 (20), 2007(14), 2013(11) which are in the later years. This corresponds to 18.9%, 17.1% and 12.4% by yearly output.
For reference this is what Sherpa Romeo says about what is allowed with the journal of Academic librarianship
What to make of these results?
Firstly, I was kinda surprised by the relatively lower rate of free to read article for the LIS journals compared to my institution’s rate, though it’s of course only 2 journals.
Restricting just to articles published from 2013 onwards to make my institution output and Journal of Academic Librarianship comparable, the free to read article % is 14.8 % vs 10.0%
Of course things aren’t exactly comparable, because of the differences in disciplines (my institution is a mix of Economics/Business/Social Science/Law/Information Systems) but I was still expecting that librarians to be more aware of the possibility of self archiving.
The other thing to note is that the oaDOI API (unlike unpaywall) probably doesn’t find content when the legal status is unclear (e.g. Reasearchgate), but this difference is likely to skew towards non-librarians who are more likely to deposit such items.
The pattern of free to read article by years for my institution from 2013 onwards is interesting, showing a clear pattern of higher archiving rates the older the article is. I probably will need study older articles to see if this holds as I go back in time.
For journal titles where I have the advantage of being able to study for longer periods, but the results are unclear due to smaller numbers, but so far for the one journal I’ve studied so far, it seems self archiving rates are higher after the 2010s.
I’m not sure what to make of this. More study needed. Libraries who subscribe to OAfigr probably have more accurate and in-depth statistics of course.