Scholarly maps, recommenders & reference managers at Crossref Live 17

This post was originally published on this site


I recently attended the Crossref Live17 event in Singapore. I discovered that these events often have a heavy publisher presence, who make up most of their membership.

Still, I am a bit of a doi nerd, and I have long enjoyed watching Crossref webinars to understand what goes on in the background for dois to work (hint, it helps a lot for troubleshooting broken links in our discovery services) and recently started playing with their Crossref event data API, so it was a good opportunity to attend a non-librarian conference. It helped that it was held just a stone’s throw away from where I work and needed no registration fees to attend. I really enjoyed it, and am still thinking about what was presented days after the event, particularly the discovery implications.

These are some of what struck me as most interesting.

TrendMD recommender – the recommender you might not have heard of but have used

As someone who attends mostly librarian conferences (I really shouldn’t), it was an eye opener to attend a conference whose main target wasn’t librarians but publishers. So for example, I learned about TrendMD, a recommender service that markets to journals.

I was amazed to realise that “the related articles” list you see in many journal sites were actually done by TrendMD widgets and not done in-house.

 

TrendMD on BMJ journals

This was quite surprising to me, as I am in the midst of writing a post on academic library recommender systems (Mendeley’s, Exlibris, NatureSpringer, JSTOR’s, Core etc) and have never come across TrendMD.

A small subset of the customers of TrendMD
 
They have an amazing customer list, including big names like Elsevier, EMBO, BMJ, AAAS, IEEE etc. BTW in case you are wondering despite the name, they cover all disciplines not just medical.
Admittedly, I am still trying to wrap my brain around how their business model works , it’s a complicated system of using credits where clicks inbound to your content gains you credits. It feels a bit Google adwords like to me though….
But a wild thought passes through my mind, can libraries do this for content in their repositories?  Maybe not articles (unless the library is a joural publisher) but digized scans ? Digital humantities, digital scholarship products?

Microsoft Academic talk – machine generated metadata and “enthusiastically endorsed” for recusive importance

I have been testing the new Microsoft Academic since it went into beta, and am watching it very carefully. So it was great to hear a talk on it.
I was aware of most of the features including the newly released citation features that improve on Google Scholar, but I was most intrigued with the following points made.
Firstly, the speaker started by saying he had a “philosophical difference” with the way metadata is done by Crossref and most of the talks so far. Essentially, Microsoft academic generates metadata using “machine labour” rather than human manual labour.

 

He then went in depth into what was scraped and what those numbers meant. This was followed by a video on what Microsoft Academic can do when combined with Microsoft Power BI and a demo of the features. Most of this won’t be surprising to you if you have been watching and using Microsoft Academic recently.

Still there were some interesting bits. While talking about the new cite feature , the speaker mentioned this doesn’t show dois. I believe he then went on to state that while scraping dois from references he found the errors in them was “not insignificant”.

He also started talking about how difficult it was to figure out what journals were predatory journals by a human though they use machine learning to handle it like everything else (maybe based on the “recursive importance” below? ). He also remarked that a major weakness with just using citations is that citations didn’t tell us whether the cite was a critical one for the paper, or if the  cite was just a throw away cite.

 

As such Microsoft academic calculates something called “recursive importance” – where importance is how “enthusiastically endorsed” by important others.

I missed out this part, but I think how enthusiastically endorsed is based on some sort of texual analysis based on the sentence before the citation (which he said earlier they extract) and probably by counting the number of times the cites appears in the paper (studies have indicated this and perhaps position of the cite in the paper, maps with importance of the cite)

This “recursive importance” is done for not just articles but also other entitites like organizations and journals.

Talk on digital humantities

Miguel Escobar Varela from the National University of Singapore gave a talk about digital humantities in Singapore and talked about some of the projects done. You can find most of them on the site http://digitalhumanities.sg/
I’m somewhat familar with some of  them, particularly the ones done in collaboration with the National University of Singapore Libraries (where I used to work).
But what struck me was the following comments.
Firstly, he mentioned that digital humantities is a good meeting place for people from different communities to come together , e.g. Researchers, Publishers and Librarians.
He also remarked that researchers may have been spoiled by the existing infrastucture around journal articles where reseachers just publish and need not worry about sustainability as these articles will likely be around has there is good existing preservation infrastructure around them. But when it comes to digital humanaties projects, once the funding runs out……
He then goes on to say as a result of this, researchers in Digital Humantities had to learn to become information professionals and worry about sustainability and preservation to ensure their products will still be accessible by future generations of scholars.
Conversely he talked about how librarians should also move away from purely traditional roles and pitch in and be researchers.
Such thoughts are very timely, because I’ve recently started to read about digital humantities and the bigger tent of digital scholarship and their relationship with librarians. One thing that struck me immediately is a lot of the rhetoric around this area, is about how librarians will no longer serve (or perhaps even support) but be partners and equals to reseachers.
This gives me pause, but I’m still thinking about it.
Another thing that struck me is the intense focus on metatdata for discovery he stressed on.
Here’s the thing, the presumption of many including reseachers and librarians is that libraries are good at metadata. But after attending crossref Live17, I wonder. In a sense, academic librarians have outsourced a lot of metadata management to our publishers, and the organization Crossref. Sure we still retain some expertise in MARC/RDA for books, a little bit for institutional repositories (mostly dublin core) and if you have them in archives.
But when it comes to metadata for the most important objects in scholarly communication articles, the expertise is squarely with the publishers and Crossref. They the ones that produce the data, they are the ones who came up with Crossref event API and the Scholarly communication map (see later) etc. Does a typical academic library have a lot of metadata expertise? I’m not sure.

PubChase and the one amazing feature in reference managers that blew my mind

Relatively early in my librarianship career, I started studying referance managers like Endnote, Zotero, Mendeley etc. I got bored after a while, when I realised they were mostly all variants the same theme. You pulled references in your reference library using various methods,  annotated/added notes to these entries, then you inserted references into your manuscript using a Word plugin with whatever predefined citation style you chose supported by the reference manager.
Mendeley at the time, was most interesting as they were making noises of going into the researcher’s workflow but in the end didn’t seem to do as much as I expected in the years that followed. Though with Elsevier’s spree of acquisitions of services and tools  throughout the researcher workflow (including Mendeley itself) and their recent repositioning as an analytics company this will change.
Still, Lenny Teytelman’s talk blew my mind.

It started innocently enough when he started talking about the need for Protocols.io to publicly record and share scientific protocols. It was a interesting enough service, making Science better by ensuring protocols could be reproduced , with the expected features like versioning, forking, adding of vidoes etc.

 

 

Then without warning or so it seems to me he shifted to the 2nd part of his talk entitled “Part II: random collisions with everything (and the opportunity to fix it)”.

He started talking about PubChase a reference manager I had heard of in the past but didn’t try as I wrote it off as yet another reference manager (there are close to a dozen the last I looked and this seems to be mostly for life sciences).

He started talking about a recommender system which didn’t quite impress me as it was a obvious thing Mendeley and others had.

Then he mentioned how if a paper in a user’s reference library had a retraction, they would be informed.
You get alerts of retractions via emails too if you like. If you think this is a small feature and retractions are too rare to worry about, PubChase can also tell you if there are discussions or comments about the paper in commenting sites like PubPeer.

Still not excited because you not into the medical area? How about if the article in your reference manager is updated with improvments to data/methods (via Figshare, Dryad, Protocols.io), new versions (say a preprint is now published) or new discussions around the article.

Are you starting to see?

So how is this magic achieved? By pulling via many APIs?  No, there’s now a better solution!

The Scholarly communication Map

At the very beginning of the CrossRef Live 17 event, Jennifer Lin talked about the Crossref Research or Article Nexus. 
I’ve come across a earlier version of her idea, where she described the idea of a open Scholarly communication map – (Crossref & the Art of Cartography: an Open Map for Scholarly Communications.) You should really read the article, but it talks about how Crossref is building a scholarly map of the research enterprise and makes it openly available for the entire research ecosystem.
The thing is the idea of of creating a web of links or relationships is nothing new. These days, I’m often asked about my view of linked data, or  Yewno a new discovery solution that is being trialed in the top Ivy league University libraries that claims to improve discoverability by allowing one to explore less obvious connections behind various concepts.
Frankly at this stage, I’m undecided. All these ideas seem to be worth-while exploring but so far I haven’t seen anything too compelling to me that improve over state of art for discovery. So I filed this Scholarly communication map idea with the others, as “interesting but …..”.
But Lenny’s presentation made me assess this thinking. Call me dense but I finally saw the light….
By assigning a doi and appropriate identifers to various entities (articles, datasets,codes, protocols, etc) and register metadata with Crossref establishing relationships between them one can do amazing things.

In the Pubchase example, “journals and resource platforms must register metadata with Crossref establishing relationships between an article and its associated materials (data, code, methods, figures, etc). ”

Assuming everyone does their part (in reality they won’t), one can get all sorts of information around a paper.  Besides what is mentioned, one can presumably use the Crossref event data API to find out about altmetrics around it and if additional relationships are created in the future by Crossref even more associations can be made.

 

To make it clear, if say a dataset or preprint is properly linked via some of the relationships (e.g. has preprint or issupplemented by) above, a service like a Reference manager could use this to be notified of new or changed relationships to say an article already in the reference manager library.

Many of these objects will also have associated metadata on funders, researchers (via ORCID), and in the future organizations leading to even more cross-linking.

Or as Jennfer Lin wrote in “Crossref & the Art of Cartography: an Open Map for Scholarly Communications

“We begin to capture relationships between all such contributing agents and objects involved in the research process. Here we find an array of entities belonging to the scholarly graph, including different types of research artifacts, publisher and journal, funders, ORCIDs, peer reviews, publication status updates (corrections, retractions, etc.), citations, license information, additional URLs (machine destinations, hosting platforms, etc.), underlying data, software and protocols, materials, discussions and blog posts, recommendations, reference work mentions, etc. “

At this point, the only thing mostly missing would be some sort of subject/topic vocabularycontrol.

Scholarly communication linked data/web – a more realistic target?

My understanding of metadata and linked data is woeful, so take the comments below with a huge pinch of salt. Please correct any misunderstanding below in comments.

It seems to me, in a sense, all this is nothing new again, we are basically talking about linked data with triples etc.

But I think unlike the much more heterogenous web, the Scholarly communication system is relatively organized and is a constrained domain, with a much smaller set of entities and relationships to describe.
Established organizations like CrossRef, ORCID, are systematically creating identifers for different entitites in the scholarly communication domain (e.g. Organization identifers is being studied next). It’s still not perfect of course and getting researchers to adopt identifers like ORCID say is still a work in progress but I believe it’s a matter of time before ORCID is a standard, just like how dois after over a decade is now a standard.
So it seems to me the Scholarly communication map or web looks like a much more realistic target, as already parts of it are showing value.
PubChase’s brilliance I think is it is a “killer app” to showcase the value of the scholarly communication map/web. Papers in user’s reference manager is an absolutely strong signal of intent/interest and you can bet most researchers will be interested in information or signals around those papers they are intending to cite in their reference library.
I spoke to Lenny after the talk and he mentioned that they were tracking drop out rates from emails and they were very low, showing that these emails were valued.

I wonder if one could go further. Perhaps in the reference manager library, one could decide to have notifications for some papers but not others, would that produce signals that show how important the paper was to them? So those papers where people wanted to be alerted could be”more important” working around the issue that not all citations are equally important?

I couldn’t stop thinking about this, could libraries plug into the Scholarly communication map by establishing relationships between entities in the crossref scholarly communciation map and their content? What type of content? What relationships? Could there be new novel ones specific to libraries? What implications for discovery does this have for academic libraries?
I think unlike the true semantic web where anyone could publish their own relationships, the Crossref scholarly communication universe has a controlled vocab for relationships so this requires feedback and communication with Crossref.

Conclusion

It was a great two days and there were many other great talks I do not have space to cover. One of the talks was on Metadata 2020  whose goal was to advocate for good metadata by coming up with good business cases. In a sense, much of Crossref Live 17 could be said to focus on the theme of how good metadata is critical for discovery. As I said before Lenny’s PubChase example could be a killer  use case example of this.

I know it’s a funny thing for a librarian to say, much less one who has studied and worked in library discovery for a while, but this event really made me think about the importance and implications of metatdata for discovery. 

You can probably tell from this blog post, I greatly enjoyed Crossref Live17. It gavc me a lot to think about and confirms my thinking that often attending conferences in allied fields that are not strictly meant for librarians can be very helpful as you will be exposed to different ideas that you might not usually encounter.

So if you have the opportunity to attend Crossref events in your area you should definitely try to attend given that they are are free to attend.

Comments are closed.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑