I recently attended the Crossref Live17 event in Singapore. I discovered that these events often have a heavy publisher presence, who make up most of their membership.
Still, I am a bit of a doi nerd, and I have long enjoyed watching Crossref webinars to understand what goes on in the background for dois to work (hint, it helps a lot for troubleshooting broken links in our discovery services) and recently started playing with their Crossref event data API, so it was a good opportunity to attend a non-librarian conference. It helped that it was held just a stone’s throw away from where I work and needed no registration fees to attend. I really enjoyed it, and am still thinking about what was presented days after the event, particularly the discovery implications.
These are some of what struck me as most interesting.
TrendMD recommender – the recommender you might not have heard of but have used
As someone who attends mostly librarian conferences (I really shouldn’t), it was an eye opener to attend a conference whose main target wasn’t librarians but publishers. So for example, I learned about TrendMD, a recommender service that markets to journals.
I was amazed to realise that “the related articles” list you see in many journal sites were actually done by TrendMD widgets and not done in-house.
This was quite surprising to me, as I am in the midst of writing a post on academic library recommender systems (Mendeley’s, Exlibris, NatureSpringer, JSTOR’s, Core etc) and have never come across TrendMD.
Microsoft Academic talk – machine generated metadata and “enthusiastically endorsed” for recusive importance
@kuansanw from @Microsoft talks about using AI to produce human-readable data for the scholarly community #CRLIVE17 pic.twitter.com/voKjcBSnKe
— Crossref (@CrossrefOrg) November 15, 2017
He then went in depth into what was scraped and what those numbers meant. This was followed by a video on what Microsoft Academic can do when combined with Microsoft Power BI and a demo of the features. Most of this won’t be surprising to you if you have been watching and using Microsoft Academic recently.
Still there were some interesting bits. While talking about the new cite feature , the speaker mentioned this doesn’t show dois. I believe he then went on to state that while scraping dois from references he found the errors in them was “not insignificant”.
He also started talking about how difficult it was to figure out what journals were predatory journals by a human though they use machine learning to handle it like everything else (maybe based on the “recursive importance” below? ). He also remarked that a major weakness with just using citations is that citations didn’t tell us whether the cite was a critical one for the paper, or if the cite was just a throw away cite.
As such Microsoft academic calculates something called “recursive importance” – where importance is how “enthusiastically endorsed” by important others.
I missed out this part, but I think how enthusiastically endorsed is based on some sort of texual analysis based on the sentence before the citation (which he said earlier they extract) and probably by counting the number of times the cites appears in the paper (studies have indicated this and perhaps position of the cite in the paper, maps with importance of the cite)
This “recursive importance” is done for not just articles but also other entitites like organizations and journals.
Talk on digital humantities
PubChase and the one amazing feature in reference managers that blew my mind
It started innocently enough when he started talking about the need for Protocols.io to publicly record and share scientific protocols. It was a interesting enough service, making Science better by ensuring protocols could be reproduced , with the expected features like versioning, forking, adding of vidoes etc.
He started talking about PubChase a reference manager I had heard of in the past but didn’t try as I wrote it off as yet another reference manager (there are close to a dozen the last I looked and this seems to be mostly for life sciences).
He started talking about a recommender system which didn’t quite impress me as it was a obvious thing Mendeley and others had.
Still not excited because you not into the medical area? How about if the article in your reference manager is updated with improvments to data/methods (via Figshare, Dryad, Protocols.io), new versions (say a preprint is now published) or new discussions around the article.
Are you starting to see?
The Scholarly communication Map
In the Pubchase example, “journals and resource platforms must register metadata with Crossref establishing relationships between an article and its associated materials (data, code, methods, figures, etc). ”
Assuming everyone does their part (in reality they won’t), one can get all sorts of information around a paper. Besides what is mentioned, one can presumably use the Crossref event data API to find out about altmetrics around it and if additional relationships are created in the future by Crossref even more associations can be made.
To make it clear, if say a dataset or preprint is properly linked via some of the relationships (e.g. has preprint or issupplemented by) above, a service like a Reference manager could use this to be notified of new or changed relationships to say an article already in the reference manager library.
Many of these objects will also have associated metadata on funders, researchers (via ORCID), and in the future organizations leading to even more cross-linking.
Or as Jennfer Lin wrote in “Crossref & the Art of Cartography: an Open Map for Scholarly Communications”
“We begin to capture relationships between all such contributing agents and objects involved in the research process. Here we find an array of entities belonging to the scholarly graph, including different types of research artifacts, publisher and journal, funders, ORCIDs, peer reviews, publication status updates (corrections, retractions, etc.), citations, license information, additional URLs (machine destinations, hosting platforms, etc.), underlying data, software and protocols, materials, discussions and blog posts, recommendations, reference work mentions, etc. “
At this point, the only thing mostly missing would be some sort of subject/topic vocabularycontrol.
Scholarly communication linked data/web – a more realistic target?
My understanding of metadata and linked data is woeful, so take the comments below with a huge pinch of salt. Please correct any misunderstanding below in comments.
It seems to me, in a sense, all this is nothing new again, we are basically talking about linked data with triples etc.
I wonder if one could go further. Perhaps in the reference manager library, one could decide to have notifications for some papers but not others, would that produce signals that show how important the paper was to them? So those papers where people wanted to be alerted could be”more important” working around the issue that not all citations are equally important?
It was a great two days and there were many other great talks I do not have space to cover. One of the talks was on Metadata 2020 whose goal was to advocate for good metadata by coming up with good business cases. In a sense, much of Crossref Live 17 could be said to focus on the theme of how good metadata is critical for discovery. As I said before Lenny’s PubChase example could be a killer use case example of this.
I know it’s a funny thing for a librarian to say, much less one who has studied and worked in library discovery for a while, but this event really made me think about the importance and implications of metatdata for discovery.
You can probably tell from this blog post, I greatly enjoyed Crossref Live17. It gavc me a lot to think about and confirms my thinking that often attending conferences in allied fields that are not strictly meant for librarians can be very helpful as you will be exposed to different ideas that you might not usually encounter.