We can all agree that Google Scholar has many strengths , but no matter how complete or deep it’s indexing, how much better it is at finding free articles or it’s presumed better relevancy ranking , we librarians have always had one weakness of Google Scholar to point at. We often say “Despite it’s strengths, still we have to be careful, after all we don’t know what Google Scholar actually includes, as they refuse to provide lists of sources”.
Maybe as a librarian you yourself have never said this, but I’m sure you know of others who have.
Even if librarians don’t say this I suspect many users intuitively think they can fully trust results from library databases.
Both librarians and users believe library discovery results are more trustworthy, unlike Google Scholar that indexes anything that looks scholarly and can be fooled into indexing fake articles to boast citation counts, our library databases and discovery services have specially curated lists, we know exactly what’s in there and even if it’s not perfectly, it’s still pretty reliable right?
But how true is that?
Trouble in Pubmed land
Currently as I write this the article A Confusion of Journals — What Is PubMed Now? is making the rounds. It points out that articles from dubious predatory journals are findable in Pubmed. This is shocking to many who see being findable in Pubmed as a sign of quality.
Sidenote: let’s side step the whole debate on how we define which journals are really predatory or whether peer review is valuable and for the purposes of this discussion let’s say they are just journals that everyone agrees unequivocally are bad and shouldn’t be cited.
The thing is though there is a difference between Pubmed, Pubmed Central (PMC) and Medline and it is the last that is the highly respected curated list. Without going into too much detail between the differences, Pubmed can be seen loosely as a superset of Pubmed Central and Medline
As the article points out Pubmed Central appears to provide a backdoor into PubMed. While some journal articles deposited in Pubmed Central are medline journals , others are not.
But how many non-medical librarians know this? I myself sort of knew this because years ago I helped setup the link resolver for Pubmed and resolved to learn more.
As the medical librarian Krafty Librarian points out this isn’t new and she wrote about this as far back as 2011. She carefully explains articles that articles that get into Pubmed via PMC and are not medline journals will not be indexed in Medline so won’t be assigned MESH headings and as a result isn’t searchable with MESH.
Still as the majority of searchers do simple keyword searches these articles still remain very discoverable.
Interestingly there are standards to be included in PMC. Just perhaps not as high for Medline journals.
This has led to calls to tighten up the standards on what journals can be deposited into Pubmed Central.
How do library discovery services fare?
If you have a discovery service like Summon, Primo etc, and you activated Pubmed Central in PCI or your knowledgebase, your discovery results will include the same problematic articles.
So if we want to “protect” users and protect the integrity of our results we should just temporarily deactivate Pubmed central until this is sorted out right? This is a particularly easy decision for my institution since we don’t have a big life science footprint.
Sadly things aren’t so simple.
This problem goes beyond just Pubmed Central. Dubious journals have been discovered in even highly regarded indexes like Scopus. Other librarians have mentioned to me the same issues crop up on other platforms such as Proquest.
Honestly, this is not just a PMC problem. I have found these journal in ProQuest as well.
— Annie Johnson (@anniekjohn) September 8, 2017
A muddled mess
Unfortunately , this is just the tip of the iceberg.
A while ago , I was doing a vanity search in Primo a while back and I was surprised to see a presentation slide I uploaded to ResearchGate popup in the Primo results. I also found preprints I put in Arxiv, Zendo etc.
I really did not expect this and truth be told was a bit uncomfortable as a lot of this was pretty half baked particularly the ones in ResearchGate. But when I studied the source, I realised we had activated DataCite in Primo Central, As ResearchGate can mint dois from DataCite for your research output that you deposit with them, these citations will eventually make their way into Primo results.
This creates a potential loophole for dubious journals to get through if predatory journal editors want to exploit users trust in library search services.
There are in fact likely to be other “loopholes”….
Are library discovery results more trustworthy or authorative than other sources?
So how are we going to react to these discoveries?
One might be tempted to say these are one off errors, that we can correct.
But think about it, library discovery services tend to aim to “discover everything” and when you try to cover everything, you are likely to include some bad apples. This is particularly true now that discovery vendors are starting to get serious about indexing open access.
If on the other hand, we want to be sure that we only show quality results (however it is defined), we have to be selective, similar to how Web of Science carefully screens journals before letting them into the index. This is similar also to how subject indexes have to make their own decisions on which side of the completeness – curated spectrum to lie on.
Incidentally seeing a open access journal you never heard of claim truthfully they are on so and so subject index might not be that informative, if you don’t know how carefully said subject index vets entries into the index.
Authority is constructed and contextual?
It’s clear we don’t have the resources to police all the potential dubious items in our huge indexes. But does that mean we just give up? Yell at content providers to have more strict standards?
Another possible response is to leave everything in there and pull out the ACRL framework and intone “authority is constructed and contextual”.
Frankly I am not sure I understand fully what that threshold concept means but Lisa Hinchliffe assures me I’m using it correctly.
Used correctly. Issue is that it seems ppl are constructing the PMC authority inaccurately. Inclusion doesn’t mean what ppl think it does!
— Lisa Hinchliffe (@lisalibrarian) September 8, 2017
Similarly we have to teach users about being skeptical even if results are from the library databases and not trust any authority blindly without context or understanding how the authority was conferred.
As per my usual style, this is me mostly thinking aloud. What does the presence of a piece of content on your library discovery mean? Do users know?
I would like to thank Lisa Hinchliffe for much of the discussion above via Twitter.