The evolving librarian – reconsidering teaching of boolean, CRAAP for fake news and calculating open adjusted cost per use

This post was originally published on this site

Life and libraries is always changing and evolving. A lot of our standard practices date back decades, but as the environment changes, we librarians should always consider if our tools or practices are in the need of a change or if they can be reused to tackle the same problem in a different form.
I’ve recently being inspired by the arguments and evidence from various blog  posts and articles to reconsider or consider the following
  • The effectiveness of teaching of Boolean particularly to first years
  • Teaching CRAAP test as a tool to spot and handle fake news
  • Using of levels of open access to adjust cost per use

1. Stop emphasizing use of Boolean operators unless necessary

In 2013, I wrote “Why Nested Boolean search statements may not work as well as they did“, that outlined why I felt Boolean Operators were of little use and might even potentially be harmful in most cases (barring exceptional cases like systematic reviews).
I observed that Boolean operators in particularly nested boolean of the variety (A OR B OR C) AND (D OR E OR F), might have out-lived it’s usefulness compared to in the past.
I argued in the old days (90s and early 2000s),  we were mostly searching under the following environment constraints,
  • databases with small aggregation of items – in specific disciplines
  • very precise searching (little or no stemming)
  • No-full text matching as most databases didn’t have full-text
As such, this led to very few results with most searches. In particular, typing a bunch of words in a natural langauage search would get you very little if not nothing in a strict boolean search of metadata. Boolean operators helped to reduce this problem by only including the critical words in the search and with the help of OR operators catch synonyms.
But does this hold up today? Today, we operate mostly in an environment of huge mega databases with Google Scholar and Web Scale Discovery as paradigm examples. They combine large amounts of data in cross discipline searches, full-text matching and do intelligent stemming to ensure you don’t miss out obvious variants.  With even the simplest keyword search you get tons of results that the relevancy ranking can barely handle.


Another way to put it is that in the past, you had great precision but poor recall and Boolean operators helped increased recall. Now, you tend to get tons of results (across various disciplines), precision is the name of the game and nested boolean often leads to worse precision. Additional synonyms (which may be used differently in different disciplines) that you add in generous chains of OR often leads the search results to explode , thanks to matches in full text and stemming effects.
Add the fact that so few people ever use boolean , discovery vendors are not incentized to optimise relevancy for that style of searching.
In the original post, the argument was mostly done by al prior reasoning and perhaps a illustrative search or two demonstrating the impact of full text match vs metadata only match. But nobody put this argument to a empirical test. Until now, kind of.
In the paper (preprint at time of blog) “The Boolean is Dead, Long Live the Boolean! Natural Language versus Boolean Searching in Introductory Undergraduate Instruction“, the authors test the impact of different types of search queries and put my theory to the test (in a way).
First they choose typical appropriate research queries and ran them in the following databases
  • Academic Search Premier
  • Google Scholar
  • Lexis Nexis
  • Proquest
  • Pubmed
  • Scopus
  • Web of Science

The key thing they tested 4 different versions of the search.

Then they tested queries in natural language searches vs proper boolean searches.
For example they tested “Television advertising AND children” vs “Effects of television advertising on children” in Academic Search Premier, Google Scholar and JSTOR.

The 2nd factor they studied was the effect of filters – such as “peer reviewed” (Proquest Central), “Articles” (JSTOR/Scopus/Web of Science) etc.

This resulted in 4 types of searches – Unfiltered Boolean, Filtered boolean, Unfiltered natural language, Filtered natural language.

For each of the search, they captured the top 25 results (simulating freshman behavior) and rated relevancy using a rubric for the four types of searches .

Guess what they found? While natural language searches generally had fewer results (which makes sense) but for the top 25 results on average the unfiltered natural language searches had the best average relevancy score of the four searches.

The least relevant search? Filtered boolean. So one could even say the further away you moved from natural language search the worse the results got.

To be fair, the differences are small (2.11 vs 2.08 for top 25 results) and you can’t make a statistical statement that natural language seach without filters was definitely the best. Still it’s a interesting result.

A interesting thing to note is that this result seem to hold even for metadata only databases like Web of Science, Scopus and PubMed, as I would have expected metadata only searches to benefit boolean over natural language but it doesn’t hold true for the searches they did. But then again those searches were on fairly general topics.
I highly recommend you go read the whole paper (or if you are lazy read my tweets summaary), as there are lots of interesting findings like comparing databases that had the lowest relevancy scores indicating difficulty of search topic or relevancy ranking issues (no surprise Lexis Nexis Academic was solidly bottom!) and comparing overlap of results between natural language searches and boolean searches to see if the two searches gave broadly the same set of results etc.
But the main point is made, natural language seems to be no worse than doing boolean searches, at least for these simple searches. The authors conclude

This study found no clear advantage in relevance of results between natural language and Boolean searching, suggesting that for introductory courses, librarians can spend less time covering the mechanical “how to” aspect of searching, and more time on other, more substantial, information literacy concepts such as topic and question development (including search terms and terminology) and source evaluation.

Further areas of study?

One must be careful not to over-state the results of this study though. It only tests the simple AND function and has a small sample size, but the results are suggestive, though not conclusive.

Also the study tests out natural language search vs  key concepts chained with AND operators and not my proposal that nested boolean is not necessary.

There are essentially three types of query I see in search logs

1. Straight out natural queries – “What is the effect of television advertising on children”

2. Simple boolean – ” television advertsing AND children”

3. Nested boolean –   (Television OR TV) AND (Children OR Child OR Youth OR Kid)

Sorry, I lied. I almost never see the #3 unless is is by a librarian, yet many librarians have in the past and might still teach #3.

I am convinced that #2 in most cases will not be inferior to #3. I admit to be mildly surprised #1 is no worse than #2 though I suspect this isn’t true once you do more specific searches with a lot fewer results rather than something generic like advertising and childen.

A interesting follow-up I can imagine is for someone to repeat this study but now do more complicated nested boolean. Also repeat the test with “difficult” searches where the number of known correct answers are smaller rather than a generic search.

I’m particularly curious to see if nest boolean with OR statements will pull ahead for metadata only databases like Scopus, Web of Science and Pubmed. Seems to me the medical librarians literature on systematic reviews supports this hypothesis though Pubmed isn’t quite a normal database.

In the real world, personally I would still not recommend my students do natural language searches (#1) but still recommend search using keywords only (#2), but I would be wary of implying that nested boolean searches (#3) are necessarily always better.

2. Teaching of CRAAP and list based methods to combat fake news

As always I like to declare my lack of expertise in information literacy but recently reading a trio of articles has led to me to think about CRAAP and it’s purpose and I set out my thoughts for discussion.

The recent rise in interest in fake news has given us librarians a reason to once again trumpet loudly the value of what we do in teaching information or media literacy. Librarians were quick to establish our turf by calling out articles that mention information literacy without mentioning librarians.


After all it’s our superpower it seems.


Besides the expected library sources, pieces began to appear in mainstream sources such as the Salon, U.S. News & World Report and most PBS began to praise the role librarians can play in fighting the rise of fake news & many librarians were ecstatic, finally our moment in the sun has come!
I have no doubts librarians have some role to play in this, but I often wonder how big the role we have and how best to own that role.
From what I see, the first thing many librarians did was to create libguides on Fake news. Of course most of us recognise that is going to be of minimum impact.



But of course, we already teach information literacy and that seems just the right antidote to fake news right?

But how do we best use our information literacy sessions for this purpose? One thing that we are already doing for Information Literacy seems to leap out as ready made to counter this issue – the CRAAP test.

If you are a librarian you probably have heard of it. If not, It’s a test that orginated at Meriam Library – California State University, Chico and advises users to evaluate sources using the following handy and catchy acronym CRAAP which stands for

  • Currency
  • Reliability
  • Authority
  • Accuracy
  • Purpose

I am unable to trace the history of CRAAP and when it exactly was created, but the wayback machine suggests the acronym appeared on the library page in 2001, though the same criteria appear in prior versions of the page.

Because of it’s catchy name – CRAAP is probably most famous of such check-list based systems that aim to help users evaluate information sources.  My (wild) guess is such check-lists started becoming popular in the early 90s with the dawn of the world wide web, where the scarity of information was replaced with abundance.

Given that the tool originated in the late 90s, early 2000s, it is natural to ask can we repurpose this to deal with the current problem of fake news? Is it just the same same problem in a different form?

The Stanford study

First let’s look at an empirical test on the ability of people to detect reliable websites. In the study “Lateral Reading: Reading Less and Learning More When Evaluating Digital Information , 45 individuals including 10 phd historians (historians), 10 professional fact checkers and 25 undergraduates from Stanford were asked to evaluate the reliability of a couple of websites and to find the answer to some questions with an open search.

One task for example was to evaluate reports on the topic of “bullying in schools” by looking at the following 2 websites


Bullying at School: Never Acceptable


Another task was to find out who paid for the funding of the $1.2 million legal fees for plaintiffs in Vergara v. California (about teacher tenure).

For the first task, one report was by  American Academy of Pediatrics (“the Academy”) and the other was by the American College of Pediatricians (“the College”).

Which website was more reliable or more authoritative?

Both sounds pretty official and authoritative, but the trick was that in fact only the first was truly authoritative – the academy was the largest professional organization of pediatricians in the world publishing the flagship journal of the profession . The second was a splinter group that broke off in 2002 over the issue of adoption for LGBT couples and has only 200-500 members, 1 paid staff and no journal.

So how did the historians, fact checkers and Stanford undergraduates do?


Fact Checkers vs Historians vs Undergraduates

The result was stunning. Despite all their experience, creditionals and knowledge, half of the historians were pretty easily fooled. 40% of them considered both sites equally reliable and 10% even thought the college site was more reliable. They did better than the undergraduates but far worse than the fact checkers who had no problems at all to a man figuring out which was reliable.
I won’t summarise the other 2 tasks but basically the results were similar. The fact checkers were easily able to  avoid all the traps and narrow down on reliable data, while the historians and undergraduates struggled.
Why? Again I’m not going to summarise the whole paper and I urge you to read the full paper because it goes in fascinating detail on how fact checkers think and search as compared to the other groups but the upshot is the fact checkers were very quick to do cross-checking of information on other sites.
The article calls this “Taking bearings”. While the historians and students spent a lot of time studying the webpages they were presented with,  the fact checkers quickly started searching other websites to learn more about the webpages they were studying.
For example the paper described how one of the fact checkers spent a mere 8 seconds on the sites before trying to google about the organizations that produced the two websites. This allowed him to quickly figure out the background of both the organizations.
In comparison, only two out of the historians used this strategy. Most of them spent a lot if not most of their time (10 minutes) on the websites they were evaluating. They were “often taken in by the College’s name and logo; its .org domain; its layout and aesthetics; and its “scientific” appearance, complete with abstract, references, and articles authored by medical doctors.”

Would librarians have done better? Would CRAAP have helped?

Granted the historians were given 10 minutes for the task, and given more time it might be possible the historians would get to the right answer too.
Still, it’s interesting to think what results we would get if we replaced the historians with academic librarians.Would the results would be similar, particularly given that some librarians are also Phds in history? Would they have done better if they tried to apply CRAAP or a similar check-list method?
Think back about CRAAP.  Remember the line about historians getting fooled by official logos and I can imagine many students doing exactly that to get the answer to “Authority”.
In fact if you look at CRAAP a lot of the points seem to involve looking at surface details on the source itself you are evaluating. For example
  • Are there spelling, grammar or typographical errors?
  • Does the language or tone seem unbiased and free of emotion?
  • Is there contact information, such as a publisher or email address?
  • Does the URL reveal anything about the author or source? Examples: .com .edu .gov .org

Some seemingly ask almost the right questions but don’t explictly ask the user to do cross checking.

  • Who is the author/publisher/source/sponsor?
  • What are the author’s credentials or organizational affiliations?


I can imagine many users will just try to answer these questions by looking at the information stated in the source each self. e.g “Hmm the author’s biography on the site states he’s a Professor at Harvard, that’s a reliable source then”.
Yet other questions seem to be more geared to helping the user decide if the information is useful for him -which reflects it’s main use as a tool to choose sources to cite for assignments I suspect.
  • Does your topic require current information, or will older sources work as well?
  • Does the information relate to your topic or answer your question?
  • Is the information at an appropriate level (i.e. not too elementary or advanced for your needs)?
  • Would you be comfortable citing this source in your research paper?

And lastly some seem purely subjective with no guide on how to answer the question

  • Is the author qualified to write on the topic?
  • Who is the intended audience?

It is possible that the CRAAP test was created in a simpler time when the line between reliable and less reliable information was more clear-cut.

e.g. blogs were almost always less reliable than information on gov and edu sites or peer reviewed journals. One could almost always tell easily if the publisher was a scholarly source (compare to today where there are predatory journals and authoritive world renowned experts blog).

One of the problems I suspect is that while CRAAP test works to help users tell the difference between say blogs and published journal articles, it doesn’t work too well against sources that are trying to be deceiving. A lot of the signals in CRAAP can be easily faked if they come from the source itself and the more sophisticated fake new sources that have emerged will take great pains to mimic all these signs of reliability. So you get sites that try to look like respectable think tanks (the domain .org doesn’t mean anything these days), or try to hide their ties and affiliations to lobbies or appear as academic publishers that mimic signs of academic prestige for example.

The main way to counter this is to get off the page to do cross-checking which is intellectually challenging and time consuming and I suspect most users will just default to doing the easy thing – aka “evaluating” urls, spelling, stated credentials on the page.
That said it seems to me that the infographic IFLA created for spotting fake news seems more appropriate than CRAAP for helping users spot fake news. There are more explict tips, in particular the very first tip -“click away from the story to invesstigate the site, its mission and its contact info”.
Another problem with CRAAP – lack of strategy in evalution
But I imagine some librarians protesting, CRAAP does tell you to do cross-validation.
Scattered between various points you see for example under accuracy  points such as “Can you verify any of the information in another source or from personal knowledge? “
True but this is where the other weakness of checklist based methods come into play. It doesn’t give the user any specific strategy to pursue or focus on.
Mike Caulfield in “Recognition Is Futile: Why Checklist Approaches to Information Literacy Fail and What To Do About It” points this out clearly when he talks about the weakness of the “E.S.C.A.P.E. Junk News” method  (a cousin of CRAAP) and  why it fails.
He argues that the problem with check list based methods like CRAAP is not just that it doesn’t encourage you to quickly get off the page to cross-validate or get your bearings but the right strategy is also about  “asking the most important questions first, and not getting distracted by salient yet minor details, or becoming so overloaded by evaluation your bias is allowed free rein“.
We will talk about bias later, but to him list based methods like CRAAP have too many questions and doesn’t guide the user on what are the most critical questions to ask and the order to ask them in.
He draws an analogy with doctors trying to diagnose patents. Doctors now don’t ask questions in a random order or ask patients to just list all the symptoms. Instead they are trained to use decision trees to ask questions in a specific order to narrow down possibilities.
As such, he gives the following specific targeted advice when checking for fake news.
  • Check for previous work
  • Go upstream to the source
  • Read laterally
  • Circle back

Notice, he doesn’t just give you a bunch of evaluation points, but tells you the order to do them. In particular, he follows the strategy of the fact checkers in the Stanford study and prioritizes cross-checking and validation.

Without this specific push as I’ve argued people will be lazy and just evaluate based on what they see in the source and their biases (e.g. Trump supporter  will be suspicious of CNN as a source). After all people are generally crediable and want to confirm what they read (as long as it doesnt conflict with their beliefs), so without a clear push to do cross validation they are unlikely to do so.

Cognitive biases and librarians who agree

I’m going to add that, in fairness not all librarians have the simple view that CRAAP is the solution to fake news. There are in fact many librarians with more sophisticated  views and far better understanding on the issues of fake news.

One example is philosopher and information literacy librarian, Lane Wilkinson. In his wonderful post , Teaching Popular Source Evaluation in an Era of Fake News, Post-Truth, and Confirmation Bias he sets out a very nuanced post on the issues around fake news.

Firstly off, he rightly points out that fake news isn’t really new. His take is that the main problem is
The spread of a deep mistrust of traditional media coupled with the valorization of motivated reasoning” or aka the “post-truth” mindset.

This gets you into the realm of cognitive biases which is something we need to address for fake news, and that “a bullet pointed list of “ways to spot fake news” isn’t sufficient, you need to teach in a way that avoids triggering poor cognitive processes.””.

One cognitive bias – directional reasoning that he points out is a common problem that I often see in students. The tendency to decide on a position and then hunt for a source that supports what he already decided was true and then insisting that the librarian find him a source saying exactly what he expects to see.

It’s one thing to have a hypothesis and revise it on finding evidence or lack of, is yet another to keep thinking a source must exist to support one’s point. The irony is we librarians often tell users they don’t know how to search or where to search (often true), bu taken to the extreme , it may lead a few students to think that if they can’t find something to source what they know to be true it only means their searching skills are at fault and not that the evidence doesn’t exist.

Obviously this is the same type of mindset that makes fake news thrive.

The beauty of Lane’s article is that carefully notes that cognitive biases are a issue and gives practical tips on how not to trigger them when teaching a class on fake news. How? Read his post!

His opinion of CRAAP?

“The CRAAP test makes a lot of epistemological assumptions that obscure just how difficult it really is” –  It’s actually a pretty complicated topic on the meaning of authority and how we know what is authoritative. But CRAAP seems to make it look simple e.g. It has a .org, .edu , .gov hence it’s likely reliable.

Perhaps this is also why people take the lazy superficial way out and/or biases predominate.

Another point he makes is that CRAAP test is less about reliabilty and more about usefulness of the article and that information isn’t reliable but information sources are.

Measuring cost per use with adjustments of level of open access

As levels of open access rise, the idea of possibly replacing subscriptions with Green OA versions have started to appear. See for example my posts “Academic libraries in a mixed open access & paywall world — Can we substitute open access for paywalled articles?” or even more directly Ryan Regier’s  “The problem with using cost-per use analysis to justify journal subscriptions”.”

This idea remains controversial for many reasons, in particular you can’t guarantee the version of the open access variant you get but still this has not stopped the author of Leveraging the Growth of  Open Access in Library Collection Decision Making to make a intriguing proposal.

Of course you are familar with the  idea of valuing subscriptions based on cost per use and using that as a factor to rank or rate journals for renewal.

The author of the paper suggests that one should tweak the cost per use taking into account levels of Open Access.

For example, say it’s 2018, if the number of downloads for the year of 2017 for that articles published in 2017 for the journal is 100, and 10% of the articles in that publication year (2017) are Open Access, the adjusted usage = 100 X (1-0.1) = 90 downloads

The idea here is that because 10% of the content is open access and free, on average, they could have been replaced by OA usage. You use that “OA-adjusted usage” to calculate cost per use.



He suggests many formulas but this is the simplest one. Over three subscription years , he proposes discounting the price by the amount of Green OA level of the journal.

JR5 is the Journal COUNTER statistic for a given year of publication.

Official definition of JR5 – “Number of Successful Full-Text Article Requests by Year-of-Publication (YOP) and Journal”




He has many other formulas in the paper for example formulas that discount older articles versus newer ones, formulas that take into account delayed OA, Gold OA etc and discusses in detail some of the strengths and weaknesses of each formula and whether the data is available for such calculations.

The author argues that this OA adjused cost per use methodology can also be used with bundles of journals. While you don’t know the exact price of each title, shifts in  “the aggregate change in a relative value of the bundle or big deal the level of toll-OA overlap could be assessed”

Actually doing the calculation

The tricky bit is firstly you will need historical data to project into the future and  secondly you will need OA levels for the journals. The author uses 1Science Oafigr for the later but what can you do if you don’t want to pay for this?
Simple you can try to use the unpaywall API (formerly known as oadoi) and the crossref API!
Here are the steps
1. First choose the journal title you want to test and the year of publication.
2. Use crossref API to pull out all the dois for the right title and year of publication
3. Run the dois extracted from #2 with unpaywall api
I’ve mapped out the steps on how to do #3 using Openrefine in a earlier blog post. It takes very little effort to work out the same for using the crossref API.
But what if you want something simpler?
You do need to install and run python. But even someone like me who have very little python experience was able to quickly get it running.
As shown above, once you run the script, it will prompt you for a ISSN. And then a year of publication. You can then choose if you want the output in csv.
The output csv is as below.

The tricky part is this script can give you historical OA levels but not projected OA levels. But I guess you can adjust it yourself by getting 3 historical years and averaging the change.

Still this is only for one journal and seems a lot of work when you consider how many journals we have, so currently this remains a interesting idea at best. Could such calculations be supported in our systems like Alma?


Thanks for reading. Any thoughts?

Comments are closed.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑