- The effectiveness of teaching of Boolean particularly to first years
- Teaching CRAAP test as a tool to spot and handle fake news
- Using of levels of open access to adjust cost per use
1. Stop emphasizing use of Boolean operators unless necessary
- databases with small aggregation of items – in specific disciplines
- very precise searching (little or no stemming)
- No-full text matching as most databases didn’t have full-text
- Academic Search Premier
- Google Scholar
- Lexis Nexis
- Web of Science
The key thing they tested 4 different versions of the search.
Then they tested queries in natural language searches vs proper boolean searches.
The 2nd factor they studied was the effect of filters – such as “peer reviewed” (Proquest Central), “Articles” (JSTOR/Scopus/Web of Science) etc.
This resulted in 4 types of searches – Unfiltered Boolean, Filtered boolean, Unfiltered natural language, Filtered natural language.
For each of the search, they captured the top 25 results (simulating freshman behavior) and rated relevancy using a rubric for the four types of searches .
Guess what they found? While natural language searches generally had fewer results (which makes sense) but for the top 25 results on average the unfiltered natural language searches had the best average relevancy score of the four searches.
To be fair, the differences are small (2.11 vs 2.08 for top 25 results) and you can’t make a statistical statement that natural language seach without filters was definitely the best. Still it’s a interesting result.
“This study found no clear advantage in relevance of results between natural language and Boolean searching, suggesting that for introductory courses, librarians can spend less time covering the mechanical “how to” aspect of searching, and more time on other, more substantial, information literacy concepts such as topic and question development (including search terms and terminology) and source evaluation.”
Further areas of study?
Also the study tests out natural language search vs key concepts chained with AND operators and not my proposal that nested boolean is not necessary.
There are essentially three types of query I see in search logs
1. Straight out natural queries – “What is the effect of television advertising on children”
2. Simple boolean – ” television advertsing AND children”
3. Nested boolean – (Television OR TV) AND (Children OR Child OR Youth OR Kid)
Sorry, I lied. I almost never see the #3 unless is is by a librarian, yet many librarians have in the past and might still teach #3.
I am convinced that #2 in most cases will not be inferior to #3. I admit to be mildly surprised #1 is no worse than #2 though I suspect this isn’t true once you do more specific searches with a lot fewer results rather than something generic like advertising and childen.
A interesting follow-up I can imagine is for someone to repeat this study but now do more complicated nested boolean. Also repeat the test with “difficult” searches where the number of known correct answers are smaller rather than a generic search.
I’m particularly curious to see if nest boolean with OR statements will pull ahead for metadata only databases like Scopus, Web of Science and Pubmed. Seems to me the medical librarians literature on systematic reviews supports this hypothesis though Pubmed isn’t quite a normal database.
In the real world, personally I would still not recommend my students do natural language searches (#1) but still recommend search using keywords only (#2), but I would be wary of implying that nested boolean searches (#3) are necessarily always better.
2. Teaching of CRAAP and list based methods to combat fake news
The recent rise in interest in fake news has given us librarians a reason to once again trumpet loudly the value of what we do in teaching information or media literacy. Librarians were quick to establish our turf by calling out articles that mention information literacy without mentioning librarians.
This #fakenews article is what has lots of #librarians saying “this is what we we’re trying to tell you all along” https://t.co/NOgX2bZT4b
— steven bell (@blendedlib) December 12, 2016
After all it’s our superpower it seems.
We all know that librarians are superheroes. The latest villain they’ve been fighting is fake news. https://t.co/31Giwdzgxv
— The Harry Potter Alliance (@TheHPAlliance) January 3, 2017
“Before I read this thing my friend posted on Facebook, let me open up that helpful LibGuide in another tab.” <–No Student Ever
— Lane Wilkinson (@lnwlk) January 24, 2017
But of course, we already teach information literacy and that seems just the right antidote to fake news right?
But how do we best use our information literacy sessions for this purpose? One thing that we are already doing for Information Literacy seems to leap out as ready made to counter this issue – the CRAAP test.
If you are a librarian you probably have heard of it. If not, It’s a test that orginated at Meriam Library – California State University, Chico and advises users to evaluate sources using the following handy and catchy acronym CRAAP which stands for
I am unable to trace the history of CRAAP and when it exactly was created, but the wayback machine suggests the acronym appeared on the library page in 2001, though the same criteria appear in prior versions of the page.
Because of it’s catchy name – CRAAP is probably most famous of such check-list based systems that aim to help users evaluate information sources. My (wild) guess is such check-lists started becoming popular in the early 90s with the dawn of the world wide web, where the scarity of information was replaced with abundance.
Given that the tool originated in the late 90s, early 2000s, it is natural to ask can we repurpose this to deal with the current problem of fake news? Is it just the same same problem in a different form?
The Stanford study
One task for example was to evaluate reports on the topic of “bullying in schools” by looking at the following 2 websites
For the first task, one report was by American Academy of Pediatrics (“the Academy”) and the other was by the American College of Pediatricians (“the College”).
Which website was more reliable or more authoritative?
Both sounds pretty official and authoritative, but the trick was that in fact only the first was truly authoritative – the academy was the largest professional organization of pediatricians in the world publishing the flagship journal of the profession . The second was a splinter group that broke off in 2002 over the issue of adoption for LGBT couples and has only 200-500 members, 1 paid staff and no journal.
So how did the historians, fact checkers and Stanford undergraduates do?
Fact Checkers vs Historians vs Undergraduates
Would librarians have done better? Would CRAAP have helped?
- Are there spelling, grammar or typographical errors?
- Does the language or tone seem unbiased and free of emotion?
- Is there contact information, such as a publisher or email address?
- Does the URL reveal anything about the author or source? Examples: .com .edu .gov .org
Some seemingly ask almost the right questions but don’t explictly ask the user to do cross checking.
- Who is the author/publisher/source/sponsor?
- What are the author’s credentials or organizational affiliations?
- Does your topic require current information, or will older sources work as well?
- Does the information relate to your topic or answer your question?
- Is the information at an appropriate level (i.e. not too elementary or advanced for your needs)?
- Would you be comfortable citing this source in your research paper?
And lastly some seem purely subjective with no guide on how to answer the question
- Is the author qualified to write on the topic?
- Who is the intended audience?
It is possible that the CRAAP test was created in a simpler time when the line between reliable and less reliable information was more clear-cut.
e.g. blogs were almost always less reliable than information on gov and edu sites or peer reviewed journals. One could almost always tell easily if the publisher was a scholarly source (compare to today where there are predatory journals and authoritive world renowned experts blog).
One of the problems I suspect is that while CRAAP test works to help users tell the difference between say blogs and published journal articles, it doesn’t work too well against sources that are trying to be deceiving. A lot of the signals in CRAAP can be easily faked if they come from the source itself and the more sophisticated fake new sources that have emerged will take great pains to mimic all these signs of reliability. So you get sites that try to look like respectable think tanks (the domain .org doesn’t mean anything these days), or try to hide their ties and affiliations to lobbies or appear as academic publishers that mimic signs of academic prestige for example.
- Check for previous work
- Go upstream to the source
- Read laterally
- Circle back
Notice, he doesn’t just give you a bunch of evaluation points, but tells you the order to do them. In particular, he follows the strategy of the fact checkers in the Stanford study and prioritizes cross-checking and validation.
Without this specific push as I’ve argued people will be lazy and just evaluate based on what they see in the source and their biases (e.g. Trump supporter will be suspicious of CNN as a source). After all people are generally crediable and want to confirm what they read (as long as it doesnt conflict with their beliefs), so without a clear push to do cross validation they are unlikely to do so.
Cognitive biases and librarians who agree
One example is philosopher and information literacy librarian, Lane Wilkinson. In his wonderful post , Teaching Popular Source Evaluation in an Era of Fake News, Post-Truth, and Confirmation Bias he sets out a very nuanced post on the issues around fake news.
Firstly off, he rightly points out that fake news isn’t really new. His take is that the main problem is
“The spread of a deep mistrust of traditional media coupled with the valorization of motivated reasoning” or aka the “post-truth” mindset.
This gets you into the realm of cognitive biases which is something we need to address for fake news, and that “a bullet pointed list of “ways to spot fake news” isn’t sufficient, you need to teach in a way that avoids triggering poor cognitive processes.””.
One cognitive bias – directional reasoning that he points out is a common problem that I often see in students. The tendency to decide on a position and then hunt for a source that supports what he already decided was true and then insisting that the librarian find him a source saying exactly what he expects to see.
It’s one thing to have a hypothesis and revise it on finding evidence or lack of, is yet another to keep thinking a source must exist to support one’s point. The irony is we librarians often tell users they don’t know how to search or where to search (often true), bu taken to the extreme , it may lead a few students to think that if they can’t find something to source what they know to be true it only means their searching skills are at fault and not that the evidence doesn’t exist.
Obviously this is the same type of mindset that makes fake news thrive.
The beauty of Lane’s article is that carefully notes that cognitive biases are a issue and gives practical tips on how not to trigger them when teaching a class on fake news. How? Read his post!
His opinion of CRAAP?
“The CRAAP test makes a lot of epistemological assumptions that obscure just how difficult it really is” – It’s actually a pretty complicated topic on the meaning of authority and how we know what is authoritative. But CRAAP seems to make it look simple e.g. It has a .org, .edu , .gov hence it’s likely reliable.
Perhaps this is also why people take the lazy superficial way out and/or biases predominate.
Measuring cost per use with adjustments of level of open access
This idea remains controversial for many reasons, in particular you can’t guarantee the version of the open access variant you get but still this has not stopped the author of Leveraging the Growth of Open Access in Library Collection Decision Making to make a intriguing proposal.
Of course you are familar with the idea of valuing subscriptions based on cost per use and using that as a factor to rank or rate journals for renewal.
The author of the paper suggests that one should tweak the cost per use taking into account levels of Open Access.
For example, say it’s 2018, if the number of downloads for the year of 2017 for that articles published in 2017 for the journal is 100, and 10% of the articles in that publication year (2017) are Open Access, the adjusted usage = 100 X (1-0.1) = 90 downloads
The idea here is that because 10% of the content is open access and free, on average, they could have been replaced by OA usage. You use that “OA-adjusted usage” to calculate cost per use.
He suggests many formulas but this is the simplest one. Over three subscription years , he proposes discounting the price by the amount of Green OA level of the journal.
JR5 is the Journal COUNTER statistic for a given year of publication.
Official definition of JR5 – “Number of Successful Full-Text Article Requests by Year-of-Publication (YOP) and Journal”
He has many other formulas in the paper for example formulas that discount older articles versus newer ones, formulas that take into account delayed OA, Gold OA etc and discusses in detail some of the strengths and weaknesses of each formula and whether the data is available for such calculations.
Actually doing the calculation
The tricky part is this script can give you historical OA levels but not projected OA levels. But I guess you can adjust it yourself by getting 3 historical years and averaging the change.
Still this is only for one journal and seems a lot of work when you consider how many journals we have, so currently this remains a interesting idea at best. Could such calculations be supported in our systems like Alma?