Summary: Controlled vocabularies are inherently subjective, arbitrary, and a more rigid semantic layer than is necessary in an age of full-text indexing and machine learning. This should not be a controversial claim, considering how the most widely-used search tool on the planet already operates.
“Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity.”
— George Box
Years ago, I spent a good deal of time compiling library use statistics, primarily for database searches, but also reference desk transactions, bibliographic instruction sessions, and even paper consumption — which dropped precipitously once we started charging for printing. It was useful to have those numbers on hand when colleagues asked for them. They were usually working on a grant or a report or actually considering making an evidence-based decision.
Thanks to improvements in how we record statistics in the first place, as well as better (and fewer, due to mergers) vendor interfaces for pulling data, it’s now a lot less work, especially since I don’t spend much time running numbers unnecessarily. A more common situation nowadays is that someone wants to know how many chat questions we received last year, so I then and only then sign in to QuestionPoint and retrieve the relevant information.
It’s more efficient to wait on spending time doing something until you’re sure it’s needed. There’s little purpose in determining how much a service is used if administration is incapable of considering its cancellation, for example. And as much as innovation is at times worth the risk, making decisions based on speculation, rather than demonstrated demand, can result in an unproductive workload.
That’s why we follow the best practice of putting a login prompt at the point of need, instead of needlessly gating access to free sites. I’m still waiting to see libraries that adhere to the opposite philosophy of, “they might need to sign in later, so let’s require it now” fully embrace the idea and require authentication to view their homepage.
A library could keep a record of books that are 25cm high. It sounds downright silly when you put it like that, because you can instead conduct searches in a library services platform specifying that sort of thing. The time spent on maintaining such an inventory would not only be a waste, it also wouldn’t help educate people on how searching works.
Google sure doesn’t curate a pre-coordinated index of websites about the French Revolution, apart from what it can generate on the fly to deliver results when someone searches for those words. Admittedly, the ways in which computers present seemingly intelligent results is now done in a rather roundabout way of relying upon human behavior: link popularity, co-citations, and paired reading habits (a la “customers who looked at this item also bought…”) all influence how relevancy is calculated.
If you need a list of library items in a specific format, on a certain subject, sorted a particular way, it’s almost trivial to execute a search and retrieve those desired items. Our diminished amount of in-house documentation, reference desk traffic, and time spent in classroom settings imparting procedural knowledge partially reflect this reality.
We likewise don’t maintain shelf lists much anymore, although there are some exceptions. After all, it should come as no surprise that most people have a propensity for insisting that whatever they do for a living is still very much needed in its current form. The A-Z index of journals, although an inefficient method for searching, cannot be removed because some individuals want it around, while our LibGuides site is chock full of pathfinders of various depth and currency, plus the entire video collection is manually cross-listed by genre for some reason.
Last month the Open Directory Project shut down. It was one of the last human-powered endeavors using a hierarchical taxonomy to classify the entire web. Search engines have done a better job, at least as measured by popularity, making web content discoverable. It’s mainly a matter of collective processing power, when not even an army of volunteers could match the robotic might of Google. Reliance on automated crawling also eliminates the thorny problem of human subjectivity.
One of the reasons astrologers claim validity to their craft is that practitioners with the same precepts would in theory come up with a similar horoscope for a person with a given birthday. Determining the objective “aboutness” of a publication isn’t always as straightforward. Beyond measuring and describing the physical dimensions of materials, subject cataloging is prone to “eye of the beholder” sorts of judgments. Is Romeo & Juliet a tale of romantic love, or one of warring families? Once you go down the rabbit hole of interpretive literature, there’s no end of different meanings that could be applied to the same work.
I regularly put stuff back in the “wrong” location when I unload the dishwasher. Last time it was placing the 1/4 cup measuring cup in the same drawer with the 1/2 cup one, although that wasn’t where it belonged. Of course if I knew where the utensils were previously arranged, I’d be sure where to put them, however someone who’d never worked in the kitchen before may have a more difficult time finding everything.
What’s intuitive to some, others may yet take umbrage with. And given the nature of private experience, nobody gets to tell them otherwise. There’s no end to the different manners in which items can be categorized. Just look at what constitutes kosher food. Although, I can certainly see why, in an era with no refrigeration, staying away from shellfish would become a popular habit. The problem with limited access points (e.g., is a book about the record industry shelved in music or the business section of the library?) could just as easily apply to any system using a controlled vocabulary. At some point, cataloging is therefore more of an art instead of a science.
Our evolving and imperspicuous language introduces several pitfalls, as does the existence of different dialects, translations, wordplay, and allegorical speech. Then there’s the more philosophical quibbling (think Plato’s Cave) over whether or not linguistic labels can ever perfectly represent universal properties, or even if there are such a thing. That hunk of rock floating in space called Pluto obviously didn’t empirically change much when we stopped calling it a planet, after all.
“I know it when I see it” is a famous exasperated claim regarding the legal classification of pornography. That’s pretty much what most typologies come down to. Even when there is a method to such madness, people attempt to ascribe natural kinds to things that are ontologically as arbitrary as any other social construct.
Aside from its place on the evolutionary tree, does asking something like whether a tomato is a fruit or a vegetable have any meaningful basis in reality? You invariably end up with wacky exceptions to any taxonomy. All birds can fly, except those that don’t. All mammals give live birth, except the platypus. So you instead fit the operational definition of a mammal because your ear has three bones in it. See also the concept of corporate personhood, the logical leap in calling campaign donations an exercise of free speech, or the curious case of determining if the X-Men are human.
Peculiar classification issues pervade many sports. For example, what field does a transgender athlete compete in? In the 1988 America’s Cup, the US team sneaked in with a multi-hull design, which adhered to the letter if not spirit of the rules on boat specifications, that blew away other contenders. While his opponents walked from hole to hole, Casey Martin got to use a golf cart in the PGA, thanks to the Americans with Disabilities Act. Anthony Robles is an NCAA wrestling champion, dominating the competition to go undefeated in his senior year, an inspiring achievement considering he’s missing a leg. Anyone with a basic understanding of the sport, however, recognizes the substantial advantage in upper body strength that he has in his weight class when it comes to ground grappling. Competitive runners Aimee Mullins and Oscar Pistorius are also missing their legs. At just what point in length their prosthetic limbs would have to be to constitute an unfair advantage is a question of some debate.
All of these typology discussions are largely academic. Ultimately, any efforts at a simple schema give way to a type of uncertainty principle in that the more you try to make distinctions, the more gerrymandered classification criteria (as with the platypus) you get, no matter how you cut it. It’s enough to make me think it’s time to consider if metadata, at least the kind generated by people prior to a search being done, has any future value.
If the human element in cataloging is removed, a layer of abstraction, namely subject headings, may be eliminated. But without those controls, the cataloger insists, how can we know books penned by “Richard Bachman” were written by Stephen King, much less properly classify, and thereby aid researchers in finding, a collection of haikus, or whatever Ulyssees is about? There appears to be a benefit in deriving metadata external from the work itself, as with an old map of New York City titled “New Amsterdam,” to aid in the discovery process.
A substantial reliance on being acquainted with knowledge of other things, outside of the thing being classified, seems necessary in order to categorize it correctly, or at least in an optimal way. This basis is also essential for verifying alternative truths. As stated by George K. Fortescue, “is it not rather the peculiar felicity of the librarian’s calling that in whatsoever reading or study he may follow for his own sake, he is also adding steadily to his ability to carry out his daily duties?” (1901–08–27)
The more knowledge you possess, the better cataloger you become. This holds for human and computer alike. One of our professions’ many elephants in the room is the question of if and when will computers “know” how to catalog items better than us, provided they haven’t attained this skill already. Considering machine intelligence isn’t going to decline, I’d say it’s becoming increasingly apparent that believing artificial agents will never be able to do original cataloging is nothing more than wishful thinking. Perhaps we should be working more to prepare for the future rather than romanticizing the past.
- Another Word for ‘Illegal Alien’ at the Library of Congress: Contentious
- Naming and Reframing: A Taxonomy of Attacks on Knowledge Organization
- Prejudices and Antipathies
- Resource Description and Access (RDA): Cataloging Rules for the 20th Century
- Roadmap to Nowhere: BIBFLOW, BIBFRAME, and Linked Data for Libraries Open Access
Check out my other posts for related commentary.