How libraries might change when AI, Machine learning, open data, block chain & other technologies are the norm

This post was originally published on this site

More than 2 years ago , I wrote about how academic libraries may change when Open Access becomes the norm which summarized how I expected the rise of open access would diminish and eventually obsolete some current library functions like fulfillment and possibly even discovery.

I still stand by what I wrote though on hindsight it was a bit defensive and a bit light on details in terms of what libraries could do instead. This was probably because at the time I had much less understanding of GIS, Research Data management etc even as I pointed to them as future areas of growth.

The other thing is that as disruptive open access is, it’s not the only trend out there that will affect libraries.
Recently, I began to that about some of the potential technologies out there that might play a role in altering the fate of libraries. They are
  • Machine Learning and AI
  • Linked Data
  • Block chain technology

A lot of this might trigger a “not this again! We have been saying they are coming for years, but…..” response but I’m of the view this time things might be different.

Machine learning and AI

Science fiction has always been a guilty pleasure of mine. As a teen I devoured all the classic science fiction greats like Asimov & Arthur C Clarke and some fantasy. In the 2000s, I ran into the ideas of transhumanism and the concept of a “Technological Singularity”, where a self modifying run-away AI would “go FOOM” and bootstrap itself to virtual godhood (relatively speaking compared to humans).

I know most sober minded librarians would think this sounds crazy but to be fair thinkers such as Elon Musk, Stephen Hawking worry about this as a serious possibility.

But even if you dismiss this possibility, you don’t have to be a visionary to see that increasingly machine learning and AI will be the next big revolution in society. You have self-driving cars, IBM Watson systems replacing doctors & law paralegals, AIs beating world champions at games like GO which has been studied for centuries and once thought to requite subtle pattern matching skills beyond the reach of computers for the foreseeable future or learning to play video games from the scratch. Everyday you see some announcement of a great AI achievement once thought to be extremely difficult if not impossible.

Even on a personal level, have you noticed how good technology has become?  My Google photos has the uncanny ability to recognize faces (even of the same person as a small child and much later as a teen) and group them together. I am addicted and rely heavily on Google Now, to surface articles of interest. Having been plugged heavily into the Google ecosystem, Google seems to magically know what I want to know based on what I search, what’s on my calenders or what’s in my email (it auto populates my calender based on emails among other things)

It tells me when to leave for appointments to be on time.



Google now tells me when to leave for an appointment

Once it even told me a flight was going to be delayed before it was even officially announced. The only thing preventing me from using Google assistant more is because I’m self conscious speaking aloud but when I do so, it has zero problems understanding me (even though I tend to mumble).

Yes, I’m aware of the privacy trade-off but this is to illustrate how good even mere consumer level technology is becoming thanks to machine learning.

Machine learning relies on data and the rise of open access , open data and open science is going to further accelerate this trend.

So how is this going to affect academic libraries?

Chris Bourg’s What happens to libraries and librarians when machines can read all the books is perhaps the most cogent analysis of the challenges and issues facing academic libraries right now and I highly recommend you read it now if you haven’t yet.

She lists the following three questions we should try to address

“1. What role can libraries play in making sure we don’t summon the demon; or at least that we have the tools to control or tame the demon?

2. How might we leverage AI in support of our missions? How might AI help us do some of our work better?

3. How might we support AI and machine learning in ways that are consistent with and natural evolutions of the long-standing missions and functions of libraries as sources of information and the tools, resources, expertise to use that information?”

Over at Scholarly Kitchen, David Smith muses about what future virtual assistants like Alex might do and refers to arxivML: An Alexa skill to read latest machine learning papers from arXiv

Of the questions/issues asked by Chris Bourg, I would add the meta question, what skills or knowledge would librarians need to help answer these questions?

Of all the technologies listed in this post, machine learning is the broadest and has the most potential to be disruptive and it’s really difficult to sketch out every possibility though obvious ones include chat bots, self learning predictive analytical systems, smarter search systems etc. It also depends on how capable the technology actually becomes.

More thinking is needed on this.

Linked data

This one has been on the horizon ever since I became a librarian nearly 10 years ago and while I bet many practicing librarians can vaguely remember reading/learning about RDF triples but they probably don’t see any practical use of it.

The major difference is that compared to 10 years ago, library vendors are moving on to it. For example Zepheira‘s Libhub Initiative has sponsors and supporters such as innovative, Ex libris has a white paper entitled Putting Linked Data at the Service of Libraries The Ex Libris Vision and Roadmap and is working to enhance Alma and Primo to both consume and produce linked data.

Still I guess for the average academic librarian (which includes me), the idea of linked data is still extremely hazy . Relatively few librarians know how to use SPARQL to query RDF , or  know that Wikidata and various data sources like Europeana are available in RDF and can be queried that way. (For other sources see datahub, Mannheim Linked Data Catalog or here for other linked data that have SPARQL endpoints or RDF to download.)

I myself must admit despite spending hours at various points in the last 10 years (whenever guilt set in that I don’t know anything about linked data) to try to understand this and has always failed to see the point.

It is only recently, where my adventures with open refine reconciliation services using the RDF extensions and from another angle my explorations with wikidata did a glimmer of understanding emerge.

Using openrefine + RDF extension for reconciling using SPARQL endpoint

As more practical applications emerge (e.g. linked data starting appearing in catalogues) and tools for managing them become easier, I believe librarians will slowly start to see the point about learning linked data and perhaps searching using SPARQL will be to librarians what keyword boolean operators are today??

I’m wildly speculating of course.

Block Chain Technology

Block chain is yet another technology I haven’t been playing much attention to, beyond the fact it is some cryptographic distributed system with no central point and is the basis for bitcoin.

The great library Technologist Jason Griffey (some of his projects include Librarybox and Measure the future – a “Google analytics for spaces”), in the video above explains what Block-chain technology (starting at the 31:24 mark) is and proposes 3 possible library related use of it (at the 1:01:49 mark).

A very rough explanation of block chain technology is as follows.  The idea here is that you have a database or ledger of every transaction made. Each transaction has “verified identities” (this corresponds to pseudo identities not necessary actual public ones), and every transaction can be verified as actually happening using complicated cryptographic techniques. All this is made public and is publicly distributed on various servers so no one organization controls the system or is able to do nefarious things. One is able to track the sequence of transaction events from the present transaction by transaction to the very first transaction. This creates a very robust, hard to disrupt system.

He suggests blockchain technology can be used for libraries in 3 ways.

Firstly for Digital Provenance. For example one could envision a research data management solution built on blockchain technology. Research Data would be distributed and any changes tracked publicly.

The next idea would be working it on creating a distributed bibliographic metadata store. Basically a distributed OCLC catalogue system where one could make changes to metadata records in a distributed fashion and sign each change with your identity. Then individual libraries could choose to decide which changes created by different identities would be acknowledged? Most importantly no single organization such as OCLC or Library of Congress controls the system, so there is no single point of failure.

Though he points out he isn’t sure whether such a thing would lead to a bigger blockchain size than the bitcoin system.

The third idea he calls for supports digital first sale. The main problem with ebooks is there is no digital first sale rights. While you can sell print books your purchased, you can’t sell ebooks you bought. One argument justifying that is it’s not possible to prove ownership of ebooks and hence sale of them.

With block chain technology, one can use it to track the ownership of the ebook (Note, while this sounds like a privacy nightmare, like bitcoin I assume the identities can be at least pseudo-anonymous). He argues that perhaps “provable scarity” with blockchain technology might allow digital first sale rights but he isn’t sure.

He envisions a distributed system, with micro-transactions, plus provable first sale rights. All transactions data will be public giving all publishers lots of data.

In  case you are wondering if all this is purely theoretical, it isn’t. Some systems are starting up today based on block chain.

For example, Kubrik Engineering is working on a repository solution Data Management Hub (DaMaHub) based on block chain technology.

Here’s the vision

“Now imagine a global network of document repositories in public libraries. Each one hosts all open access articles relevant to the libraries’ users, maybe even with the published research data from the articles. The repositories update and distribute newly published articles among themselves without manual interaction, serving both as an archive and a front-end library. 1

Articles can be stored in any format together with underlying research data. The repository’s host can decide which data to host (for example only host data for local researchers), and if a user wants to access off-site data or articles, the network will deliver them immediately from the sites that have them available. All events on the repository such as uploads or metadata changes are published on a permissioned blockchain that serves as the feed that connects all repositories and allows newly set up repositories to quickly access all requested articles and local data.”

Some benefits.

“A library in a war zone asks for help to save all their hosted publications? Repositories worldwide can be set to quickly mirror all publications from the endangered library, looking up all references published by that library on the blockchain.”


“All protocols and software will be under open licenses. No vendor-lock-in. Everyone with a sufficiently fast internet connection can participate. Advanced publishing and archiving solutions can be adjusted or developed by third parties. Censorship in part of the network will not affect repositories outside these areas.”

Another one I came across recently is LBRY  (pronounced like library), a distributed/decentralised d marketplace system.

The idea is that it’s something like Youtube, but without being controlled by any single organization. Like Youtube or other distribution channels, you can share free content, or charge users for access (via micropayments) but everything will be done in a non-centralized and distributed manner.

Though this project has nothing to do with libraries (except the name), ideas around content sharing and distribution is definitely something libraries should watch out and perhaps even get involved in.


It is perfectly possible individually Machine learning/AI, linked data and block chain technology will all come to nothing. But it’s unlikely that all of them ends up not working out.

While most of us will not be technologists at the level to implement these new systems, it seems wise to keep updated to gain a basic understanding of new technology that will impact how people search, access and manage data.

A common thread that links all these technologies is that you involve new or different ways of handling data. That’s why I’ve been fascinated with the idea of software/library carpentry lately.

The idea here is to educate all librarians on the basics of topics such as Git/version control, handling/manipulating data with Bash/regular expressions/open refine/SQL. To these I could add perhaps familiarity with SPARQL (based SQL on RDF), use of markdown/Latex and maybe even basic analytics and machine learning using tools like R with R studio or Rapidminer. (All of this might already be caught in some library school classes).

Will these be the basic competencies of librarians in the future?

Comments are closed.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑