My roundup of developments in 2017 that caught my eye.

This post was originally published on this site


In my final blog post of the year, I’m going to talk about some of the developments in librarianship and the related domains that caught my eye. Of course, this is by necessity going to be personal and idiosyncratic from my point of view
“Obi-Wan: Anakin, Chancellor Palpatine is evil!
Anakin Skywalker: From my point of view, the Jedi are evil!” – Revenge of the Sith (2005)

The rise of AI – or more accurately deep learning

So everyone knows AI is coming that’s nothing new. But towards the end of this year it became quite personal to me.

What many of you may not know is that before I became a librarian, I was very interested in Chess as a child and teenager and later became very interested and involved in the field of Computer Chess in the late 90s and early 2000s. Incidentally I stopped around the time I became a librarian.

As such the very recent announcment by DeepMind about the exploits by  AlphaGo Zero and Alpha Zero blew my mind.

 

I was of course aware of the magnitude of what DeepMind achieved with AlphaGo, the first Go engine to beat a top ranked Go player, going on to beat the world champion, particularly since Go was a much harder game for machines to master than Chess. In Chess, the top humans started losing in the late 90s (most famously Kasparov’s lost to Deep Blue in 1997, though at the time arguably Kasparov was still stronger) and by the mid to late 2000s were pretty much unbeatable by any human.

Still when I read about Alpha Zero in a reddit thread, I couldn’t help but feel a big emotional reaction. It wasn’t that DeepMind’s Alphazero beat one of the top chess engines Stockfish in a convincing way , winning 28 games out of 100 games (the rest were drawn), but it was the way it was achieved.

Unlike the AlphaGo version that beat the Go World Champion, which was fed human master games to learn from , Alpha Zero learned everything from scratch. It was only coded with the rules of the game and nothing else and in only four hours by pure self play and learning it trained itself to the level that it could toy with Stockfish one of the best Chess engines in the world.
Remember Stockfish itself is superhuman in play, but Alpha Zero makes it look dumb. It wasn’t that Alpha Zero won, but the way it tore Stockfish apart with positional play and sacrifices. A commenter said it was like watching Paul Morphy (a unparallelled Amercian Chess genius in the1880s who Bobby Fischer is often compared to) tear apart much weaker players.


A brilliant game by Alpha Zero where it shows it understands zugzwang a concept modern chess engines have problem grasping 

A lot of StockFish defenders protested the results. They pointed to the fact Stockfish was disadvantaged (it didn’t use an opening book, endgame tablebases but neither did Alpha Zero), and while some of what they said had some merit (e.g. Stockfish had only 1 Gb of Ram for hashtables when typically it would have much more) but it all misses the point.

What Alpha Zero shows is that the old computer chess paradigm is over.

Stockfish is the latest line of Chess engines built on the old computer chess paradigm dating back to the 60s and 70s. It was designed to search as deep as possible with a few carefully chosen heuristics (chosen by programmers to ensure evaluating these features didn’t slow the search too much) to evaluate terminal positions. It was essentially “fast but dumb” compared to humans. But by the early 2000s the hardware speed was too overwhelming allowing it to outsearch humans even though humans were much better at discarding useless moves and looking only at critical ones.

For decades, human programmers would painstakenly coded how the chess engine would search (typically some variant of alpha-beta pruning with extensions, late move reductions etc). They coded in heuristics on how to evaluate and score the positions reached by search . So for example Stockfish would be given heuristics like a Pawn is worth 1 unit, a Bishop worth 3 units etc. Or add +0.5 bonus if you had a bishop pair and the position was open etc. They would make changes and run thousands of games against other chess engines to see if the change helped it become stronger.

Alpha Zero was different. No-one taught it how to play. No one coded any search algothrim, or taught it to value queens over Rooks. Just by playing itself, it figured out the game of Chess.

Or rather in 4 hours of self-play (though on very fast and specialized TPUs designed for neural nets), it exceeded what the best achievements of human civilization in Chess. After all Stockfish can be viewed as the accumulation of over 70 years of Computer Chess knowledge and centuries of human knowledge in Chess and Stockfish played like a God compared to any individual human – even the human world champion.

But there is a new sheriff in town and Alpha Zero did it all by self-learning with no human guiding it!

I know I have been gushing but it seems to me that Alpha Zero’s achievement shows that in time to come, anything with clear cut rules and goals is doomed to quickly fall to AI. They don’t even need to consult current experts in the domain!

Before this I was already interested and playing with machine learning packages in R, playing around with decision trees, random forest, svm, naive Bayes, KNN etc (also check out Rattle if you want a point and click GUI over R) but it seems I made a mistake and should have learnt more about neural nets – such as  Convolutional Neural Networks (CNN), or Recurrent Neural Networks (RNN) etc.

Libraries are already starting to experiment, for example Andromeda Yelton as just released Hamlet – a system that uses”machine learning to power experimental, exploratory interfaces to MIT’s thesis collection.”

MIT’s Alpha version of Hamlet – neutral net learns how to recommend thesis by similarity 

I have no ambitions to be a data scientist. But I do think it prudent to understand roughly know how they work or even in a pinch run simple machine learning tasks just to get a feel. It allows you to get a sense what things are pure hype and what is actually do able.

For example I used to be quite excited by the idea of a new generation of chatbots powered by latest machine learning techniques. That is until I actually looked at how you got it trained and tried a few chatbots deployed by banks etc, and it seems to have a way to go.

The game of open access continues…..

“You were the chosen one! It was said that you would destroy the Sith, not join them. You were to bring balance to the Force, not leave it in darkness!” – Revenge of the Sith (2005)

I’ve been blogging a lot about open access in 2016 and this continued in 2017. A very big thing that broke out in 2017 from my point of view was the sudden interest in improving the discovery of open access material. I speculated that we have reached a level where the availability of open access was too big to ignore, and this continued in the waning months of 2017, with Web of Science linking to Green OA using oaDOI (the breakout star of the year), and CORE partnering with Summon and Primo.

Other big news included the buyout of Bepress by Elsevier  and their continued ambitions of getting into user workflows and provide view as an “analytics company” and the growing realization by open activists (at least the subset that is hostile to traditional publishers and/or want to “reclaim the scholarly infrastructure” or solve the affordability issue) that they have made serious strategic mistakes in the past.

Here’s a typical example.

The gane of open access continues……….

 

Data skills , Digital humantities , Digital Scholarship and Research Data management

As open access advances , the role of academic libraries has that of a wallet is projected to diminish. This has to be replaced by other services, particularly with a greater focus on services based on expertise.

In particular academic libraries have been moving into digital humantities, digital scholarship and Research Data management with varying speeds.

I’ve been studying some of the technical skills associated with these areas and one of the more interesting developments this year include the rise of the library/data/software carpentry movements,  which provide easily digested lesson plans to learn the basics of everything from Git to OpenRefine.

Linked data is also an area that I’ve struggled to understand over and over again for years. Beyond the technicalities (RDF serialization, SPARQL, OWL etc), I have struggled to see the point of Linked Data as a lot of tutorials don’t make this clear and the few demos I’ve seen don’t look too compelling (.e.g. mockups of ExLibris linked data in Primo records).

I’ve tended to agree with this critic on library Linked data , who thinks a lot of Linked Data work in Libraries is just doing Linked data for the sake of Linked data.

I’m slowly starting to see the light though. In particularly Wikidata is perhaps the best answer I know to “Why Linked data”.

Very clear lecture on what is Wikidata and how to query it 

By structuring data from Wikipedia in structured link data, you can answer questions like , “How many painters are sons of painters?” or “Largest cities with female mayor” using SPARQL.

 

 

In the coming year, I might actually muster up the courage to blog a post to try to demystify Linked Data for people who like me are confused and consider some use cases for linked data that are potentially useful.

Beyond all the technicalities – a more interesting question is the question of how academic librarians are going to cope with these new expertise areas. For instance how much of all this is a average liason expected to know? It’s all very well to get new hires who already have these skillsets but how do you include those people like me who did not come in with these skills?

In a way it’s not a new question, even in my  not so long 10 year career in academic librarianship, I had to learn and master diverse knowledge areas and skill sets like reference managers, Bibliometrics and open access which are now expected competencies in the skillsets of academic librarians.

Still, there are limits to this expansion of knowledge and it’s hard not to be daunted in the new areas of Research Data Management, Digital Humantities, Digital Scholarship…..

Conclusion

I guess a lot of what I posted isn’t mind blowing even if you haven’t being paying a lot of attention to the world around us.

Still, I hope dear readers you have found certain bits useful.

I would like to take the opportunity to thank all loyal readers for your support of this blog and I’ll see you in 2018!

Comments are closed.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑