In my final blog post of the year, I’m going to talk about some of the developments in librarianship and the related domains that caught my eye. Of course, this is by necessity going to be personal and idiosyncratic from my point of view
“Obi-Wan: Anakin, Chancellor Palpatine is evil!
Anakin Skywalker: From my point of view, the Jedi are evil!” – Revenge of the Sith (2005)
The rise of AI – or more accurately deep learning
So everyone knows AI is coming that’s nothing new. But towards the end of this year it became quite personal to me.
What many of you may not know is that before I became a librarian, I was very interested in Chess as a child and teenager and later became very interested and involved in the field of Computer Chess in the late 90s and early 2000s. Incidentally I stopped around the time I became a librarian.
This is insane. I used to follow computer chess closely in late 90s and early 2000s and I really think the era of human experts is coming to a close. These algorithms are starting to work without human domain expertise
— Aaron Tay (@aarontay) December 6, 2017
Still when I read about Alpha Zero in a reddit thread, I couldn’t help but feel a big emotional reaction. It wasn’t that DeepMind’s Alphazero beat one of the top chess engines Stockfish in a convincing way , winning 28 games out of 100 games (the rest were drawn), but it was the way it was achieved.
A brilliant game by Alpha Zero where it shows it understands zugzwang a concept modern chess engines have problem grasping
A lot of StockFish defenders protested the results. They pointed to the fact Stockfish was disadvantaged (it didn’t use an opening book, endgame tablebases but neither did Alpha Zero), and while some of what they said had some merit (e.g. Stockfish had only 1 Gb of Ram for hashtables when typically it would have much more) but it all misses the point.
What Alpha Zero shows is that the old computer chess paradigm is over.
Stockfish is the latest line of Chess engines built on the old computer chess paradigm dating back to the 60s and 70s. It was designed to search as deep as possible with a few carefully chosen heuristics (chosen by programmers to ensure evaluating these features didn’t slow the search too much) to evaluate terminal positions. It was essentially “fast but dumb” compared to humans. But by the early 2000s the hardware speed was too overwhelming allowing it to outsearch humans even though humans were much better at discarding useless moves and looking only at critical ones.
For decades, human programmers would painstakenly coded how the chess engine would search (typically some variant of alpha-beta pruning with extensions, late move reductions etc). They coded in heuristics on how to evaluate and score the positions reached by search . So for example Stockfish would be given heuristics like a Pawn is worth 1 unit, a Bishop worth 3 units etc. Or add +0.5 bonus if you had a bishop pair and the position was open etc. They would make changes and run thousands of games against other chess engines to see if the change helped it become stronger.
Alpha Zero was different. No-one taught it how to play. No one coded any search algothrim, or taught it to value queens over Rooks. Just by playing itself, it figured out the game of Chess.
Or rather in 4 hours of self-play (though on very fast and specialized TPUs designed for neural nets), it exceeded what the best achievements of human civilization in Chess. After all Stockfish can be viewed as the accumulation of over 70 years of Computer Chess knowledge and centuries of human knowledge in Chess and Stockfish played like a God compared to any individual human – even the human world champion.
But there is a new sheriff in town and Alpha Zero did it all by self-learning with no human guiding it!
I know I have been gushing but it seems to me that Alpha Zero’s achievement shows that in time to come, anything with clear cut rules and goals is doomed to quickly fall to AI. They don’t even need to consult current experts in the domain!
Before this I was already interested and playing with machine learning packages in R, playing around with decision trees, random forest, svm, naive Bayes, KNN etc (also check out Rattle if you want a point and click GUI over R) but it seems I made a mistake and should have learnt more about neural nets – such as Convolutional Neural Networks (CNN), or Recurrent Neural Networks (RNN) etc.
Libraries are already starting to experiment, for example Andromeda Yelton as just released Hamlet – a system that uses”machine learning to power experimental, exploratory interfaces to MIT’s thesis collection.”
I have no ambitions to be a data scientist. But I do think it prudent to understand roughly know how they work or even in a pinch run simple machine learning tasks just to get a feel. It allows you to get a sense what things are pure hype and what is actually do able.
For example I used to be quite excited by the idea of a new generation of chatbots powered by latest machine learning techniques. That is until I actually looked at how you got it trained and tried a few chatbots deployed by banks etc, and it seems to have a way to go.
The game of open access continues…..
“You were the chosen one! It was said that you would destroy the Sith, not join them. You were to bring balance to the Force, not leave it in darkness!” – Revenge of the Sith (2005)
I’ve been blogging a lot about open access in 2016 and this continued in 2017. A very big thing that broke out in 2017 from my point of view was the sudden interest in improving the discovery of open access material. I speculated that we have reached a level where the availability of open access was too big to ignore, and this continued in the waning months of 2017, with Web of Science linking to Green OA using oaDOI (the breakout star of the year), and CORE partnering with Summon and Primo.
Other big news included the buyout of Bepress by Elsevier and their continued ambitions of getting into user workflows and provide view as an “analytics company” and the growing realization by open activists (at least the subset that is hostile to traditional publishers and/or want to “reclaim the scholarly infrastructure” or solve the affordability issue) that they have made serious strategic mistakes in the past.
Here’s a typical example.
The scientific community and stakeholders made 3 mistakes cleverly induced by publishers: 1) accept the APC concept with unregulated prices; 2) call it “Gold” although it is not; 3) encourage or even mandate “OA publishing” instead of simply (Green) archiving. I agree with Erin. https://t.co/S87vVBlh8b
— Bernard Rentier (@bernardrentier) December 25, 2017
The gane of open access continues……….
Data skills , Digital humantities , Digital Scholarship and Research Data management
As open access advances , the role of academic libraries has that of a wallet is projected to diminish. This has to be replaced by other services, particularly with a greater focus on services based on expertise.
In particular academic libraries have been moving into digital humantities, digital scholarship and Research Data management with varying speeds.
I’ve been studying some of the technical skills associated with these areas and one of the more interesting developments this year include the rise of the library/data/software carpentry movements, which provide easily digested lesson plans to learn the basics of everything from Git to OpenRefine.
Linked data is also an area that I’ve struggled to understand over and over again for years. Beyond the technicalities (RDF serialization, SPARQL, OWL etc), I have struggled to see the point of Linked Data as a lot of tutorials don’t make this clear and the few demos I’ve seen don’t look too compelling (.e.g. mockups of ExLibris linked data in Primo records).
I’ve tended to agree with this critic on library Linked data , who thinks a lot of Linked Data work in Libraries is just doing Linked data for the sake of Linked data.
I’m slowly starting to see the light though. In particularly Wikidata is perhaps the best answer I know to “Why Linked data”.
In the coming year, I might actually muster up the courage to blog a post to try to demystify Linked Data for people who like me are confused and consider some use cases for linked data that are potentially useful.
Beyond all the technicalities – a more interesting question is the question of how academic librarians are going to cope with these new expertise areas. For instance how much of all this is a average liason expected to know? It’s all very well to get new hires who already have these skillsets but how do you include those people like me who did not come in with these skills?
In a way it’s not a new question, even in my not so long 10 year career in academic librarianship, I had to learn and master diverse knowledge areas and skill sets like reference managers, Bibliometrics and open access which are now expected competencies in the skillsets of academic librarians.
Still, there are limits to this expansion of knowledge and it’s hard not to be daunted in the new areas of Research Data Management, Digital Humantities, Digital Scholarship…..
I guess a lot of what I posted isn’t mind blowing even if you haven’t being paying a lot of attention to the world around us.
Still, I hope dear readers you have found certain bits useful.
I would like to take the opportunity to thank all loyal readers for your support of this blog and I’ll see you in 2018!