“The feels” – Sentiment analysis for altmetrics

This post was originally published on this site


One of the central aspects of what we do at Altmetric is processing and subsequently storing large quantities of data, whether we are talking about publication meta-data or online attention, in its various formats (news, Facebook or Twitter posts, etc). This allows us to occasionally have a bit of fun in doing  our own research to test assumptions and hypotheses that we or others may hold.

(Img: http://jiffyclub.github.io/digital-demography-2014/)

This year, as part of our participation at the 4AM altmetrics conference in Toronto, Canada, Stacy and I decided to engage in a small-scale project to explore sentiment analysis.

Why sentiment analysis?

(Img: https://mrprintables.com/emotion-flashcards.html)

Sentiment analysis is a computational technique that makes use of automated classification algorithms (a combination of natural language processing (NLP), machine learning (ML), and statistics) to determine a numerical score that aims to be the equivalent of a human judgement on the attitude or general feeling of a segment of text. As a practical example, a sentiment analysis algorithm might give the sentence “What a pathetic piece of writing!” a score of -0.6, which means strongly negative, while “What an amazing paper!”  would receive an opposite score of 0.6, or strongly positive.

One of the current general issues in the field of altmetrics is that they quantify online attention for academic publications, but with little insight into the nature of this attention, which can of course be both positive and negative. Sentiment analysis would thus be an extremely helpful addition in further qualifying the nature of altmetrics and would add a great deal of value into the mix.

We decided to see if adding this extra value was presently technically possible by looking in particular at Altmetric Twitter data around a set of academic fields. This is particularly relevant for as Twitter was, for the academic areas we looked at as part of our project, the platform most used to discuss research online.

Because we also wanted to test some of our own assumptions, we decided to focus our inquiry in particular on the field of Gender studies as, anecdotally, it tends to be an area of great contention and polarization, with fairly extreme attitudes often displayed in social network posts. The assumption that we wanted to test is specifically that Gender studies research is subject to more negative Twitter attention than other research areas.

What we did (the short version)

The process we used was pretty standard for data science projects, namely extracting the data in bulk, cleaning it up, analyzing it and then printing out some (not terribly exciting) graphs and drawing some sensible conclusions.

In terms of selecting and exporting our data, we decided to use all of the Twitter posts from the Almetric dataset that mention any publication from 2016 from the Top10 Scimago ranked journals in the fields of Gender studies along with Cultural studies (humanities) and Paleontology (STEM) as comparisons.

The cleanup of the data set, as with most data science projects, was the most difficult and involved step and we will leave a discussion of this for later in this post.

The sentiment analysis of the clean Tweet data was performed comparatively using the NLTK’s (Natural Language Tool Kit) VADER Python module and the SentiStrength software suite.

We were wrong, but we’re happy!

(Img: http://mimiandeunice.com/about/)

Results from the two tools were statistically very similar and both showed, contrary to our assumptions, that the predominant sentiment in the Gender studies was strongly neutral, rather than negative, with more positive than negative attention overall. This applied similarly to the comparison Cultural Studies and Paleontology data sets.

Even though this went against the assumptions we had going in, it was very good to see that Twitter attention landscape around Gender studies is not as bleak as we assumed and that individuals are sharing their own work and/or judging others’ in a fairly balanced manner – and the same applied to the comparison fields as well.

Sentiment analysis for altmetrics is hard

The other very obvious conclusion we were able to draw from this project is that cleaning up Twitter data for sentiment analysis is hard and using easy, off-the-shelf methods for implementing it is unfeasible.

Sentiment analysis accuracy

>>> sid.polarity_scores(“Your mother was a hamster and your father smelt of elderberries! “)

{‘pos’: 0.0, ‘neg’: 0.0, ‘compound’: 0.0, ‘neu’: 1.0}

As it turns out, machines aren’t perfect (yet) and pre-trained sentiment analysis tools are not all that fantastic at accurately scoring all types of human-produced text. While they do ok with the more straight-forward cases, automated sentiment analysis can have a lot of trouble with more nuanced expressions like irony and sarcasm which are otherwise very present in today’s meme-rich style of expression that permeates social networks in particular (we lovingly named this “the feels”). There are also the Winograd Schemas and similar problems where sentiment analysis algorithms will typically have low accuracy. Finally there is always the issue of terms like ‘cancer’ which will, in sentiment analysis tools sometimes have strongly negative scores associated and will thus be misinterpreted even when the actual context is positive or neutral (ie: “I beat cancer!” is seen by sentiment analysis to have a strongly negative sentiment). These terms require manual intervention to adapt the sentiment analysis lexicon to override term values, or to even build a lexicon from scratch – this itself is not a trivial task.

We did not employ any manual coding of our dataset in our project, which was a bit of a shortcut and so we do not have a good measure of the accuracy of the tools we used on our dataset. This remains something we would need to follow up with when developing a more robust sentiment analysis platform.

Language detection

Sentiment analysis, such as the current state of the art is, has primarily been built around the English language. As such, although some other languages are sometimes supported, sentiment analysis is currently only realistically feasible on English text. This introduces the further challenge of mixed-language data sets, as was the Twitter data set that we worked on. Although language detection libraries DO exist, their accuracy, much like sentiment analysis itself, is not 100% and thus, if there is no possibility of pre-filtering the data export to include only English language text, manual intervention is always needed to support automated language detection which can be very time consuming in the case of large data sets.

Removing duplicate tweets

One of the main challenges in cleaning up our data set was the large amount of duplicate tweets present (an estimated 64% of the original data set) and whether these should be removed or not. On the one hand from a visual inspection of our data set it became clear that the vast majority of the duplicated tweets were in fact authors and other associated entities promoting their own papers and most of them contained little text aside from the paper titles and a link. In some cases these can also be the result of networks of Twitter bots spamming a certain tweet to either promote or denigrate a publication. This justified our final decision of removing duplicates from the data set as removing a source of neutral skew in our case (based on a spot analysis of some of these self-promotional tweets). On the other hand, there is also a possibility of having removed either positive or negative tweets legitimately retweeted by other individuals en-masse. Even though the sentimental valence of such a retweet is debateable, removing such sets of duplicated tweets would effectively introduce a form of bias instead of removing one. So the issue of duplicated material remains a contentious one.

Removing titles

Titles of publication will, in most cases be judged by sentiment analysis tools to be of neutral sentiment and thus their presence in the data set, which is very common for our Twitter data set where individuals often tweet or retweet little extra text other than just a paper title and a link, will constitute a strong neutral bias. We looked at two approaches for removing titles: one by attempting to match and replace the title within the tweet as a whole and the second by tokenizing the title and the tweets  (where each word in the title is treated as a “token” and thus, if it appears anywhere in the related tweet, it is removed — in theory leaving only words that add value behind) and subtracting one from the other. Each of these has its own problems – the first has a very low hit rate, because of human introduced idiosyncrasies; the second has the unfortunate side-effect of removing tokens/terms that were not actually part of an article title within the tweet, but part of the text tweet itself, thus mangling the original tweet text and confusing any subsequent sentiment analysis results. It is likely that with further work with regular expressions and fuzzy matching the title removal process can be improved, but this again shows that something apparently trivial can turn out to be quite challenging when doing data cleanup.

Project & code

Our poster from 4AM and code are available on GitHub: https://github.com/thisisjaid/4am

Comments are closed.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑