Qualitative Data Repository Teams with Hypothesis to Develop Annotation for Transparent Inquiry (ATI)

Originally published 12 May 2017 on the QDR blog by Sebastian Karcher.

Scholars are increasingly being called on – by journal editors, funders, and each other – to “show their work.” Social science is only fully understandable and evaluable if researchers share the data and analysis that underpin their conclusions. Making qualitative social science transparent poses several knotty problems. The Qualitative Data Repository (QDR) and Hypothesis have partnered to meet this challenge by developing a new way to cite, supplement, and share the data underpinning published work.

The Challenge: Achieving Transparency in Qualitative Research

Three aspects of qualitative inquiry complicate transparency. First, qualitative data are multi-format and non-numeric (text, audio, video, pictures). Second, they are analyzed and used to support claims individually or in small groups: each insight drawn from one or a handful of cited sources (e.g., books, archival documents, interview transcripts, newspaper articles, video clips, etc.) serves as a distinct input to the analysis. Third, data, analysis, and conclusions are typically densely interwoven across the span of a book or article.

Qualitative Research – Individual Pieces of Data
Qualitative Research – Individual Pieces of Data

Quantitative social science does not face the same challenges. Quantitative work involves the computational analysis of numeric data arranged in a matrix and approached as an aggregate body of information. The analysis is typically summarized in tabular form in the text or appendix of published work. To make quantitative publications transparent, scholars share the study dataset (and relevant information about its creation) and supplemental materials such as the code used for analysis.

Quantitative Research – Matrix Data
Quantitative Research – Matrix Data

Making qualitative research similarly transparent requires resolving at least two problems: safely sharing non-numeric data that may come in multiple forms, and placing those data adjacent to the claims and conclusion in the text that they support. Traditionally, qualitative researchers showed at least some of their work in extended footnotes in which they cited the data they relied upon; provided supplemental information about how the data were analyzed and support their points; and provided extracts from those materials. Traditional footnotes are a sub-optimal solution, however. Tight space constraints severely limit what can be included, a problem made even more acute by the increasing use of in-text citation styles. Moreover, even where extracts of the evidence are included in long-form footnotes, there is no systematic way to ensure that available underlying sources are held and curated in ways that make them accessible and useful to scholars.

The Solution: Annotation for Transparent Inquiry (ATI)

Annotation for Transparent Inquiry (ATI), developed through a partnership between QDR and Hypothesis, uses author-generated web annotations on academic publications. Annotations provide information about data analysis, excerpts from data sources, and links to underlying sources, housed in a data repository. The approach harnesses the power of open web annotations, displayed by Hypothesis. Authors annotate their work and deposit underlying data sources with QDR. The repository curates these deposits and converts them into a set of web annotations on the published article, and creates a data project (the aggregate of the underlying data sources). The annotations can be viewed alongside the article using the Hypothesis client, and interested readers can access the underlying data sources archived at QDR.

Annotation for Transparent Inquiry
Annotation for Transparent Inquiry

The new collaboration between Hypothesis and QDR is already bearing fruit. You can see an example of scholarship annotated using ATI here. This is a working paper by Sam Handlin (Department of Political Science, University of Utah), “The Politics of Polarization: Governance and Party System Change in Latin America, 1990-2010,” published by the Kellogg Institute at Notre Dame University. The annotations you see on the side are served by Hypothesis. QDR curated the annotations and provides access to the underlying files, e.g. for this annotation.

Further, working with the Agile Humanities Agency, QDR has developed the function to use the g #annotations:query:<search phrase> at the end of a link to only show a subset of annotations on a given page using the Hypothesis proxy service. QDR uses this feature to present links to the set of annotations that make up the qualitative data underlying an article by limiting the view to annotations created from QDR’s Hypothesis account. You can see this at work in the link to Sam Handlin’s paper above.

Looking ahead, QDR will hold two workshops in late 2017 and early 2018 focused on evaluating and further developing ATI. The workshops are funded by a grant that the Robert Wood Johnson Foundation has awarded to the Qualitative Data Repository to pilot and promulgate ATI and to encourage its use.

Further, QDR and Hypothesis are hoping to address the challenges created a large share of academic literature in the hard and social sciences residing behind a paywall. Access is provided to particular IP-ranges known to be associated with institutions that pay for access. Finding user-friendly solutions to allow viewing annotations on paywalled material is therefore high on our agenda. We hope to draw on our partnership with a wide range of academic publishers in the “Annotating All Knowledge” coalition to develop those solutions. While our immediate interest is motivated by rendering qualitative research transparent, the annotation of academic literature will benefit a much broader scholarly community.

QDR and Hypothesis will also work towards facilitating third-party authentication to the Hypothesis platform. For QDR, the ability to authenticate users against its own user base is critical to limit access to sensitive material that may be stored in annotations, e.g. in the form of interview excerpts.

Remarq Goes Well Beyond Annotation


Remarq goes beyond annotations to create an entire system of engagement around journal articles, with levels of engagement that users can use as they see fit:

  • Private engagement with content – highlighting and private annotations
  • Semi-public engagement – article-sharing, following articles, polls, profiles
  • Public engagement – qualified comments, post-publication reviews, and author and editor updates

This combination of features delivers what David Worlock described succinctly in a recent blog post after he saw Remarq demonstrated at the recent UKSG Meeting in Harrogate, UK:

“Remarq . . . enable[s] any publisher to create community around annotated discussion and turn it into scholarly exchange and collaboration.”

By offering a full-featured service, Remarq is built to help publishers compete with ResearchGate and Academia.edu. Remarq gathers features readers have found valuable on these platforms – profiles, article-sharing, annotations, comments – and combines these with the strengths publishers offer, including editorial and author involvement, the version of record, post-publication reviews, and article-sharing.

Remarq’s design fits quietly into any web site, requiring no expensive redesigns or unattractive design compromises. Outsell recently noted the strengths of Remarq in a May 10, 2017, Insight:

“Taking on the likes of ResearchGate and Academia.edu means matching (or exceeding) their offerings in terms of simplicity and ease of use – which Remarq does.”

Remarq enables all of these features in ways publishers prefer. For instance, Remarq’s sophisticated commenting feature ensures that commenters are qualified in the fields the journal covers. If the system has not registered expertise via the user’s publication record, educational background, professional position, or professional memberships, comments are held and the user can add more information.

One pain point for publishers is that article-sharing in ResearchGate and Academia.edu removes usage from their sites. Article-sharing via Remarq occurs via the publisher’s site, so usage counts in the standard ways publishers prefer.

We think Remarq represents an important leap forward for online tools available for editors, authors, and readers – the constituents publishers serve. By allowing publishers to quickly become competitive in the scholarly collaboration space, Remarq can solve many strategic conundrums simultaneously, as well.

You can find out more at https://remarqable.com.

Weaving the annotated web

In 1997, at the first Perl Conference, which became OSCON the following year, my friend Andrew Schulman and I both gave talks on how the web was becoming a platform not only for publishing, but also for networked software.

Here’s the slide I remember from Andrew’s talk:


The only thing on it was a UPS tracking URL. Andrew asked us to stare at it for a while and think about what it really meant. “This is amazing!” he kept saying, over and over. “Every UPS package now has its own home page on the world wide web!” Continue reading “Weaving the annotated web”

Understanding Collaborative Tools: An Interview with PaperHive


In the last few years, there has been an increase in the number of collaborative publishing tools that are available for researchers. Each of these tools have their own unique features and shortcomings. As part of our interview series on Connecting Scholarly Publishing Experts and Researchers, we had the opportunity to speak with Alexander Naydenov, co-founder and Head of Marketing at PaperHive. PaperHive is an online scientific collaborative platform that enables researchers to simplify research communication and make reading more interactive, social and productive.

Read full story

HighWire and Hypothesis Partner to Bring Annotation to Publishers

Today Hypothesis and HighWire Press are announcing a partnership to bring a high quality, open annotation capability to over 3,000 journals, books, reference works, and proceedings published on HighWire’s JCore platform.

Logo for HighWire Press.Annotation is a fundamental activity of researchers and scholars everywhere—from taking notes, collaborating with peers, and performing pre-publication reviews, to engaging in conversations with the broader community. Until now, solutions for journals have been limited, proprietary and siloed in ways that significantly constrain their utility. With the advent of a standards-based, open source and interoperable annotation paradigm, that is now changing.

Hypothesis, a non-profit annotation technology organization launched in 2011, is working with publishers, educators, researchers, and journalists to enable annotation across the internet. Within scholarship use cases include: post-publication annotation and community review; authors’ notes over their own work including updates to previous articles, invited discussions, pre-publication peer review, enhanced footnotes, corrections and errata and more. More than 70 major publishers, platforms and technology organizations have come together in support of this interoperable vision under the Annotating All Knowledge coalition.

Through this partnership, HighWire publishers will be able to implement and control their own annotation layers, moderated, branded, and visible by default over their publications. Annotations will be able to be made either under existing publisher user accounts, or within the Hypothesis namespace.

Dan Whaley, Hypothesis CEO and Founder, will be presenting as part of the Partner Showcase at the HighWire Publisher’s Meeting on 5 April 2017 and will be available for more information at the Partner Reception.

“Hypothesis is excited to work with HighWire to deliver a powerful toolchain across publisher content,” says Whaley. “By making annotation native to scholarly content at the platform level we stand the best chance of fulfilling the vision of an interoperable collaborative layer over all scholarship.”

HighWire publishers that are interested in bringing Hypothesis annotations to their publications should contact Heather Staines, Director of Partnerships.

About Hypothesis

The Hypothes.is Project is a San Francisco-based, non-profit software company focused on enabling humans to reason more effectively together through a shared, collaborative discussion layer over all knowledge. Learn more about Hypothesis online.

About HighWire Press

A leading ePublishing platform, HighWire Press partners with independent scholarly publishers, societies, associations, and university presses to facilitate the digital dissemination of more than 3,000 journals, books, reference works, and proceedings. HighWire also offers a complete manuscript submission, tracking, peer review, and publishing system for journal editors, Bench>Press. HighWire provides outstanding technology and support services, and fosters a dynamic and innovative community, enhancing the strengths of each of its members. For more info, visit highwire.org online.

Annotating TV news

Join us May 3-6 in San Francisco at I Annotate 2017, the fifth annual conference for annotation technologies and practices with a keynote from Esther Dyson. This year’s themes are: increasing user engagement in publication, science, and research, empowering fact checking in journalism, and building digital literacy in education.

The Internet Archive’s TV News Archive is a remarkable resource that provides video clips of TV news shows since 2009, text-searchable by means of their closed captions. Annotation of that caption text enables anyone to zoom in on specific moments and language in the TV timeline, bookmark it, and start a conversation linked to text and video. It’s a great way to use TV news as a primary source in education, journalism, and research.

For example, here’s a claim that Politifact rated as four Pinnochios

Right now, Libya, as you know, has fantastic oil, some of the finest oil in the world. Who has the oil? ISIS has the oil. Do we blockade it, do we bomb it, do we do anything? No. ISIS is making a fortune now in Libya.

— Donald Trump, interview on NBC’s Today show, April 21, 2016

And here’s that quote in context at the TV News Archive. That link, which Politifact could have cited (but didn’t), takes you to the segment of the show that contains the quote. The encircled red checkbox on the timeline tells us that Politifact has evaluated a claim made there.

That’s awesome! But wait, there’s more. I can annotate that selection and hand you a Hypothesis direct link that not only takes you to the segment in context, but also highlights the quote and enables us to discuss it in the annotation layer.

But wait, there’s still more! Now suppose we are writing the story using the Hypothesis toolkit for fact checkers. First I’ll capture that quote — along with the rest of the evidence we’re gathering for the story — and assign it a (toolkit-controlled) Hypothesis tag that binds related annotations to the story.

Now, in my editing tool, I’ll grab that direct link from the embedded Hypothesis viewer, using the Copy to Clipboard button.

And I’ll connect it to a statement in the story as a Hypothesis direct link. The link (as above) takes you to the quote in context at the Internet Archive. But now the page also includes the quote into the story’s footnotes, and connects the linked statement (‘calling it a “Hillary Clinton deal”‘) to that footnote.

As a writer (or publisher) this is exactly how I want things to work. The story links directly to both the location in the video and to a highlighted quote within its caption stream. Importantly, it’s never copied and pasted. Rather, the text of the quote is included from a canonical source.

As a reader this is also exactly how I want things to work. I can follow that direct link to explore the context surrounding the quote. Or I can quickly assess the quote — along with all the other supporting evidence — directly within the story.

Hypothesis provides a core capability for fact checking. Hypothesis-powered writing and publishing tools can extend that capability, streamlining the process for writers who gather and organize evidence, publishers who present it, and readers who evaluate it. Here’s a screencast that shows such tools in action.

When researchers, analysts, or students can spend less time and effort wrangling source material, using power tools like these, they’ll be able to invest more in what really matters: the analysis.

Will this way of annotating the TV News Archive be superseded, now that there’s a well-defined model for annotation of video using standard time-based selectors? Not at all! Text-based and time-based annotation will happily coexist. When text is available, it’s an easy and natural affordance for annotators working with video content.

How shared vocabularies tie the annotated web together

I’m fired up about the work I want to share at Domains 2017 this summer. The tagline for the conference is Indie Tech and Other Curiosities, and I plan to be one of the curiosities!

I’ve long been a cheerleader for the Domain of One’s Own movement. In Reclaiming Innovation, Jim Groom wrote about the need to “understand technologies as ‘potentiality’ (to graft a concept by Anton Chekov from a literary to a technical context).” He continued:

This is the idea that within the use of every technical tool there is more than just the consciousness of that tool, there is also the possibility to spark something beyond those predefined uses. The only real way to galvanize that potentiality is to provide the conditions of possibility — that is, a toolkit for user innovation.

Continue reading “How shared vocabularies tie the annotated web together”

SciLite – an open annotation platform for sustainable curation

Scientific publications are the main medium for sharing scientific results and assertions supported by observational data. Consequently, bioinformatics resources depend on research literature to keep the content updated; a task carried out by curators, who extract information from articles and transfer its essence to the corresponding resources.

The advances made in high-throughput technology have resulted in a tremendous growth of biological data, increasing the number of research papers being published. It provides a great challenge for manual curation that relies on finding the right articles and assimilating facts described in them. Therefore, services that support researchers and curators in browsing the content and identifying key biological concepts with minimal effort would be beneficial for the community.


What is SciLite?


We at the literature services group, EMBL-EBI, host Europe PMC, a database for life science literature, a partner in PubMed Central International. Europe PMC hosts a large variety of content and provides free access to over 32 million abstracts (27 million from PubMed) and 4 million full-text articles.

Our goal is to develop Europe PMC as an open community platform for new developments that improve our interaction with the scientific literature. As a part of this effort we have recently launched a new Europe PMC tool – SciLite, which we present in our Software Tool Article published on Wellcome Open Research. SciLite presents an opportunity for text miners to showcase their work to a wider public. SciLite exposes text-mined annotations and provides deep links with related data to a wide audience of scientists and curators, as well as other interested stakeholders.


How does SciLite work?


SciLite links text mined annotations from literature to the corresponding data resource and highlights those outputs on full text articles and abstracts in Europe PMC. Using the checkboxes on the right-hand side of article pages, readers can select the type of concepts that they are interested in, and matching annotations for that article will be highlighted on the article text as below. Clicking on the highlighted terms in the text opens a popup with information about the given annotation, such as a link to related database record and the source of the annotation.


What types of annotations are available?


SciLite annotates articles by identifying concepts, such as gene/protein names, organisms, diseases, Gene Ontology terms, chemicals, and accession numbers, as well as biological events (e.g. phosphorylation). The latter annotations are provided by the National Centre for Text Mining. SciLite also displays gene function annotations (GeneRIF – Gene Reference into Function) contributed by the Bibliomics and Text Mining group at the University of Applied Sciences, Geneva.


Are all annotations correct?


Although text-mining algorithms have greatly improved over the years and are being actively used in real-world applications, inaccuracies do occur. To counteract that we have introduced a user-driven mechanism to refine the annotations. While reading a paper, users of Europe PMC can validate or report an erroneous annotation (see example below). Such feedback ensures the quality of provided annotations and improves the text-mined outputs.


How is SciLite useful?


For the reader SciLite makes it very easy to skim-read articles, focusing on highlighted terms and concepts and helping to quickly understand what a given article is about. Those annotated entities are linked to the corresponding resources, so the reader can comfortably get to the underlying data in a straightforward way. In addition, SciLite could be useful for fetching related concepts from the text, as annotations highlighted in close proximity might signal a functional relationship between those terms, e.g., gene-disease association.


What are the future plans for SciLite?


We believe SciLite has the potential to further enhance the reading experience of scientific articles by developing applications that improve full text searching, filtering and integration with biological data. We have taken an initial step towards this for Protein Data Bank (PDB) accession numbers with the BioJS application. For a given PDB accession number it fetches the coordinate information and displays the corresponding 3D molecular structure, serving as an interactive visualiser (see below). Similar applications could be developed to display relevant information for a given annotation type in the context of the article.


How can you contribute to SciLite?


We encourage sharing annotations from text-mining and other associated communities on the SciLite platform. We have set up a participation page to assist interested groups to submit annotation data. Furthermore, the annotations on SciLite are modelled based on the Web Annotation Data Model specification, and the open nature of the format allows other platforms, such as journal publisher websites or other content aggregators, to fetch these annotations from SciLite to be reflected on their resource.


If you would like to find out more about developments at Europe PMC, visit https://europepmc.org/Roadmap or follow @EuropePMC_news on Twitter.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑