Scientific publications are the main medium for sharing scientific results and assertions supported by observational data. Consequently, bioinformatics resources depend on research literature to keep the content updated; a task carried out by curators, who extract information from articles and transfer its essence to the corresponding resources.
The advances made in high-throughput technology have resulted in a tremendous growth of biological data, increasing the number of research papers being published. It provides a great challenge for manual curation that relies on finding the right articles and assimilating facts described in them. Therefore, services that support researchers and curators in browsing the content and identifying key biological concepts with minimal effort would be beneficial for the community.
What is SciLite?
We at the literature services group, EMBL-EBI, host Europe PMC, a database for life science literature, a partner in PubMed Central International. Europe PMC hosts a large variety of content and provides free access to over 32 million abstracts (27 million from PubMed) and 4 million full-text articles.
Our goal is to develop Europe PMC as an open community platform for new developments that improve our interaction with the scientific literature. As a part of this effort we have recently launched a new Europe PMC tool – SciLite, which we present in our Software Tool Article published on Wellcome Open Research. SciLite presents an opportunity for text miners to showcase their work to a wider public. SciLite exposes text-mined annotations and provides deep links with related data to a wide audience of scientists and curators, as well as other interested stakeholders.
How does SciLite work?
SciLite links text mined annotations from literature to the corresponding data resource and highlights those outputs on full text articles and abstracts in Europe PMC. Using the checkboxes on the right-hand side of article pages, readers can select the type of concepts that they are interested in, and matching annotations for that article will be highlighted on the article text as below. Clicking on the highlighted terms in the text opens a popup with information about the given annotation, such as a link to related database record and the source of the annotation.
What types of annotations are available?
SciLite annotates articles by identifying concepts, such as gene/protein names, organisms, diseases, Gene Ontology terms, chemicals, and accession numbers, as well as biological events (e.g. phosphorylation). The latter annotations are provided by the National Centre for Text Mining. SciLite also displays gene function annotations (GeneRIF – Gene Reference into Function) contributed by the Bibliomics and Text Mining group at the University of Applied Sciences, Geneva.
Are all annotations correct?
Although text-mining algorithms have greatly improved over the years and are being actively used in real-world applications, inaccuracies do occur. To counteract that we have introduced a user-driven mechanism to refine the annotations. While reading a paper, users of Europe PMC can validate or report an erroneous annotation (see example below). Such feedback ensures the quality of provided annotations and improves the text-mined outputs.
How is SciLite useful?
For the reader SciLite makes it very easy to skim-read articles, focusing on highlighted terms and concepts and helping to quickly understand what a given article is about. Those annotated entities are linked to the corresponding resources, so the reader can comfortably get to the underlying data in a straightforward way. In addition, SciLite could be useful for fetching related concepts from the text, as annotations highlighted in close proximity might signal a functional relationship between those terms, e.g., gene-disease association.
What are the future plans for SciLite?
We believe SciLite has the potential to further enhance the reading experience of scientific articles by developing applications that improve full text searching, filtering and integration with biological data. We have taken an initial step towards this for Protein Data Bank (PDB) accession numbers with the BioJS application. For a given PDB accession number it fetches the coordinate information and displays the corresponding 3D molecular structure, serving as an interactive visualiser (see below). Similar applications could be developed to display relevant information for a given annotation type in the context of the article.
How can you contribute to SciLite?
We encourage sharing annotations from text-mining and other associated communities on the SciLite platform. We have set up a participation page to assist interested groups to submit annotation data. Furthermore, the annotations on SciLite are modelled based on the Web Annotation Data Model specification, and the open nature of the format allows other platforms, such as journal publisher websites or other content aggregators, to fetch these annotations from SciLite to be reflected on their resource.