Don’t forget JATS4R community call – Thursday 1st March 2pm BST/10am ET. Agenda and call in details found here: http://docs.google.com/document/d/1lV
Draft recommendation for the tagging of Clinical Trials open for comment. Please make comments on the google doc version https://docs.google.com/document/d/1Oa9IAUWfHzBuhWevKKipLihH07pLrGuKv7FS2dRslE4/edit?usp=sharing … The deadline for adding comments is Friday, March 9th, 2018.
Content mining, machine learning, text and data mining (TDM) and data analytics all refer to the process of obtaining information through machine-read material. Faster than a human possibly could, machine-learning approaches can analyze data, metadata and text content; find structural similarities between research problems in unrelated fields; and synthesize content from thousands of articles to suggest directions for further research explorations. In consideration of the continually expanding volume of peer-reviewed literature, the value of TDM should not be underappreciated. Text and data mining is a useful tool for developing new scientific insights and new ways to understand the story told by the published literature. Continue reading “Unrestricted Text and Data Mining with allofPLOS”
In this interview, Bruce Rosenblum explores the origins of JATS & some potential futures http://www.inera.com/_blog/news/post/jats-where-is-it-going-where-has-it-been/ …
2:30pm today: Join Inera CEO Bruce Rosenblum for session 7.1 ‘JATS & BITS’ #CSE2017
@jats4r is restructuring so we can do more & get more input from many people https://docs.google.com/document/d/1QijrR6G9JPCvxbPMHP2hI1vcEgxFk8d9fix_l01-w8c/edit …
When scaling great heights, sometimes you need a place to rest before moving on.
That’s one analogy for XSweet, a toolkit under development by the Coko Foundation. It offers a set of stylesheets for extraction and refinement of data from MS Office Open XML (.docx) format, producing HTML for editorial workflows.
XSweet developer Wendell Piez offered that parallel in a recent presentation at JATS-Con 2017. The two-day conference centers around Journal Article Tag Suite (JATS), an XML format for marking up and exchanging journal content.
The toolkit offers a new path to document conversion — instead of heading first to a format like JATS, XSweet delivers the document into HTML, the lingua franca of the web. Once the document is in HTML, it can be processed in a web-based workflow, progressively improved using browser tools and easily go out to other formats from there. What was once a tedious trek becomes a journey where collaborators focus on what matters — editing and determining the details of publishing. Details of his talk are available as part of the conference proceedings.
XSweet offers “refuge” from the slog of conversion because instead of immediately trying to produce structured JATS from unstructured Docx, it produces a faithful rendering of a Word document’s appearance translated into a vernacular HTML/CSS.
In a 45-minute session titled “HTML First? Testing an alternative approach to producing JATS from arbitrary (unconstrained or “wild”) .docx (WordML) format,” Piez walked the audience through a mini-editorial process: taking a Word docx file sent by an author and pushing it through XSweet to produce an HTML file. “The few hours it took me to produce BITS from the docx original, that was both faithful and also better for further editing and application, were minimal in comparison to the time we were then able to spend on things that really mattered,” Piez said.
Piez is pleased about how the talk went. “A number of audience members approached me afterwards, many of whom had themselves looked this problem in the face before and were willing to confirm the sense of the problem and approaches to it.”
Texture XML informed by @jats4r recommendations…. http://jats4r.org https://twitter.com/dalapeyre/status/857260571061882885 …
Happy that history/event will likely happen in #jats 1.2d1…supports our recs on article versioning http://jats4r.org/article-publication-and-history-dates … https://twitter.com/robindunford/status/857330591552987136 …
XSweet, a toolkit under development by the Coko Foundation, takes a novel approach to data conversion from .docx (MS Word) data. Instead of trying to produce a correct and full-fledged representation of the source data in a canonical form such as JATS, XSweet attempts a less ambitious task: to produce a faithful rendering of a Word document’s appearance (conceived of as a “typescript”), translated into a vernacular HTML/CSS. It is interesting what comes out from such a process, and what doesn’t. And while the results are barely adequate for reviewing in your browser, they might be “good enough to improve” using other applications.
One such application would produce JATS. Indeed it might be easier to produce clean, descriptive JATS or BITS from such HTML, than to wrestle into shape whatever nominal JATS came back from a conversion processor that aimed to do more. This idea is tested with a real-world example.