I’m showing my age here, but having observed the evolution of web development since it’s early days, it’s fascinating to look at how it’s become increasingly open to contributions from an ever-increasing set of participants. From the mid-90s when internet technologies were literally built in isolation in a garage through to the birth of user-centered design and agile, there has been steady growth in the attempts to bring the voices of those who will actually use the technology into the process.
We are seeking one or more talented Node,js / React developer(s) to work with us on improving scholarly / scientific publishing forever. Continue reading “Coko is seeking one or more Node.js / React Developers”
With 2017 drawing to a close, it seems like the right time to reflect on what the year brought us at Hindawi. Continue reading “2017 in review: 12 months of new initiatives”
On October 23rd 2017 in Berlin, we held a one day FORCE pre-meeting open source bazaar in partnership with Hypothesis. The day opened with a discussion on about replacing our currently siloed scholarly communications platforms and tools with a new ecosystem of open source technologies. This is the only way to transform the sector at scale, which is a job too large for any one organization or company.
Kristen opened the day with the key themes:
3 key themes of #osbazaar – community, diverse economy system, reuse
— Alice Meadows (@alicejmeadows) October 24, 2017
- Coko’s own Jure and Yannis kicked it off with overviews of the PubSweet and Editoria Projects
- Dan Whaley followed-up with a discussion of Hypothes.is’s recent milestones
- Michael Aufreiter and Oliver Buchtala demo-ed Substance and the Texture editor
- Nokome Butler presented on Stenci.la and how to build reproducible documents (with embedded code!)
- Richard Smith-Unna showed how you can use Science Fair to do peer-to-peer sharing of articles
- Max Ogden discussed dat and distributed approaches to data sharing
- Karen Yook walked through Wormbase’s micropublication.org project
- Peter Kraker showed us the world with Open Knowledge Maps
- Adam Hyde discussed the future of journals with the INK and xPub projects
- Daniel Mietchen showed how all of our projects are interconnected through Wikidata
Oh and don’t forget, we also had a celebration and product showcase during lunch from groups across the annotation community: SciLite, Hypothesis/SciBot, Pundit, Xpansa, PaperHive, eLife, Profeza) and others. All in one day!
The tweets are all at #osbazaar. One tweet captured the essence of the day
— Juan Pablo Alperin (@juancommander) October 24, 2017
Kristen summarized the day at a session at FORCE with this presentation:
Was the inaugural OS bazaar a rousing success? Was a good time had by all? Yes and yes. Any opportunity to come together and hear updates from the community about the incredible projects under development. We’re already looking forward to OS Bazaar #2!
If you would like to consider getting involved with the code that we produce the you might find the following information helpful.
First, we have our own chatroom. It runs on the wonderful Mattermost platform (open source). You can find our version running here:
The account creation page is linked from there or you can jump direct to it from here:
The main room ‘Townsquare’ is where the general chitter chatter takes place. Feel free to jump in and introduce yourself. We are a pretty friendly bunch so don’t be scared to yell out with any issues or help you might need or ideas you might have.
Next, you may wish to have a look around our code. We have a few places for you to check out, depending on your interest. As with our chat room we host our code on an open source platform – Gitlab (github is closed source). You can find our Gitlab here:
You can create a new account from the Sign In link, or access it directly here:
You don’t need an account to access any of the code, but you do need one to make any merge requests (same thing as a pull request on github).
As for the code…we have much for you to look at!
INK – INK is our framework for managing file conversion, entity extraction, content enrichment etc etc. It consists of an API (written in Rails) and a client (written in JS, the client is generally used for admin purposes). You can find the api and the client here:
INK steps can be found here:
XSweet – XSweet is our file conversion scripts for MS Word to HTML. Written in XSLT. You can find them here:
PubSweet – and lastly, our decoupled CMS, the app that enables us to build platforms and reuse all these juicy components, is to be found here:
As you can see, we have a lot going on! Many products in play. If you would like to learn more please jump into Mattermost and say hi! We welcome code contribs, ideas to improve the technologies, questions about what we are trying to do – or anything else you have to say!
Editoria, the book production platform we’re building with the University of California Press, is getting ever closer to a 1.0 release. Check out the latest blog post on the Editoria website for an overview of the system and the latest features:
There, we’ll walk you through all the major interfaces, including…
…the Dashboard, wherein the editor sees a list of all current books…
…the Book Builder, wherein a user creates the book structure, uploads content into it, and manages the editing workflow…
…and the newly redesigned Editor, wherein users edit and collaborate on the content in a clean and appealing editor that supports import from Word, styling, formatting, track changes, notes, and commenting.
There are also demo videos so you can see Editoria in action.
For the past 8 months we have been been building INK – the open source file conversion and transformation engine for publishing.
INK is now nearing 1.0, ready in the next weeks. In anticipation of the first major release we thought you might to know a little more about what INK does and why.
INK has been built with two major use cases in mind:
- Publishers – publishers need to automate all manner of operations on files (conversion, enrichment, format validation, etc). INK does all this and can be integrated with any current technology stack the publisher uses.
- File conversion pro’s and production staff– the people who love staying up all night perfecting file transformations. INK is a job management framework into which you can plug any action you want taken on files, create recipes, generate reports and more.
Lets look at these needs a little closer.
INK and Publishers
Publishers need to do all sorts of things to files. The highest value need right now is to automate file conversion from one format to another. Most publishers currently ‘automate’ file conversion by sending MS Word documents to external vendors which is both costly and slow. Adding to these inefficiencies, it can be painful when there are errors introduced by the file conversion vendor and the workflow required to correct those errors.
We built INK so that Publishers could automate these conversions and generate reports to measure accuracy and speed. INK supports the Publishers workflow by acting as an ‘invisible’ file conversion service. In these situations you push a button and get a result. INK can be integrated into your current workflow with minimal hassle since it uses APIs. Because INK is open source, Publishers can either set up their own instance of INK, or they can use INK as offered by a service for a small fee (we are currently talking to some service providers to make this kind of hosted version available). It could also be possible for several smaller publishers to set up a shared instance of INK to lower costs even further.
As mentioned above, integration with existing softwares is easy. We have, for example, integrated INK with the open access monograph production platform – Editoria – as you can see below. The integration comes in the form of a button that says ‘Upload Word’. Uploading a Word doc in this instance will send the document to INK and return beautifully formatted and structured HTML to Editoria and ‘automagically’ load it into a chapter. All done without the user knowing a thing about file conversion.
In other contexts you may require production tools as well to QA conversions. In this case it is very simple to set up an tightly integrated production environment connecting INK to, for example, a QA editing environment. Everything you need to make your production staff happy (see below for how INK helps troubleshoot file conversions).
INK and File Conversion Pro’s / Production Staff
It is a simple truth that you cannot have good file conversions without some file conversion pro, somewhere, doing the initial hardwork for you. This is because file conversion is not just a science, it is an undocumented art!
INK helps these talented artists help you in 3 critical ways:
- Easy to build conversion pipelines – INK enables production staff to construct file conversion pipelines through a simple UI. This means they can assemble a new pipeline, reusing previously constructed conversion steps, in (literally) a matter of minutes. This flexibility hasn’t yet been available in the publishing industry. Most file conversion pipelines are hard coded which makes them very difficult to optimize, but it also makes it very difficult to reuse any part of the pipeline for other conversions.
- Reusable steps – INKs pipelines are built up of discrete reusable steps. This is the magic behind INKs philosophy for reuse. File conversion specialists can build these steps very easily (we have clear example documentation) and then use these steps in as many pipelines as they wish. Steps can be wholly new code in any language, leverage existing services via APIs or run system processes. These steps, once built, can be shared with the world or kept private. Our hope is to build up a shared repository of reusable steps for every need that a publisher may have. This would assist us all by reducing the possibility of duplicating effort, and enable us a community to spend the time optimizing conversion steps rather than building the same old hard coded conversion pipelines over and over again.
- Troubleshooting conversions – INK has a very sophisticated way of managing file conversions and exposing the pipeline results through a clean open API. INK also logs and displays errors to assist in troubleshooting. That means file conversion specialists or production staff can inspect any given conversion and work out exactly where a problem may have occurred and why.
Currently we have developed INK steps to achieve the following:
- Docx to xHTML (a very sophisticated conversion that we have been working on for over 6 months)
- HTML to PDF
- EPUB to InDesign XML (ICML)
- Docx to PDF
- HTML to print-ready, journal-grade PDF
In the works are the following:
- Docx to JATS
- LaTeX to PDF
- HTML to JATS
- R Markdown to Docx
- Markdown to HTML
- HTML to Markdown
- EPUB to print ready book formatted PDF
- HTML to DITA XML
- EPUB to Mobi
- Docx to DocBook XML
and more! INK itself, and all steps we produce, are open source (MIT license).
Its not all about conversions
INK isn’t only about conversions. Reusable steps can be written to mine data from articles, automatically register DOIs, automate plagiarism checks, normalize data, validate formats and data, link identifiers, syndicate, and a whole let more. One of the most important use-cases ahead of us, we think, is to start parsing and normalizing metadata out of manuscripts at submission time and then disseminating to third parties – reducing the time and effort for processing research and improving early discovery of preprints or articles. A perfect job for INK. We will be moving quickly on to these use cases after our initial file conversions are in place. You should see rapid progress on these other file operations within the next month or so!
There is a lot to the INK universe as it is a sophisticated software. Here is a short break down for the technically minded:
INK (API SERVICE)
- HTTP Service API
- Resource management
- Async request management
- Multi tenet service architecture
- JWT authentication
- Step abstraction (leveraging GEMs)
- Recipe management
- Web Socket support
- Event subscription during recipe execution, meaning any client using the INK API can update their users on the progress of execution in real time.
INK (DEMO ClienT)
- UI Recipe creation (including selecting the steps from an automatically populated, searchable dropdown of available steps on that INK instance)
- Public and private recipes
- Editing a recipe from the UI
- An updated recipe view with clearer step names, and with descriptions
- Users can immediately see the file list belonging to each step as it completes.
- Users can see download each file individually or together as a .zip file.
- Administrators can get a status report of services INK uses, so it’s easy to spot potential issues that may affect users.
- A list of user accounts – it’s basic at the moment, and will evolve to account management.
- A list of available steps. In the future, administrators will be able to enable and disable execution of these steps from this panel.
As you can see, INK has come along a long ways from a proof-of-concept and we’re excited about what it can bring to the domain.
We are currently working on the following features:
- downloadable log and report generation
- single step execution (currently steps are nested in recipes)
- synchronous execution
- http recipe parameters
- http step parameters
- semantic tagging of outputs
Please get in touch if you’re interested in finding out more or working with us to improve INK, implement it, or build and share steps! INK 1.0 due by the end of June!
There has been much attention recently given to preprints, the early versions of journal articles that haven’t yet been peer-reviewed. While preprints have been around since before arXiv.org launched in 1991, fields outside of physics are starting to push for more early sharing of research data, results and conclusions. This will undoubtedly speed up research and make more of it available under open and reusable licenses.
We are seeing the beginning of a proliferation in the number of preprint publishing services. This is a good thing. We know from the organic growth of journals that researchers often choose to publish in places that serve their own community. This will no doubt be true with preprint services as well and offering them choices makes sense. Even within the very large arXiv preprint server, there are many different community channels where researchers look for their colleagues work.
Last year ASAPbio formed with the goal of increasing preprint posting in the life sciences. There was agreement that preprints should be searchable, discoverable, mineable, linked, and reliably archived. These are all steps that the online journal publishing industry needed to take 20 years ago, and there are well-understood mechanisms in place. This is how cross-journal databases such as PubMed came to be, best practices such as assigning DOIs evolved, standards such as COUNTER were developed to ensure consistent reporting on usage, and integration with research databases such as GenBank were worked out.
These same efforts will be needed across the different preprint services to ensure that preprints are taken seriously as research artifacts. As more preprint channels arise, this infrastructure and operating standards will only be more important. A research communication service is not necessarily the same as its underlying technology and, though people tend to equate the two, shared preprint infrastructure is actually the best way to ensure costs are kept down and standards are applied.
As the ASAPbio conversation evolved, so did the discussion of whether a central service was needed for aggregation of preprints. I believe that what is needed is a collection of services that are centralized in some way to ensure a low cost and easy path to preprint services and that they work together as effectively as possible.
Several of the needed services include:
- Consistent standards applied to preprints (identifiers, formats, metadata)
- Reliable archiving for long term preservation
- A record of all preprints in one place for further research purposes (text and data mining, informatics, etc)
- Version recording and control
- Best practices in preprint publishing applied across all services
- Sustainability mechanisms for existing and new preprint services
Comparable services for journals have helped to make journal literature reliable and persistent. If we want preprints to turn into first class research artifacts in the life sciences and other fields outside of physics, we need to apply some degree of the same treatment for them – and at this early stage, now is the time to plan for these services.
A centralized set of services could ensure that, for preprint services that already exist, their efforts are tracked and a record is kept. If they don’t have DOIs they can get affordably get them. If they have DOIs, those DOIs are tracked and searchable through a central API. If the preprints are PDF-only, a version could be converted to structured data and held in a minable database.
The sooner that the research communication community gets out in front of these support services for preprints, the less chance there is for loss of data and an incomplete record of this growing segment of research literature.
When scaling great heights, sometimes you need a place to rest before moving on.
That’s one analogy for XSweet, a toolkit under development by the Coko Foundation. It offers a set of stylesheets for extraction and refinement of data from MS Office Open XML (.docx) format, producing HTML for editorial workflows.
XSweet developer Wendell Piez offered that parallel in a recent presentation at JATS-Con 2017. The two-day conference centers around Journal Article Tag Suite (JATS), an XML format for marking up and exchanging journal content.
The toolkit offers a new path to document conversion — instead of heading first to a format like JATS, XSweet delivers the document into HTML, the lingua franca of the web. Once the document is in HTML, it can be processed in a web-based workflow, progressively improved using browser tools and easily go out to other formats from there. What was once a tedious trek becomes a journey where collaborators focus on what matters — editing and determining the details of publishing. Details of his talk are available as part of the conference proceedings.
XSweet offers “refuge” from the slog of conversion because instead of immediately trying to produce structured JATS from unstructured Docx, it produces a faithful rendering of a Word document’s appearance translated into a vernacular HTML/CSS.
In a 45-minute session titled “HTML First? Testing an alternative approach to producing JATS from arbitrary (unconstrained or “wild”) .docx (WordML) format,” Piez walked the audience through a mini-editorial process: taking a Word docx file sent by an author and pushing it through XSweet to produce an HTML file. “The few hours it took me to produce BITS from the docx original, that was both faithful and also better for further editing and application, were minimal in comparison to the time we were then able to spend on things that really mattered,” Piez said.
Piez is pleased about how the talk went. “A number of audience members approached me afterwards, many of whom had themselves looked this problem in the face before and were willing to confirm the sense of the problem and approaches to it.”
XSweet, a toolkit under development by the Coko Foundation, takes a novel approach to data conversion from .docx (MS Word) data. Instead of trying to produce a correct and full-fledged representation of the source data in a canonical form such as JATS, XSweet attempts a less ambitious task: to produce a faithful rendering of a Word document’s appearance (conceived of as a “typescript”), translated into a vernacular HTML/CSS. It is interesting what comes out from such a process, and what doesn’t. And while the results are barely adequate for reviewing in your browser, they might be “good enough to improve” using other applications.
One such application would produce JATS. Indeed it might be easier to produce clean, descriptive JATS or BITS from such HTML, than to wrestle into shape whatever nominal JATS came back from a conversion processor that aimed to do more. This idea is tested with a real-world example.
The promise of open science to improve the speed, transparency and completeness of research sharing has attracted a lot of innovators and developers creating new, open source technology solutions. All too often, though, technologies are built by organizations that see themselves as competitive with one another and work at cross purposes.
We’re focusing on changing this culture. That may seem a strange statement from a Foundation whose initial work has already launched open infrastructure projects such as PubSweet and INK, but bear with us. Coko is working to seed a new ecosystem of open source projects, tools and platforms that work together.
We envision building an evolving network of modular, interoperable, flexible and reusable open source projects that facilitate rapid, transparent and reproducible research and research communication for the public good. Rather than remaining independent and siloed, these projects will share resources and learn from each other, creating an open science infrastructure. Coko is striving to create a healthy ecosystem of projects that can thrive and work with each other to solve the many problems and opportunities that face STEM publishing today.
Our first small step in this direction — which we see as a giant leap — is pulling together complementary projects to create an Open Source Alliance for Open Science. This federation will actively work together to form the ecosystem, agreeing on best practices that emphasize generosity and openness. The idea is to create a common pool of resources whose development is driven by community needs. Code is shared, so are tips for funding applications, report writing and outreach (etc).
An apt analogy is a community garden: plants that grow well together in common soil are seeded, grown, harvested, shared and plowed back into the land. Individual “plots” may be tended by the gardeners who are most adept at cultivating the seedlings, yet cross-pollination and resource sharing where appropriate are encouraged. The gardeners work in a common space, find territorial solutions and share fruits of the “harvest.”
One example of how we are prototyping this process is with the Substance Consortium, we helped found along with the Public Knowledge Project (PKP), SciELO and Érudit in 2016. Consortium members all use (or intend to use) the open source Texture editor, which helps publishers improve structured documents without having to mess with the underlying markup of XML (extensive markup language). The Consortium started as a way to recognize that organizations using the tools as critical infrastructure have a responsibility to contribute to their upkeep. To that end, Coko has played a foundational role in establishing the consortium, as well as putting energy and funds that contribute to the sustainability of Substance and the codebase.
As another example we introduced the innovative new project, Stencila, to funders — and then stepped aside. Typically, in a competitive environment, smaller projects that are desperate for initial funds may be co-opted by larger ones who overshadow the smaller organization and take a large cut of the funding. The larger project may vacuum up the credit without adhering to attribution best practices. Instead, in a demonstration of good faith, we coached Stenci.la through the funding process and made the direct introductions to funders. Stepping aside to enable Stenci.la to operate as they need to, with the funds they need, and receive the recognition they duly deserve.
Our efforts to cultivate these projects differ from the typical competitive model where organization see what others are doing, then throw shade on the newcomers by claiming to be building the exact same thing. This land grab results in whoever has the superior budget, PR and grant-writing staff, and stronger name recognition “winning,” whether or not they intend to actually create the product, build it well, or share it in a meaningful way. This highly competitive landscape discourages healthy open source communities forming around projects and meaningful, productive, inter-project collaboration.
The garden model will give smaller projects a chance to thrive and grow so as to avoid being co-opted or plowed under. This will create a more diverse and rich ecosystem, since many of these projects arise out of specific expertise that larger projects may not have.
To lay the groundwork for this Alliance, we’re planning a meeting May 1 in Portland, along with founding partners DAT, the Code for Science & Society (CSS) and The California Digital Library (CDL). By meeting in person, discussing initiatives and directly collaborating, we seek to generate buy-in on shared goals and open direct lines of communication between organizations. The initial meeting will garner support for shared goals and values and establish a self-sustaining community with firm attendee commitments to continue the conversation. If you’d like to participate email us at email@example.com
The first event towards building an alliance of good faith actors working in open source for open science. Request an invite now! https://www.eventbrite.com/e/open-source-alliance-for-open-science-forum-tickets-32172984262 – Travel funds available.
INK is Coko’s ingestion, conversion and syndication environment that converts content and data from one format to another, tags with identifiers and normalizes metadata.
When an author or group of authors creates content, there is a fair bit of processing that needs to be done on the content in order to prepare it for publishing.
Typical use cases include converting Word and other proprietary formats into highly structured formats such as HTML5, XML, and ePub, and outputting to syndicated services, the web and PDF. Additionally INK can add common identifiers such as DOIs and geolocation IDs and ensure compliance with standards for content and metadata.
Frameworks similar to INK have been created and re-created in both open and proprietary domains, but INK takes it further and does it better. One of the big advantages of INK is that it is an open source framework for chaining custom processing steps together to automate some of these processes. We encourage (but not require!) the creation and sharing of steps and recipes – ordered collections of processing steps – so communities, organisations and individuals can help each other. It’s all about sharing and collaboration which is pretty much what Coko is about.
In this post, I detail how INK works, using cake as an analogy. Don’t worry, if you’re a pie person, you can still follow along as you dream of the perfect raspberry chiffon…
What does INK do?
What a great question – glad you asked.
INK is an open-ended, extensible, modular service that allows processing of files (e.g. documents) via execution of Steps. A user feeds in one or more files, usually a document, the step/s do something with the file/s in sequence, and the user gets the result. It sounds very general, and admittedly a bit abstract, because INK is meant to be flexible and customisable by anyone. Let’s break it down a bit.
Each Step contains a bit of logic that can do something to one or more files. For example:
- convert from one format to another, such as converting a HTML document to PDF
- clean up HTML
- modify the images in a document (resize them, make them greyscale…)
- translate a document to another language
- analyse the contents of a document and generate a summary
This is just a small number of examples. Steps are intentionally open-ended.
INK and its steps are released open source, so anyone can set up their own server and run their own customised INK service. They can install whichever steps satisfy parts of their own publishing process. If there’s something they need to do to a document that’s not covered by an existing step, they can write their own and add it to their instance of INK.
Often with publishing toolchains, there are several things that need doing to a raw document before it’s ready to publish. INK lets you chain steps together into a recipe. A user can create an INK recipe which is a pipeline of steps all in a row that need to be executed in sequence.
Think of a recipe just like you’d think about making a cake. A recipe details how one might turn raw materials (sugar, flour, etc) into a cake — but you don’t have an actual cake until you get your ingredients together, put on your apron and follow each step, one after the other.
As you’ll know, a recipe involves more than throwing everything together!
INK can execute the recipe given some files, and when all the steps are done, the user can see the results from each step that is passed to the next one. They can see if something went wrong, or check if some intermediate step in the recipe didn’t behave as expected. They may need to tweak the step logic itself, or make sure they provide the right kind/s of file/s.
How does INK work?
You might be thinking – ask a technical person to explain how some of their software works… and the answer is usually jargon-riddled and aimed towards other developers as an audience. Fortunately, I’ve been teaching developers and non-developers alike for long enough that I can manage to explain something in language that suits a wide range of audiences. Hopefully the following is clear!
INK has three main parts.
In the Ruby programming language, people can write standalone code libraries that other Ruby programs can use. These are called ‘gems’. INK uses INK step gems to detail what each step does.
A step gem contains one or more steps contained in it. An INK server might have any combination of step gems installed on it. If a step gem is installed on the server (by the system administrator), then recipes using steps contained within that step gem can be executed on that server. It’s designed this way so that someone running an INK server has control over what steps users can use.
The recipe engine is a Rails web app that keeps track of users, their recipes, and which steps are in which recipe, and in what order. It also tracks which recipes have been executed by whom, and where to find the resulting file/s for each step in the pipeline. When a user decides to execute a recipe, they provide at least one file, and the recipe engine hands it off to the execution engine.
The execution engine performs the logic in the steps in the order specified by the recipe. The results of each step are provided to the following steps in sequence (more about this in a bit).
In order to use INK, users interact with the client. Since INK is an API (a web-based service that doesn’t have a graphical interface of its own), there are other programs, such as ink-pubsweet or the INK client, that people can use to tell the INK system what to do.
Example: Docx to SHOUTY HTML.
Let’s take an example recipe and see what happens when it is executed.
The user has a recipe called “Docx file to SHOUTY HTML”, which has the following RecipeSteps:
- Docx to HTML
- SHOUTIFIER (a silly step that makes every letter CAPS and replaces all periods next to a letter with three exclamation marks!!! Not immediately useful, but makes for a GREAT DEMONSTRATION!!!)
- The user asks the system to execute the recipe, and provides a file (let’s call it the totally unoriginal name example.docx)
- INK checks that the recipe can be executed.
– it’s been given at least one input file
– all the steps the recipe asks for are available. Different installations of INK on separate servers might have different steps available, depending on what step gems the system administrator has put on that server. It’s a bit like kitchens having different equipment in them – for example, a pâtisserie kitchen would have quite different equipment than one for charcuterie. Anyone can spin up their own INK server, so it’s really up to them what step gems will deliver the most value to them or their organisation.
- The recipe engine queues the execution and immediately lets the user know that it’s in progress. We use an asynchronous process here, so that the user gets some immediate feedback that the execution is in progress, and they can do other things while INK takes care of processing.
- The execution engine takes the recipe execution request off the queue, creates a Process Chain from that recipe, and starts the execution. The execution system is always checking the queue for things for it to process, so normally this is instant. If there are some process chains still going, the execution system might wait until they are done (it depends on the pool size – how many such processes the system administrator has told INK it can do at once).
- The execution engine starts at the first step and executes it. It copies the input file/s into the step’s “personal” execution directory and executes whatever logic is in there against some or all of the files. In the example above, the execution engine creates the folders it needs, copies example.docx into the directory for the first step (Docx to HTML), then calls the step logic in Docx To HTML. The latter involves calling the system utility Pandoc on the docx file to convert it into HTML. The resulting HTML is written to the same sandbox directory.
So the directory for the Docx To HTML process step will contain the original docx file (unless the step logic includes cleanup of unneeded older files, which is ideal but not mandatory) and the resulting HTML output from the Pandoc call. Then the step logic tells the framework that it’s all done, and done successfully (ie. without an error).
If the user had provided a file that the step wasn’t expecting – e.g. a text file, or an image file – the step raises an error to say “I can’t work with this – I need a docx file please” and signals the execution engine to halt the process chain with an error. There’s no point in continuing this particular recipe if a step spectacularly fails.
- The execution engine continues to the next step, and repeats until there are no more steps to execute. Again, the files from the previous step are copied into the personal execution directory of the current step, executes the logic against them, and writes the result into the output directory of the current step. And so on.
In our example, the execution engine copies the .docx and .html files from the Docx To HTML process step into the personal execution directory of the SHOUTIFIER process step, executes the logic, and will change the .html file so the content is ALL IN CAPS!!!
- When the pipeline has come to an end, the execution engine notifies the caller via callback (if they provided one). Callbacks are like leaving your phone number and saying “Here are the ingredients and the recipe. Call me on this number when the cake is done.” Meanwhile, you don’t have to sit by the phone and wait – you go do something else and get notified when it’s all done… and then you get to have cake! (Figurative cake in this case. INK can do a lot of things, but it can’t make literal cake. Sorry.)
If there was some sort of issue during step execution, INK keeps track of any errors raised and logs them.
INK makes the result files available for download from any process step owned by you, together as a zip file or individually. You can download the contents of the input files, or the HTML output of Docx To HTML, just to make sure it looked right.
INK provides an extensible step-based pipeline framework to help make great content into a publishable format for distribution. Recipes and steps are totally customisable and can be made by organisations and individuals to suit their own requirements.
What really makes INK awesome, is that it can be suited to a wide range of processes. We look forward to hearing what delivers value to your organisation. Give it a try and let us know how you get on.
Open Source (MIT):
We would like some feedback from you!
Their basic requirements were clean and simple:
- If the user adds text to the document: mark the text as an addition, and give it some color.
- If the user deletes existing text: annotate it as a deletion, and put a strikethrough line over it to show that this part has been deleted.
You can see the above 2 requirements implemented in the following animation:
As you can see, a third requirement is that users must also be able to resolve these changes by “accepting” or “rejecting” the suggested change.
We looked into a lot of different ways to solve these requirements with this additional design restraint: the interface for resolving changes should be in close proximity to the change itself to reduce the amount of cursor movement required to accept or reject a change.
In the image below you can see our first attempt to solve this problem:
As you can see, each change has a small area underneath it that gives you two options: “accept” and “reject.” These areas are always displayed and serve as a visual cue that something has been changed – allowing the user’s eyes to quickly identify all changes in a document. A possible disadvantage is that this may be overwhelming to the eye. More importantly, the ‘accept/reject’ texts tend to dominate the interface and the space needed for these items demands that the document’s line spacing is doubled, which may look a bit awkward to some.
Our second approach did not stray far from the first idea, but rather aimed to ‘tone it down’ a little as you can see below.
There is still an “accept / reject” area underneath the change, but it is only visible for the selected change. With this approach, the interface is cleaner and the line height only needs to change for the line with the active change, keeping things tidier.
But both versions above have a possible limitation. It is possible that future versions will need to display more information about the change. We can easily see why the user would like to see who made the change and when, as this is information that could influence the decision on how to resolve the edit. So, perhaps, we need to think ahead a little. In our next attempt, below, we tried keeping all actions and information in a tooltip.
Even though some complexity is introduced in the above prototype (especially regarding the potential display of multiple tooltips), it introduces an important new possibility: ‘information space’ where we can display additional options or information related to the change. We can also make the tooltip as large or as small as needed and adjust with minimal effort in future iterations.
The fourth and final version was more consistent with the way annotated comments work in the Editoria editor. This breaks the proximity-to-the- change constraint, but perhaps not by too much.
As you can see, instead of displaying the buttons right under the change, we display a tool with “accept” and “reject” icons on the right edge of the document. This is the least intrusive approach that we could imagine. The problem, however, is again, space. There is minimal area to add new elements in the future without using up the area to the right of the document, which is normally reserved for easily readable comments.
Of course, designing in anticipation of possible future features is also, perhaps, a problematic approach but we enjoyed exploring this a little. We would love to hear your thoughts on all of the above. Please join the new Editoria list below and have you say!
The post written by Yannis Barlas. The track changes prototypes created by the Coko Athens team; Yannis, Christos Kokosias and Alexis Georgantas.
UI in the prototypes by Julien Taquet.
Many thanks to Adam and Alex, since their work with UCP is what made this feature come to the table in the first place.