The robots are coming — the promise and peril of AI, some questions

I’m at the Charleston conference, my first time, and we had a panel discussion this morning talking about AI.

On the panel were:

Heather Staines Director of Partnerships, Hypothes.is

Peter Brantley Director of Online Strategy, UC Davis

Elizabeth Caley Chief of Staff, Meta, Chan Zuckerberg Initiative

Ruth Pickering Co-founder and Chief Strategy Officer, Yewno

and myself. It was a pleasure to be on a panel with these amazing people. Continue reading “The robots are coming — the promise and peril of AI, some questions”

I’m hiring.

So, I’m hiring a product manager. We are working on developing new green field products. We have a small, really fun team, and we are using a lot of lean tools to try to get to product-market fit with relativly low risk.

To apply submit a CV using our hiring portal.

Here are some more details about the role:

What we are doing

SAGE is innovating around how we might support social science researchers engaging with big data and new technology. We are creating new services and products that can provide value to the research community while at the same offering new business opportunities for SAGE. This is an opportunity to work on stuff that matters in a fast-paced team focused on using lean principles to rapidly understand the needs of researchers, and to use those insights to develop products and services that will best serve their needs.

Our first product will roll out over the summer, and we are now looking to expand the team to allow us to move faster with the testing and creation of further products. You will play a pivotal role in this effort.

WHAT WILL YOU BE DOING?

How do you move fast in a large organisation? How do you make the right bets to make sure that what you are building is going to be useful to people? We have been experimenting with using a set of tools from lean product development to rapidly iterate on individual products while at the same time balance all of the ideas that crop up across the entire space of opportunities.

We have proven that we can get things done quickly at low risk and we now want to scale this approach out to allow us to look at a wider set of product opportunities.

You will be responsible for creating experiments to test our thinking around an immediate set of three possible products, and beyond that to help us prioritise and test a wide range of other product ideas. Weekly you should be constructing tests that allow us to make decisions around whether to continue to work on a product idea, as well as helping us to refine the ideas that we believe have real potential. You will be working with internal teams as well as partners from some of the world’s most prestigious universities.

We have had success using tools such as the lean product canvas and lean value tree, and supporting our analysis of risks and opportunities by applying pirate metrics to different business models and you will be expected to pick up these tools and be fluent in leading the use of them. We are open to any ideas that bring us rapid insight, and if you find better ways to validate or reject our hypothesis we will be interested in trying out any approach that can help us to get to scale.

The ultimate goal of the role is to help us get to a point where we can bring new products to market.

RESPONSIBILITIES

· Lead on driving experiments around ideas in current development · Help the team make go / no go decisions on these ideas · Coordinate creation of testable prototypes · Run user tests · Be responsible for reporting on outcome of tests, both qualitative and quantitative · Help the team create a strategy for how to build MVPs for successful product ideas · Help the team prioritise further opportunities · Develop business cases for product ideas · Coordinate design sprints where appropriate · Coordinate with other SAGE teams such as marketing, IT and Design

Skills

Successful Product Launch Experience — You will have worked on launching products to market from early ideation through to onboarding customers. You will understand how to translate needs of users into product features, and how to prioritise those features. You will have worked across teams on successful product launches and you will have made significant contributions towards the success of those product launches.

Lean Product Development Experience — At the heart of this role is doing lean experimentation. You must have experience of executing the spirit of lean or agile methodologies. We have adopted a specific set of tools, but underpinning all of these tools is a mindset of experimentation, and getting to data driven decisions. We are looking for someone who exemplifies this spirit and who can bring creativity to bear when faced with uncertainty.

Interpersonal Communication Skills — you will be communicating daily with people both inside and outside of SAGE. Being able to hold courteous, persuasive conversations (by email or phone) and respond quickly to queries are key requirements of the role. You should be able and willing to share your thoughts and experience with the team and feel confident to speak your mind and ask questions.

Prioritizing Workloads — you will be managing a varied workload involving many different tasks. You will need to prioritize effectively and allocate appropriate amounts of time to each task.

Research and Analysis Skills — you will be required to conduct desk research as well as expert and user interviews. Knowledge of social science research methods (both qualitative and quantitative) would be an advantage.

Curiosity and enthusiasm — you will be interested in working with social scientists, excited about the opportunities that digital publishing offers, enthused about research methods and eager to learn about SAGE and the markets we serve.

Futurepub10

This week I attended futurepub10, I love these events, I’ve been to a bunch, and the format of short talks, and lots of time to catchup with people is just great.

A new Cartography of Collaboration — Daniel Hook, CEO Digital Science (work with Ian Calvert).

Digital science have produced a report on collaboration, and this talk was covering one of chapters from that.

I was interested to see what the key takeaways are that you can describe in a five minute talk. This talk looked at what could be inferred around collaboration by looking at co-authors actually using the Overleaf writing tool. It’s clear that there is an increasing amount of information available, and it’s also clear that if you have a collaborative authoring tool you are going to get information that was not previously available by just looking at the publication record.

Daniel confirmed they can look at the likely journals for submission, based on the article templates, how much effort in time and content that each author is providing to the collaboration, how long it takes to go from initial draft to completed manuscript, which manuscripts end up not being completed. There is a real treasure trove of information here. (I wonder if you can call the documents that don’t get completed the dark collaboration graph).

In addition to these pieces of metadata there are the more standard ones, institute, country, subject matter.

In spite of all of the interesting real-time and fine grained data that they have, for the first pass info they looked at the country — country relations. A quick eyeballing shows that the US does not collaborate across country boundaries as much as the EU does. The US is highly collaborative within the US.

Looking at the country to country collaboration stats for countries in the EU I’d love to see what that looks like scaled per researcher rather than weighted by researchers per country, are there any countries that are punching above their weight per capita?

In the US when you look at the State to State relations California represents a superstate in terms of collaboration. South Carolina is the least collaborative!!

The measures of centrality in the report is based on document numbers related to collaborations.

Question Time!

The data that generates the report is updated in real time, but it seems like they don’t track it in real time yet. (It seems to me that this would really come down to a cost benefit analysis, until you have a key set of things that you want to know about this data you probably don’t need to look at real time updates.). Daniel mentions that they might be able to begin to look at the characteristic time scale to complete a collaboration within different disciplines.

In terms of surprise there was the expectation in the US that collaboration would be more regional than they saw (my guess is that a lot of the national level collaboration is determined by centres of excellence for different research areas, a lot driven by Ivy League).

Someone asks if these maps can be broken out by subject area. It seems that it’s probable that they can get this data, but the fields will be biased around the core fields that are using by Overleaf.

This leads to an interesting question, how many users within a discipline do you need to get to get representative coverage for a field (when I was at Mendeley I recall we were excited to find that the number might be in the single digit percentages, but I can’t recall if that still holds any more, nor why it might.).

Someone asks about the collaboration quality of individual authors. Daniel mentions that this is a tricky question, owing to user privacy. They were clear that they had to create a report the didn’t expose any personally identifiable information.

Comment

I think that they are sitting on a really interesting source of information, and for any organisation to have information at this level, especially with the promise of real time updates, that’s quite exciting, however I’m not convinced that there is much extra information here than you would get by just looking at the collaboration graphs based on the published literature. This is what I’d love to see, can you evidence that the information you get from looking at real time authoring is substantively different than what you would get by mining the open literature? Doing this kind of real time analysis is probably only going to happen if Overleaf see a direct need to understand their user base in that way, and doing that is always going to need to be traded off against other development opportunities. Perhaps if they can find a way to cleanly anonymise some of this info, they could put it into the public domain and allow other researchers to have a shot at finding interesting trends?

The other papers in the report also look interesting and I’m looking forward to reading through them. The network visualisations are stunning and I’m guessing that they used gephi to derive them.

Open Engagement and Quality Incentives in Peer Review, Janne Tuomas Seppänen, founder of Peerage of Science. @JanneSeppanen

Peerage of science provides a platform to allow researchers to get feedback on their manuscripts from others (reviewing) before submission, and allows them to get feedback on how useful their reviews are to others. A number of journals participate to allow easy submission of a manuscript along with review for consideration for publication.

Janne is emphasising that the quality of the peer review that is generated in his system is high. These reviews are also peer evaluated, on a section by section base.

Reviewers need to provide feedback to each other. This is a new element to the system, and according to Janne the introduction of this new section in their system has not negatively affected the time to complete the review by any significant factor.

75% of manuscripts submitted to their system end up eventually published. 32% are published directly in the journals that are part of the system. 27% are exported to non-participating journals.

Questions

The reason why people take part in reviewing is that they can get a profile on how good their reviews are from their colleagues, building up their reviewing profile.

Is there any evidence that the reviews actually improve the paper? The process always involves revisions on the paper, but there is no suggestion that there is direct evidence that this improves the paper.

Comment

Really, anything that helps to improve the nature of peer review has to be welcomed. I remember when this service first launched, and I was skeptical back then, but they are still going, and that’s great. In the talk I didn’t catch how much volume they are processing. I’m keen to see many experiments like this one come to fruition.

Discover what’s been missing, Vicky Hampshire, Yenow

Yenow uses machine learning to extract concepts from a corpus, and then provides a nifty interface to show people the correlation between concepts. These correlations are presented as a concept graph, and the suggestion is that this is a nice way to explore a space. Specific snippets of content are returned to the searcher, so this can be used as a literature review tool.

I had the pleasure of spending an hour last week at their headquarters in Redwood California having a look at the system in detail, and I’ll throw in some general thoughts at the bottom of this section. It was nice to see it all presented in a five minute pitch too. They do no human curating of the content.

They incorporated in 2014, is now based in California, but the technology was created in Kings in London. As I understand it the core technology was originally used in the drug discovery realm and one of their early advisors Mike Keller had a role in alerting them to the potential for this technology in the academic search space.

The service is available through institutional subscription and it’s been deployed at a number of institutions such as Berkeley, Stanford and the state library of Bavaria (where you can try it out for yourself.)

To date they have indexed 100M items of text and they have extracted about 30M concepts.

Questions

Are they looking at institutions and authors? These are things that are on their roadmap, but they have other languages higher up in their priorities. They system won’t do translation, but they are looking for cross-language concept identification. They are interested in using the technology to identify images and videos.

They do capture search queries, and they have a real time dashboard for their customers to see what searchers are being made. They also make this available for publishing partners. This information is not yet available to researchers who are searching.

They are also working on auto-tagging content with concepts, and there is a product in development for publishers to help them auto-categorise their corpus.

They are asked what graph database they are using. They are using DynamoDB and elasticsearch, but Vicky mentioned that the underlying infrastructure is mostly off the shelf, and the key things are the algorithms that they are applying.

At the moment there is no API, the interface is only available to subscribing institutions. The publisher system that they are developing is planned to have an API.

Comment

There is a lot to unpack here. The scholarly kitchen recently had a nice overview of services that are assembling all of the scholarly content, and I think there is something here of great importance for the future of the industry, but what that is is not totally clear to me yet.

I’m aware of conversations that have been going on for some years now about wanting to see the proof of the value of open access through the development of great tools on top of open content, and as we get more and more open access content the collection of all of that content into one location for further analysis should become easier and easier, however yenow, along with other services like meta and google scholar, have been building out by working on access agreements with publishers. It’s clear that the creation of tools built on top of everything is not dependent on all of the content being open, it’s dependent on the service you are providing being not perceived as threatening to the business model of publishers.

That puts limits on the nature of the services that we can construct from this strategy of content partnerships. It’s also the case that for every organisation that wants to try to create a service like this, they have to go through the process of setting up agreements individually, and this probably creates a barrier to innovation.

Up until now many of the kinds of services that have been built in this way have been discovery or search services, and I think publishers are quite comfortable with that approach, but as we start to integrate machine learning, and increase the sophistication of what can be accomplished on top of the literature, will that have the potential to erode the perceived value of publisher as a destination? Will that be a driver to accelerate the unbundling of the services that publishers provide. In the current world I may use an intermediate search service to find the content that may interest me, and then engage with that content at the publisher site. In a near future world if I create a natural language interface into the concept map, perhaps I’ll just ask the search engine for my answer directly. Indeed I may ask the search engine to tell me what I ought to be asking for. Owing to the fact that I don’t have full overview of the literature I’m not in a position to know what to ask for myself, so I’ll rely on being told. In those scenarios we continue to disrupt the already tenuous relationship between reader and publisher.

There are some other interesting things to think about too. How many different AI representations of the literature should be hope for? Would one be just too black boxed to be reliable? How may we determine reproducibility of search results? how can we ensure representation of correlations that are not just defined by the implicit biases of the algorithm? should we give the reader algorithmic choice? Should there be algorithmic accountability? Will query results be dependent on the order in which the AI reads the literature? Many many many interesting questions.

The move to do this without any human curation is a bold one. Other people in this space hold the opinion that this approach currently has natural limits, but it’s clear that the Yenow folk don’t see it that way. I don’t know how to test that, but maybe as searches on the platform become more focussed, that’s the moment where those differences could come to light.

I do have some comments on the product itself. I spent a little time today using the demo site available from the state library of Bavaria. It strikes me that I would quite like to be able to choose my own relevance criteria so that I can have a more exploratory relationship with the results. I did find a few interesting connections through querying against some topics that I was recently interested in, but I had the itch to want to be able to peel back the algorithm to try to understand how the concepts were generated. It’s possible that this kind of search angst was something that I experience years ago with keyword search, but that years of practice have beaten the inquisitiveness out of me, but for now that is definitely something that I noticed while using this concept map, almost a desire to know what lies in the spaces between the connections.

At the moment they are looking to sell a subscription into libraries. It’s almost certain that this won’t totally replace current search interfaces (that sentence might come back to haunt me!). The challenge they face in this space is that they are Yet Another Discovery Interface, and people using these tools probably don’t invest a huge amount of time learning their intricacies. On the other hand the subscription model can be monetized immediately, and you don’t have to compete with Google head to head.

On a minor note looking at their interface there is an option to sign in, but It’s not clear to me why I should. I imagine that it might save my searches, that it might provide the opportunity for me to subscribe to some kind of updating service, but I just can’t tell from the sign up page.

CrossRef Event Data — Joe Wass — @JoeWass

By this stage in the evening the heat was rising in the room, and the jet lag was beginning to kick in, so my notes start to thin out a lot. Joe presented some updates on the CrossRef event data service. It was great to see it live, and I’d love to see it being incorporated into things like altmetric. Perhaps they need a bounty for encouraging people to build some apps on top of this data store?

At the moment they are generating about 10k events per day. They have about 0.5M events in total.

They provide the data as CC0, and for every event in the data store they give a full audit trail

Musicians and Scientists — Eva Amson — @easternblot

Eva gave a beautiful little talk about the relationship between scientists and musicians, and that there are a disproportionally high number of scientists who play instruments than in the general population. She has been collecting stories for a number of years now and the overlap between these two activities is striking. You can read more about the project on her site and you can catch Eva playing with http://www.londoneuphonia.com on Saturday at St Paul’s Church Knightsbridge.

PLOS are looking for a new CEO

So I hear that PLOS are looking for a new CEO. They are making the process fairly open, so if you are interested you can read more here.

I got to thinking about some of the challenges and opportunities facing PLOS over the weekend. Over the years I’ve gotten to know a lot of PLOS folk, and I think it’s an amazing organisation. It has proved the viability of open access, and their business model is being copied by a lot of other publishers. At the same time they have had a fairly high frequency of turn over of senior staff in the last couple of years. So what are the likely challenges that a new CEO will face, and what should they do about them? (Time for some armchair CEO’ing).

The condensed view of PLOS’s mission that they want to to accelerate progress in science and medicine. At the heart of their mission is the belief that knowledge is a pubic good, and leading on from that, that the means for transmitting that knowledge should also be a public good (specifically research papers).

It was founded in 2001 by three visionaries, and it was configured to be a transformational organisation that could catalyse radical change in the way that knowledge is created and disseminated, initially in particular in contrast to the subscription model for distributing scholarly content.

Since launching PLOS has found massive success with the introduction of PLOS one, currently the largest journal in the world. That rapid growth led to a period of significant scaling and adjustment for the organisation, where it had to keep running at full pace in order to stay just about on top of the flood of manuscripts that were coming its way. This also created a big revenue driver for the organisation that has led to PLOS one being the engine that drives the rest of the PLOS.

So now we have the strategic crux facing any incoming CEO. The organisation has an obligation to be radical in it’s approach to further it’s mission, but at the same time the engine that drives the organisation operates as such scale that changes to the way it works introduce systemic risks to the whole organisation. You also have to factor in that the basic business model of PLOS one is non defensible, and market share is being eroded by new entrants, in particular Nature Communications, so it is likely that no changes also represents a risky strategy.

So what to do?

There are probably many routes to take, and there are certainly a large number of ongoing activities that PLOS is engaged in as part of the natural practice of any organisation. I think the following perspectives might have some bearing on where to go. As with any advice, it’s much easier to throw ideas across the wall when you don’t have any responsibility for them, but I’m going to do it anyway in the full awareness that much of what I say below might not actually be useful at all.

Changing PLOS does not change scientists

PLOS has shown that Open Access can succeed, and it’s existence has been critical to confirm the desire of researchers who want to research conducted as an open enterprise. That has allowed those researchers to advocate for something real, rather than something imagined. However, there remain a large number of researchers for whom the constraints of the rewards system they operate under outweigh any interest they may have in open science. I think it is important to recognise that no matter what changes PLOS introduces, those changes on their own will not be sufficient to change the behaviour of all (or even of a majority) of researchers. Being able to show plausible alternatives to the existing system is important, but it is also important to continue to work closely with other key actors in the ecosystem to try to advance systemic change. What that tells me is that the bets that PLOS ought to take on to create change do have to be weighed against their likelihood to affect all researchers, and the risks they introduce to the current business model of PLOS.

On the other hand you do want to progressively make it possible for people to be more open in how they conduct science. We talked a lot at eLife about supporting good behaviours, and you could imagine using pricing or speed mechanisms as a way of driving that change (e.g. lower costs for publishing articles that have been placed on a preprint server, for instance). One does have to be careful with pricing in academic circles as usually costs to publication are rarely a factor in the decision of an academic around where to publish, but generally I’m in favour of providing potentially different routes through a product to different users, and making the routes that promote the behaviours I support be easier/cheaper. (Github do this brilliantly by make open code repositories free to host, and only making you pay if you want to keep your code private).

How do you balance risk?

One of the things that is consistent in innovation is that we mostly don’t know what is going to succeed. I expect that the success of PLOS one probably took PLOS by surprise. It was a small change to an existing process, but it had a dramatic effect on the organisation.

It seems to me that what you want to do is to have a fair number of bets in play. If we accept that we mostly won’t know what is going to succeed in the first place, then the key thing is to have a sufficient number of bets in place that you get coverage over the landscape of possibilities, and you iterate and iterate and iterate on the ones that start working well, and you have the resolve to close down the ones that are either making no progress or are getting stuck in local minima.

Product Horizons

I like the idea of creating a portfolio of product ideas around the three horizons principle. There are lots of ways of determining if your bets are paying off. One of the things that I think PLOS needs to do is to ensure that at least a certain minimum of it’s financial base is being directed towards this level of innovation.

I don’t think that is a problem at all for the organisation in terms of creating tools like ALM and their new submissions and peer review system, but I’m not clear on whether they have being doing this strategically across all of the bases where they want to have an impact. That’s not an easy thing to do, balancing ongoing work, new ideas, being disciplined to move on, being disciplined enough to keep going with the realisation that real success sometimes takes you by surprise.

PLOS may need diversification

As I referred to above, the business model of PLOS, as it’s currently configured, is not easily defensible. Many other publishers have created open access journals with publishing criteria based on solidity of the science rather than impact. The Nature branded version of this is now attracting a huge number of papers (one imagines driven by the brand overflow from the main Nature titles). This speaks to me that there is some value in looking at diversifying the revenue streams that PLOS generates. This could be around further services to authors, to funders or to other actors in the current scholarly ecosystem. Here are three ways to potentially look at the market.

One; what will the future flow of research papers look like, how does one capture an increasing share of that? Will increased efficiencies of time to publication, and improved services around the manuscript be sufficient, how might the peer review system be modified to make authors happier.

Two; ask how will funding flow to support data and code publishing, will there be funding for creating new systems for assessment? Can any services that benefit PLOS be extended to benefit others in the same way?

Three; if you are creating platforms and systems that can be flexible and support the existing scale of PLOS, what might the marginal investment be to extend those platforms so that others could use them (societies, small groups of academics that want to self-publish, national bodies or organisations from emerging research markets).

The key here is not to suggest that PLOS has to change for it’s own sake, but rather to be clear about exploring these kinds of options strategically. It might be that you can create streams of revenue that make innovation be self-supporting, it might be that you hit on a way to upend the APC model. These efforts could be seen as investment in case the existing driver of revenue continues to come under increasing pressure in the future.

Ultimately you want to build a sustainable engine for innovation.

Who does all of the work?

In the end all of the work is done by real people, and the key thing any new CEO is going to have to do is to bring a clarity of purpose, and to support the staff who are in the thick of things. What I’ve seen cause the most dissatisfaction in staff (aside from micromanagement — a plague on the houses of all micro-mangers), is a lack of ability to ship. This usually comes down to one of two causes, either priorities chance too quickly, or unrealistic deadlines are set that lead to the introduction of technical debt, that causes delays in shipping. It’s key to try to identify bottlenecks in the organisation, and (as contradictory as it might sound) to try to create slack in people’s schedules to allow for true creative work to happen.

If everyone is going open access why should PLOS exist, has it now succeeded in some way?

Given that almost all new journal launches are now open access journal launches, has PLOS effectively won? Could the existing PLOS as it exists essentially go away? I think within one area of how we get to an open research ecosystem that might actually be true, however that only speaks to access to the published literature. Open science requires so much more than that. It needs transparency around review, efficiency in getting results into the hands of those who need them, data and code that are actionable and reusable, a funding system that abandons it’s search for the chimera of impact, an authoring system that is immediately interoperable with how we read on the web today.

So what to do with PLOS as it’s currently configured? I see the current PLOS, with it’s success, as being an opportunity to generate the revenues to continue to explore and innovate in these other areas, but I think that the current system should be protected to ensure that this is possible.

In the end of the day, what does a CEO do?

I can’t remember where I read it now, but one post from a few years back struck me as quite insightful. It said that a CEO has three jobs:

  • make sure the lights stay on
  • set the vision for the organisation
  • ensure that the best people are being hired, and supported

PLOS is in a great position at the moment. It has a business model that is working right now, and is operating at a scale that gives any incoming CEO a good bit of room to work with. It’s a truly vision led organisation, whose ultimate goal is one that can benefit all of society. It has great great people working for it.

I don’t think that the job is in anyway going to be a gimme, but it’s got to be one of the most interesting challenges out there in the publishing / open science landscape at the moment.

what we mean when we talk about preprints

Cameron Neylon, Damian Pattinson, Geoffrey Bilder, and Jennifer Lin have just posted a cracker of a preprint onto biorxiv.

On the origin of nonequivalent states: how we can talk about preprints

Increasingly, preprints are at the center of conversations across the research ecosystem. But disagreements remain about the role they play. Do they “count” for research assessment? Is it ok to post preprints in more than one place? In this paper, we argue that these discussions often conflate two separate issues, the history of the manuscript and the status granted it by different communities. In this paper, we propose a new model that distinguishes the characteristics of the object, its “state”, from the subjective “standing” granted to it by different communities. This provides a way to discuss the difference in practices between communities, which will deliver more productive conversations and facilitate negotiation on how to collectively improve the process of scholarly communications not only for preprints but other forms of scholarly contributions.

The opening paragraphs are a treat to read, and provide a simple illustration of a complex issue. They offer a model of state and standing, that provides a clean way of talking about what we mean when we talk about preprints.

There are a couple of illustrations in the paper of how this model applies to different fields, in particular, physics, biology, and economics.

I think it would be wonderful to extend this work to look at transitions in the state/standing model within disciplines over time. I suspect that we are in the middle of a transition in biology at the moment.

Data, What is it Good For? Success…that’s what

Our world is driven by data. You may THINK that you decided to buy a new blue sweater at the mall last Thursday, but in fact the retailer analyzed data that drove marketing that ultimately lead you to purchase that sweater at the specific time and place that you bought it. Like it or not, it’s a fact of (business) life.

Data drives every aspect of other businesses, and of course we use it in ours as well after all, scholarly publishing is as much a business as it is a service to science. But are we using data enough; do we have the right data; and are we looking at the results in a meaningful way?

Last week I attended the ALPSP conference in London where I heard some really great talks on the topic of data – not only in terms of how we as publishers manage the published research data, but data ABOUT the research we publish. One panel in particular spoke at length on the topic, making points such as, publishers should have a clear line of sight on the data surrounding every aspect of their program, otherwise they are doing a disservice to the science they serve. Bingo!

Screen Shot 2016-09-22 at 2.38.17 PM.png

The entire time I had to restrain myself from leaping out of my chair and shouting, “this is what I’ve been trying to tell you people!” But it was a scholarly publishing meeting, so I didn’t.

Then earlier this week I spoke at the PSP journals seminar/webinar on the topic of data and the importance of understanding the performance of  journal content now, while it’s relevant. A publisher that bases editorial decisions on their Journal Impact Factor is acting too late. Predictive analysis from current, relevant, accurate data gives the insight necessary to make fact-based decisions now to change the course of impending performance and ultimately affect the Impact Factor.

Goodhart’s Law tells us, “When a measure becomes a target, it ceases to be a good measure.”

Hooray! That’s what we want. The Impact Factor is shamefully based on old data. It’s almost taunting us to affect it.

And of course, publishers are slaves to the measure. Impact Factor. All of us scurry to see the results every summer. Are we up? How much? Down? Oh no. The Impact Factor drives so many important things about our publishing business. Institutions require faculty to publish in a journal with a minimum IF, therefore journals that are at or near that threshold (usually five) are keenly attuned to the importance of staying above that very important line. Dip below and they’ll not get the number of highly-citable papers, which means they’ll publish papers that won’t get as highly-cited, which means their IF will continue to drop. It’s a spiral effect that’s hard to escape.

Think of it another way: the likelihood of certain data outcomes is described by data. But each individual outcome just has a probability of fitting the model, or not. So the data doesn’t drive the decision, but describes it. And what businesses can do is understand the aggregate and then anticipate it. So using the IF is a little like looking at last year’s fashion sales to predict next year’s. The years probably DO have something to do with each other, but the model based on sales might only predict 40% of the outcome next year. Suddenly people wear acid-washed denim again. Whatever did I do with my prairie skirts?!

Look at the data analytics to observe trends. Monitor the citation, sharing, and usage performance of the content you published in very recent months. Use this insight to make critical determinations about how to best adjust your publishing program to maximize citations. Besides the fact that you should know all of this information and have a tool that organizes it any way you want and is flexible enough to show the data sliced and diced in the precise manner you need, the other reason you should use rich data analytics is … your competitor is doing it and they’re going to beat you in the market for papers if you don’t

A final word to remember: A publishing program will only flourish if it is run with an eye towards success – both in terms of editorial and revenue performance. Our market is experiencing many unexpected changes:

  • Hosting platform vendors are being acquired by their customers’ competitor.
  • Self-publishing associations are unexpectedly moving to commercial publishing.
  • OA journals are rising in Impact Factors and new challengers are rapidly taking over the long-time leaders.

What these changes tell us is that decisions about your publishing business need to be made using the richest depth of information possible. Ignore this at your peril.

 

 

 

 

The Conversation: Research and Scholarly Publishing in the Age of Big Data #alpsp16

Ziyad Marar, Global Publishing Director at SAGE Publishing chaired the opening plenary at the 2016 ALPSP Conference. He was joined by his colleague, Ian Mulvany, who is SAGE’s Head of Product Innovation and Francine Bennett, CEO and co founder of big data consultancy Mastodon C. They discussed how data is to the 21st century what oil was to the 20th Century and this has major implications for researchers and publishers alike. Information of all kinds is now being produced, collected, and analyzed at unprecedented speed, breadth, depth and scale. The big data revolution promises to ask and answer fundamental questions about individuals and collectives, but large datasets alone will not solve major social or scientific problems.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑