Data are out. Start analyzing. But beware.

This post was originally published on this site


Now that we have our data set on research tool usage out and shared the graphical dashboard, let the analysis start! We hope people around the world will find the data interesting and useful.

If you are going to do in depth analyses, make sure to read our article on the survey backgrounds and methods. It helps you understand the type of sampling we used and the resulting response distributions. It also explains the differences between the raw and cleaned data sets.

For more user friendly insights, you can use the graphical dashboard made in Silk. It is easy to use, but still allows for quite sophisticated filtering and even supports filtering answers to one question by answers given to another question. Please be kind on Silk: it crunches a lot of data and may sometimes need a few seconds to render the charts.

example chart with filter options

Example chart that also shows filter options in the dashboard

When looking at the charts and when carrying out your analyses, please note two things.

First, whatever you are going to do, make sure to reckon with the fundamental difference between results from preset answers (entered by simply clicking an image) and those from specifications of other tools used (entered by typing the  tool names manually). The latter are quite probably an underestimation and thus cannot be readily compared with the former. [Update 20160501: This is inherent to the differences between open and closed questions, of which ease of answering the question is one aspect. Specifications of ‘others’ can be seen as an open question]. This is why we present them separately in the dashboard  Integrated lists of these two types of results, if made at all, should be accompanied with the necessary caveats.

Frequency distribution of survey answers

Frequency distribution of 7 preset answers (dark blue) and the first 7 ‘other’ tools (light blue) per survey question

Second, basic statistics tells that when you apply filters, the absolute numbers in some cases can become so low as to render the results unfit for any generalization. And the other way around: when not filtering, please note that usage patterns will vary according to research role, field, country etc. Also, our sample was self-selected and thus not necessarily representative.

Now that we are aware of these two limitations, nothing stops you (and us) to dive in.

Our own priorities, time permitting, are to look at which tools are used together across research activities and why that is, concentration ratios of tools used for the various research activities, and combining these usage data with data on the tools themselves like age, origin, business model etc. More in general, we want to investigate what tool usage says about the way researchers shape their workflow: do they choose tools to make their work more efficient, open and/or reproducible?  We also plan to do a more qualitative analysis of the thousands of answers people gave to the question what they see as the most important development in scholarly communication.

By the way, we’d love to get your feedback and learn what you are using these data for, whether it is research, evaluation and planning of services or something else still. Just mail us, or leave your reply here or on any open commenting/peer review platform!

Comments are closed.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑