Presentation, visualization & data manipulation tools – add machine learning & stir?

This post was originally published on this site


I haven’t done a “tools” post in a while, so this blog post will just be a whirlwind introductory tour of tools and applications I have explored recently and my thoughts on them.

The interesting trend that links many (but not all) of these apps and services below together is that a lot of them are starting to embed machine learning into their feature sets.

From the automatic powerpoint designs that Powerpoint 2016 recommends when you drag in photos, to Google sheets automatically suggesting visualizations based on your dataset, to data wranglers/cleaning tools trying to guess what transformations you want to make based on what you highlight , machine learning is indeed coming!

The tools I will cover will fall into the following categories.

 

Presentation tools
  • PowerPoint 2016 – designer and Morph feature

 

  • Office Sway

 

  • Office Mix

 

Visualization tools 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 
Data cleaning tools
  • Openrefine add-ons – (VIB-BITS diff )
  • Trifacta Wrangler
  • Talend data preparation
  • Excel add-on (fuzzy match and power query) + Google doc (add-ins)
  • Bash/sed

 

Machine learning tools
  • R rattle
  • Rapid Miner
  • Weka
  • Knime
  • Orange

Due to the length of the post, I will split this blog post into two, with the first one focusing on presentation tools and Visualization tools. A latter post will consider data cleaning or data wrangling and machine learning tools.

 

Presentation tools

Powerpoint 2016 – needs Office 365 subscription

Like many academic institutions, we currently have a Office 365 subscription and this gives us access to the latest Office apps. Still, I had given up keeping up with the new features in upgraded versions of Office because I believed Microsoft was just adding increasing exotic features that normal users like me would never use.
Things seem to be changing though, and Microsoft has been recently adding features that look pretty amazing to me, thanks to machine learning.
Ned Potter’s 5 easy ways to create fabulous slides suggests there are 5 different styles in Powerpoint slides. Style 4 and 5, where there is a background image overlaid with text area (either opaque background or transparent) is a style that probably looks the most professional looking but is pretty tiresome to do on powerpoint.
You have to drag in the background picture, resize it to the size of your side. Drag in a text area or shape. Figure out the best position for that as well as the best color background and text color to complement the background and then move it to the foreground.
Doable? Definitely, but quite a few clicks.
What if you could drag just a photo into powerpoint and it could just figure out a couple of reasonable designs and let you pick? This is exactly what powerpoint 2016 can do if you have Office 365.
In the example below, I chose a blank template (it works with some but not all templates), copy and pasted a photo of my library building into the slides and then went to Design -> Design ideas and lo and behold you see the system automatically generate suggested layouts.
Drag one photo into blank template and design ideas appear on the right
Pretty good right? You can also try by adding more than one photo to see what happens and/or change the layout to “Title slide”, “section header” etc.
For additional photos you can also insert using online pictures using Bing (nice touch is by default it shows only creative commons photos), Onedrive , Facebook and surprisingly Flickr.
Auto-generated designs with 2 photos
You can add even more photos, but so far I’ve  only seen designs generated when there are maximum 4 photos.
After dragging in 4 photos into blank slide – this is a nice suggested design 

It’s interesting to note that the competition, Google slides offers a similar feature by using the “Explore” function.

 

Google Slides – “explore” – Automatically creates layouts

The other nice feature that caught my eye in the new powerpoint is a new type of transition called “morph”. Allows you to smoothly add animations by duplicating slides and making slight differences, morphing will smoothing transition to the altered slide.

 

There are many more nice features like new chart visualizations for sunburst, waterfall, box and whisker, combo charts etc but these are just bonuses.

Sway

Tired of normal powerpoint? The next two tools are also from Microsoft. The first is Microsoft Sway

There has been no shortage of apps and services trying to disrupt Powerpoint style of presentations. There are some that try to be radically different like Prezi while most are more like web based versions of Powerpoint with collaboration and easier importing of content from to web services like Youtube, Twitter and other web content sources online to generate web-based presentations. Google sheets of course is the paradigm example.

These days though native Powerpoint has most of these features already so it’s curious Microsoft has experimented with a tool called Microsoft Sway that lies somewhere in-between those two types.

Remix button automatically creates new layouts based on what you have

Microsoft Sway is described as a “digital story telling tool” but is closer to the latter than the former.

In terms of format layout it’s more flexible than traditional power point allowing you to create presentations that scroll horizontally, vertically or  in slideshow mode.  You can search for images, videos or other content via Onedrive, Bing, Flickr, Youtube etc.

Instead of slides, Sway has “cards” , which can be grouped in many ways for display such as stacking them, comparing them, grid and slideshow style etc.

 

Some ways to display/arrange cards in Sway

So far, none of this is really ground breaking.

The somewhat more interesting parts is the about to click “remix” and the style will try to remix your slides in other styles. It does a good attempt but I prefer Powerpoint 2016’s “Design ideas” feature.

Similar to the latest powerpoint, you can also type in a topic and it will try to generate a presentation for you. It seems heavily depend on using Wikipedia to do so. Below shows an outline created for the topic search for my institution.

Notice Sway, gives me suggestions for what to search e.g. it knows the president’s name and suggests what to search and also generates suitable images and videos to use.

Not super ground-breaking but niffy if you are new at research.

Office Mix

This is a traditional powerpoint add-on that allows you to add voiceovers and quizzes, videos etc. You can  then upload the  presentation online and obtain  analytics of users viewing the presentation. Probably a good bet if you want a less steep learning curve.

 

Visualization tools

Visualization or “Dataviz” is all the rage these days. I understand the experts tend to use d3.js javascript library or R libraries to generate flexible and fancy visualizations but this may be a bit too much if you aren’t into coding (but also see later post about GUIs for R).

Hence the rise of data visualization desktop tools , that are reaching a level where you can create quite amazing visualizations with little or no coding skills.

A lot of these are also starting  to embed machine smarts…

 

Tableau

I’ve spent quite a lot of time in the past two years looking at visualization tools. This has become a booming industry in recent years but the one that is perhaps most well known and established is Tableau.
I have to admit, it’s currently my favorite as I spend so much time with it. Many students I work with also favour it probably because of its generous licensing to students.
Some visualizations with Tableau 

Tableau is an extremely powerful tool, and I probably only use the bare minimum of what it can do and I’m constantly surprised by the tool’s growing capabilities.

Tableau touts “self service analytics”, which allows users to extract data and helps them “self serve” data in any way they like. The “show me” function is perhaps one of the nicer functions in Tableau as it suggests visualizations based on the type of data fields you have. I admit I tend to explore data by semi-randomly selecting data fields and look at what the “show me” function recommends, or it would remind me the types of data fields I need to generate say a “packed bubble” visualization or a “heat map”.

Show me feature – recommends visualizations

It works with a wide variety of file formats as well as it seems with pretty much any commonly used servers.

Servers & Services supported by Tableau 
While Tableau is mostly for visualization and not statistical analysis, I was surprised to realise you could use it to draw trend lines (regression with p values and betas calculated) and recently, version 10.0 added the ability for forecasting and as at time of writing I’m waiting for 10.3 to be released which allows you to connect to PDFs to extract data and to Dropbox.

Want to do more statistical analysis? Tableau integrates with R (a popular programming language for data analysis and Machine learning), which is helpful for me because I have started to learn R lately.

With such a large user base, the Tableau community is very active and I find my questions posted to the forums tend to get answered in a day or less.

If you are playing with Tableau , there’s a free public version. Be careful though, this version of Tableau requires that you store your data on the web, so if it’s sensitive data you don’t want to be out on the net, don’t use this.

If you work in an academic institution, you may have access to Tableau desktop via an academic option. Like SAS and SPSS, they may have realized the best way to build brand loyalty is to give away their software to students at educational institutions.

Microsoft Power BI Desktop

Microsoft’s entry to the data visualization software tool is named Power BI desktop. It’s the newest of the 3 big names (the others are Tableau and Qlik) and as such Power BI desktop has a lot of ground to catch up.

However Microsoft does have quite a few advantages and at the time I tried it a year ago it was quite raw, but it is ramping up quickly.

Compared to Tableau’s free Tableau Public, Microsoft’s Power BI desktop is more generous. You can save the data on your desktop and do not need to save it on the web.

Also some may find Power BI desktop’s more familiar to use given it’s Microsoft roots. But over time, I found Tableau desktop was seemingly more flexible and more powerful in what it could do but this may reflect my lack of experience with Power BI. Also with Microsoft quickly adding update features every month , the feature gap is closing fast.

One thing nice about Power BI is that you can easily add new snazzy visualizations by going to custom visualizations gallery and download new visualizations.

For example, heard or played with the impressive Sand Dance visualization?  Want that in Power BI? You can.

Just go to the office store for Power BI visualizations, search and add the sand dance visualization.

 

 Custom Sand Dance visualizations in Power BI

Qlik Sense desktop

Completing the big 3 is Qlik’s Qlik Sense Desktop, I have the least “sense” of this one though my institution does use Qlikview also by Qlik for our dashboards. What is the difference between the two products?
It seems in the field of visual analytics tools, there are two classes of products. The first, older type of product focused on creating dashboards , also known as Business intelligence. They tended to require coding to create and whatever visualizations in the dashboard was first agreed upon and then coded. Typically used to track KPIs, this class of product did not encourage data exploration beyond the predetermined visualizations by ordinary end users.
The second type of product focused on helping end-users explore data on their own. Typically employing point and click / drag and drop to create visualizations, they enabled ordinary end-users to explore the data in whatever way they wanted.
The lines between the two can be blurred of course, for example Tableau which is best known for empowering end-users can also be used for company wide dashboards. Typically the former might be made free (or a lite version might be), but a much more pricey server enterprise version would also be sold.

As I said, my experience with this tool is extremely limited. But it does seem capable and could be a worthy tool.

Raw Graphs

Two years ago , I was in a mailing list on LibQual and I saw an email from a Chandler Christoffel sharing his interesting visualization of comments required in LibQual.

This was my first introduction to the free open source tool – Raw Graphs, and with the kind help of Chandler, I managed to do something similar for my own comments.

 

Visualization using Raw Graphs of Libqual+ comments

So what is Raw Graphs? It is a free web-based open source tool that is capable of generating 21 less commonly seen visualizations (read not doable in default Excel).

Default visualizations available in RAWgraphs



It’s a pretty simple tool, upload your data, select the visualization you want and then add the correct fields to labels, colors, size etc needed to generate the visualization you need.

For the visualization I selected, the system knows the fields I need to specify

You can do further customization if you like, but if not it generates visualizations in SVG format that you can edit in photoshop.

It’s main strength is that it’s opensource and free, hosted on github, you can easily fork it to make further changes. Even though it is web-based, there is no server side operations, so all operations is done in your browser so it is suitable for sensitive data. (You can test by turning off internet, or install your own version locally).
For advanced users who know d3.js and coding, you can even add more new custom graphs.

Gephi – network visualization tool

My familiarity with network visualization tools is far less than with other tools, but for a long while I was aware that Gephi is a very popular one. For Excel there is NodeXL but I haven’t tried that yet.

For librarians, when we think about network visualizations the obvious one is visualizing bibliometric networks from data we extract from Web of Science or Scopus.

It isn’t particularly easy to do this with Gephi, but following the instructions here, I managed my first author-keyword network using papers generated by authors in my institutions.

 

Obviously, this needs a lot of work but it does show nodes of popular keywords (red) and authors who publish papers with those keywords (green)

The key seems to be this though. While it’s possible to use Gephi directly on Scopus data (see this tutorial for example), perhaps easier would be to use the free online tools at Sciencescape to generate the network file to be used in Gephi.

 

You will still need to learn to use Gephi, to setup labels, colors, size of nodes etc, but at least you already get the network all setup without needing to wrangle the data fields into what Gephi needs.

VOSviewer

While Gephi is a very flexible tool for visualization of all sorts of networks, is there a specific tool that is designed for citation analysis?
In fact, there are several options for handling such analysis including CitNetExplorer by CWTS (to show changes across time) and CNS’s Sci2 which while isn’t just for citation analysis is “specifically designed for the study of science” and among it’s sample workflows lists the use bibliometrics network workflows.  You can use R scripts to handle Web of Science data or use the NAILS interface (Network analysis for Information Literature studies)

Still I find that if  you want is something simple, CWTS’s VOSviewer  seems to be the ticket.

I personally find VOSviewer fairly easy to use, as easy as such tools can be anyway.

Obviously you will need to know what terms like bibliographic coupling, link strength etc mean, but VOSviewer keeps things as simple as possible.

It first asks you the mapping group you are going to take.

Most of the time you will choose the second option to create a co-authorship, keyword co-occurrence, citation, bibliographic coupling, or co-citation map. You are unlikely to use the first option unless you already have citation data converted to a network file through other means.
Some types of maps you can create using Vosviewer
Less often is when you take the third option to create a map based on text data (e.g. analysis of abstracts and titles). The tool is flexible enough to accept WOS/SCOPUS/RIS/Pubmed data.
Co-occurance map of keywords from Scopus data generated from my institution
 
I didn’t have much problems learning and experimenting by changing options (colors, labels, size of nodes etc) to see what it could do, so it seems a very good mix of flexibility and ease of use, if all you want to do is bibliometric networks.

Google Sheet and explore function

Google which has redefined themselves as a machine learning company and is snapping up AI startups left and right is obviously working a big player in machine learning. As such, they have started to embed the fruits of machine learning into their services, with Microsoft, Facebook, Amazon not far behind.
It will probably require a couple of posts to show all the places Google have started embedded machine learning but for today’s purposes let me just show you one interesting trick in Google Sheet.
According to Gartner, the next generation visualization tools will start to offer natural language queries to automatically generate analysis and visualizations and auto narration, where the system automatically generates reports about interesting data trends with no human intervention.

Would it surprise you to know both features are in Google Sheets? You need to invoke it using the easily missed “explore” function, hidden at the bottom right.

 

 

The easily missed “explore” function in Google Sheets
It will then help you explore the data set by automatically generateingvisualizations.

Another nice feature is that you can almost type in natural language to get the analysis you want.

 

 

In the example above, I uploaded a Google analytics data file and after clicking on explore, besides offering me some graphs it also offers to allow me to type questions in natural language to get answers. Some suggested type of questions you can try are already shown. e.g. Finding correlations, medians etc.

It’s still not very smart at interpreting what I want but it’s possibly going to get better as more people use it.

This explore function is not just in Google Sheets but also in Google Docs and Google Slides as you have already seen,

Conclusion

This was a really long post, with a diverse mix of tools, services and apps.

A lot of the machine learning type features where the system provides recommendation and advice based on the data you have seem to have start appearing in 2016, so things are still very fluid right now and we should be expecting more and more new services and tools to have such features. God knows when library provided software will follow suit. 🙂

Hopefully, my post gave you some ideas on what can be done and inspiration to try out some of these tools that you think might be useful.

 

Comments are closed.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑