2018 will be the year chatbot conversations get real

As we enter 2018, it’s clear the expectations of intelligent assistance providers and enterprise practitioners are more in balance than ever before. It’s also clear that the number of “known knowns” is growing rapidly.

Here are five examples of how and why we will learn more about the complex workings of conversational commerce in the coming year. Continue reading “2018 will be the year chatbot conversations get real”

5 best practices for implementing voice marketing in 2018

Hey Alexa, play some music.
Ok, Google, turn on the lights.

Five years ago, these commands would have made no sense. But for the past two and a half years, voice-enabled speakers have steadily gained traction, introducing the world to voice-activated technologies. As we approach 2018, there’s no sign of slowing down the smart speaker revolution. Continue reading “5 best practices for implementing voice marketing in 2018”

SEO is not enough in the age of voice

With so many technological innovations now transforming our lives, it should be noted that the ideas for these innovations have existed for decades in science fiction novels and television. The capacity to talk to a computer (and have it talk back) was a staple of Gene Roddenberry’s, Star Trek, where the Starfleet computer was voiced by Roddenberry’s wife, Majel. The 1970 movie, Colossus: The Forbin Project, featured a supercomputer that was intended to prevent war and proclaimed itself “the voice of World Control.” And before Google’s self-driving cars, the 1980s brought us KITT, an advanced artificially intelligent, self-aware, and nearly indestructible car from the TV show, Knight Rider. Continue reading “SEO is not enough in the age of voice”

Why the best approach to voice marketing might be nothing at all


Seemingly overnight, capable voice recognition joined forces with artificial intelligence and machine learning to push voice-enabled experiences to the forefront of business thinking. But before brands leap into the fray, they need to recognize where and when to invest in the new technology. For some, that means investing elsewhere, for now. Continue reading “Why the best approach to voice marketing might be nothing at all”

Voice interfaces will revolutionize patient engagement


The healthcare industry is abuzz over consumer engagement and empowerment, spurred by a strong belief that when patients become more engaged in their own care, better outcomes and reduced costs will result.

Nevertheless, from the perspective of many patients, navigating the healthcare ecosystem is anything but easy.

Consider the familiar use case of booking a doctor’s appointment. The vast majority of appointments are still scheduled by phone. Booking the appointment takes on average ten minutes, and the patient can be on hold for nearly half of that time.

These are the kinds of inefficiencies that compound one another across the healthcare system, resulting in discouraged patients who aren’t optimally engaged with their care. For example, the system’s outdated infrastructure and engagement mechanisms also contribute to last-minute cancellations and appointment no-shows—challenges to operational efficiency that cost U.S. providers alone as much as $150 billion annually.

Similarly, long waits for appointments and the convoluted process of finding a doctor are among the biggest aggravations for U.S. patients seeking care. A recent report by healthcare consulting firm Merritt Hawkins found that appointment wait times in large U.S. cities has increased 30 percent since 2014.

It’s time for this to change. Many healthcare providers are beginning to modernize, but moving from phone systems to online scheduling, though important, is only the tip of the iceberg. Thanks to new platforms and improved approaches to integration of electronic medical records (EMR), the potential for rapid transformation has arguably never been greater.

This transformation will take many shapes—but one particularly excites me: voice. While scheduling and keeping a doctor’s appointment might be challenging today, it’s not far-fetched to envision a near future in which finding a doctor may be as simple as telling your favorite voice-controlled digital assistant, “Find me a dermatologist within 15 miles of my office who has morning availability in the next two weeks and schedule me an appointment.”

How voice has evolved in healthcare: The rise of technology platforms

Voice technologies have been generating excitement in the healthcare space for years. Because doctors can speak more quickly than they can type or write, for example, the industry has been tantalized by the promise of natural language processing services that translate spoken doctors’ notes into electronic text.

No single company or healthcare provider holds all the keys to this revolution. Rather, it hinges on a variety of players leveraging technology platforms to create ecosystems of patient care. These ecosystems are possible because, in contrast to even a few years ago, it’s eminently more feasible to make software interoperate—and thus to combine software into richer services.

Developers can leverage application programming interfaces (APIs) that provide access to natural language processing, image analysis, and other services, enabling them to build these capabilities into their apps without creating the underlying machine learning infrastructure, for example.

These apps can also leverage other APIs to connect disparate systems, data, and applications, anything from a simple microservice that surfaces inventory for medical supplies to FHIR-compliant APIs that allow access to patient data in new, more useful contexts. Understanding the possibilities and challenges of connecting these modern interfaces to EMR systems, which generally do not easily support modern interoperability, may be one of the biggest obstacles. Well over a quarter-million health apps exist, but only a fraction of these can connect to provider data. If voice-enabled health apps follow the same course, flooding the market without an approach to EMR interoperability, it could undermine the potential of these voice experiences to improve care.

Fortunately, as more providers both move from inflexible, aging software development techniques such as SOA to modern API-first approaches and adapt the FHIR standard, these obstacles should diminish. FHIR APIs allow providers to focus on predictable programming interfaces instead of underlying systems complexity, empowering them to replace many strained doctor-patient interactions with new paradigms.

As it becomes simpler for developers to work with EMR systems alongside voice interfaces and other modern platforms, the breadth and depth of new healthcare services could dramatically increase. Because developers can work with widely adopted voice assistants such as Google Assistant, Apple’s Siri, and Amazon’s Alexa, these new services won’t need to be confined to standalone apps. Instead, they can seamlessly integrate care and healthier activity into a user’s day-to-day routines.

Many of us already talk to our devices when we want information on things like traffic conditions, movie times, and weather forecasts. Likewise, many of us are already accustomed to taking advice from our digital assistants, such as when they point out conflicts on our calendars or advise us to leave in order to make it to a meeting on time. It’s natural these interfaces will expand to include new approaches to care: encouraging patients to exercise, reminding them to take medications, accelerating diagnoses by making medical records more digestible and complete, facilitating easier scheduling, etc.

Indeed, research firm Gartner’s recent “Top 10 Strategic Technology Trends for 2018” speaks to the potential of voice and other conversational interaction models: “These platforms will continue to evolve to even more complex actions, such as collecting oral testimony from crime witnesses and acting on that information by creating a sketch of the suspect’s head based on the testimony.”

As voice and other interfaces continue to evolve from scripted answers to more sophisticated understandings of user intent and more extemporaneous, context-aware ways of providing service, the nature of daily routines will change. For example, whereas many patients today feel anxiety over finding the time and focus to pursue better care, in the near future, this stress will likely diminish as more healthcare capabilities are built into platforms and interaction models consumers already use.

What comes next?

It’s clear that providers feel the urgency to improve patient engagement and operational efficiency. Research firm Accenture, for example, predicts that by the end of 2019, two-thirds of U.S. health systems will offer self-service digital scheduling, producing $3.2 billion in value. That’s a start, but there’s much more to do.

More capabilities will need to be developed and made available via productized APIs, platforms will need to continue to grow and evolve, and providers must adopt operational approaches that allow them to innovate at a breakneck pace while still complying with safety and privacy regulations.

But even though work remains, voice platforms and new approaches to IT architecture are already changing how patients and doctors interact. As more interoperability challenges are overcome, the opportunities for voice to be a meaningful healthcare interface are remarkable.

For the biggest changes, the question likely isn’t if they will happen but how quickly.

Aashima Gupta is the global head of healthcare solutions for Google Cloud Platform where she spearheads healthcare solutions for Google Cloud.

BBC is launching an interactive radio show for Echo


The future of entertainment is here. The BBC, in collaboration with Rosina Sound, is working on an interactive radio play for artificial intelligence-enabled home chatbots like Amazon’s Echo and Google Home.

The production will be the first of its kind — the first to use this kind of technology and to function in this way. It plans to release this futuristic, high-tech play by the end of the year.


Do you have an AI strategy — or hoping to get one? Check out VB Summit on October 23-24 in Berkeley, a high-level, invite-only AI event for business leaders.


The play

The story, called the Inspection Chamber, will work similarly to choose-your-own-adventure books and games, in which users can influence the direction of the story by the choices they make.

The creators of the Inspection Chamber, though, are seeking to take that idea a bit further and make listeners really feel like they’re in the story.

The story’s narrator will ask you, the listener, questions throughout the story. Your answers to those questions will change the outcome of the narrative.

The questions are designed so the listener doesn’t have to step out of the story to consider their decision, but instead feels like they’re a character in the story. It’s meant to feel like you’re interacting with the other characters in the play.

The creators of the play said they took inspiration from games like The Stanley Parable and Papa Sangre, and authors such as Franz Kafka and Douglas Adams. The story became, in the creators’ own words, “a comedy science-fiction audio drama.”

The technology

The sci-fi elements fit well with the medium through which the story will be presented. The show’s creators say they’ve built a “story engine” that lets the story work on a variety of different voice devices.

First, the Inspection Chamber will come out on Amazon Echo and Google Home, but the BBC is looking into other devices, like Apple’s HomePod and Microsoft & Harman Kardon’s Invoke speaker, as well.

The project comes out of a wider BBC initiative called Talking With Machines that is exploring spoken interfaces. It’s looking at ways to share content through these technologies and improve interactive audio interfaces. It also aims to create a platform for these interfaces that works across devices, instead of relying on one particular device.

Merging art and technology

In some ways, the plot of the Inspection Chamber had to conform to the limitations of the technology used to share it. For example, Amazon’s Alexa requires users to speak every 90 seconds, and these devices only understand a limited number of phrases. The story’s writers had to come up with a way to incorporate these phrases and time requirements into the story, without making it feel forced.

The use of this technology to tell a story may be experimental now, but as the technology improves, this type of content will likely become easier to create with fewer limitations on creativity. This presents some interesting ideas about the future of creative fields and technology. Rather than shy away from tech in favor of the traditional, the BBC is going full force into it.

Physical books and theater productions may never go completely out of style due to their many virtues, but using new technologies creates new possibilities with a plot, user experience, and more.

Kayla Matthews is a technology writer interested in AI, chatbots, and tech news. She writes for VentureBeat, MakeUseOf, The Week, and TechnoBuffalo.

Voice technology will change your relationship with customers


Artificial intelligence is at the root of several entirely new platforms on which customers and companies can interact. Voice augmented reality and chatbots are powered by natural language processing, computer vision, and machine learning AI algorithms. Each technology offers considerable opportunities for companies to deliver a more personal, useful, and relevant service to their customers.

Conversational interfaces are already here

Voice-controlled user interfaces have been around since 1952 when Bell Labs produced Audrey, a machine that could understand spoken numbers. But the current wave of voice technology was started by Amazon just a couple of years ago.

In 2015, Amazon launched the Echo, which introduced its AI-powered voice service, Alexa. At the time, the general response was one of confusion and frustration. As Farhad Manjoo, The New York Times’ tech columnist, wrote at the time, “If Alexa were a human assistant, you’d fire her, if not have her committed.”

But in the past two years, a lot has changed. Today, the Echo is recognized as a product that is leading a major shift in how humans engage with technology — and, by extension, how customers engage with brands.

It’s taken more than six decades, but now increasing processing power and advances in AI have technology giants locked in an arms race to create the dominant voice-based assistant. Some of the advances of key focus include machine learning, self-improving algorithms, speech recognition, and synthesis for developing conversational voice interfaces.

Voice can deliver better customer experiences

As the technology improves, the opportunity for companies to use voice to improve customer relationships grows.

Via an Alexa skill (Amazon’s term for an Alexa app), home cooks can ask for advice from Campbell’s Soup, shoppers can pay their Capital One credit card bills, and BMW drivers can check fuel levels remotely. Alexa, of course, is not alone. Apple Siri, Microsoft Cortana, Google Assistant, and other voice-enabled platforms are vying for attention.

For example, Xfinity’s latest TV remote is voice-enabled; Samsung Bixby controls a phone with voice commands; and Ikea is considering integrating voice-enabled AI services into its furniture.

Customer-focused companies must consider three areas in which voice can have an impact on their relationship with their customers.

  • More personality leads to deeper relationships: By its very nature, voice technology allows brands to move from text-based interactions with customers to something that feels more human. However, there is a high bar to meet. If customers feel they’re engaging with something closer to a “real person,” their expectations will change. If a conversational voice assistant makes a mistake or loses the context, it will be important for human backup to intercede. In addition, injecting an ambient conversational intelligence into people’s lives and homes will require deeper levels of trust that an individual’s privacy won’t be violated.
  • More engagement leads to more data, which gives companies further opportunities to understand their customers: Customers now expect omnichannel service, meaning they take for granted that companies will interact with and respond to them across any and all channels, including voice. From a company’s perspective, those voice interfaces can provide a rich additional set of data on its customer interactions. Companies will be able to use phrasing, tone, accent, and speed of delivery to learn far more about their customers than ever before. More data means companies can get better at understanding customer intent and attitude, such that they can take proactive steps to optimize the customer experience.
  • Voice presents opportunities for new types of engagement: Customers increasingly expect companies to respond to their queries immediately, whether during business hours or not. Voice and AI-powered conversational technology can help companies measure up to those expectations.

Intelligent conversational interfaces allow companies to scale up their capacity to engage with customers. The result is reducing customer service hold times, resolving simple issues more quickly, and triaging complex questions before directing them to the appropriate department. Intelligent, personalized voice-enabled assistants could also help health care companies scale “virtual medicine” and in-home care, and they could give financial services companies the capacity to handle customer service and provide financial advice at scale.

Voice is the most natural interface for humans. As conversational interfaces continuously learn, become smarter, and grow more aware of each individual’s preference, they will become more valuable in augmenting the customer experience and building deeper relationships with brands.

Clement Tussiot is director of product management at Salesforce Service Cloud, which delivers customer service software in the cloud.

The future of journalism is not all doom and gloom. Here’s why

The Reuters Institute for the Study of Journalism will be unveiling its 2017 Digital News: Essential Data on the Future of News report at the GEN Summit in Vienna, on 22 June. We talked to Nic Newman, author of the report, to get an early glimpse of what we can expect from it and what is going to shape the media industry in the foreseeable future.

David Levy, Nic Newman and Matt Kelly at the GEN Summit 2016

GEN: What trends are emerging this year and will be more prominent in the upcoming months for news?

Nic Newman: It’s been an extraordinary year for the news industry. Because of this perfect storm of fake news – and how to define them, business models and the growing realisation that platforms are not just platforms. Those three things together condition how we create journalism, how we distribute journalism. They show that we are really at an inflection point as an industry.

One of the things we do in the 2017 Digital News: Essential Data on the Future of News report, which we will be revealing at the GEN Summit, is that we get country reports from every country on the Reuters Institute supply side, giving us insights into what is going on in terms of jobs, journalistic jobs, and in terms of business models. The responses to that show an enormous strain. In Australia, for instance, with Fairfax losing 25 percent of editorial jobs. There are a lot of job cuts in journalism in quite a few countries but we are also seeing on the more positive side real innovation in business models. We are really starting to see a change there.

Last year I felt it was all quite depressing, it was a very depressing report to read. This year, there will be a few more moments of optimism, particularly around the inventiveness and creativity people are trying to take advantage of. Some of the content and the innovation we are seeing coming from some of our partners are extremely impressive.

So it is still a storm, it is still somewhat depressing but there are definitely more moments of ‘home-run’ in this year’s report.

What is going to be key for the future of news publications?

What is happening with platforms in general is key. Again, I am not going to give away the details of the report, because we want to reveal those details in June, but we are seeing a lot of change within the social networks space that is perhaps a bit hidden, we will explore this. We have seen for the past five years the growth and increasing power of Facebook specifically. We are now reaching a point of saturation within developed markets for the “old style” social networks, they are getting disrupted.

It is really about the role of platforms and algorithms. In terms of how people discover news, over the last five years we’ve seen this shift from the majority of people going directly to a news site and getting stories selected by an editor to many more people coming across (and then selecting) news via an algorithm.

This chart shows that across all countries editorial selection (direct, email, email notifications linked to an app) is now only just in the lead (52%) but for under 35s, who use more social media in particular we are now in a world where the majority of content is selected by an algorithm (55% compared with 43% for direct)

This is why the issue of who programmes the algorithms, the transparency of those algorithms and what kind of content they surface matter so much. We will be discussing this at the GEN Summit.

We will expand also lot on advanced messaging apps in the report, as for some of the new countries we are looking at this year, messaging is more powerful already than traditional social networks. We used to look at the US to understand what was happening and what was rising, now it would not make sense; Asia and Latin America are better examples for these emerging trends.

About the evolution of business models, do you see this affecting legacy media and digital pure players alike?

Absolutely. It is affecting everybody. What is interesting is that many of the pure players, many of them are only a few years old, are already becoming disrupted by changes in distribution models. So they you know they started out with one approach, which was very successful for a while, but they cannot assume this approach is going to continue to be successful, as every year something new changes the landscape, particularly in this disrupted world. If you take at what a few of the New York companies do on distributed models, then you will see this the year when the greatest change is happening.

From the Journalism, Media and Technology Trends 2017, Reuters Institute

On the legacy side what we are seeing is a major refocusing away from pure reach and pure numbers towards more subtle issues: “How do we create value? How do we get people to come back to our site more often?” One of the big elements that is in our report this year comes from the research we have conducted with focus groups, especially in Europe, on how consumers think about different new models paying for news. Not just paywalls but also some of the emerging models like micro-payments and bundle of different propositions, the aggregated ‘Spotify for news’ kind of ideas and we have been talking to consumers about how they would feel about some of those models.

What we found out is that we have gone from a world where everyone in the media assumed publishing was going to be funded by advertising, to one where everyone sensed that no publications can survive without a company paywall. And I believe that neither of those two are true. When we talked to consumers, it became obvious they cannot afford to pay for four or five digital subscriptions. What they really want is to be able to do what they are already doing in the digital world, navigating from one site to another. They love that and they do not want to go back to the actual world where they are forced, from a financial standpoint, to only get news from one provider.

So I don’t think that single stack paywalls are going to be the answer either. Ultimately publishers are going to think much more radically about how they combine those models into something that fits with what the consumer wants.

What will be the solution for publishers?

Everyone is realising that one single business model is not going to be enough. Essentially what is needed is, three, four or even five different approaches, meaning that publishers will be protected, to some degree, from a sudden down, a sudden change of an algorithm by Facebook for instance or a consequent downturn in terms of display advertising.

Having a more distributed model would help. We have been seeing increasingly over the last year, positive signs with publications starting to develop income streams around events, sponsored content or data. These make for different business models, every publisher will have a different approach.

To some extent these help, but above the business models, publications have to have real clarity on what they are about. The most successful media companies are very clear about what their fundamental strategy is. And then beneath that publishers need to iterate very quickly on those points, particularly if the fundamental strategy is about making money out of branded content for example and building distribution networks. If this was your strategy, you would have to change what you did, how your formats worked and which networks you were using.

You were mentioning a more optimistic outlook for the future of the news industry, what is affecting it?

I suppose this is what we will be discussing in June at the conference, which is the reason why everyone needs to be at the GEN Summit. As I described earlier, what we are seeing is partly because of the desperate state of the economic situation, with publications really counting now on business model innovation to move forward.

A very positive element being a source for hope is actually the fake news phenomenon. Before people everywhere felt that there wasn’t really a problem with news, from a consumer point of view. It was fine, it was all free, a lot of people thought they should never have to pay for it, which was part of the problem. What we are getting now is because of fake news, the general public has come to the realisation that journalism doesn’t come for free. There is good journalism, there is bad journalism, and there is quality journalism which cannot be found everywhere, it is actually quite scarce. It might be something people need to pay for. The increasing pollution of our news environments, which is I think is what is going on, is creating a situation where there is an opportunity for quality news brands or brands that have something to say to actually charge for their either directly or through a creative approach to advertising in the marketplace. I think this is the ray of hope I take from the whole ‘fake news’ debacle.

What is the next big disrupter, technology-wise, for news? How are immersive and other technologies faring in newsrooms?

For the first time this year for the report, we have asked about ‘voice’ – voice-activated devices such as Amazon’s Alexa, Echo or Echo Dot. People at the GEN Summit are going to be very surprised at some of the results we got around voice. Voice is going to be an incredibly important platform, and in the short term it is more important than wearables – glasses, watches etc. – for example which everyone got very excited about. Media companies need to take ‘voice’ seriously, as an emerging platform that is developing quite quickly.

The Amazon Echo Dot

So far voice-activated devices are sold in four countries, whereas for the report we take a look at 36 countries. In most countries those aren’t relevant yet but in countries we have surveyed voice for the first time, the extent of its usage not only in general, but for news specifically is impressive.

The obvious implication here is the rise of audio, beyond radio programs, or even the popularity for podcasts. ‘Voice’ as a new platform for news is only just emerging, it has just being launched. But in a five year perspective, it is going to be a major disrupter.

The VR for news report authored by Zillah Watson from the BBC, looks very widely at best practices in VR. Virtual reality, and any type of technology for immersive journalism, is a much longer term opportunity, it is going to take a long time to build. While we see a major potential in 360 journalism, there is a lot of experimentation going on around it at the moment. But there are quite a few barriers standing between the technology and innovations in 360 and implementation in newsrooms, it is going to take some time.

It could be years before we see the potential of VR, which I think is going to be very strong, but it is going to be different for ‘voice’, as it will be easier to implement, it will be really disruptive. Amazon is going to disrupt all kinds of business models including Google’s, since they mainly built their business on adjacent display in search. AR offers a lot of opportunities with huge bundle of technologies, bringing disruptions in almost every stage in the news valued chain.

From the Reuters Institute VR for News report, by Zillah Watson, BBC

About Nic Newman

Nic Newman is a journalist and digital strategist who played a key role in shaping the BBC’s internet services over more than a decade. He was a founding member of the BBC News Website, leading international coverage as World Editor (1997–2001). As Head of Product Development for BBC News he helped introduce innovations such as blogs, podcasting and on-demand video. He has played an important part in the development of social media strategies and guidelines for the wider BBC. Nic is currently a Visiting Fellow at the Reuters Institute for the Study of Journalism and a consultant on digital media.

Rich Harris—The Guardian US

“There’s been this trend in recent years towards trying to make journalism as digestible as possible, as easy to share as you can make it. And we wanted to do something that explicitly went against that trend.” (Journalism.co.uk, 18 August 2016)

Ben Kreimer—Buzzfeed

“Let´s try things out. Even if virtual reality journalism is not exploding in terms of hits right now, it pays to be a part of it. VR news is going…well, somewhere.” (The Media Online, 7 March 2017)

Amy Webb—Future Today Institute

“This isn’t to say that every single journalist needs to become a coder overnight, but I do think it’s important that news organisations understand what AI can and cannot do. I see a fairly big disconnect right now, with some organisations thinking that AI will eliminate all the reporters and others thinking that it will somehow magically allow them to write millions of stories.” (Journalism.co.uk, 13 December 2016)

Quotes brought to you by Storyzy


The future of journalism is not all doom and gloom. Here’s why was originally published in Global Editors Network on Medium, where people are continuing the conversation by highlighting and responding to this story.

What voice UI is good for (and what it isn’t)

“Bill Buxton introduced the concept of a “place-ona”, adapting the concept of a persona (which we all love to hate) to show how a location can place limits on the type of interactions that makes sense. There is no “one best input” or “one best output”. It all depends on where you are, which in turn defines what you have free to use.

At a very simple level, humans have hands, eyes, ears and a voice. (Let’s ignore the ability to ‘feel’ vibrations as that’s alert-only for the moment). Let’s look at some real world scenarios:

  • The “in a library wearing headphones” placeona is “hands free, eyes free, voice restricted, ears free”.
  • The “cooking” placeona is “hands dirty, eyes free, ears free, voice free”.
  • The “nightclub” placeona is “hands free, eyes free, ears busy (you can’t hear), voice busy (you likely can’t speak/can’t be heard)”.
  • The “driving” placeona is “hands busy, eyes busy, ears free, voice free”.

Based on the above, you can see which scenarios voice UI are useful in and in general the role of voice as an input mechanism.”

Read full blog post

Why Publishers Need to Capitalize on Voice Computing

The following is a guest post from SpokenLayer.


Humans develop language skills long before they read or write. It’s no wonder that using voice to interact with technology has been a sci-fi goal for decades. With years of practice under the belt of Amazon Alexa, Google Assistant, Apple’s Siri, and Microsoft’s Cortana we’ve finally come to a point where voice assistants are widely used. Now we see voice assistants not only as a smartphone feature, but as an always-on in home device that becomes a part of someone’s day-to-day life.

In the next year, it’s expected that 33 million of these devices will be out in the wild. That is an incredible opportunity for publishers and brands to hop on board and capitalize on this technology.

graph showing voice-first device footprint year over year

What happened to make these devices emerge?

Improvements in technology. After years of mobile phone assistants, the data and capabilities of these assistants have finally gotten to the point where they’re good. The commands are also clear enough for the assistant to quickly train you on what you can and can’t ask for.

The cost of this technology has gone down. Economies of scale for smartphones have made the device components cheap enough to mass produce consumer devices for under $50 like the Echo Dot.

The proliferation of APIs and content made the hardware easier to produce. This has ensured that device manufacturers have little heavy lifting to do to make these devices useful. Using external APIs, a simple voice command can order you a pizza, replace your laundry detergent, tell you the weather or play your favorite song.

 

chart showing volume of apps in the Alexa app store

Publishers can meet their audiences everywhere they go

Just as content marketing has exploded as everyone’s go-to marketing tactic, it’s also exploded as the number one app category on voice computing devices. Content producers are pumping out skills to meet their audience where they’re at with audio that’s delivered straight to the living room, kitchen, or bedroom.

voice-first device in home, compared to an app

For content publishers, the home presents a huge opportunity to build a daily habit with your users. Rather than reaching someone on a screen where they’re distracted by other apps – Facebook, Email, Netflix, Texts, etc., devices like Echo and Google Home make it easy to capture someone’s full audio attention. And as content producers build out this experience, we’re beginning to think of it as Radio 3.0.

What is Radio 3.0?

Radio’s evolved quickly over the past 15 years. Here’s a rundown of its evolution:

  • Radio 1.0: 1 broadcaster to many listeners – Terrestrial Radio that’s controlled by the broadcaster. Users are stuck with whatever the broadcast delivers.
  • Radio 2.0: 1 broadcaster to 1 listener – Online Radio controlled by the user, but locked into an ecosystem. It’s personalized, but only for that platform like iTunes, Pandora or Spotify.
  • Radio 3.0: Many broadcasters and 1 listener – A blending of sources that are controlled by the user. Provides the opportunity for someone to listen to a mixture of content through one device.

Radio 3.0 is interactive, personalized, and habit-forming. Unlike traditional radio, which is controlled by the broadcaster, Radio 3.0 is controlled by the user’s voice so interactions are quick and easy, no matter where they left their phone. Personalization on voice assistants brings listeners the content they want, whether it’s a song on Spotify, the weather, local sports scores, a news story from Slate.com or a Hardcore History podcast (this guy is incredible!). All of this ladders up to create a daily habit that users make their own.

How publishers can design voice experiences for their audience

In designing experiences for this type of environment, think of the interactions you have at a restaurant.

photos demonstrating the restaurant analogy for voice tech experience design

  • Less than 60 seconds spent with the hostess. This is a short interaction centered around quick answers.
  • Less than 5 minutes of small talk with the bartender. Short snippets of information like the flash briefing which includes the top stories of the day along with the weather and sports scores.
  • More than 5 minutes at the table. You sit down, order, and eat. This dedication to staying engaged with something for more than 5 minutes is the equivalent to longer listening sessions. Think about listeners using their voice assistant to listen to multiple stories, music, podcasts, TED talks, etc. and engaging like you would in a conversation.

The time is now

It’s clear that Amazon’s Echo and Google’s Home devices have struck a chord with consumers. The other major tech companies will soon follow suit with devices of their own.

For brands and publishers deciding whether to jump in or stay on the sidelines, it’s time to jump in. Consumers are creating their consumption habits now, and it’s much harder to break a habit than creating a new one. Starting now will also provide insights on how audiences want to use this new medium. Armed with that knowledge, brands can advocate for advanced features with manufacturers who are themselves, still looking to understand how these devices will be used.

At SpokenLayer, we transform written content into human-read audio at scale. We work with premium publishers to bring their stories to life every day, sharing them with the world through Alexa, Google Home, iTunes, Spotify, Soundcloud and more.

The post Why Publishers Need to Capitalize on Voice Computing appeared first on Parse.ly.

Voice and the uncanny valley of AI

Voice is a Big Deal in tech this year. Amazon has probably sold 10m Echos, you couldn’t move for Alexa partnerships at CES, Google has made its own and, it seems, this is the new platform. There are a couple of different causes for this explosion, and, also, a couple of problems. To begin, the causes.

First, voice is a big deal because voice input now works in a way that it did not until very recently. The advances in machine learning in the past couple of years mean (to simplify hugely) that computers are getting much better at recognizing what people are saying. Technically, there are two different fields here; voice recognition and natural language processing. Voice recognition is the transcribing of audio to text and natural language processing is taking that text and working out what command might be in it. Since 2012, error rates for these tasks have gone from perhaps a third to under 5%. In other words, this works, mostly, when in the past it didn’t. This isn’t perfect yet – with normal use a 5% error rate can be something you run into every day or two, and Twitter is full of people posting examples of voice assistants not understanding at all. But this is continuing to improve – we know how to do this now.

Second, the smartphone supply chain means that making a box with microphones, a fast-enough CPU and a wireless chip is much easier – with 1.5bn smartphones sold last year, there’s a firehose of ever-better, ever-cheaper components of every kind being created for that market at massive scale but available for anything else. In parallel, the ecosystem of experts and contract manufacturers around smartphones and consumer electronics that is broadly centred on Shenzhen means not only that  you can get the parts but that you can also get someone to put them together for you. Hardware is still hard, but it’s not as hard as it was. So, if you want a magic voice box, that you plan to light up from the cloud, you can make one.

Third, the major internet platform companies collectively (Google, Apple, Facebook and Amazon, or GAFA) have perhaps 10 times the revenue that Wintel had in the 1990s, when they were the companies changing the world and terrifying the small fry. So, there’s a lot more money (and people, and distribution) available for interesting side projects.

Fourth, a smartphone is not a neutral platform in the way that the desktop web browser (mostly) was – Apple and Google have control over what is possible on the mobile internet in ways that Microsoft did not over the desktop internet. This makes internet companies nervous – it makes Google nervous of Apple (and this is one reason why it bought Android) and Amazon and Facebook nervous of both. They want their own consumer platforms, but don’t have them. This is an significant driver behind the Kindle Fire, Alexa, Facebook Messenger bots and all sorts of other projects.

All of this adds up to motive and opportunity. However, this doesn’t necessarily mean that voice ‘works’ – or rather, we need to be a lot more specific about what ‘works’ means.

So, when I said that voice input ‘works’, what this means is that you can now use an audio wave-form to fill in a dialogue box – you can turn sound into text and text (from audio or, of course, from chatbots, which were last year’s Next Big Thing) into a structured query, and you can work out where to send that query. The problem is that you might not actually have anywhere to send it. You can use voice to fill in a dialogue box, but the dialogue box has to exist – you need to have built it first. You have to build a flight-booking system, and a restaurant booking system, and a scheduling system, and a concert booking system – and anything else a user might want to do, before you can connect voice to them. Otherwise, if the user asks for any of those, you will accurately turn their voice into text, but not be able to do anything with it – all you have is a transcription system. And hence the problem – how many of these queries can you build? How many do you need? Can you just dump them to a web search or do you need (much) more?

Machine learning (simplifying hugely) means that we use data at massive scale to generate models for understanding speech and natural language, instead of the old technique of trying to write speech and language rules by hand. But we have no corresponding way to use data to build all the queries that you want to connect to – all the dialogue boxes. You still have to do that by hand. You’ve used machine learning to make a front-end to an expert system, but the expert system is still a pre-data, hand-crafted model. And though you might be able to use APIs and a developer ecosystem to get from answering 0.1% of possible questions to answering 1% (rhetorically speaking), that’s still a 99% error rate. This does not scale – fundamentally, you can’t create answers to all possible questions that any human might ever ask by hand, and we have no way to do it by machine. If we did, we would have general AI, pretty much by definition, and that’s decades away.

In other words, the trap that some voice UIs fall into is that you pretend the users are talking to HAL 9000 when actually, you’ve just built a better IVR, and have no idea how to get from the IVR to HAL.

Given that you cannot answer any question, there is a second scaling problem – does the user know what they can ask? I suspect that the ideal number of functions for a voice UI actually follows a U-shaped curve: one command is great and is ten probably OK, but 50 or 100 is terrible, because you still can’t ask anything but can’t remember what you can ask. The other end of the curve comes as you get closer and closer to a system that really can answer anything, but, again, that would be ‘general AI’.

The interesting implication here is that though with enough money and enough developers you might be able to build a system that can answer hundreds or thousands of different queries, this could actually counterproductive.

The counter-argument to this is that some big platform companies (i.e Google, Amazon and perhaps Facebook) already have huge volume of people typing natural language queries in as search requests. Today they answer these by returning a page of search results, but they can take the head of that curve and build structured responses for (say) the top 100 or 500 most common types of request – this is Google’s knowledge graph. So it’s not that the user has to know which 50 things they can ask, but that for the top 50 (or 500) types of question they’ll now get a much better response than just a page of links. Obviously, this can work well on a screen but fails on an audio-only device. But more broadly, how well this works in practice is a distribution problem – it may be that half of all questions asked fall into the top 500 types that Google (say) has built a structured response to, but how many of the questions that I myself ask Google Home each day will be in that top 500, and how often will I get a shrug?

This tends to point to the conclusion that for most companies, for voice to work really well you need a narrow and predictable domain. You need to know what the user might ask and the user needs to know what they can ask. This was the structural problem with Siri – no matter how well the voice recognition part worked, there were still only 20 things that you could ask, yet Apple managed to give people the impression that you could ask anything, so you were bound so ask something that wasn’t on the list and get a computerized shrug. Conversely, Amazon’s Alexa seems to have done a much better job at communicating what you can and cannot ask. Other narrow domains (hotel rooms, music, maps) also seem to work well, again, because you know what you can ask. You have to pick a field where it doesn’t matter that you can’t scale.

Meanwhile, voice is not necessarily the right UI for some tasks even if we actually did have HAL 9000, and all of these scaling problems were solved. Asking even an actual human being to rebook your flight or book a hotel over the phone is the wrong UI. You want to see the options. Buying clothes over an IVR would also be a pretty bad experience. So, perhaps one problem with voice is not just that the AI part isn’t good enough yet but that even human voice is too limited. You can solve some of this by adding a screen, as is rumored for the Amazon Echo – but then, you could also add a touch screen, and some icons for different services. You could call it a ‘Graphical User Interface’, perhaps, and make the voice part optional…

As I circle around this question of awareness, it seems to me that it’s useful to compare Alexa with the Apple Watch.  Neither of them do anything that you couldn’t do on your phone, but they move it to a different context and they do it with less friction – so long as you remember. It’s less friction to, say, set a timer or do a weight conversion with Alexa or a smart watch, as you stand in the kitchen, but more friction to remember that you can do it. You have to make a change in your mental model of how you’d achieve something, and that something is a simple, almost reflexive task where you already have the muscle memory to pull out your phone, so can this new device break the habit and form a new one? Once the habit or the awareness is there then for some things a voice assistant or a watch (or a voice assistant on a watch, of course) are much, better than pulling out your phone, but the habit does somehow have to be created first.

By extension, there may be a set of behaviors that fit better with a voice UI not because they’re easier to build or because the command is statistically more likely to be used but because the mental model works better – turning on lights, music (a key use case for the Echo) or a timer more than handling appointments, perhaps. That is, a device that does one thing and has one command may be the best fit for voice even though it’s theoretically completely open-ended.

There’s a set of contradictions here, I think. Voice UIs look, conceptually, like much more unrestricted and general purpose interfaces than a smartphone, but they’re actually narrower and more single-purpose. They look like less friction than pulling out your phone, unlocking it, loading an app and so on, and they are – but only if you’ve shifted your mental model. They look like the future beyond smartphones, but in their (necessarily) closed, locked-down nature they also look a lot like feature phones or carrier decks. And they’re a platform, but one that might get worse the bigger the developer ecosystem. This is captured pretty well by the ‘uncanny valley’ concept from computer animation: as a rendering of a person goes from ‘cartoon’ to ‘real person’ there’s a point where increased realism makes it look less rather then more real – making the tech better produces a worse user experience at first.

All of this takes me back to my opening point – that there are a set of reasons why people want voice to be the new thing. One more that I didn’t mention is that, now that Mobile is no longer the hyper-growth sector, the tech industry is casting around looking for the Next Big Thing. I suspect that voice is certainly a big thing, but we’ll have to wait a bit longer for the next platform shift.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑