Synergies between voice bots and chatbots

When embarking on a project using voice technology how much of your existing chatbot knowledge is reusable?

As we continue on into the “new normal” for customer service forged by the ongoing pandemic, businesses are increasingly looking towards omnichannel strategies to deliver improved customer experiences. Whether this takes the form of chat, phone support or a hybrid model, it’s important to understand the synergies of these technologies and their benefits.

In data from Salesforce, it was revealed that 72% of users placed a higher degree of trust in a business after having a positive experience with a voice assistant. This signals that the desire from consumers to have meaningful, productive interactions with the brands they care about via voice is there - but is the technology up to the task?

In 2022, I believe that, yes, this is more so the case than ever before. And in this article, I will outline what is important to consider when embarking on a project using voice technology if you already have experience with chatbots (or vice versa), and how much existing knowledge can be reused to expedite and reduce friction in the process.

A shift in voice market share

Both voicebots and chatbots, powered by Natural Language Understanding (NLU), allow users to be understood, complete tasks and to get on with their day. These advanced algorithms boost the intelligence of these bots and eliminate the need to wait in long queues, explain oneself multiple times and generally help to improve the experience of speaking with customer service. Combined with human agents at the right time via seamless transfer, this can be hugely beneficial for both a business and its customers.

The adoption rate of voice-related technology is constantly increasing however, in 2022, I expect that we will see a shift in market share of voice project deployment from smart speakers, such as Google Home and Amazon’s Alexa, towards other channels including conversational IVR, voice-in-the-car services and digital avatars/humans - primarily because of the value they bring to contact centers.

Voicebots deployed through Interactive Voice Response (IVR) systems, in particular, are becoming more advanced due to technological advances and better interoperability, which is largely possible thanks to a growing number of acquisitions and partnerships.

Benefits of Voice bots

Voicebots can be deployed in the contact center through conversational IVR. The benefits of the technology in this setting include improving call transfer rates through smart routing, automating informational questions such as “where do I find my latest invoice?”, and performing transactions on a customer’s behalf like blocking a card or tracking an order.

Similarities in voice and chat

A common question I hear from companies with existing chatbot projects is how much of their existing content is reusable when adding voice as a channel?

The simple answer is that a lot of existing work can be utilised, including not just the content itself, but also training and test data, as well as multi-language support.

Companies expect a level of enterprise-grade scalability when it comes to these kinds of projects, so it wouldn’t make sense (or be cost-effective) for this work to have to be done again. Let’s take a look at what exactly can be transferred over:

Training data

This is used to train the virtual agents or voicebots to understand user requests and, although a minimum amount of data can be used to successfully train a bot, when organisations have thousands of intents (topics and questions that conversational AI can respond to) it is beneficial to re-use these when adding a new channel.

With advanced NLU, and selecting the correct speech recognition technology, a significant saving (in time and money) can be had by avoiding duplicate work.

We have seen this successfully implemented by the Roskilde municipality in Denmark, whose existing chatbot with over 6,000 intents was converted into a fully-featured voicebot helping citizens with inquiries over the phone.

Klaus Bjørn Larsen, a project leader on Roskilde’s chatbot had this to say about the process: ”We saved a lot of work reusing our training data from our existing chatbot. Initially, we had concerns of how the users phrase questions differently using chat versus voice and didn’t know how the NLU would manage that, but we took a chance and our AI trainers were positively surprised. As long as the speech-to-text service is robust enough it seems like the NLU isn’t affected much by the origin of the input.”

Intents

This is somewhat dependent on the scope of the bot, and if use cases for both chat and voice are the same. Voice and chat projects may share a why to their reason for implementation, but often they serve different purposes in how they assist users.

Re-using intents is a good place to start, though it has been suggested 80% of requests may be for just 20% of available content, a smarter virtual agent can understand the user even if a request is outside of its scope.

During Finnish municipality, Porvoo’s move from chat to voice, Sami Hiitti, a Senior Analyst at Accenture Finland shared some findings: “New answers for voice assistants can be easily created to existing [chat] conversation flows, by using separate filters which are then connected to phone numbers provided for voice. This also allows multiple voice assistants to exist on the same instance since each filter can be attached to different numbers.”

This illustrates that enterprises can have voice, chat and additional channels built on the same content with answers given that are best suited to the channel of interactions.

Content

The reason responses or actions performed differ between channels is to enhance the customer experience. For instance, we don’t want a voicebot reading a long URL, or trying to display an image if the user is calling on the phone.

Some chat responses can be reused, and often are as a foundation for voice, but there are best practices to follow in which we would expect to remove visual content, shorten responses to allow users to better absorb information, and focus a little more on how the content will sound, rather than how it looks.

In addition, voice design includes considerations such as Speech Synthesis Mark-up Language (SSML) - do we read buttons out loud or hide them? - and often an increase in conversation recovery.

Klaus from Roskilde says: “We put most of our effort into “voicifying” our chat answers. Smileys, long texts and complex links don’t match good VUX. Still, the core message content was there, so even though it required quite an effort, it wasn’t nearly comparable to the kind of resources it would require to start from scratch.”

Voice-specific considerations

While there are many synergies in using the same foundation to build a chatbot and voicebot in parallel, there are also some voice-specific things to consider:

Testing

A good practice for any conversational AI project includes Wizard of Oz testing, reading the prompt aloud and collaboration. This is especially the case with voice, it’s crucial to test, test and then test some more.

Speech Synthesis Mark-up Language

SSML helps to provide a more customized synthesis (text-to-speech) response. There is a lot of great content to check out in this area: Falene Mckenna has a popular SSML University series explaining some nice tips, Kane Simms shares when to use SSML in his blog, and, ultimately, I would recommend the W3C recommendations on SSML for further reading.

Speech services

Primarily the difference between chat and voice is adding speech services. It is vital to use the providers in this area which best fit your needs, particularly when it comes to transcription - this is important! Good NLU is also crucial, and in some cases can save poor transcription. But poorly transcribed speech will almost certainly decrease the quality of the experience. I have dug deeper into speech-to-text selection in a previous blog which you can check out here.

Closing thoughts

In our work at boost.ai in deploying a number of high-profile voice projects over the past 18 months, I have experienced many smooth transitions of existing chatbot projects of enterprise-level organisations to successfully adding voice as a new channel with minimal friction and comparable functionality.

Not only that, but with today’s advances in technology the ability to move the other way - adding chat to voice - is equally as seamless, as is embarking on a brand new voice project from scratch.

The technology to achieve once-unthinkable levels of automation and customer service via voice is already here today and I’m excited to see what the future holds as we continue to make waves in this channel.