Blog

Conversational IVR and the dawn of intelligent automated phone support

July 31, 2020

By Samuel Danby, Project Manager Voice, boost.ai

Thanks to advances in artificial intelligence businesses can finally say ‘goodbye’ to frustrating legacy IVR systems

When Interactive Voice Response (IVR) systems became popular in the ‘90s, they were set to revolutionize customer service. They promised consumers a Jetsons-like experience of using nothing but one’s own voice to get fast, efficient answers to their questions. The truth, as anyone who has ever gotten lost in the maze-like menus of an IVR system will tell you, is far more frustrating. The trouble with these legacy IVR systems is that they ultimately rely on specific keywords and prompts in order to advance through their (already complex) menu systems, with very little room for error. It’s no wonder that other customer service channels such as email, social media and instant messaging have become increasingly popular amongst consumers when reaching out to the brands they care about.

While the idea behind IVR is solid, it was perhaps a little ahead of its time. With recent advances in artificial intelligence, however, we can finally start to make the self-service dream of a fully-automated, voice-controlled telephony system a reality. At boost.ai, we have been working on a new technology that we are calling Conversational IVR - a combination of the proprietary natural language technologies that power our chat-focused virtual agents combined with various third-party text-to-speech and speech-to-text services. Conversational IVR gives users the flexibility to speak naturally to an automated system when they contact a brand by phone, not having to guess at keywords to perform a transaction or become frustrated when trying to get transferred to a human agent. More often than not, when a customer calls a company for help it is usually as a last resort after having already tried to search their website for information, so it’s essential to get the interaction right.

Successful automation requires the right engine

Understanding the nuances of language is not an easy task for a computer to handle. It requires complex algorithms and the team at boost.ai has spent countless hours perfecting our NLU and NLP engine to ensure that our virtual agents can consistently achieve resolution rates of around 90% when interacting with end-users via chat. The human voice, however, is even more tricky to get right. Not only are you dealing with different accents and dialects, but there’s also the issue of background noise to contend with. Humans don’t always understand each other when speaking face-to-face, so fine-tuning of a conversational flow for voice is highly important.

When interacting with a business (whether it’s a bank or a florist), customers tend to ask the same sorts of questions. Varied accents, dialects and mispronunciations, however, mean that keywords and basic prediction scoring alone aren’t enough to accurately parse a customer’s intent, especially if you hope to handle large volumes of interactions. It’s easy enough to design a demo that shows off an IVR system answering only two or three questions, but when things increase in complexity (think hundreds of topics) things can quickly begin to fall apart if the underlying technology isn’t robust enough.

Customization is key

Due to the more complex nature of voice, implementing a conversational IVR solution requires more testing than a chat project. We have established a number of best practices to follow:

  • Start by identifying your use-cases. Selecting what challenges you would like your IVR system to solve can give the project focus. Are you replacing a legacy system or are you brand new to the technology? What queries are customers calling most often to solve?
  • Once you have defined your use-cases, it is vital to do some testing. We recommend recording audio files of some of the most common questions your customers ask, focusing on tone of voice, the different languages used and other ways in which they might interact with your brand over the phone.
  • Remember that every project is unique. We use benchmarking tools to transcribe audio files when deciding which speech-to-text (STT) service is best. We upload multiple audio files that clients (or partners) have recorded and, depending on the language, select the services available that allow the tool to transcribe what it hears.
  • During testing, changes to the conversation flows will be continuous when you begin to get data from customers and how they interact with your IVR system. Having selected the STT and TTS (text-to-speech) services that are best for your project, you can then decide which IVR partner is most appropriate. There is a substantial benefit in using a vendor that offers flexible STT and TTS; otherwise, the analysis of audio files may provide little value. Simple integrations with an IVR service provider to the platform you are building your conversations on makes for smooth sailing. It can begin full testing regardless of whether it is a proof-of-concept or a live pilot with customers.

Our benchmarking results

Once you have established a benchmarking tool, you can use it to create a spreadsheet as a basis for an analysis of which STT service is suitable for a particular project. Results can vary depending on the information required and, as the complexity increases, this often eliminates unsuitable candidates.

Below are examples of testing we have performed on popular STT platforms. We have purposely kept their names anonymous.

conversational-ivr-benchmarking-1

The first looks at how STT services understand languages that have more than one dialect. In this case, Bokmål and Nynorsk are Norway’s two official written languages, however they are both spoken differently.

conversational-ivr-benchmarking-2

The second assesses the accuracy of STT transcription against the increase of background noise and how well users can be understood in noisy environments such as a train station or restaurant.

The accuracy of STT services becomes evident when testing with audio files. It highlights the importance of customizing projects to allow each organization to get the most value out of using conversational IVR.

Note: Decibel values for background testing have not been noted. This article is designed purely to highlight the importance of carrying out effective benchmarking, not to suggest a maximum background noise level that stops speech-to-text from functioning.

Final thoughts

Using a platform that offers the flexibility to use every channel that an organization may wish to implement has been, in our experience, essential to building functioning conversational IVR conversation flows. The ongoing changes required to these flows that would occur over time would otherwise be labor-intensive and decrease the ease of effectively scaling the solution as the knowledge base expands.

In the following video, you can see a brief demonstration of the capabilities of conversational IVR: