By Samuel Danby, Project Manager Voice, boost.ai
Thanks to advances in artificial intelligence businesses can finally say ‘goodbye’ to frustrating legacy IVR systems
When Interactive Voice Response (IVR) systems became popular in the ‘90s, they were set to revolutionize customer service. They promised consumers a Jetsons-like experience of using nothing but one’s own voice to get fast, efficient answers to their questions. The truth, as anyone who has ever gotten lost in the maze-like menus of an IVR system will tell you, is far more frustrating. The trouble with these legacy IVR systems is that they ultimately rely on specific keywords and prompts in order to advance through their (already complex) menu systems, with very little room for error. It’s no wonder that other customer service channels such as email, social media and instant messaging have become increasingly popular amongst consumers when reaching out to the brands they care about.
While the idea behind IVR is solid, it was perhaps a little ahead of its time. With recent advances in artificial intelligence, however, we can finally start to make the self-service dream of a fully-automated, voice-controlled telephony system a reality. At boost.ai, we have been working on a new technology that we are calling Conversational IVR - a combination of the proprietary natural language technologies that power our chat-focused virtual agents combined with various third-party text-to-speech and speech-to-text services. Conversational IVR gives users the flexibility to speak naturally to an automated system when they contact a brand by phone, not having to guess at keywords to perform a transaction or become frustrated when trying to get transferred to a human agent. More often than not, when a customer calls a company for help it is usually as a last resort after having already tried to search their website for information, so it’s essential to get the interaction right.
Understanding the nuances of language is not an easy task for a computer to handle. It requires complex algorithms and the team at boost.ai has spent countless hours perfecting our NLU and NLP engine to ensure that our virtual agents can consistently achieve resolution rates of around 90% when interacting with end-users via chat. The human voice, however, is even more tricky to get right. Not only are you dealing with different accents and dialects, but there’s also the issue of background noise to contend with. Humans don’t always understand each other when speaking face-to-face, so fine-tuning of a conversational flow for voice is highly important.
When interacting with a business (whether it’s a bank or a florist), customers tend to ask the same sorts of questions. Varied accents, dialects and mispronunciations, however, mean that keywords and basic prediction scoring alone aren’t enough to accurately parse a customer’s intent, especially if you hope to handle large volumes of interactions. It’s easy enough to design a demo that shows off an IVR system answering only two or three questions, but when things increase in complexity (think hundreds of topics) things can quickly begin to fall apart if the underlying technology isn’t robust enough.
Due to the more complex nature of voice, implementing a conversational IVR solution requires more testing than a chat project. We have established a number of best practices to follow:
Once you have established a benchmarking tool, you can use it to create a spreadsheet as a basis for an analysis of which STT service is suitable for a particular project. Results can vary depending on the information required and, as the complexity increases, this often eliminates unsuitable candidates.
Below are examples of testing we have performed on popular STT platforms. We have purposely kept their names anonymous.
The first looks at how STT services understand languages that have more than one dialect. In this case, Bokmål and Nynorsk are Norway’s two official written languages, however they are both spoken differently.
The second assesses the accuracy of STT transcription against the increase of background noise and how well users can be understood in noisy environments such as a train station or restaurant.
The accuracy of STT services becomes evident when testing with audio files. It highlights the importance of customizing projects to allow each organization to get the most value out of using conversational IVR.
Note: Decibel values for background testing have not been noted. This article is designed purely to highlight the importance of carrying out effective benchmarking, not to suggest a maximum background noise level that stops speech-to-text from functioning.
Using a platform that offers the flexibility to use every channel that an organization may wish to implement has been, in our experience, essential to building functioning conversational IVR conversation flows. The ongoing changes required to these flows that would occur over time would otherwise be labor-intensive and decrease the ease of effectively scaling the solution as the knowledge base expands.
In the following video, you can see a brief demonstration of the capabilities of conversational IVR: