Thanks to advances in natural language understanding, conversational AI-powered virtual agents are now able to tackle difficult languages such as Finnish with ease
Look up any list of the world’s hardest languages to learn and you will invariably find Finnish nestled somewhere near the top. While the written form of this unique Nordic language is (somewhat) similar to English, grammatically it takes things to a whole new level.
Due to a mixture of compound words and tricky conjugation, learning Finnish is a herculean task for any non-native speaker. For artificial intelligence, it is equally taxing to decode, requiring a range of complex processes in order to have a virtual agent arrive at a correct response.
At boost.ai, our conversational AI uses a variety of natural language understanding techniques (some common and others proprietary) to make sense of the many permutations that words can take in a language as grammatically complex as Finnish. Below we outline some of the steps our algorithms go through in order to make languages that were once considered impossible, possible.
Finnish words have a staggering number of possible conjugations. For something as simple (at least in English) as the words ‘car’, ‘insurance’ and ‘invoice’ there are any number of alternatives available in Finnish. Here are just a few examples":
In order to parse such a large number of possibilities, conversational AI goes through a process called ‘stemming’ which essentially reduces the conjugated word down to its root form so the algorithm doesn’t have to be taught each variation. This is illustrated above by the words highlighted in black.
Another common difficulty with Finnish is the language’s high number of compound words. Rather than writing each word individually, it’s often the case that multiple words are joined together to form one longer word. Below we can see just some of the many variations of the compound words for ‘car insurance’ and ‘insurance invoice’:
Compound words are common in many European languages such as German or Norwegian, but combining them with such a high number of conjugation possibilities makes Finnish especially difficult. You would first need to teach the algorithm each individual word with all its conjugations, and then move on to the corresponding compound words with all of their different conjugations. The result is an enormous amount of work just for the algorithm to learn every permutation.
To solve this, conversational AI is able to perform a process called compound splitting. This allows the algorithm to disassemble compound words into their composite parts so that it only needs the base words (in their stemmed form) in order to accurately interpret user input.
Another important piece of the puzzle is conversational AI’s ability to perform advanced spelling correction. Our solution is able to identify and repair mistakes in the spelling of complex compound words, greatly reducing the chances of error.
This combination of stemming, compound splitting and spelling correction is the part of our technology that is the key to decreasing the workload needed to crack Finnish and other similarly complex languages. Thanks to these processes, we only need to feed the algorithm a total of eight words (instead of over 150!) in order to teach it to understand the Finnish for ‘car’, ‘insurance’, ‘invoice’, ‘car insurance’, ‘insurance invoice’ and ‘car insurance invoice’.
Automatic semantic understanding
Simplifying a language down to this level then allows us to layer our proprietary Automatic Semantic Understanding (ASU) technology on top for an even deeper level of understanding. ASU has the capacity to supercharge a virtual agent to the point where it can do everything from handle multiple intents in the same request, understand insanely complex queries that would trip up lesser solutions and even eliminate false positives by up to 90% in some cases.
For humans, learning a new language will always be difficult. But for conversational AI, once you have dealt with these core challenges, even the most complex language can be simplified, becoming no more difficult to decode than English or Norwegian.