A hybrid LLM chat experience

Lars Ropeid Selsås, Co-Founder of Boost.ai
Last updated 28 November 2023
Technology

Boost.ai's Hybrid NLU revolutionizes enterprise virtual agent development by combining large language models with its intent engine for unparalleled accuracy, flexibility, and cost-effectiveness.

In chatbot development, achieving the right balance between efficiency, accuracy and cost-effectiveness is essential. I believe that with the recent launch of GPT-4, it is possible to harness the generative capabilities of large language models (LLMs) and combine them with an intent hierarchy to create a powerful and versatile hybrid solution for enterprise virtual agents.

At boost.ai, we have developed an innovative system that capitalizes on both LLM intent detection and our own state-of-the-art intent engine. We are calling this Hybrid NLU, and it works by first querying the intent engine to determine if it has a suitable response to the user’s question. If not, the LLM is consulted for an answer. This approach leverages the strengths of both technologies, ensuring greater accuracy and customization while minimizing costs.

LLMs vs. boost.ai intent models: advantages and challenges

LLMs offer several benefits, such as generating predictions without requiring training data or model training, leading to faster development with less effort. However, standalone LLMs tend to be slower, costlier and less accurate compared to our intent-based models. Fine-tuning an LLM can also be challenging when it fails to function correctly. Despite these challenges, LLMs excel in certain situations, like addressing questions that significantly deviate from a virtual agent’s trained scope.

At boost.ai, our intent models have numerous advantages, including scalability, high accuracy and easy fine-tuning. They still require training data, however, which can slow down the building of larger models and limit their general understanding compared to LLMs trained on very large datasets.

So, what if instead of having to choose between boost.ai or LLMs, we made it possible to harness the power of both technologies together?

Synergy of a hybrid approach

One of the significant advantages of Hybrid NLU is the flexibility it provides for both new and existing customers. They can seamlessly integrate it without having to choose between one technology or the other. This eliminates the risk of implementing a single solution only to find it falls short and having to start the process again. By using both technologies in parallel, customers can take advantage of the strengths of each system and adapt to their unique needs.

Fig. 1: The benefits of boost.ai and LLM intent prediction

Hybrid NLU enables enterprises to take advantage of the rapid development and general understanding provided by LLMs while also leveraging the fine-tuning capabilities of our platform. This combination optimizes high-traffic areas – such as a company’s most frequently asked questions – by utilizing the intent model. It has numerous benefits (see Fig. 1), including faster response times and reduced costs due to a decrease in LLM requests.

If the boost.ai intent engine manages, for instance, 80% of the traffic, you would only incur LLM costs for the remaining 20%. This means that for every dollar spent on a solely LLM-based model, the expense would be reduced to just 20 cents.

Research and experimentation

Our research on a challenging dataset* showed that our engine correctly identifies intents about 94% of the time, while the large language model achieves roughly 79% accuracy. When combining the two, we can surpass the performance of either model on its own, opening up the possibility for even higher accuracy levels. Note that with our intent engine, we actually don't take out the maximum accuracy and sacrifice some of it to reduce false positives as they are very bad for the user experience giving us a slightly lower number than this, in practice. Dealing with false positives seems significantly more difficult with the LLM-based approach.

We also experimented with using embeddings, a tool that simplifies machine learning tasks by converting large inputs into smaller numerical lists, making it easier for computers to understand and process them. With this approach, we found that performance was not up to our standards. Specifically, we achieved an accuracy of only 57% with a one-shot attempt, which improved slightly to 62% when using all of the training data. These accuracy levels are too low for us, but they do highlight the difficulty of improving embedding performance by simply increasing the amount of data used to train them.

Despite these challenges, embeddings do have some benefits. They appear to be effective at narrowing down a large list of intents to a smaller one containing only the top intents. For example, when we selected the top 50 intents from the embedding, we found that the correct intent was included 93% of the time with a single-shot attempt, and 98% of the time when using all of the training data. When we increased the number of top intents to 100, the accuracy improved to 95% for single-shot attempts and, impressively, over 99% when using all of the training data. This shows that embeddings can be useful for reducing the number of intents needed to be sent to the language model, which can improve the overall cost of the LLM approach.

Minimizing false positives and integrating search

To further illustrate the strengths of both boost.ai and LLM intent prediction, we can look at the following example: If a user says “I am not able to replace my card,” it can be hard to fine-tune an LLM to recognize that it should point to a ‘problems ordering card’ intent, instead of a ‘card not working’ one. In this case, our intent engine is highly effective in guiding the model’s behavior. On the other hand, if the user states “I lost my wallet,” – something the virtual agent has not been trained on – and the appropriate intent is ‘block card’, an LLM-powered bot with a broader understanding may be better at connecting the inquiry to the correct intent.

A key aspect of the success of Hybrid NLU lies in our conversational AI’s ability to greatly minimize false positives. With fewer false positives, the LLM has more chances to help when the intent engine fails to provide a sufficiently confident response. The result is more flexible intent management, with some fully trained parts and others that rely on the LLM’s general understanding without any training data.

Fig. 2: Adding a third layer to the hybrid approach with the Boost Automator

In addition to this two-layer approach, we also offer a third layer to our hybrid solution that doesn’t rely on intents at all. By analyzing a website, knowledge base, or other sources with our Boost Automator, we can leverage a combination of search technology and LLMs to locate the correct information. We then merge this information with the question in the LLM to generate a response. This enables us to cover large amounts of knowledge by simply directing our solution to the right data.

This zero-shot method excels in rapidly addressing a broad spectrum of information, even if its precision might not match the intent-driven strategy. For managing transactions or intricate, structured dialogue sequences, intent-focused layers are essential.

In essence, virtual agents built on our platform can now be seamlessly upgraded with generative AI, boosting accuracy and filling knowledge voids.

By combining LLMs with our existing intent engine, Hybrid NLU give you the best of both worlds: the ability to build quickly without training data, while also being able to fine-tune specific areas when needed, ensuring the model behaves exactly as desired, and with increased accuracy.

*Research conducted against existing dataset of large banking client