How Vectara’s New Boomerang Model Takes Retrieval Augmented Generation to the Next Level via Grounded Generation
Learn about the benefits of Vectara’s new embedding model “Boomerang,” including how our version of Retrieval Augmented Generation (RAG), called Grounded Generation, is smarter than other systems
September 28 , 2023 by Shane Connelly
We’re happy to announce the release of Vectara’s new embedding model, Boomerang. In this blog post, we’re going to dive into what that means, and why it’s important: both from a traditional search perspective as well as a Grounded Generation (or Retrieval Augmented Generation) perspective.
(Note: If you want a more technical deep dive into Boomerang with performance metrics, check out “Introducing Boomerang – Vectara’s New and Improved Retrieval Model“)
What is an embedding model?
When we talk about search (the “retrieval” in retrieval augmented generation), it’s important that the retrieval engine finds the right things. Traditional keyword-based systems do this by following language rules like stemming, synonym expansion, and other language-dependent rules, and then using an inverted index to retrieve the related text. For example, let’s say we were to index the following quote:
“You may grow old and trembling in your anatomies, you may lie awake at night listening to the disorder of your veins, you may miss your only love, you may see the world about you devastated by evil lunatics, or know your honour trampled in the sewers of baser minds. There is only one thing for it then—to learn. Learn why the world wags and what wags it. That is the only thing which the mind can never exhaust, never alienate, never be tortured by, never fear or distrust, and never dream of regretting.”
I’ve highlighted in bold the terms/phrases to pay closest attention to in this example. Traditional keyword systems could allow you to find this quote if you searched for “learn” or “wag,” but it may miss a search for “honor” because “honour” used a British spelling and would very likely miss a search for “what should I do if Earth is overrun by madmen?” because Earth isn’t keyword-equivalent to “the world” and “madmen” isn’t keyword-equivalent to “evil lunatics.”
This is where a high-quality embedding model can really shine.
An embedding model is a deep neural network (“model”) that is trained to map concepts to a vector representation so that similar concepts can be found by looking for similar vectors. Those concepts don’t have to have any keyword overlap at all: by training them on a wide variety of data, they can actually learn that honor and honour are semantically equivalent when used in a similar context, that a phrase like “evil lunatics” is semantically similar to “madmen” and even allow for these mappings to happen across languages. This makes them particularly well suited to handle semantic search.
When we talk about the embedding model in Vectara – and now Boomerang specifically – we’re talking about the part of the system that is capable of this semantic understanding.
Why does it matter? Smarter systems
The quality of the embedding model has wide-ranging effects. The way to think about it is really how “smart” the system is at understanding the task at hand and being able to handle variations in human language. For example, it can affect:
- How many – and which – languages can be ingested and retrieved
- How tolerant the system is to typos and spelling variations
- How well the system understands synonyms and phrases
- The ability to understand and stay up to date on both idioms and popular culture
- How well any scores can be interpreted, which can help answer questions like: “Is a score of 0.2 good or bad? Should I show it at all?”
In the context of Grounded Generation or Retrieval Augmented Generation, the system first performs the retrieval step by using the embedding model and then hands the information it retrieved off to the generative AI model for summarization or other actions. This means that the accuracy of the summarization is fully dependent on how good the retrieval is.
What does Boomerang allow me to do that Vectara didn’t support before?
We’re really proud of Boomerang because of how much it has shown to improve both search-focused use cases and generative AI capabilities through Grounded Generation. A few things that you’ll notice when using Boomerang are:
- Your results, in general, should be much higher quality. If you’re using Vectara to provide search capabilities to your application, the top results should answer user questions even better.
- You’ll have accurate results in many more languages, including the ability to perform cross-lingual searches. Want to allow your Greek-speaking users to find information that was written in Welsh and summarize it back to them in their language of choice? Boomerang can do that. With Boomerang, Vectara now supports hundreds of languages and dialects, whereas the original model only supported about 15.
- While we’ve always been proud of having few hallucinations on the Vectara platform, Boomerang’s ability to more directly surface relevant answers to the generative model means even fewer hallucinations.
- Vectara should reply even less often that it “does not know” the answer to a user question in its generative responses. This is because it should have even more accurate information surfaced to its generative response system via the more accurate retrieval.
- The structure of the model means that we’ll soon be able to provide guidance to our users on what the scoring of any individual result actually means. And even better – we’ll have the ability to automatically cut off and stop returning irrelevant results.
Boomerang is highly tuned for retrieval tasks, and based on our testing, as well as the testing of several of our customers, it outperforms existing publicly available models for search and grounded generation tasks.
How can I try Boomerang?
We’ve rolled out Boomerang to all new and existing accounts. If you’re a new user to the Vectara platform, there’s nothing for you to do to take advantage of Boomerang: just create a new corpus and it will automatically use Boomerang. If you already had an existing account, we’re continuing to maintain our legacy encoder for the time being for you, so you will need to manually select Boomerang as the encoder when you’re creating a new corpus. With Vectara, we handle the vector database, indexing, embedding, and hosting end-to-end, so all you need to do is upload your data and start using it!