Introducing Vectara’s Chain Rerankers

Today, we’re excited to release a new feature in Vectara: the ability to chain different rerankers. For example, you can now take your standard retrieval results from Boomerang, increase the diversity of results with our MMR reranker, rerank them with our state-of-the-art multilingual reranker, and then make sure to bias the newest results to the top using Vectara’s user-defined functions. To talk about why this is important, it’s worth doing a recap of all of the different types of rerankers Vectara offers first and how chaining them can add even more value and flexibility.

Rankers Overview

Vectara’s default and first neural ranking system is Boomerang. Boomerang is an extremely fast embedding model and focuses on natural language questions/retrieval: often producing results in tens of milliseconds.
We then combine our neural retrieval with a traditional keyword retrieval system through our Hybrid Search capability. We do this because Boomerang, like all dense vector models, is incapable of understanding words and phrases it was not trained on. Many vendors try to resolve this gap by engaging in expensive and time-consuming “fine-tuning” of the models to train them on new data, which also means they also need to be re-fine-tuned every time a new product ID or barcode or similar is created. Using a keyword-system doesn’t have these problems: it’s fast and cheap to index (less than a few seconds) and blending it with Boomerang means great retrieval every time.
We provide a cross-attentional neural reranker which internally we have called “Slingshot.” Slingshot is much slower than Boomerang, but also much more accurate. We recommend running it on just the top ~100 results from Boomerang. This allows a great performance balance: using Boomerang to select very good results very quickly and then Slingshot to provide the best possible results but only spend its time on high-quality candidates.
Our MMR reranker is focused on increasing the diversity of results so that you don’t see the same type of information (e.g. duplicates) over and over near the top (even if it’s the most relevant) and it also allows the generative LLM to consider more diverse viewpoints and answers to a question.
Finally, our user-defined functions (UDF) reranker lets you insert your own custom business logic into the ranking system. For example, some customers want to remove results that are out of stock or promote special products, or bias newer or higher margin products/results and you can do that all with this reranker.

The above is all about giving you control over how your data is retrieved because in a retrieval augmented generation (RAG) system, the LLM really can be so much better (faster, cheaper, higher quality, more thorough, less biased, and more) when you’re able to give it hints to the best information to look at before it ever gets to the LLM.

Our out-of-the-box prompts take this benefit into account: we tell the LLM to give a higher weight toward the top results.

Chain Reranking

The list of rerankers we provide is each useful on their own, but there are many times when it’s useful to implement more than 1. For example, some users want to use our cross-attentional reranker first and then the UDF reranker to boost recent documents even further toward the top. Others want to do the opposite: eliminate out-of-stock results from the eligible list of results for the cross-attentional reranker. Now you can do these and many more with the chain reranker.

To use the chain reranker, you can use the Advanced Query API and specify a chain type for the reranker. For example, as seen in our docs on the Chain Reranker:

This would use our multilingual cross-attentional reranker first (which has reranker_id rnk_272725719) and then force a popularity bias on top of the score.

Limiting Results in a Chain

Along with the new Chain Reranker, we’ve also added the ability to limit the number of results that come out of each reranker in the chain. Using this capability, you can tell Vectara to “completely eliminate this result from the result set” or “do not allow more than the following number of results to be passed to the next reranker.” This can be useful for a variety of reasons, including:

You can improve the performance of your query by making sure rerankers aren’t looking at results you want to eliminate anyway
You can provide even more strict filtering from your user-defined functions: not allowing “bad” results according to your business logic to be passed along to the next step at all
You can force certain results to be considered by the generative LLM by matching your rerank limit to the max_used_search_results

In order to make use of this new limit feature, you need to pass a “limit” object to each object in the reranking chain. To take our previous example and add new limits:

Conclusion

In this blog, we introduced Vectara’s new chain reranking functionality. This reranker can enable you to completely customize the functionality of Vectara to your application’s needs by giving you absolute control over the ranking functions.

For the latest documentation on the chain reranker and how to use it, have a look here.

As always, we’d love to hear your feedback! Connect with us on our forums or on our Discord or on our community. If you’d like to see what Vectara can offer you for retrieval augmented generation on your application or website, sign up for an account!