February 9, 2023 by Shane Connelly | 6 min readRead Now
Neural rerankers are powerful models for fine-tuning search results and making the top results even better, no matter what the underlying ranking algorithm is.
September 6, 2022 by Shane Connelly
When most users get started with search, they tend to think of a single relevance model or algorithm. For example, you might hear that a system uses TF/IDF or BM25 or some particular dense-vector-based model. In general, no single algorithm or model is perfect: there are tradeoffs between different approaches. Some are considerably faster while others are much slower. Keyword-based systems tend to require a lot of human intervention for handling synonyms and phrases while some vector-based systems handle understanding of human language better, but may make trade-offs of lower precision. Even within the context of specific algorithms, the specific way the configuration is tuned can make a big difference on a variety of factors.
The idea of reranking is to take the best characteristics of each of these models and apply them at an appropriate stage in the query pipeline to get the fastest, most relevant results in a single go.
For most text search use cases, first and foremost you need to make sure potentially relevant results are even showing up in the result set. Without this high recall, users stand no chance of finding the most relevant documents unless they’re looking for an exact word or phrase that’s guaranteed to be in the index.
If you’ve used a keyword-based search system before, this concept might already be familiar with you. Synonyms, stemming, decomposition, n-gram tokenization, normalizing the case (e.g. lowercasing or removing accents) of all characters, concatenating multiple terms together (sometimes called “shingles”), spelling correction algorithms like Levenshtein distance, and phonetic algorithms such as Soundex and Metaphone are all common approaches that you may have used on a keyword system to increase recall. For example, when your users search for “resume” and your documents have “Résumés,” “CVs,” or “Curriculum Vitae,” some combination of stemming, synonyms, shingles, and normalization might help the document to match even without an exact phrase match.
All of these keyword algorithms are intended to bypass the fundamental limitation of keyword systems which is that they don’t have any innate understanding of human language. They rely on a search administrator to understand and provide the rules for each language and use case. By using cutting-edge neural networks that don’t rely on language-specific algorithms, Vectara has a built-in understanding of human language and by default achieves very high recall across a wide set of languages, including common user mistakes like typos and word variations. In fact, the high recall without any configuration is a huge advantage to using a zero-shot vector model (like the one Vectara has created and builds in) for the initial retrieval step.
Precision is on the flip side of the search coin to recall. The way to think about the importance of precision is that for most use cases, just like it’s a bad experience to not have a good result show up anywhere in the list, it’s also a bad experience if the best result is buried on the 100th page. For many use cases, having good precision means a very good result shows up “above the fold” – typically in the first 6-15 results depending on the design of the way results are shown to users, the type of documents, quality of the highlighting/snippet extraction, and a few other variables.
Making sure you have very precise results comes down to giving each document in the candidate set a high-quality final ranking or result score. These scores can vary from the initial high-recall algorithms by a variety of factors. For example, you might give a higher score to an exact match – or even better, an exact phrase match – than just a set of candidate keywords matching. Continuing with our original keyword example and algorithms, “résumés” is soundex equivalent to “risen,” has >85% token-based n-gram based overlap with “presume,” and has the same stemming as “resumed” (after character normalization), all of which are far less likely to relate to the user’s search of “resume.” So it’s common in keyword systems to have operators boost exact phrase matches, limit what languages can go into a single index/corpus to maintain consistent term weights, boosting specific terms, running multiple text analyzers to produce different weightings to combine together, or in some recent cases, boost with the scores from LLM-powered search platforms like Vectara.
As was the case of recall, neural search platforms can sidestep all of this manual configuration by simply having a natural understanding of human language through the use of deep neural nets. There is an important element at this step though: Vectara has an even more precise model in the form of a “reranker” that can be used to fine-tune the results and make the top results even better. At the time of this writing, this is available only for English language documents, but check back in the future and let us know if you’re looking for other languages. The accuracy of this reranker comes at a cost: it’s slower. If it had to operate on a very large document set, Vectara would feel slow – with latencies in the hundreds of milliseconds or worse. However, because it’s reranking “just” the top documents to give the highest precision, Vectara is able to offer extremely precise results and still maintain
A natural question would be “why not score all documents with the most precise model if it’s available?” The answer to that is performance. While it’s possible to have both high precision and high recall with a single algorithm, high precision tends to be computationally expensive, so there tends to be a tradeoff between high precision (over a large set of documents) and high latency. This is true for the vast majority of search systems: both dense-vector-based and traditional/keyword. By coming up with a very fast initial selection of “likely good” candidate documents and then coming up with the final score for only plausibly-relevant documents, you’re able to get the best of both worlds: search results that can be measured in milliseconds while also having very high recall and high precision.
In order to have both high precision and high recall, the search industry is starting to accept that vector-based systems are needed across the entire search pipeline: at the initial high-recall stage to handle the variations in the ways humans speak and write, and at the high-precision stage to accurately reflect the importance of relevance from things like phrases. Many systems only allow reranking via dense-vector systems, and thus miss out on the very important benefits of their recall benefits.
In order to achieve both – and in order to maintain the millisecond-order performance search users have come to love and expect, you can start searching with high-quality zero-shot models straight away and easily enable reranking via a single API parameter. Have a look at our docs for more info!