Introducing Knee Reranking: smart result filtering for better results

Today, we’re excited to announce that we have released our new Knee Reranking capability in Vectara. This feature allows Vectara to automatically filter out irrelevant or low-quality results from queries as well as preventing them from being sent to the generative LLM. This ensures that only the most relevant results make it to the final output, enhancing the quality of your results while reducing unnecessary latency, costs, and hallucinations.

In this blog we’ll dive into how it works and how you can easily integrate it into your Vectara applications.

Why result filtering is critical

Retrieval Augmented Generation (RAG) systems – and more generally most search/retrieval systems – struggle with determining optimal cutoff points for query results. However, it’s important that these systems can cut off irrelevant results for a number of reasons:

Users get confused with why they're seeing some results – especially when using neural embedding models where there might be little to no keyword overlap in what they searched for.
The more data you send to a generative LLM, the higher the latency it responds with, the more expensive it is, and also the more likely it is to get “confused” with the excess information and have a hallucination in its response as it gets overwhelmed by irrelevant information.

While traditional methods like fixed score thresholds (“cut off results below a score of 0.5”) provide a simple solution, they fail to adapt to varying score distributions across different queries and they fundamentally can’t work in a hybrid search system where e.g. BM25 scores can be unbounded. Vectara's Knee Reranking addresses these challenges by automatically detecting natural boundaries between relevant and irrelevant results.

This feature is called Knee Reranking because in many real queries, the results look a bit like a “knee” where “good” results have scores that descend slightly and then you get to a point where the “bad” results scores decrease faster or have a sharp drop-off from the good results. The following is a bit of an exaggerated view of what the result score progression may look like:

This new addition to Vectara's chain reranking combines statistical analysis with configurable parameters to identify this natural boundary to provide intelligent, adaptive filtering. Designed specifically to work after the Slingshot reranker (Rerank_Multilingual_v1), it analyzes score patterns to identify significant drops in relevance while maintaining safeguards against over-aggressive filtering.

Knee Reranking represents a significant advancement in RAG result filtering, offering improved precision (making sure the top results are very good) without sacrificing recall (making sure there are enough good results). By automatically adapting to each query's unique characteristics, it helps deliver more focused and relevant results while maintaining high performance.

Core concepts and parameters

Vectara's knee reranking employs a dual-analysis approach combining global regression analysis with local pattern detection. At its core, it uses the L-Method enhanced with quality ratio comparisons and decay factors.

There are 2 parameters that control how Vectara detects potential knees in the data:

sensitivity

This parameter controls the knee detection threshold: how dramatic the “knee” (change in relevance scores) needs to be in order to be detected. Valid values range from 0 to 1:

0: Detects subtle changes in relevance
1: Only detects dramatic shifts

By default, sensitivity has a value of 0.5, which balances these two.

early_bias

This parameter controls whether Vectara is more likely to detect the "knee" at the higher-ranked results or if it should treat all result positions equally. Valid values range from 0 to 1:

0: Equal consideration to all results
1: Strong preference for earlier (higher-ranked) results

By default, early_bias has a value of 0.2, meaning it slightly prefers a knee towards higher results, but doesn't ignore lower-ranked results entirely.

Both parameters work with the score cutoff feature when it’s present, taking the earliest of either the knee point or cutoff position.

Enable Knee Reranking in Vectara

By default, the knee() function in Vectara has a sensitivity value of 0.5 and an early_bias of 0.2. However, you can pass different values. For example, to set sensitivity to 0.4 and an early_bias of 0.3:

Best practices and usage guidelines

Knee Reranking should follow Rerank_Multilingual_v1 (aka Slingshot) in the reranking chain, as this will help ensure the score is normalized
Knee Reranking will only find knees over more than 4 results since it needs enough data to identify the knee
Knee Reranking works best when the upstream Rerank_Multilingual_v1 doesn’t have a limit
If the Knee Reranking can’t find any “knees” in the results, it will fall back to the “cutoff” value if you’ve provided one. 0.5 is typically a good fallback cutoff, but you can configure this
For most use cases, knee() with its default parameters is a good starting point and often needs no adjustments
If precision is extremely critical to your use case, consider setting a higher sensitivity value
If conciseness is a priority, consider setting a higher early_bias value
Always test parameter changes against representative queries!

Conclusion

The combination of Slingshot reranking, knee detection, and score cutoff provides optimal filtering across diverse query patterns. Enable the feature today and start improving your result quality and reducing latency in your AI assistants and agents. To read more about this new feature and learn how to start using it, check out our documentation.

As always, we’d love to hear your feedback! Connect with us on our forums, on our Discord, or on our community. If you’d like to see what Vectara can offer you for retrieval augmented generation on your application or website, sign up for an account!