Retrieval Augmented Generation Buyer's Guide
Regardless of whether you are considering building or buying a RAG solution, you need to understand the players, their offerings, where they accel, and where they are still trying to catch up
12-minute read timeIntroduction
Wow, what a difference a day makes!! When Vectara was founded, the Retrieval Augmented Generation (RAG) space was very lonely. Or, to be more precise, the RAG space didn’t exist!! Starting in 2020, Vectara implemented its product following an architectural approach called “Grounded Generation,” which the market now refers to as RAG.
However, the explosion of LLMs in 2022 also led to a rapid increase in the number of solutions that propose a RAG element. With this post, we aim to look at who are the major players in this space, and do a comparison of the options available to application builders, along with their strengths and weaknesses.
The Players
Like the introduction of the tributes in The Hunger Games, here are the contenders. Note, Our CEO Amr would have been great representing District 12!!
The rules for this year’s tournament: We only include options that are GA as of the initial publication date. There are some products and capabilities that are in Tech Preview or Beta as of the time of publication, and where applicable, those will be discussed in brief. But we won’t score on those non-GA items.
Vectara
Founded in 2020, Vectara was designed from the ground up to be a focused RAG solution for application builders with easy-to-use APIs. Vectara takes a page from the “Snowflake” playbook – focus on ease-of-use and let developers focus on implementing their own business requirements instead of building underlying infrastructure.
The developer only needs to think about their data, queries, and ACLs… instead of worrying about which chunking algorithm, embeddings model, vector storage, retrieval algorithm, or generative LLM, to use and how to integrate them.
Cohere
This GenAI company was founded in 2019 with multiple researchers from “Google Brain”, one of whom (Aidan) was a co-author of the famous “Attention is All You Need” paper, which introduced the “Transformer” architecture and laid the groundwork for other heavyweights below. Cohere partnered with Oracle to bring their technology into Oracle’s Fusion Cloud. There are two areas of relevance for our discussion:
- Their Co-Embed and Co-Command models can be used as individual components within a do-it-yourself RAG implementation, e.g. via LangChain or LlamaIndex.
- Cohere’s Coral (currently in beta) is similar to Vectara in that it is a hosted RAG-based approach targeting Conversational AI use cases.
OpenAI
OpenAI is still the juggernaut in the LLM space, with GPT4 still showing the best overall performance and least hallucinations relative to any other LLM.
They definitely have great brand recognition, though the name is kind of Ironic. Whilst they set out to be an open source, non-profit provider, they have evolved in the other direction.
Their current offering in the RAG space is as a provider of the Ada embedding model, supporting the retrieval component within a RAG process, and of course GPT3.5 and GPT4 for the generation component of RAG.
OpenAI is beginning todelve into the RAG architecture, which is in recognition of the fact that foundation models and fine-tuning are not enough for “chat with your own data” use cases, and RAG is required in order to get the most trustworthy results.
Azure AI Search
Azure AI Search is halfway between a “batteries included” RAG option like Vectara and a complete do-it-yourself option.
It is definitely more low level, and not as easy to use or reason with as more complete end-to-end solution options. Some elements are included but not all, and in order to get to a fully functional RAG-based application, you have to integrate multiple other Azure services.
Google Vertex AI
Google’s Vertex AI is a broad GenAI suite that encompasses LLMs, MLOps, and more. Specifically, it includes “Vertex AI Search and Conversations,” which is their managed RAG solution. They build on their strength in the Cloud and integration with their product suite.
LangChain
Langchain is a popular open-source framework for orchestrating various components together into a fully functional LLM-based application. LangChain supports different use cases that leverage the RAG pattern and provides many essential components such as document parsers, chunking strategies, wrappers around various embedding and LLM models, and even agents. With an increasing focus on multi-language support and regulatory compliance, LangChain is increasing the maturity of its offering.
LlamaIndex
LLamaIndex is another open source orchestration framework, which is focused on connecting data with LLMs. RAG is certainly a strong focus for LlamaIndex, and like LangChain, it provides various components to build RAG pipelines like data loaders, chunking, wrappers for embedding models and LLMs, as well as advanced query algorithms. It is fast maturing, shifting from the focus as a library to a solution that also offers enterprise support.
Databricks
A leader in the analytics platform space, Databricks has upped their game to allow Data Engineering and Data Science teams to collaborate and bring the benefits of LLMs to the Lakehouse. We are including the Databricks “Public Preview” vector storage/serving, as this has formal support and SLA for DataBricks customers, demonstrating strong maturity.
Our Analysis
In the analysis below, we’ll be primarily looking at how well each of these contenders is suited to provide a solution for a developer building a RAG-based application.
Comparison Criteria
The following criteria are comparisons between an organization’s position with relation to serving a front-facing application operational workload. We will explore the following factors:
- Completeness: We will assess RAG from the perspective of an application builder, who would be looking at RAG as a complete end-to-end pipeline to utilize in their solution. What tasks do they handle for the developer (each gets a point):
- Parsing & Chunking
- Encoding chunks as embeddings
- Vector Storage
- Retrieval – and whether only neural search or also Hybrid search is supported
- Prompt Engineering
- LLM Execution
- Flexibility
- Deployment Mode: (Library, PaaS, or SaaS) How much of the end-to-end RAG solution is the application builder responsible for, with SaaS scoring the best for builders?
- Abstraction: How aligned to “Don’t make me think” is this for an Enterprise? For example, do I reason about things at the level of Corpus or Document, or at the level of Embeddings?
- Total Cost of Ownership
- Software Cost
- Infrastructure Cost – compute, storage
- Integration Cost
- People Cost – hire and retain adequately skilled team members
- Operational cost – uptime, performance, release management, etc
- Trust
- Confidence in Search Results (best Retrieval Model)
- Minimisation of Hallucination in Responses
- Reduction of Bias in Responses
- Explainability
- Data Security
- Separation of Duties
- Advanced RAG Features
- Cross-Language Retrieval
- Query Flexibility – runtime filters, confidence score tuning, tenant isolation scoping
- ACLs – at the corpus or data level
- Automatic Index Optimisation
- Memory for Conversational AI
- Others
Scoring
Completeness
Given that any end-user application that leverages RAG requires both the underlying RAG infrastructure and the actual business application, how much of the entire solution is provided to the builder?
Is the full infrastructure provided so the builder can focus only on implementing their business application requirements… or is the builder responsible for wiring up elements of the infrastructure as well?
Vectara scores well here, as does Google and OpenAI. On the flip side, Cohere is mature, but you have to “do-it-yourself” to accomplish the major parts of RAG (parsing, encoding, vector storage, and search).
Table 1. RAG Option Scoring for Completeness Criterion
Note: Partial scores relate either to situations where you “need to build/maintain it”, or where “you can get it by integrating other parts of the stack,” e.g. Azure AI Search requires OpenAI for vector encoding and LLM execution. In both cases, the burden of the pipeline is on the developer, exemplified by the Azure RAG approaches.
Deployment Mode
This category was scored by treating PaaS == 5 and SaaS == 10.
In the cases of LangChain and LlamaIndex, they’re both kind of a library, but are evolving to SaaS to commercialize, so we gave them a 7.
Abstraction
Why is abstraction important? We think that a good RAG system should embody the “don’t make me think” principle.
In this fast-moving vertical, this category is like looking at Databricks or Snowflake as a Data Warehouse (circa 2021). You could achieve Data Warehousing with Databricks in 2021, but there was a lot of banging, tears, and a fair few sharp corners to get something equivalent to Snowflake, which kept their users focused on their tables and queries instead of the underlying infrastructure. Whilst this may not be true anymore, with Databricks now having many “make it easy” style features and being serverless, it was a key issue back then.
We think Some of the RAG providers, Vectara included, take an approach similar to Snowflake – let the users operate at higher orders of abstraction so they can remain closer to their business requirements.
Total Cost of Ownership
How many cooks does it take to get your RAG pipeline working…?
To continue working…?
And to continue to evolve and add new capabilities in this fast-moving space.
And how much do these cooks cost to hire and retain?
Trust
The scoring metric here was applied in relation to how much you trust the provider to return relevant results in the Retrieval phase and accurate, trustworthy responses in the Generation phase.
If the onus is entirely on the developer to tune the retrieval for optimal results, this scores poorly amongst “application builders” who just want something that works … and works well.
Similarly, if the burden is on the builder to reduce hallucinations by choosing the right LLM and engineering the prompt correctly, then this option will score poorly, as this is a hard thing to do.
Additional consideration was taken based on the extent to which the provider allows for robust access controls, so the builder can trust that end-users can only see the data they are permitted to see.
Advanced RAG Features
What additional features does the vendor support specifically for RAG, beyond the basic vector search and LLM generation? This is a fast-moving space, but some notable examples are below:
- Azure AI Search has strong roots in keyword search, so its hybrid retrieval has many advanced capabilities
- OpenAI is able to use Bing to search public information
- Databricks allows the client to use their own models (but also need to wire it together)
- Vectara automatically rebuilds the search index to optimize it if retrieval metrics begin to fall
Final Scores & Summary
The table below shows the complete set of scores across all categories.
Table 2 – RAG Option Scoring Totals
The final scores, in ranked order, are included below. We have also included a final statement for each option, which summarizes the main takeaway.
- Vectara (Score: 50.4) – Takeaway: An end-to-end, optimized SaaS platform that gives the developer a production-grade RAG service on which to build and deploy their business applications.
- Google Vertex AI (Score 46.7) Takeaway: A comprehensive SaaS suite offering easy-to-use tools to build and run RAG-based applications.
- OpenAI (Score 40.7) – Takeaway: Broad suite of LLM-oriented tools that is moving up the stack to offer more managed offerings and solutions, now including a RAG option.
- LangChain (Score 37.9) – Takeaway: Open source LLM application orchestrator that was the first to gain widespread usage in the market, making it easier to build and run your own RAG infrastructure.
- LlamaIndex (Score 34.4) – Takeaway: Open source framework using the RAG approach to bring your data to your LLM applications.
- Cohere (Score 26.9) – Takeaway: LLM-focused company that is moving towards a RAG-based managed offering to cater more towards developers who do not want to own the infrastructure.
- Azure AI Search (Score 26.0) – Takeaway: Keyword-based search platform that is iteratively adding on vectors and LLM – primarily via the OpenAI partnership – to offer a RAG platform with particular emphasis on security and governance.
- Databricks (Score 24.0) – Takeaway: Data platform company with deep roots in data engineering, data science, and data warehousing that is now moving towards the LLM world with a nascent RAG offering.
Conclusion
As RAG continues to drive the majority of LLM-based enterprise applications, developers gain a better understanding of what they need to stand up a production-grade, secure, and performant RAG solution. At the same time, more and more startups and large companies provide their RAG solution for developers to evaluate.
In this blog post, we reviewed some of the leading and more mature RAG solutions, including both commercial and open-source, and scored them against a set of criteria that is important to consider when building RAG applications.
A skeptic might remark that we’re tooting our own horn because Vectara came out first – yet it cannot be denied that Vectara’s RAG features are on another level, and the focused delivery of a platform for application builders has resulted in an incredibly easy path to building GenAI applications. Customers have commented that skeptics should first assemble their own RAG pipeline before trying Vectara so that they can discover how elegant it truly is.
It also cannot be denied that every option on this list has a unique approach and some compelling features. Furthermore, all are moving very rapidly to capture as much market share (and mindshare) as possible. Ultimately, this is good for users as this space evolves, and these RAG solutions let users more easily build GenAI applications that are ever more powerful.
For those who want to quickly get started with RAG, and add a Q&A, semantic search, or conversational AI capability to your application, sign up for Vectara. You can have something up and running in 1 hour. Also, consider joining our forums or Discord server to share your thoughts and any experiences you have had with various RAG platforms and solutions.