Skip to main content
Menu

Blog Post

Retrieval Augmented Generation

Vector Database: Do You Really Need One?

RAG Platforms’ Silent Takeover of Vector Databases

Introduction

In 2023, a new vector database product was released almost every week – or at least it certainly felt like it!

Enterprise architects were told they needed to become experts in this space and pick a winning vector database product to modernize their stack, all in the midst of rapid innovation on LLMs, ANN algorithms, and utility functions to make vector databases usable in a “real” enterprise context. There would be comparison charts, pros/cons articles, bake-offs – you name it. 

But, is that really needed?

It’s true that RAG (Retrieval Augmented Generation) is currently the most popular methodology for building trusted LLM-based applications with your own data, and you do need a strong semantic search capability as part of your overall retrieval capability (the “R” in “RAG”), but a vector database is just one piece of that overall stack, and likely not even the most important one.

The stage is set for this to dramatically change in 2024.

First, all major database products like Snowflake, Redshift, MongoDB, DataStax and many others are on a fast-track to provide a native “vector data type” (like pgvector already provides for Postgres) that is fast, scalable, and integrates well with the existing database system. Those developers who opt to implement their own RAG pipelines DIY-style, will then likely use those native DB systems they already know.

In parallel, we see the accelerated adoption of RAG-as-a-Service platforms like Vectara, which provide the full set of capabilities needed for building RAG applications (such as chunking, embedding, vector store, and summarization) – all built to work in tandem. With Vectara, developers can now implement a RAG application via purpose-built, easy-to-use APIs and do not need to become RAG experts.

RAG, Similarity Search and Vector Databases

RAG is based on a simple but powerful idea – given a user query, retrieve the most relevant text chunks from a large corpus of documents (we refer to those as “facts”), and then ask a generative LLM to respond to the user query by summarizing those facts as they relate to the query. 

A key to the success of RAG is the ability to extract the most relevant facts from a large corpus of text documents. One of the key breakthroughs in solving this information retrieval problem is the ability to retrieve text by its semantic meaning. This is also known as “semantic search”, “neural search” or “vector search”. Algorithms for semantic search rely on translating each “chunk of text” (usually a sentence or two) into a vector of numbers (often 768 or 1024 such numbers):

embeddings

Image 1: Text chunk encoded as an embedding vector

These so-called “embedding” vectors are computed in such a way as to capture the semantic meaning of the text.

During data ingestion, the embedding vectors for all text are computed and stored into the vector database. When a user query is issued, the query embedding is matched against all the embeddings (text chunks) that are stored in the vector database and the closest matches are retrieved:

Image 2: Similarity between a document embedding and query embedding (image taken from Sbert docs).

This is simply a nearest-neighbor search algorithm in the multi-dimensional embedding vector space. If you think of the query as a point in this multi-dimensional embedding space, we are looking for the K closest points (each point represents a text chunk) in this space.

Executing a nearest neighbor search in a space with millions or even billions of data points is not an easy task if you do it naively, it just takes too long. Thankfully, approximate nearest neighbor algorithms (ANN) like HNSW have been developed that make this approach practical and are powering the underlying mechanism in all semantic search systems today.

This is what vector databases promised – to solve this vector space search problem efficiently and at scale. However, the vector database is just one item to deal with in your full end-to-end RAG pipeline. It’s the tip of the iceberg and much of the work (and complexity) you face when implementing RAG lies with the other parts.

Beyond Vector Databases

2023 can certainly be called the “Year of RAG”, and this realization – that RAG is the most effective methodology to apply the power and magic of LLMs to your own data – resulted in the high popularity of vector databases. At the time I write this blog post, on LangChain’s vector DB list there are 65 individual products. Why do we need so many? How are they actually different? And should you even care?

As a DIY RAG developer, it is truly overwhelming to fully grasp the differences between each of these products and select which one to use. More importantly, as this type of functionality is becoming part of existing database systems (relational databases or data warehouses as well as document databases), enterprise architects are starting to question whether they actually need a new component in their architecture diagram or perhaps instead their existing database can just add a new data type: the “vector”, and all this extra complexity goes away. At the end of the day, it’s not just the vectors – the text itself has to be stored as well, so having a single database to do both makes everything easier.

Many developers are discovering that developing RAG applications DIY-style works great for small-scale demos, but quickly becomes overwhelming at enterprise scale. Part of it is because of the idea that you need a vector database and the realization that for an enterprise application, you now need to manage two databases.

But it’s not just the additional vector database; it’s much more than that.

Building a RAG application requires dealing with chunking, embedding, advanced retrieval (like hybrid search or MMR), crafting the right prompt, and using the LLM to summarize the final set of facts. All of this while ensuring data privacy, security, low latency and high availability. Doing this correctly requires quite a bit of expertise in Machine learning, Information retrieval, MLOps, DevOps, and even what people now call PromptOps.

And if you build chatbots, you need to worry about additional aspects like chat history management and tracking, integrating chat history into the query flow, and advanced multi-turn prompting.

And then there is ongoing maintenance, upgrades (e.g. what do you do when GPT-5 comes out?), data refreshes and many more duties and responsibilities involved.

This is why Vectara’s RAG-as-a-Service is so compelling. It addresses the broad spectrum of complexity inherent in RAG application development, provides specialized support for chatbot applications, and lets developers focus on how to make their applications shine. 

Summary

Only a few months ago, everyone was writing about how vector databases are the shiny new component we all need in our enterprise stack. And already, this is shifting in two ways: existing database vendors are implementing vector search as part of their product, and RAG platforms are taking over the place of vector databases by providing the RAG-in-a-box API so that enterprise teams don’t have to become experts in RAG at scale.

In fact, if you look closely, you will see that vector databases are working to add some RAG features (like chunking or integrated embedding models), which helps them inch closer to the RAG platform vision, as offered by Vectara.

So, for all the RAG developers – you don’t need to go through the difficult DIY journey and instead jump directly to the end.

To learn more about Vectara – sign up for an account (our free “Growth plan” includes enough storage for 50MB of text and 5000 queries/month, which is quite a lot, and will support most application experiments), and experience RAG-as-a-Service at its best.

Recommended Content

docs

Vectara's API documentation

Check out all the powerful RAG APIs provided by Vectara

Vectara docs
Resource Image
Close Menu