Application Development
Introducing Vectara-Agentic
Enabling advanced RAG applications with Vectara Agentic
August 14 , 2024 by Ofer Mendelevitch & David Oplatka
Introduction
Computer scientists have been thinking about autonomous software agents since the 1980s. In his book, “Software Agents,” Jeffrey Bradshaw writes:
“The idea of an agent originated with John McCarthy in the mid-1950s, and the term was coined by Oliver G. Selfridge a few years later when they were both at the Massachusetts Institute of Technology. They had in view a system that, when given a goal, could carry out the details of the appropriate computer operations and could ask for and receive advice, offered in human terms, when it was stuck. An agent would be a ‘soft robot’ living and doing its business within the computer’s world.” (Kay 1984).
We have always been fascinated with the idea of non-human entities with their own agency, and this has inspired many science fiction books like “I, Robot” (by Isaac Asimov) and “The Moon is a Harsh Mistress” (by Robert Heinlein) as well as movies such as “Blade Runner”, “2001: A Space Odyssey” and “Her”.
Fast forward to 2024, and there is quite a lot of buzz in the world of LLMs about AI Assistants, AI Agents, and “Agentic RAG”.
What is that all about?
Are we closer than ever before to building our own HAL or Samantha?
In this article, we will describe what “Agentic RAG” means, how it works, and introduce the beta version of vectara-agentic, a new Python package from Vectara that provides a quick and easy way to create AI Assistants and AI Agents by leveraging the power of Agentic RAG.
Let’s dive in.
The Rise of Agentic RAG
When ChatGPT launched in late 2022 it took the world of technology by storm, and since then, LLMs have continued to improve dramatically every few months.
The initial applications were quite simple: you ask the LLM a question and it provides a response. For most enterprise applications, direct LLM responses had too many hallucinations, which made it challenging to use in production.
Then came retrieval augmented generation (RAG), which allowed the LLM to use relevant facts provided by an accurate retrieval engine, reducing hallucinations and improving the trust users have in the response.
Agentic RAG is the next step in this evolution, and powers two types of GenAI applications:
- AI Assistants: these are LLM-based applications that answer a user’s question by forming and executing a query plan, calling various tools and formulating a final response.
- AI Agents: these extend the functionality of AI assistants beyond the query/response paradigm, allowing an Agentic RAG application to safely act on the user’s behalf to achieve a desired outcome, such as sending an email or booking a flight.
Whether it’s an AI Assistant or an AI Agent, the underlying idea is to use LLM technology to automate complex tasks that traditionally required human intelligence, ranging from research and analysis to creative writing and problem-solving.
Many believe that LLM-based agents represent a significant step towards artificial general intelligence (AGI). As we will see later in some examples, the quality of the output in some use-cases is so good that it does feel that way sometimes.
LLM Reasoning and Tool Use
The need for LLMs to reason and plan drove a lot of research in the field, including Chain-of-Thought prompting, ReAct, Tree of Thoughts, Chain of Density, and much more.
These techniques attempt to emulate human-like reasoning processes within the LLM by encouraging the model to break down complex problems into smaller, manageable steps. In each step, the model can use tools via a recent capability of LLMs called “function calling”, namely the ability of language models to interact with external tools, APIs, or functions by generating structured outputs that can be interpreted as function calls.
Consider, for example, an AI agent for booking a vacation. The agent has access to the following tools:
- A tool to access my calendar
- A tool to search and book flights
- A tool to search and book hotels
- A weather forecast tool
Now let’s say I am the user and ask the tool to “book me a vacation in the Maldives”. Here is how the conversation may proceed with an AI agent:
Me: “Please book me a vacation in the Maldives.”
Agent: “When would you like to go?”
Me: “Anytime in January 2025.”
Agent: “And for how long?”
Me: “Let’s say 7-10 days.”
Here the agent will use the calendar tool to review my calendar for any holidays or special company functions to determine the possible dates for the trip. It will also call the weather forecast tool to determine the best match in terms of comfortable weather.
Having all this information, the agent can now search for flights and hotels. But first, it may ask:
Agent: “Do you have a budget in mind?”
Me: “The total cost must be less than $5000.”
The agent now uses the flight tool and finds a nice flight from San Francisco to Velana International Airport. It may know from its memory that I have a liking for Singapore Airlines and try to pick a flight from that airline that matches my criteria.
The agent then uses the hotel tool to find a few good hotels, again based on my historical preference from past interactions.
The agent can then present me with a few possible itineraries, ask additional questions, and finally when I approve, it will book the hotel and flights for me.
As you can see, through tool use, the LLM that runs the agentic engine is imbued with “superpowers”: it can check my calendar, book flights, check the weather, and much more.
Challenges with Agentic RAG?
This framework of LLM-based reasoning along with tool calling and structured outputs is quite powerful and flexible, but are there any challenges?
As you might imagine – yes, there are.
First of all – LLMs themselves are not perfect. They can make mistakes, and when they do, this might result in calling the wrong tool, supplying the wrong arguments to the tool, or simply hallucinating when creating the final response from the tool output.
Even worse, this may compound: if the plan created by the LLM includes 5 steps and a mistake is created in step 2, this may mean steps 3-5 execute in a completely wrong way. Just imagine an AI Agent that books you a flight to the wrong destination or on the wrong dates.
This is why reducing hallucinations is critical for success with Agentic RAG applications, an area we at Vectara focus a lot of effort, including advanced retrieval algorithms, specialized LLMs like Mockingbird, and hallucination evaluation models like HHEM (also called FCS or Factual Consistency Score).
Vectara-Agentic
So what is vectara-agentic?
It’s a Python library that helps to build safe and trusted Agentic RAG with Vectara. It’s based on the LlamaIndex open-source package and its support for Agentic RAG while abstracting away most of the nitty-gritty details.
Creating an Agentic RAG application with vectara-agentic comes down to two simple steps:
- Define the set of tools you want to provide to your agent.
- Define custom instructions to give your agent use-case-specific guidance.
Vectara-Agentic Tools
It’s super easy to define tools for your agent with vectara-agentic:
- You can create a query tool that uses Vecrara with the
create_rag_tool()
function. You simply point it to your Vectara account and corpus, configure the query parameters, and you have an agentic tool ready to go. - vectara-agentic also provides a library of industry-specific tools that are ready for use right away (e.g., for finance or legal), and of course, you can turn any Python function into an agentic tool.
- vectara-agentic provides an easy way to take advantage of many existing tools from the LlamaIndex Agent Tools repository.
Please see the vectara-agentic docs for more details about using tools.
Vectara-Agentic Configuration
There are a few ways to configure how vectara-agentic works, using environment variables (or put in the .env
file).
First, vectara-agentic supports two types of agent types using the VECTARA_AGENTIC_AGENT_TYPE
variable : REACT
and OPENAI
. Both of these work great, and we’ll keep adding new ones to vectara-agentic as additional reasoning methods become available.
Secondly, you can configure the LLM used by the main reasoning loop (VECTARA_AGENTIC_MAIN_LLM_PROVIDER
) as well as the LLM use by tools (VECTARA_AGENTIC_TOOL_LLM_PROVIDER
).
On this beta launch, we are excited to support a variety of LLMs from inference vendors such as OpenAI, Together.AI, Groq, Fireworks AI and Anthropic. Note however that OpenAI needs to be used if you choose the OpenAIAgent
agent type.
Each LLM provider comes with a default model type (e.g. GPT-4o for OpenAI and mixtral-8x7b-32768 for Groq), and you can modify those as needed using the VECTARA_AGENTIC_TOOL_MODEL_NAME
and VECTARA_AGENTIC_MAIN_MODEL_NAME
variables.
Example: A Legal AI Assistant Using Vectara-Agentic
Let’s look at a cool example.
We build an AI assistant that helps with case law. Lawyers and paralegals can use this assistant to ask questions about case law and quickly get the information they need when researching a specific legal precedent or preparing for a trial.
For this example, we used vectara-ingest to crawl the Caselaw Access Project dataset for the state of Alaska into a Vectara corpus.
To build the actual legal assistant, we first need to design the set of tools that will be available to the AI assistant.
The first tool of course is a query tool that uses Vectara to answer questions based on the data ingested into the corpus. We use vectara-agentic’s create_rag_tool()
as follows:
ask_caselaw = vectara_tool_factory.create_rag_tool( tool_name = "ask_caselaw", tool_description = """ Returns a response (str) to the user query about case law in the state of Alaska. If citation is provided, filters the response based on information from that case. The response might include metadata about the case such as title/name the ruling, the court, the decision date, and the judges. This law is designed to answer questions based on the semantic meaning of the query. """, tool_args_schema = QueryCaselawArgs, reranker = "multilingual_reranker_v1", rerank_k = 100, n_sentences_before = 2, n_sentences_after = 2, lambda_val = 0.0, summary_num_results = 10, vectara_summarizer = 'vectara-summary-ext-24-05-med-omni', include_citations = True, )
As you can see, creating a tool that automatically calls Vectara is pretty much a single function call. You can customize the Vectara query by specifying a summarizer, hybrid search, using Vectara’s reranker, and various other retrieval options that can help achieve the best and most accurate responses.
For this legal assistant, we’ve added a few other tools that are helpful:
- A tool called
get_opinion_text()
which, given a case citation, provides the full opinion text, or its summary. - A tool called
get_case_document_pdf()
which, given a case citation, returns a valid web URL to the PDF of the case record. - vectara-agentic’s
critique_as_judge()
tool helps further strengthen an argument or legal brief.
The create_assistant_tools()
function in our app creates all these tools, and then proceeds to define the agent object as follows:
agent = Agent( tools = create_assistant_tools(), topic = "Case law in Alaska", custom_instructions = """ - You are a helpful legal assistant, with expertise in case law for the state of Alaska. … (more instructions) - Never discuss politics, and always respond politely. """ )
In addition to the tools, we provide a “topic” string (think of this as the area of specialty for the assistant) and a set of custom instructions.
Putting it all together, we have developed a Streamlit app that manages the UI and enables interactions between the assistant and the end user. When a user sends a message in the Streamlit chat box, it displays the message and sends the message to the assistant with the chat()
method. The assistant will then decide which tools to call to answer the user’s question and return a response, which is displayed in the chat box.
To see the full code for this example, check out the legal-assistant demo on Hugging Face.
Here is an example:
What’s Next?
RAG is an extremely powerful methodology for building AI Assistants that continues to gain popularity amongst enterprise developers due to increased trust, higher-quality responses, and reduced hallucinations.
Agentic RAG can be used to enhance AI Assistants with new superpowers through the use of reasoning and tools, providing better handling of user queries, and integration of additional real-time data sources via tools.
We are excited to announce vectara-agentic (beta), a Python package that makes building AI assistants powered by Agentic RAG easy and straightforward. It abstracts away a lot of the details and minutia required to build Agentic RAG while providing functional flexibility, a broad set of tools out of the box, and ties back to the scalable and secure enterprise RAG platform from Vectara.
The vectara-agentic docs provide more details on how to use the package, and we have created a few Agentic RAG demos using vectara-agentic, such as the Financial-assistant, Hacker-News-assistant , and Legal-assistant.
You can try it yourself too! Sign up to get your Vectara account set up, ingest your data into a corpus (with the API, or vectara-ingest), and build your own Agentic RAG application.
And please share your Agentic RAG creations with our community. We would love to hear what you build, and what features you would like us to add to vectara-agentic.