Hacker guide
Introduction
Welcome Hackers! The purpose of this guide is to give you a quick start as well as a safe haven to help you maximize your time during the hackathon.
As we all know, time flies during a hackathon!
Vectara Overview
Vectara is the trusted platform for enterprise conversational AI and agentic RAG. As an end-to-end AI agent platform, deployed on-prem, in VPC, or utilized as a SaaS, Vectara delivers the shortest path to a correct answer/action while mitigating / correcting hallucinations and providing high-precision results.
Vectara is an API-based service, with support for data ingest, file upload, query, chat and agentic capabilities.
Vectara provides RAG-as-a-service or Agent-as-a-service. What does that mean?
It means we take care of all the heavy lifting during both data ingest (text extraction, chunking, embedding and storing in the vector store) as well as during query and retrieval (embedding of the query, neural search or hybrid search, reranking and calling the LLM) – so you don’t have to. Instead you can focus your efforts on building your RAG or AI Agent application.
Use-cases include:
- Conversational AI: provide direct answers to specific questions based on the data and build agentic chatbots that can hold full conversations with humans, with back-and-forth exchanges. Here is a demo for conversational AI with using Vectara.
- Document Generation: Leverage your enterprise data to generate deeply specific responses, providing productivity and next-level accuracy and voice.
- Semantic (Neural) Search: build applications powered by fast and powerful semantic search that finds documents that match the user’s intent.
Getting Started with Vectara
You can get to know Vectara’s features in 5 minutes:
We have more resources to help you build apps with Vectara:
- The Quick Start guide shows you how to use the Vectara Console.
- Read the API recipes for common patterns with our APIs.
- Our API playground shows you how Vectara’s API requests and responses are structured.
- See “Additional resources” below for a comprehensive list.
General guidelines:
- Vectara offers a 30-day free trial complete with nearly all of the enterprise features of the platform.
- Use the Indexing API to ingest text data into a Vectara corpus or the File Upload API to upload files such as PDF, PPT, or DOC.
- Use the Query API to run queries against the ingested data. A query can be used to retrieve search results, generate a summary, or add a new message to a multi-turn chat.
- To build AI Agents, use the Agentic APIs.
- The Console provides a unified view of information about your Vectara account and corpora. You can also run example queries in the console for quick experimentation.
Common Questions
What is HHEM?
- HHEM stands for “Hughes Hallucination Evaluation Model” and is an open source model created by Vectara that can evaluate hallucinations in LLM generated responses.
- You can find on Hugging Face the Model card and the LLM Leaderboard
- Blog posts:
- Introducing HHEM
- New and improved HHEM-2.1
A commercial-strength version of HHEM is integrated into the Vectara platform (called FCS or Factual Consistency Score) and can be returned automatically with every API Query request, or called directly as an independent API call.
What is the Vectara Hallucination Corrector?
Vectara API provides a service to correct hallucinations. If your response from RAG or an AI Agent is suspected to be hallucinated, you can use the Vectara Hallucination Corrector (VHC) to correct it.
Here is the API, and you can also see this example notebook for usage examples.
How can I use Vectara Agents?
You can create agents using our Agents API. See this blog post for the launch announcement
What is the Boomerang embedding model?
Boomerang is the name of Vectara’s SOTA embedding model. This model encodes text from your data as “vector embeddings” and is used to power the high performance retrieval process that is part of the RAG pipeline.
Read more about Boomerang, and the importance of using a good retrieval model for getting best results from RAG.
To learn more about embedding models, please see this short course.
What is Mockingbird?
Mockingbird is Vectara’s large language model (LLM) that is trained specifically to excel at RAG-related tasks. This also increases the data security of our end-to-end pipeline because you never send your data to a third party LLM provider.
Check out our blog posts to learn more about Mockingbird’s features and evaluation metrics.
What is reranking and why is it important?
Reranking is the process of reordering the retrieved search results and occurs in between the retrieval and generation steps. Reranking can have many effects, such as increasing the diversity of the search results used in the generated response. This ensures that the search results that are used for the generated response do not have redundant information, creating a more comprehensive answer.
To learn more about reranking and how to add a reranker to your application, check out this blog post and our reranker documentation page.
Can I stream the RAG response?
Yes, Vectara supports streaming outputs. You can stream the response output by simply adding “stream_response”: true to your query body.
Check out the API API reference or our Query docs to learn more.
Does Vectara provide an SDK?
Yes, we have both a Python and Typescript APIs. They are both in beta right now.