Proprietary LLM Architecture
Vectara uses Zero-shot models in LLM-powered search, a multi-model, neural network-based information retrieval pipeline built using Vectara-created LLMs for fast, cost-effective retrieval with high precision and recall.
|
|
|
---|---|---|
Complete Search Pipeline |
|
|
Overall Ease of Use |
|
|
Set Up and Maintenance |
|
|
Data Ingesting and Processing |
|
|
Relevance |
|
|
Cost |
|
|
Build for the Future |
|
|
Learn More |
Pinecone.io is a cloud-based vector-database as-a-service that provides a database for inclusion within semantic search applications and data pipelines. Pinecone supports the storage of vector embeddings that are output from third party models such as those hosted at HuggingFace or delivered via APIs such as those offered by Cohere or OpenAI. At its core, the vector database provides fast and scalable approximate nearest neighbor search functionality (in the embedding space) that computes similarity scores between queries and relevant content.
Pinecone is designed for use by search and data engineers who wish to assemble their own search pipeline for use cases like similarity search and wish to select (or implement) each element of the search pipeline, like the vector database and the embeddings model, and build and maintain the end-to-end search solution. They can use embeddings providers like OpenAI or Cohere to create embeddings for queries and content, store those in the Pinecone platform, and use its scalable similarity search to build the application.
Developers without experience in developing similarity search applications are likely to face a steep learning curve. Users of Pinecone might also expect to invest in implementing other elements required to build the search function and invest substantial time and effort in ongoing maintenance to keep the search pipeline efficient and performing well as they add data or update their data over time.
Building an end-to-end LLM-powered application is often more complex than it initially appears. Pinecone is just one component of the overall architecture, and the developer building the search application will have to pay careful attention to the integration of all the components (and their respective APIs) and how they interact with each other. Furthermore, after the initial build, maintaining that application and hosting it requires additional resources
and investment.
A typical architecture using Pinecone is shown in figure 1 below:
Documents and/or other content need to be converted into embedding vectors. This is done for each document in turn (using an embedding model provider from HuggingFace, Cohere or OpenAI), and the embedding vectors are then stored in Pinecone along with the documents they represent.
When a query is presented by the user, it is converted into a query embedding vector (using the same embedding provider); subsequently a similarity search is performed by Pinecone to match the query embedding to the closest document embedding using nearest-neighbor search algorithms.
Once the top results are identified by Pinecone in step 2, the result-set is sent back to the user for presentation, and potentially additional generative processing like summarization.
In summary, setting up and using Pinecone in itself is not trivial and requires learning the platform and gaining real world experience with using it; on top of that, designing and implementing the interaction with the embeddings provider, implementing document parsing and segmentation, connecting with the user interface and taking care of hosting everything on the cloud make this a non-trivial task.
Cost is another consideration; its pay-as-you-go pricing model can become surprisingly expensive as you scale up, especially as you introduce multiple collections (document indexes). In addition, when you use the Pinecone database, you need to factor the additional cost of development, maintenance, hosting, operation and per-token, per-API or hosted-inference cost of using the embeddings provider like OpenAI or Cohere.
Finally, Pinecone is still an early stage company, and their technology infrastructure has been prone to outages. In March of 2023, Pinecone experienced a partial database outage that affected some of their customers’ indexes.
Vectara is LLM-powered search-as-a-service. The platform provides a complete ML search pipeline that includes extract, encode, index, retrieve, rerank, and calibrate functions. The platform is API-addressable. Developers can efficiently embed an NLP model for app and site search. It is a cloud-native, LLM-powered search platform built to serve developers at companies of all sizes and enable them build or improve search functions in their sites and applications that will operate at market leading speeds. Using advanced research in AI, Vectara applies large language models to perform information retrieval (rather than using keywords) and deliver highly relevant results.
Vectara’s features include:
Vectara uses Zero-shot models in LLM-powered search, a multi-model, neural network-based information retrieval pipeline built using Vectara-created LLMs for fast, cost-effective retrieval with high precision and recall.
Vectara is API-first. It features quick set-up and easy-to-use APIs in a platform that enables developers to easily build, debug and test applications of semantic search. This is a unified API set with associated documentation and playground that allows full control over the entire pipeline, not just the database element or the embeddings element or the reranker or the text extractor, etc.
API-based features include:
Vectara’s InstantIndex feature allows developers to ingest and process new data through a full service search pipeline in sub-second time. Likewise, its File Upload API enables automated file extraction and processing.
Vectara can index most types of files and data. Vectara automatically extracts text from documents of nearly any type, with auto-detection of file formats and multi-stage extraction routines. Vectara can accurately extract text, index it, and create vector embeddings from documents in formats including PDF, Microsoft Word, Microsoft Powerpoint, Open Office, HTML, JSON, XML, email in RFC822, text, RTF, ePUB, or CommonMark. Vectara extracts text from tables, images and other document elements automatically.
Vectara’s LLM-powered re-ranking is a embedded feature. It is a part of Vectara’s multi-model AI architecture and allows users to re-rank retrieved documents for further precision around a given query.
Another customization feature, Rules-based AI, allows you to define and control the responses you provide to users.
Vectara also provides generative AI features like its LLM-powered summarization.
Vectara is language agnostic. It enables multi-language search and cross-language search. A user on the same site can search in multiple languages to find results in each of those languages. Developers can also use Vectara to provide users with the ability to search in one language for content written in another language.
Vectara security features extend across the entirety of the full pipeline at all times.
Security features include:
Finally, Vectara’s admin console UI provides users and administrators with access to manage user accounts, API keys, corpora, index data, and queries. An administrator has visibility to all the elements, users, and activities across all components of the pipeline within a single UI.
Some of the most common Vectara use cases for supporting marketing or enhancing your customer experience include:
Use Vectara to build a chatbot that understands questions no matter how they are asked and provides relevant answers, or empower your support team to quickly find answers to the most complex questions customers are asking.
Use Vectara to enable your website visitors to find what they are looking for no matter how they ask. Understand what they are asking for and provide it to them right away. Users can search across site content in many formats to include HTML JSON, and PDF. Build loyalty and improve conversion rates by dramatically improving your customer experience with a LLM-powered search.
Use Vectara to provide an eCommerce search function across all the products in your online store, and increase conversion rates and transactions. Allow shoppers to find what they are looking for as well as related products and products that other users like them purchased.
Use Vectara to improve your customer experience by helping users find related content and discover new ideas that are relevant to their question.
Common Vectara IT use cases include:
Use Vectara to enable employees in their workplace to search across documents of all types - files, emails, and other important data - to efficiently find the information they need to do their jobs.
Use Vectara to enable users to search in one language across content written in other languages and get accurate, relevant results.
Use Vectara to find more relevant and accurate information in your research. See an example of using Vectara to conduct financial research and analysis based on a company’s quarterly financial reports.
Use Vectara to enable your team to search across your Slack application and find relevant information with great accuracy. See an example of neural search Vectara built for its Slack application.
Vectara has developed solutions for these common Developer use cases as well:
Use Vectara to build a content discovery function across your applications that allows them to find the content they are looking for by better understanding the query and providing answers based on concepts, not keywords.
Use Vectara to answer semantic questions with concise, accurate answers. Vectara will first uses LLMs to understand what the user is looking for and return a relevant set of information, then use another LLM to summarize that information into a singular answer.
Use Vectara to create a real-time reporting database that is separate from your production database, and use that reporting DB to run your reporting queries and yield highly accurate results.
Vectara’s LLM-powered search-as-a-service offers a complete search pipeline that delivers unparalleled relevance. With Vectara you can build applications with cutting edge neural-network-powered Large Language Models without having to fine-tune, scale, or manage any infrastructure. Vectara’s LLMs provide semantic and contextual understanding of prompts and queries. Vectara also has a full metadata engine, including the ability to automatically assign metadata such as detected language and snippet identification within the document, as well as user-defined metadata that might include user reviews scores on products in an ecommerce context, or source, author, or references in a research context.
Vectara is a Search-as-a-Service platform that allows even a team of 1 to easily operate a highly available, scalable enterprise-grade service. Using Vectara requires no specialized search engineering or AI/ML knowledge to use the most advanced search available anywhere in your site or applications. You can start instantly by connecting via simple REST or gRPC API endpoints. No language configuration, synonym management, stop words or typo addressal is required. Vectara is unequivocally fast, both at ingest and prompt.
Vectara offers a very generous free version of its service, and users can upload a large amount of data (50 MB) into their indexes and use a high volume of queries (15,000) each month without needing to move to a paid plan. The paid plan is a pay-as-you-go plan that scales based on usage and is also cost efficient. Vectara supports near infinite logical data separation. If you want 500k buckets/indexes/corpora of data, Vectara supports that without any additional charge. In contrast, Pinecone offers a very limited free plan, but more importantly, the additional costs of hosting, using an embedding provider, and a higher level of development for set up and maintenance must be considered when using Pinecone as well.
Vectara’s LLM-powered platform will enable you to take advantage of many future capabilities as Vectara expands its platform to include continuously improved models for information retrieval and generative AI. Examples of API-addressable services include: related content, recommendations, classification, entity extraction, summarization, sentiment detection, form filling, alerting, action triggers, and iterative conversations.
You can get started using Vectara for free. You just open an account by signing-up, logging-in, and creating a corpora to start indexing your data.
Sign Up