Proprietary LLM Architecture
Vectara uses Zero-shot models in LLM-powered search, a multi-model, neural network-based information retrieval pipeline built using Vectara-created LLMs for fast, cost-effective retrieval with high precision and recall.
Complete Search Pipeline
|Vectara’s search-as-a-service platform provides a complete ML search pipeline. Its internally developed LLM architecture allows it to train data to provide a higher level of relevance. Vectara maintains the entire search pipeline.||Pinecone’s database-as-a-service does not provide everything required to build a full search functionality. You will need to use other technologies to build a full search function, and the burden is on the user to assemble, build, maintain, and operate the pipeline.|
Overall Ease of Use
|Using Vectara requires no specialized search engineering or AI/ML knowledge. You can index your first set of data and be up and running on using Vectara in 5 minutes.||Without experience in building a search pipeline, there is a steep learning curve associated with using Pinecone along with other necessary technologies to build search.|
Set Up and Maintenance
Data Ingesting and Processing
|Vectara’s InstantIndex makes content more discoverable and easily processed. Developers ingest and process new data through a full service search pipeline in < 1 second.||Once data is collected, it needs to be pre-processed, cleaned-up, and segmented appropriately before it is indexed.|
||Embedding vectors are not part of the Pinecone product offering, and developers will need to use models from embedding providers. They must insure such models remain consistent with the intent of the application and with the embedding providers, or relevance will be affected negatively.|
|Vectara offers a generous free version of its service. Users can upload a 50 MB of data into their indexes and use 15,000 queries each month for free. Upgrading to the paid plan is inexpensive.||Pinecone also offers a free plan, but it is very limited (single pod, single project, single environment.) More importantly, Pinecone’s pricing represents only part of the cost of building a search function with its use. Fees paid for hosting and to an embedding provider have to be considered as well. The cost of higher development effort for set-up and maintenance should also be considered.|
Build for the Future
|Vectara’s LLM-powered platform that offers a complete, integrated information retrieval pipeline will enable you to take advantage of a continuously improving, fully integrated service.||To take advantage of features Pinecone adds in the future, users will have to build out and expand the other components of their search model as well.|
Pinecone.io is a cloud-based vector-database as-a-service that provides a database for inclusion within semantic search applications and data pipelines. Pinecone supports the storage of vector embeddings that are output from third party models such as those hosted at HuggingFace or delivered via APIs such as those offered by Cohere or OpenAI. At its core, the vector database provides fast and scalable approximate nearest neighbor search functionality (in the embedding space) that computes similarity scores between queries and relevant content.
Pinecone’s features include:
This allows users to represent queries and documents as vector embeddings, and use that representation to build semantic search applications that can handle large volumes of data and find the most similar vectors for a given query.
Pinecone supports indexing of billions of documents, and provides fast response times for similarity searches at massive scale. Pinecone can automatically scale up or down the database based on users’ needs. As the workload grows, Pinecone can allocate additional resources to handle the increased traffic. Conversely, if user traffic decreases, Pinecone can release resources to reduce costs.
The platform enables analytics and monitoring, giving insights into search performance, tracking system performance and user behavior, and identifying areas for optimization. The analytics and monitoring features of Pinecone.io include metrics, logging, alerts, and visualization.
This feature allows the user to personalize search results based on user-specific preferences or behavior. The personalization feature includes user profiles, dynamic ranking based on user-specific data (for example, you can prioritize items that a user has previously purchased or viewed), recommender systems that provide personalized recommendations to each user, and A/B testing to evaluate the effectiveness of different search ranking strategies.
The Pinecone database allows users to manage and update vector data using grouping, tagging, pattern matching and deduplication.
It can be integrated into a variety of other software languages to include Python, TensorFlow, PyTorch, Scikit-learn, and Apache Spark; it also offers a range of SDKs, which makes it easy to integrate into your applications for a variety of common programming languages.
Provides secure access and storage for data features including identity verification, robot detection, fraud detection, and anomaly detection.
Pinecone is designed for use by search and data engineers who wish to assemble their own search pipeline for use cases like similarity search and wish to select (or implement) each element of the search pipeline, like the vector database and the embeddings model, and build and maintain the end-to-end search solution. They can use embeddings providers like OpenAI or Cohere to create embeddings for queries and content, store those in the Pinecone platform, and use its scalable similarity search to build the application.
Developers without experience in developing similarity search applications are likely to face a steep learning curve. Users of Pinecone might also expect to invest in implementing other elements required to build the search function and invest substantial time and effort in ongoing maintenance to keep the search pipeline efficient and performing well as they add data or update their data over time.
Some of the most common Pinecone search use cases are:
Use Pinecone to build a semantic text search application, convert text data into vector embeddings using an NLP transformer such as a sentence embedding model, then index and search through those vectors using Pinecone.
Use Pinecone to generate product recommendations for eCommerce based on vector embeddings.
Use Pinecone to build a search application that matches a query to multi-modal content including images or videos.
Use Pinecone to build a generative question-answering application which will retrieve relevant contexts for queries and pass these to a generative model to generate an answer backed by data sources.
Using Pinecone to build a chatbot. It provides a user-friendly interface and a set of tools for creating and customizing chatbots, as well as integrating them with different messaging platforms.
Use Pinecone for building text generation. The text generation model can be fine-tuned and customized for specific use cases, such as generating product descriptions, writing articles, or composing emails.
Use Pinecone for image generation for various applications, such as creating customized product images for e-commerce sites, generating new artwork or designs.
Building an end-to-end LLM-powered application is often more complex than it initially appears. Pinecone is just one component of the overall architecture, and the developer building the search application will have to pay careful attention to the integration of all the components (and their respective APIs) and how they interact with each other. Furthermore, after the initial build, maintaining that application and hosting it requires additional resources
A typical architecture using Pinecone is shown in figure 1 below:
Documents and/or other content need to be converted into embedding vectors. This is done for each document in turn (using an embedding model provider from HuggingFace, Cohere or OpenAI), and the embedding vectors are then stored in Pinecone along with the documents they represent.
When a query is presented by the user, it is converted into a query embedding vector (using the same embedding provider); subsequently a similarity search is performed by Pinecone to match the query embedding to the closest document embedding using nearest-neighbor search algorithms.
Once the top results are identified by Pinecone in step 2, the result-set is sent back to the user for presentation, and potentially additional generative processing like summarization.
Pinecone.io provides a fast and scalable database for use in building similarity search, however the developer using Pinecone must ensure that all the other components work in harmony both from a systems engineering perspective as well as from a modeling perspective.
A few key potential hurdles to keep in mind are:
The developer must make sure that all the components above are hosted properly in a high availability cloud environment, are secure, and that the communication interfaces between them are fast enough to maintain a quick response time to user queries. It may be easy to set up this kind of environment initially, but maintaining it and keeping it up-to-date often requires ongoing investment in time and resources.
The embedding vectors are not part of the pinecone product offering, and developers will often use either open source models (e.g. from Hugging Face) or models from embedding API providers like Cohere or OpenAI, with the independent costs associated for each token or API call or hosted inference. Importantly, care must be taken to ensure such models remain consistent with the intent of the application and with the embedding providers. For example, if a developer chooses Cohere for their embeddings model, when Cohere changes how their model works or upgrades their model, adjustments to the system may be required.
The developer should pay careful attention to which embedding models are used for the query and for the document, as often those models are not the same embedding models, and using the same model for both may harm the relevance of results.
An often underestimated complexity is the need to pre-process, clean-up, and segment documents into pieces appropriately before they are indexed. Proper segmentation of documents ensures search applications can return relevant results with the right granularity. This again falls back on the developer and requires know-how and expertise to do it right and maintain it over time.
Depending on how you plan to use Pinecone, you may need to build a front-end interface that allows users to interact with the search functionality. This could involve building a web application or integrating with an existing front-end framework.
The maximum metadata size per vector is 40KB. The vector size limitation can have an impact on the type of data that can be stored and searched using Pinecone.io. For example, if your vectors are too large, you will need to reduce the dimensionality of your data before you can use Pinecone.io. This could involve using techniques such as principal component analysis (PCA) or t-distributed stochastic neighbor embedding (t-SNE) to reduce the dimensionality of your vectors while preserving their similarity relationships.
In summary, setting up and using Pinecone in itself is not trivial and requires learning the platform and gaining real world experience with using it; on top of that, designing and implementing the interaction with the embeddings provider, implementing document parsing and segmentation, connecting with the user interface and taking care of hosting everything on the cloud make this a non-trivial task.
Cost is another consideration; its pay-as-you-go pricing model can become surprisingly expensive as you scale up, especially as you introduce multiple collections (document indexes). In addition, when you use the Pinecone database, you need to factor the additional cost of development, maintenance, hosting, operation and per-token, per-API or hosted-inference cost of using the embeddings provider like OpenAI or Cohere.
Finally, Pinecone is still an early stage company, and their technology infrastructure has been prone to outages. In March of 2023, Pinecone experienced a partial database outage that affected some of their customers’ indexes.
Vectara is LLM-powered search-as-a-service. The platform provides a complete ML search pipeline that includes extract, encode, index, retrieve, rerank, and calibrate functions. The platform is API-addressable. Developers can efficiently embed an NLP model for app and site search. It is a cloud-native, LLM-powered search platform built to serve developers at companies of all sizes and enable them build or improve search functions in their sites and applications that will operate at market leading speeds. Using advanced research in AI, Vectara applies large language models to perform information retrieval (rather than using keywords) and deliver highly relevant results.
Vectara’s features include:
Vectara uses Zero-shot models in LLM-powered search, a multi-model, neural network-based information retrieval pipeline built using Vectara-created LLMs for fast, cost-effective retrieval with high precision and recall.
Vectara is API-first. It features quick set-up and easy-to-use APIs in a platform that enables developers to easily build, debug and test applications of semantic search. This is a unified API set with associated documentation and playground that allows full control over the entire pipeline, not just the database element or the embeddings element or the reranker or the text extractor, etc.
API-based features include:
Vectara’s InstantIndex feature allows developers to ingest and process new data through a full service search pipeline in sub-second time. Likewise, its File Upload API enables automated file extraction and processing.
Vectara can index most types of files and data. Vectara automatically extracts text from documents of nearly any type, with auto-detection of file formats and multi-stage extraction routines. Vectara can accurately extract text, index it, and create vector embeddings from documents in formats including PDF, Microsoft Word, Microsoft Powerpoint, Open Office, HTML, JSON, XML, email in RFC822, text, RTF, ePUB, or CommonMark. Vectara extracts text from tables, images and other document elements automatically.
Vectara’s LLM-powered re-ranking is a embedded feature. It is a part of Vectara’s multi-model AI architecture and allows users to re-rank retrieved documents for further precision around a given query.
Another customization feature, Rules-based AI, allows you to define and control the responses you provide to users.
Vectara also provides generative AI features like its LLM-powered summarization.
Vectara is language agnostic. It enables multi-language search and cross-language search. A user on the same site can search in multiple languages to find results in each of those languages. Developers can also use Vectara to provide users with the ability to search in one language for content written in another language.
Vectara security features extend across the entirety of the full pipeline at all times.
Security features include:
Finally, Vectara’s admin console UI provides users and administrators with access to manage user accounts, API keys, corpora, index data, and queries. An administrator has visibility to all the elements, users, and activities across all components of the pipeline within a single UI.
Unlike Pinecone which provides a database-as-a-service and must be used in conjunction with other search technologies, Vectara’s search-as-a-service platform provides a complete search platform. Its LLM architecture allows it to respond to complex queries and provide a higher level of relevance relative to traditional keyword algorithms, unparalleled ease of setup and use, and is completely managed by Vectara.
Vectara’s integrated solution means you don’t have to worry about connecting the relevance model with the embeddings yourself; you don’t have to worry about correct document segmentation; Vectara’s query and document embeddings are always consistent with each other and are designed to provide the best overall search experience with highly relevant neural search results.
Vectara supports many file-types and formats out of the box, whereas with an independent database product like Pinecone you need to pre-process these different formats as part of your development effort. This again requires additional development work and maintenance.
Vectara’s easy-to-use, API-based platform allows application developers and data engineers to get started quickly and easily to build their search function. Using Pinecone’s vector search database to build your search function requires more up-front development for set-up and more maintenance. It typically requires more experienced search engineers to use it to develop the search function, and using it and may require a steep learning curve.
Vectara offers a free version of its service, and users can upload a large amount of data (~500 MB) and issue a high volume of queries (15,000) each month without needing to move to a paid plan. The paid plan is a pay-as-you-go plan that scales based on usage and is also cost efficient. Pinecone also offers a free plan, but it is designed for trying Pinecone out or using it on a very limited basis (single pod, single project, single environment.) More importantly, Pinecone’s pricing represents only part of the cost of building a search function with its use. Fees paid for hosting and to an embedding provider have to be considered as well. The cost of development effort for set-up, operations, and maintenance should also be considered.
Some of the most common Vectara use cases for supporting marketing or enhancing your customer experience include:
Use Vectara to build a chatbot that understands questions no matter how they are asked and provides relevant answers, or empower your support team to quickly find answers to the most complex questions customers are asking.
Use Vectara to enable your website visitors to find what they are looking for no matter how they ask. Understand what they are asking for and provide it to them right away. Users can search across site content in many formats to include HTML JSON, and PDF. Build loyalty and improve conversion rates by dramatically improving your customer experience with a LLM-powered search.
Use Vectara to provide an eCommerce search function across all the products in your online store, and increase conversion rates and transactions. Allow shoppers to find what they are looking for as well as related products and products that other users like them purchased.
Use Vectara to improve your customer experience by helping users find related content and discover new ideas that are relevant to their question.
Common Vectara IT use cases include:
Use Vectara to enable employees in their workplace to search across documents of all types - files, emails, and other important data - to efficiently find the information they need to do their jobs.
Use Vectara to enable users to search in one language across content written in other languages and get accurate, relevant results.
Use Vectara to find more relevant and accurate information in your research. See an example of using Vectara to conduct financial research and analysis based on a company’s quarterly financial reports.
Use Vectara to enable your team to search across your Slack application and find relevant information with great accuracy. See an example of neural search Vectara built for its Slack application.
Vectara has developed solutions for these common Developer use cases as well:
Use Vectara to build a content discovery function across your applications that allows them to find the content they are looking for by better understanding the query and providing answers based on concepts, not keywords.
Use Vectara to answer semantic questions with concise, accurate answers. Vectara will first uses LLMs to understand what the user is looking for and return a relevant set of information, then use another LLM to summarize that information into a singular answer.
Use Vectara to create a real-time reporting database that is separate from your production database, and use that reporting DB to run your reporting queries and yield highly accurate results.
Vectara’s LLM-powered search-as-a-service offers a complete search pipeline that delivers unparalleled relevance. With Vectara you can build applications with cutting edge neural-network-powered Large Language Models without having to fine-tune, scale, or manage any infrastructure. Vectara’s LLMs provide semantic and contextual understanding of prompts and queries. Vectara also has a full metadata engine, including the ability to automatically assign metadata such as detected language and snippet identification within the document, as well as user-defined metadata that might include user reviews scores on products in an ecommerce context, or source, author, or references in a research context.
Vectara is a Search-as-a-Service platform that allows even a team of 1 to easily operate a highly available, scalable enterprise-grade service. Using Vectara requires no specialized search engineering or AI/ML knowledge to use the most advanced search available anywhere in your site or applications. You can start instantly by connecting via simple REST or gRPC API endpoints. No language configuration, synonym management, stop words or typo addressal is required. Vectara is unequivocally fast, both at ingest and prompt.
Vectara offers a very generous free version of its service, and users can upload a large amount of data (50 MB) into their indexes and use a high volume of queries (15,000) each month without needing to move to a paid plan. The paid plan is a pay-as-you-go plan that scales based on usage and is also cost efficient. Vectara supports near infinite logical data separation. If you want 500k buckets/indexes/corpora of data, Vectara supports that without any additional charge. In contrast, Pinecone offers a very limited free plan, but more importantly, the additional costs of hosting, using an embedding provider, and a higher level of development for set up and maintenance must be considered when using Pinecone as well.
Vectara’s LLM-powered platform will enable you to take advantage of many future capabilities as Vectara expands its platform to include continuously improved models for information retrieval and generative AI. Examples of API-addressable services include: related content, recommendations, classification, entity extraction, summarization, sentiment detection, form filling, alerting, action triggers, and iterative conversations.
You can get started using Vectara for free. You just open an account by signing-up, logging-in, and creating a corpora to start indexing your data.Sign Up