July 11, 2023 by Vivek Sourabh | 6 min readRead Now
For app developers and product managers, implementing hybrid search into your application can be a game-changer for findability and improved user experiences. We break down the process of implementation, best practices, and considerations for when you want to bring hybrid search to your products and users.
September 19, 2023 by Paul Wozniczka
In today’s age of data-driven decision making, businesses want to embed advanced semantic search capabilities within their applications. Traditional keyword-based search methods might not be enough to deliver accurate and relevant results for complex queries. That’s where hybrid search comes in.
Hybrid search combines conventional keyword searches with sophisticated NLP methods to grasp the context and intent of search queries, revolutionizing and improving the user experience by understanding the meaning of a user’s search.
Implementing hybrid search has the potential to give users an enhanced personalized experience by understanding not just what they type, but what they mean. So we’re going to explain how exactly your organization can implement hybrid search into your product to empower users to find exactly what they want.
Let’s look at what the term hybrid search actually means. Hybrid search blends the efficiency of traditional keyword search with advanced NLP techniques to understand the context and underlying intent – the semantic meaning – of search queries. This augmentation represents the evolution in search to think more like a human by understanding nuances and the context in which a search query is made. It knows what you are thinking!
Traditional search methods rely on Boolean logic and exact keyword matching – or close variations of them. These methods work well for straightforward searches on product SKUs, serial numbers, and other static measurement units. But what about homonyms, or asking a question within a particular context?
Keyword search needs some help to answer those more complex queries. Now comes the experience of having a conversation with an application’s search engine that understands the intent behind your search.
That’s hybrid search – empowering users to find information by providing keywords, context, and nuances to a query.
Hockey season is just around the corner, so let’s look at a real-world example with the NHL rulebook and show how hybrid search works in Vectara. The NHL has complex rules that differ during preseason, regular season, and the playoffs. These nuances often regulate policies for skaters and goalkeepers but also how suspensions and certain misconduct penalties are handled.
Let’s upload the NHL rulebook, query the data into Vectara, and ask the question: What are differences between preseason, season, and playoff games?
In this example, Vectara uses hybrid search to match keywords and also provide intent. The top of the page also provides a complex summarization of the question with several references to specific parts of the NHL rulebook.
The next two search results continue showing more nuanced answers:
This time, the search found a rule regarding stick infractions during the regular season. It then showed additional information about these stick infractions during the playoffs. Each result is related to rules and guidelines that differ based on the type of season.
If you continued reading through the results, you would learn about other nuances in the complex NHL rules, and that’s just the first page of results!
So, how can you implement hybrid search to achieve the same outcomes?
The journey to implement hybrid search has four typical phases.
Taking an agile approach in each phase allows for continuous refinement as you move forward.
The hybrid search engine is only as good as the data that it can access. Gathering comprehensive and high-quality data is crucial to ensure that you have a clean and well-structured database. Why? Because poor data quality will negatively impact the search outcomes. Since hybrid search focuses on augmenting keywords with understanding context and intent, well-structured data helps enable good search outcomes for your users. Depending on the application you want to build, you can source data from product descriptions, user reviews, metadata, question-answer pairs, and so on.
An important consideration is that not all of the collected data is useful, so you need to determine how to clean and remove irrelevant data. Check for consistency and uniformity when dealing with dates and other measurement units. Also, decide how to handle null or missing values. Basically, consider all the sources of your data and how to ensure alignment to minimize confusion.
Building or utilizing existing knowledge graphs assist you in creating a structured view of the data. A knowledge graph specifically helps you represent relationships between different data types. For example, entities like products and categories map to relationships like “is related to” and “is a type of,” and so on. Knowledge graphs can also help the search engine make connections, like understanding that “apple” could refer to a fruit or a company, depending on context.
For more information about knowledge graphs, see An Introduction to Knowledge Graphs by Stanford AI.
The current tools used for text analysis and comprehension include NLP techniques like tokenization, lemmatization, and named entity recognition. Let’s break these down individually:
For more information about these techniques, see What is natural language processing? by IBM.
Each of these techniques has a role to play in understanding the context and intent of user queries. With Vectara, we handle all of these techniques for you behind the scenes. We’ll do the heavy lifting and you just upload your data.
The final phase of the hybrid search implementation process involves implementing machine learning algorithms that understand relationships between words and context, even if some words were not explicitly stated in the user query.
This journey is complex. We began with tips about cleaning and preparing data. We moved to talking about knowledge graphs and how they help you create a structure from that prepared data. The next leap was into multiple techniques in text analysis and comprehension that break down the data even further. The final stage was using machine learning to understand the semantics between the broken-down words and text. Let’s look at some of the tools.
Now that you know about the overall journey, we’re going to talk about how you can start building a hybrid search engine. It begins with selecting the right tools and libraries. Let’s look at some popular options along with some pros and cons:
The right tools depend on whether you want to build a chatbot or buy a platform that requires less development effort on your end. When you build a chatbot from scratch, you need a comprehensive set of libraries and frameworks that cover every aspect of hybrid search. On top of that, you need engineering talent with the knowledge and skills to work with those elements for a successful hybrid search implementation.
On the other hand, buying a pre-built platform of functionalities that already integrate well, like Vectara’s robust hybrid search, may be the optimal choice for companies who do not want to worry about the complexities of building from scratch, building a costly team of experts, or long lead time.
Vectara is optimized for retrieval, summarization, and generation from end to end. Instead of only having one Large Language Model, we optimize different neural systems at different stages. In fact, we pick the best option for you at every stage. You just upload your data into a corpus, like a container, and then ask questions about your data. It can be that simple.
Data quality is the linchpin that holds everything together. Bad quality data leads to incorrect and irrelevant search results, which affects the user experience and ultimately your bottom line. You need a rigorous multi-step process to gather, clean, and validate datasets as part of your regular data maintenance cycle. Use available tools to help remove inconsistencies and errors to ensure that only refined data gets ingested.
Ongoing fine-tuning sets apart a functional search engine from an amazing search engine. As the application and data both continue to grow, the model should learn and adapt as necessary. The Vectara advantage is that we select the best models for you behind the scenes, and we never train on your data, so it remains private and secure. Vectara continues learning when you provide updated data.
Hybrid search can interpret ambiguous or poorly structured queries using techniques like query expansion and query disambiguation. Query expansion involves broadening the search to include synonyms and related terminology. Query disambiguation takes an ambiguous question – which can have many meanings – and transforms it into a standalone question.
Should you opt for an out-of-the-box solution, Vectara utilizes machine learning and natural language processing techniques to decipher user intent, even when the queries are less than clear.
Once the hybrid search system is implemented, measuring the performance of your search becomes important to ensure users keep getting the right answers. Remember that you need to find the disconnect between what users intend to find versus what your search engine understands. We recommend utilizing user surveys because they provide direct feedback from users, but you can also look into other relevance feedback mechanisms and testing. If you notice that a particular query yields irrelevant results, then you know this is a prime area for improvement and you may have a problem with the data.
At Vectara, we distinguish ourselves through a series of hyper-optimized functionalities tailored for different aspects of the search pipeline – retrieval, summarization, generation, all the way from end to end.
We don’t just have one large language model to do everything. Instead, we optimize different neural systems at different stages which allows us to be faster, more reliable, and even better than using one LLM to do everything. Vectara’s hybrid search capabilities are engineered to deliver unparalleled search capabilities, and our semantic-enabled search returns relevant search results that keyword search can’t do alone.
Using the Vectara platform involves our straightforward APIs that work with minimal intrusion on your existing architecture. Create a corpus, add some data, and then use the APIs to have conversations with your data. For businesses deciding where to invest in Generative AI, Vectara presents a compelling case for buying an end-to-end platform that helps builders get up and running quickly.
Plus, our solution transcends language barriers, enabling a truly global search experience. When you choose Vectara, you’re not just investing in a search-as-a-service platform – you’re investing in a holistic search ecosystem that adapts, scales, and resonates with the end-user’s needs and elevates your product from simple search to conversations and question answering.
Take the leap with Vectara and transform your in-app search into an unparalleled user experience.
Sign up for a free plan today.
July 11, 2023 by Vivek Sourabh | 6 min readRead Now