Boosting eCommerce Conversions with Semantic Search

The world of eCommerce search, built on keyword-based retrieval, has been well established, and relatively stable, for many years now. Advances to state of the art typically come with high operational overhead activities like language management, feature generation, and taxonomy management. But recent advances related to Large Language Models (LLMs) are making it possible to achieve search relevance metrics – and therefore sales conversions – that were never before possible.

This is especially pronounced in the case of eCommerce Marketplaces, where data within product catalogs is notoriously inconsistent and incomplete because it comes from an ecosystem of external vendors where data quality is difficult to enforce. Thankfully, Vectara’s ability to understand the meaning of data and the intent of users makes it an ideal hybrid search tool to help eCommerce companies achieve these improved metrics and business outcomes. Even better, because Vectara takes care of the entire infrastructure pipeline – from extraction to embedding generation to vector storage to retrieval to reranking – developers simply need to call the Vectara APIs to add their data and to run their users’ searches. It’s as simple as that.

This blog post explains how Applaudo, an IT Solution Provider with expertise in building AI-powered applications, enhanced an existing eCommerce Marketplace using Vectara’s LLM-based semantic search platform.

Custom Feature Vectors and Keyword Search Approach

We faced a challenge that seemed simple at first: organize and standardize a database for a marketplace where new retailers were constantly joining to offer products. The problem arose when we discovered that the same products were described differently by each retailer.

We initially tried to solve this by developing a conventional keyword-based search engine. The difficulty was that not all products had keywords, and different retailers used varying conventions or classifications for their products. Since we lacked a common “taxonomy” for identical products across different retailers, we decided to train a keyword-based model using feature vectors.

Figure 1: Simple representation of doing eCommerce product search using a keyword-based approach.

Our goal wasn’t to always place the exact product as the first suggestion, but to have it within the top three rankings. Our preliminary proof of concept achieved an accuracy of 60%, but when we tested it within the production environment, our accuracy dropped to 40%. The data in production was messy, and maintaining the model’s effectiveness became an ongoing challenge as new retailers joined with differing keywords and product descriptions.

For example, assume that a user searches for cherry coke 300ml, and the product catalog has entries for cherry juice 300ml, pepsi wild cherry, sprite 300ml, and others that have similar keywords. Since the keyword approach is essentially to match as many keywords as possible, the user would end up with products that have a lot of keyword overlap, but which are not exactly what is wanted, which is a Coke. While it is possible to do a weighted keyword approach (e.g. weigh the brand name more heavily than other keywords), this is very difficult and costly to manage across an entire product catalog, because you end up with 100s or 1000s of different weighting rules.

Semantic Search using Vector Embeddings Approach

Realizing the need for a better solution, we explored semantic search using vector embeddings. This approach takes advantage of the fact that vector embeddings can represent the meaning inherent to a piece of text. When combined with extraction, storage, and retrieval techniques this provides a powerful semantic search capability.

Figure 2: A simple representation of how semantic search operates. The user’s query is processed through a semantic search algorithm, which uses a vector space model to determine product suggestions based on vector (i.e. semantic) similarity.

This approach seemed promising, and we initially considered OpenAI. However, that only provided us with a way to generate vector embeddings, and we would have had to implement and manage the rest of the pipeline (e.g. the text extraction, vector storage, retrieval algorithm, reranker, etc). This led us to Vectara, whose semantic search platform includes the entire end-to-end pipeline, which made it fast and easy to bring LLM-based semantic search into our application. With Vectara we improved our search results to 70% accuracy, and avoided the need to construct and manage the entire ML Ops infrastructure.

The real success came when we combined Vectara’s semantic search with our existing keyword-based search, incorporating product keywords into Vectara’s catalog. This approach brought our accuracy to 80%, resulting in more consistent and effective results. The experience taught us that even seemingly simple challenges can require innovative solutions, and that combining conventional methods with newer technologies can lead to significant improvements.

Figure 3: This diagram showcases two distinct search methodologies: semantic and keyword-based. While the semantic approach uses vector similarity to match products based on intent of the user, the keyword-based method matches query keywords directly with products. It is possible to further enhance the semantic search by adding the identified keywords for each product into the indexed set of vectors.

Positive Outcomes

Upon deploying the enhanced search engine to production, we observed a palpable shift in search relevance and accuracy, underlining the efficacy of our combined approach. In essence, what we ended up with was a blend of search and recommendation. The aim was not to find items, but to suggest the most appropriate matches, even when the products’ descriptions and user queries were ambiguously phrased.

While the zenith of 100% accuracy remains an elusive target, we witnessed significant improvements, pushing the accuracy rates to 80%. This isn’t just about the numbers, though. It’s crucial to comprehend and define our metrics of success and align our engineering goals accordingly.

Moreover, another vital metric that surfaced was the Total Cost of Ownership. While implementing this new Vectara-based search pipeline that got us to an 80% accuracy rate, we were able to dramatically lower the maintenance costs compared to our previous approaches (which also had lower accuracy). The savings in operational expenses, coupled with the heightened accuracy, made the deployment particularly cost-effective.

In reflecting upon this development, it’s clear that while our solution has indeed made great strides in addressing the inconsistencies in product descriptions, the evolving nature of eCommerce and the myriad ways in which products can be described imply that perfection might always be just out of reach.

Next Steps

You can learn more about Applaudo, and the wide range of IT solutions they provide, at https://applaudo.com/.

You can try out Vectara in just 5 minutes by creating a free account at https://console.vectara.com/signup, then simply uploading some data and firing off some queries. Also, check out some demos in our new Demo Gallery.