July 11, 2023 by Vivek Sourabh | 6 min readRead Now
Traditional approaches for eCommerce search have hit their limits, especially when there is a wide variety of products and services and wide variability in how they are described. We explored technical options to tackle this issue, and found that combining traditional keyword based approaches with modern semantic search techniques yielded the best results
September 5, 2023 by Justin Hayes Jose Montaña
The world of eCommerce search, built on keyword-based retrieval, has been well established, and relatively stable, for many years now. Advances to state of the art typically come with high operational overhead activities like language management, feature generation, and taxonomy management. But recent advances related to Large Language Models (LLMs) are making it possible to achieve search relevance metrics – and therefore sales conversions – that were never before possible.
This is especially pronounced in the case of eCommerce Marketplaces, where data within product catalogs is notoriously inconsistent and incomplete because it comes from an ecosystem of external vendors where data quality is difficult to enforce. Thankfully, Vectara’s ability to understand the meaning of data and the intent of users makes it an ideal hybrid search tool to help eCommerce companies achieve these improved metrics and business outcomes. Even better, because Vectara takes care of the entire infrastructure pipeline – from extraction to embedding generation to vector storage to retrieval to reranking – developers simply need to call the Vectara APIs to add their data and to run their users’ searches. It’s as simple as that.
This blog post explains how Applaudo, an IT Solution Provider with expertise in building AI-powered applications, enhanced an existing eCommerce Marketplace using Vectara’s LLM-based semantic search platform.
We faced a challenge that seemed simple at first: organize and standardize a database for a marketplace where new retailers were constantly joining to offer products. The problem arose when we discovered that the same products were described differently by each retailer.
We initially tried to solve this by developing a conventional keyword-based search engine. The difficulty was that not all products had keywords, and different retailers used varying conventions or classifications for their products. Since we lacked a common “taxonomy” for identical products across different retailers, we decided to train a keyword-based model using feature vectors.
Our goal wasn’t to always place the exact product as the first suggestion, but to have it within the top three rankings. Our preliminary proof of concept achieved an accuracy of 60%, but when we tested it within the production environment, our accuracy dropped to 40%. The data in production was messy, and maintaining the model’s effectiveness became an ongoing challenge as new retailers joined with differing keywords and product descriptions.
For example, assume that a user searches for
cherry coke 300ml, and the product catalog has entries for
cherry juice 300ml,
pepsi wild cherry,
sprite 300ml, and others that have similar keywords. Since the keyword approach is essentially to match as many keywords as possible, the user would end up with products that have a lot of keyword overlap, but which are not exactly what is wanted, which is a Coke. While it is possible to do a weighted keyword approach (e.g. weigh the brand name more heavily than other keywords), this is very difficult and costly to manage across an entire product catalog, because you end up with 100s or 1000s of different weighting rules.
Realizing the need for a better solution, we explored semantic search using vector embeddings. This approach takes advantage of the fact that vector embeddings can represent the meaning inherent to a piece of text. When combined with extraction, storage, and retrieval techniques this provides a powerful semantic search capability.
This approach seemed promising, and we initially considered OpenAI. However, that only provided us with a way to generate vector embeddings, and we would have had to implement and manage the rest of the pipeline (e.g. the text extraction, vector storage, retrieval algorithm, reranker, etc). This led us to Vectara, whose semantic search platform includes the entire end-to-end pipeline, which made it fast and easy to bring LLM-based semantic search into our application. With Vectara we improved our search results to 70% accuracy, and avoided the need to construct and manage the entire ML Ops infrastructure.
The real success came when we combined Vectara’s semantic search with our existing keyword-based search, incorporating product keywords into Vectara’s catalog. This approach brought our accuracy to 80%, resulting in more consistent and effective results. The experience taught us that even seemingly simple challenges can require innovative solutions, and that combining conventional methods with newer technologies can lead to significant improvements.
Upon deploying the enhanced search engine to production, we observed a palpable shift in search relevance and accuracy, underlining the efficacy of our combined approach. In essence, what we ended up with was a blend of search and recommendation. The aim was not to find items, but to suggest the most appropriate matches, even when the products’ descriptions and user queries were ambiguously phrased.
While the zenith of 100% accuracy remains an elusive target, we witnessed significant improvements, pushing the accuracy rates to 80%. This isn’t just about the numbers, though. It’s crucial to comprehend and define our metrics of success and align our engineering goals accordingly.
Moreover, another vital metric that surfaced was the Total Cost of Ownership. While implementing this new Vectara-based search pipeline that got us to an 80% accuracy rate, we were able to dramatically lower the maintenance costs compared to our previous approaches (which also had lower accuracy). The savings in operational expenses, coupled with the heightened accuracy, made the deployment particularly cost-effective.
In reflecting upon this development, it’s clear that while our solution has indeed made great strides in addressing the inconsistencies in product descriptions, the evolving nature of eCommerce and the myriad ways in which products can be described imply that perfection might always be just out of reach.
You can learn more about Applaudo, and the wide range of IT solutions they provide, at https://applaudo.com/.
July 11, 2023 by Vivek Sourabh | 6 min readRead Now
August 22, 2023 by Tallat Shafaat Talip Ozturk | 10 min readRead Now