Introducing Vectara InstantIndex

Indexing Challenges

Considering the scale of data today, most data processing systems are designed to work efficiently in batch mode. This was the original premise of systems like Apache Hadoop and Apache Spark. While these systems were efficient at processing data in batches, several applications depend on near real-time processing capabilities. This resulted in systems like Apache Kafka and Spark Streaming.

In a search system, adding large amounts of data to an index at once – also known as batch indexing – enables systems to implement several performance and cost optimizations. At the same time, several search use cases require new documents to be added or existing documents to be updated in the index frequently.

For example, a chat application can add new chat messages to the index or you may want to add/remove products from a searchable product catalog very quickly. Adding a stream of incoming documents to an existing index is called incremental indexing.

The challenge in incremental indexing is to support small updates to large indexes in a performant and cheap manner while also minimizing the delay between the time the document was requested to be added to the index and the time it is available to serve queries.

Indexing Modes in Vectara

Vectara supports both batch indexing and incremental indexing. This is important because some use cases have pre-processing steps for data that might be themselves batch. To support this pattern, it’s important that indexing can happen in near real-time. Consider Google’s search engine before the advent of Caffeine, which provided up to 50% better search relevance. Now with common use cases around speech-to-text and unstructured text, the power of instant indexing with immediate results is key to meeting user demand. Within incremental indexing, Vectara supports two modes: regular incremental indexing, which can take a few minutes from indexing to serving, and instant indexing, which takes a few seconds in most cases. All modes are available to customers without any configuration needed, albeit all modes may not be supported by all pricing plans.

How Vectara Achieves Incremental Indexing

In incremental indexing, Vectara batches the stream of incoming documents as much as possible. It breaks down the incoming stream of documents into chunks called journal entries. Each journal entry contains one or more document parts. Thus, a single document can span multiple journal entries. Similarly, document parts belonging to different documents can be part of the same journal entry. These journal entries are then applied to an existing index. The journals act as small batch updates and serve the purpose of batching as many updates as possible.

Our search nodes are designed to add data from journal entries to an existing index. Special attention is paid during journal application so that reads on the index are not blocked while updating the index. If reads were to be blocked, user queries could not be run on an index that is being updated due to the journal application. This means user queries will be rejected or incur long latencies if they wait for the journal writes to finish. Vectara makes sure user queries are not impacted by index updates.

You may notice in the journal creation and application process, there is a delay introduced by buffering data and by search nodes taking their time to notice new journals and applying them to the index. This can result in a time delay of a few minutes between when a document is sent to Vectara for indexing, and when it is available in the search node to be part of query results. Vectara solves this problem by introducing InstantIndex.

InstantIndex

InstantIndex is a mode where documents sent by users are instantly available to be queried. Vectara achieves this by short-circuiting the incremental indexing process: bypassing the event streaming system and journaling [see figure 1 above]. When a document is received for indexing, indexing servers create in-memory data structures that use the same format as journal entries. We call these in-memory journal entries. These in-memory journals do not wait for multiple documents to arrive. As soon as the document has been processed, its data is compiled into the in-memory journal and sent to the search node. The search node immediately applies the journal to the corresponding index. Note that Vectara is a multi-tenant system, which has several search nodes and each user is assigned to a subset of the search nodes (based on the replication degree requested for the user). The indexing server is aware of which search nodes a customer is assigned to, and it multicasts the in-memory journal to the exact replica set of search nodes that are hosting the customer.

Why InstantIndex?

InstantIndex reduces the time between when a document was requested and when it is available to be returned in the result of queries. There are several business reasons why this may be needed, some listed below.

Documents or notes uploaded or created by customer success and sales teams in different business applications need to be indexed and immediately searchable to guide decision-making and customer interactions. When there are multiple team members contributing such content in a SaaS platform, it is difficult to validate and confirm the indexing has completely happened after each indexing operation
Large volumes of call center transcripts generated in real-time need to be indexed in a very short period of time without minimal error and latency variability in making these transcripts searchable.
When launching a new product, you want it to be available in the index at an exact time, not before it should be available. For example, an iPhone gets released at midnight, so don’t show it in search results until then.
When an employee is terminated, because even search results can be used to exfiltrate data, immediately remove their access to search results by updating documents access/validating that data is correct after posting. Consider a blog or other content site with a search bar powered by Vectara. An author posts a new blog, which gets added to Vectara. It is a bad user experience if the user has to set a reminder to come back and check that Vectara has actually indexed the new blog and it is available for search. It might be acceptable to have an API that provides the current status of outstanding indexing requests, but it is best to ensure the content is indexed immediately by issuing a search after the index operation completes.
When multiple back-end systems are connected together, it can be very useful to filter by when documents were indexed. For example, imagine a system where a forum owner wants to email all users that have posts on the forum that relate to ketchup (for some reason). The system runs a background search that says “find all posts relating to ketchup from the past 24 hour period and get the e-mail addresses of each poster”. In a scenario where a user posts something at 11:58 PM, are they going to be in day 1 or day 2? What you might do in this scenario is attach post-time metadata, but sometimes the index time metadata can be just as important. It’s easy to end up missing data if things are not synchronous and this kind of machine-based query with particular times matters.

Conclusion

In today’s fast-paced world, waiting is never the ideal option. Companies want to leverage the power of instant to create strategic differentiation and get ahead of their predictions. In the world of indexing, context only becomes real once the relevant data has been indexed, and for many use cases, that indexing needs to happen immediately. Vectara supports both batch and incremental indexing, allowing users to leverage common stream processing engines or bypass them to speed delivery. Now with the introduction of instant indexing, the time between when a document was requested and when it is ultimately indexed is engineered to match your use case requirements.

Want to learn more about Vectara? Create your free trial account today.