How Sankofa was Built with Vectara

Introduction

In today’s digital age, navigating the vast landscape of the Internet can often feel like searching for a needle in a haystack. With countless web pages visited daily, keeping track of valuable information can be a daunting task.

Enter Sankofa, a GenAI browser extension built using Vectara’s RAG-as-a-service platform, that helps users keep track of information they visit on the web and ask questions, ChatGPT-style, based on that information.

In this blog post, we delve into the intricate workings of Sankofa and how we built it using Vectara.

What is Sankofa?

Sankofa, which means to retrieve in the Twi language of Ghana embodies the essence of “going back and fetching it,” empowering users to effortlessly access the content of web pages they’ve previously visited.

Unlike traditional browser history features that merely store URLs, Sankofa remembers the actual content of visited pages and leverages the power of generative AI to allow question-answering using retrieval augmented generation (RAG), based on that content. Whether you’re seeking answers to specific questions in a ChatGPT-style interaction, searching for pages on a particular topic, or discovering similar content to what you’re currently viewing, Sankofa will provide insightful and relevant responses based on web pages you visited in the past.

Sankofa is currently available for Chrome, Firefox, and Edge.Check out this blog post for more details on how to install and set up Sankofa for your browser.

Key Features of Sankofa

To use Sankofa you first have to sign-up for a Vectara account and set up a corpus to use with Sankofa. You then set up Sankofa to use that corpus for indexing web content and of course, using that content later to answer questions.

Sankofa, as a browser extension, performs the following functions:

Indexing: as you visit a webpage, you can opt to index its content in your Vectara corpus. You can click on the index button in the extension drop-down or index selected text from the context menu. You can also configure “auto-indexing” – in this case, pages will be indexed automatically in the background when you visit the page, or after a certain delay (like 5 seconds).
Searching: as the user, you can search your indexed content to display matching web pages or answer questions (ChatGPT-style) based on that content.
Finding Similar Pages: this functionality identifies related pages to the one currently being viewed by the user. The extension collects all the text content from the webpage you are currently on and utilizes Vectara to conduct a search for similar pages based on that text.

To learn more about indexing and searching using Sankofa in your browser please see this video tutorial.

Understanding Sankofa’s Architecture – The Technical Details

The landscape of generative AI is vast and rapidly growing. If you choose to build RAG on your own (DIY) you quickly realize that choosing the right tools is quite challenging. Furthermore, you have to become an expert in DevOps, MLOps, PromptOps, and other areas of expertise required to build GenAI yourself.

With Vectara, all that complexity goes away, because it is hidden behind our API – as a developer, all you have to do is provide the content for RAG and use the Indexing and Query APIs to implement a fully functional, scalable, and secure RAG system.

We have open-sourced Sankofa (repo) to demonstrate how to integrate Vectara into a browser extension and how easy it is to integrate into any application. Let’s dive into the details.

Sankofa Architecture

Sankofa is a cross-browser extension built using Plasmo, which allows it to work on different browsers without having to customize the code for each browser type. This is demonstrated in Figure 1 below:

Figure 1: High-level architecture of the Sankofa browser extension

The UI component you probably consider as the extension itself sits in the navigation bar of the browser. It will be revealed after clicking the extension button and can be seen as a typical website in a small window.

This popup includes a button to explicitly index a page you are currently on, find similar pages, or issue a search query. The configuration icon opens up a separate window where you can configure Sankofa.

Content Script

There are various functions that need to run in the context of web pages in the browser. Through DOM manipulation and event listeners they can access the details of the webpage (for example when a page needs to be indexed and thus requires access to the full content of the page).

All the other components of an extension run in different contexts and thus have no access to the DOM.

Since content scripts run in a different context, browsers provide Messaging APIs for communication. In Sankofa, we are using the Plasmo messaging API. Here is the workflow for passing information:

You register an onMessage event listener in your content script
In your popup, you send a message with a callback to your content script
The content script will use the callback to send a response to your popup

Figure 3: How content scripts communicate with the extension popup

Background Service worker

Sankofa implements a background service worker. This background service worker maintains a persistent or event-driven background process. This worker is essential for handling tasks that need to run independently of the user interface, such as adding the indexing and search options in the context menus, performing indexing, and interacting with web pages. This is shown in figure 4 below:

Indexing Web Content

When Sankofa needs to index the content of a webpage or a particular snippet from that webpage, it dispatches a message to the content script. This script then processes the content of the webpage, subsequently dispatching a message to the background service worker. The worker utilizes Vectara’s indexing API, which enables document indexing with minimal lines of code, Below is an example code snippet from Sankofa, illustrating this process.

code snippet from Sankofa that shows the payload being used to create the documents in the Vectara.

code snippet from Sankofa that shows how to call the Vectara Indexing API.

How Sankofa Responding to User Search Queries

The Query API lets you perform a query while defining its parameters that specify the query text, metadata filters, and other search and summarization settings that enable application builders to tailor their queries to specific use cases.

Below is an example code snippet from Sankofa, illustrating this process.

Making the call to Vectara:

Conclusion

Sankofa is a web browser extension demonstrating how easy it is to integrate Vectara’s RAG-as-a-service to create GenAI functionality inside a browser extension.

We are working on improved functionality with Sankofa, such as full chat functionality and more. Please let us know what other features you would like to see in the Issues section or in our Discord server.