Retrieval and Search
Searching Hacker News
The Hacker News startup community website deserves better search
July 02 , 2024 by Ofer Mendelevitch
Introduction
Hacker News is a social news website focused on computer science, technology, and entrepreneurship. It was created by Paul Graham, as part of YC, the startup accelerator he co-founded. The site functions similarly to Reddit, allowing users to submit links to interesting articles, blog posts, or projects (aka show HN), which can then be upvoted or commented on by the community.
Hacker News quickly became popular among tech professionals, programmers, and startup founders as a place to discover and discuss cutting-edge developments in technology and business.
Its minimalist design and focus on high-quality, intellectual discussions have helped maintain Hacker News’ reputation as a valuable resource for those interested in the tech industry.
I use it all the time.
The search functionality for Hacker News (powered by Algolia) is fast and supports type-ahead, but I personally find that in many cases it does not find the stories or comments I am looking for.
For example, if I search for “is AMD faster than Nvidia?”, I get a single result from 12 years ago:
Figure 1: Example with current hacker news search
And there are other stories that are more relevant, like this one.
Can we make it better?
There have been quite a few projects in the last few years that set to tackle this problem and improve the HN search experience, such as HackerSearch.net (which also adds summarization), HN by LixiaSearch, or SearhcHacker.News.
I decided to try doing this using Vectara.
Hacker News Search with Vectara
Vectara enables developers to create RAG (retrieval augmented generation) applications, such as a chatbot or a question-answering application.
At the core of the platform is a powerful semantic search and retrieval engine (the “R” in RAG).
What does that mean? A state-of-the-art embedding model (Boomerang), a robust implementation of hybrid search, and an accurate multi-lingual reranking model.
So I decided to build a new search experience for Hacker News, powered by Vectara’s retrieval engine to see if it provides better results. It is available for anyone to use here:
https://hackernews.demo.vectara.com/
Let’s run the same example from above, this time using the new semantic search:
Figure 2: Hacker News search example with Vectara
Actually, this is much better. The results returned are more recent and on topic.
Creating this with Vectara was quite quick and easy.
First I created a corpus and used vectara-ingest and its hackernews crawler to ingest data into this corpus. This crawl was limited to roughly the last 6 months, although it re-crawls daily and continues to add stories and comments as they become available. Of course, it can be configured to recrawl hourly for more frequent updates.
I used Vectara’s Create-UI tool to generate the base search user interface and customized the codebase to add additional features, such as grouping comments, adding user names and dates, and so on.
I hope this new search experience for Hacker News is valuable to the HN community.
Conclusion
The Hacker News community is an active and opinionated community that helps technologists discover, share and discuss cutting-edge developments in technology.
I believe having an upgraded search experience can help this community become even better.
Using Vectara’s powerful semantic search, and open tools like vectara-ingest and create-ui, I built a new search experience for Hacker News.
You can do this too: enhance the search experience for your website, community or enterprise knowledge base.
To get started with Vectara you can sign up for a free Vectara account and check out our documentation.
And please feel free to share your best use of Vectara Semantic search on our “#built-with-vectara” channel on our Discord server.