New HHEM and OpenAI chat completions endpoints
We’ve created a trusted platform for building safe and reliable AI applications and continue to invest in features that improve reliability, accuracy, and flexibility. Today, we're taking another step forward by introducing two powerful new capabilities: the Hallucination Evaluation Model (HHEM) and OpenAI Chat Completions endpoints.
4-minute read time
We are excited to announce two new API endpoints designed to enhance your AI development experience. Our Hallucination Evaluation Model (HHEM) endpoint helps ensure the trustworthiness of generated content, while our new OpenAI Chat Completions compatible endpoint provides flexible, generation-only capabilities using familiar standards. These additions further our commitment to providing a trusted, extensible platform for building powerful AI applications.
Why these capabilities matter
Businesses building AI applications face critical challenges. Ensuring the reliability and factual accuracy of large language model (LLM) outputs is paramount; hallucinations or inaccurate information can erode user trust and undermine an application's value. This need for trust is a cornerstone of deploying AI responsibly and effectively.
Furthermore, developers often require more granular control and flexibility than a full retrieval augmented generation (RAG) pipeline provides for every task. Sometimes, you just need powerful generation capabilities for specific use cases, like summarizing chat conversations, without the retrieval overhead. Offering components of the AI workflow allows for greater integration possibilities and architectural freedom, especially when these components adhere to familiar standards.
Introducing new HHEM and OpenAI chat completions endpoints
To address these needs directly, we are launching two specialized API endpoints.
HHEM endpoint
The problem we’re solving: Make the evaluation of factual grounding and hallucinations in AI-generated text more flexible.
Why it's important: Trust is non-negotiable in business applications. Users and stakeholders need assurance that the information provided by AI systems is reliable and based on the provided source material. Evaluating for hallucinations is key to building this trust and deploying AI responsibly.
How it works: The HHEM endpoint provides direct access to our proprietary Hughes Hallucination Evaluation Model. You can send generated text along with the source context it was based on. HHEM analyzes the generated text against the source context and returns a Factual Consistency Score (FCS), helping you quantify the faithfulness of the output. This allows you to programmatically flag potentially problematic content, implement quality control checks, and ultimately increase user confidence in your AI application's responses.
OpenAI chat completions endpoint
The problem we’re solving: Improve the flexibility of Vectara’s generation capabilities and make the platform more flexible for more use cases.
Why it's important: Many workflows benefit from standalone generation tasks like summarizing existing chat logs, reformatting content, or creative writing, where context is already present or not needed via retrieval. Developers need efficient ways to integrate these capabilities using tools and standards they already know. Furthermore, using our platform as a managed gateway can simplify security, compliance, and administration for accessing various LLMs.
How it works: This new endpoint provides direct generation access to the large language models registered within your Vectara account. Crucially, it is designed to be fully compliant with the OpenAI Chat Completions API specification, the de facto standard used by developers globally. This intentional compatibility means you can leverage the vast ecosystem of existing tools, libraries (like openai-python), and tutorials built for the OpenAI API. Often, you can use your existing integration code with minimal changes – simply point your tools to the Vectara endpoint URL and authenticate using your Vectara API keys.
This dramatically lowers the barrier to entry, accelerates adoption, and allows you to use our platform as a secure, managed proxy for your generative AI calls. This is particularly useful in on-premises or regulated environments where direct external access might be restricted or needs consolidation through a trusted intermediary.
Building a trusted and flexible platform for AI
These two new capabilities significantly enhance the Vectara platform, reinforcing our commitment to being a truly trusted and flexible solution for AI development. The HHEM endpoint provides crucial tools for building reliable applications, directly supporting the trusted pillar of our vision. The OpenAI Chat Completions endpoint offers increased modularity and efficiency by adhering to a familiar standard, making our platform more flexible and significantly easier to integrate into existing developer workflows and diverse application architectures, whether using our SaaS or on-premises versions. We believe in meeting you where you are, providing the components you need to build innovative and dependable AI solutions.
For the latest documentation about these two new features, have a look at the HHEM and OpenAI Chat Completions documentation pages.
As always, we’d love to hear your feedback! Connect with us on our forums, on our Discord or on our community. If you’d like to see what Vectara can offer you for retrieval augmented generation on your application or website, sign up for an account!
