Vectara
Back to blog
Hallucination

HHEM | Flash Update: Anthropic Claude 3

See how Anthropic’s new Claude 3 LLM hallucinates compared to other foundation models in the Hughes Hallucination Evaluation Model (HHEM)

2-minute read timeHHEM | Flash Update: Anthropic Claude 3

In an exciting development on March 4, 2024, Anthropic, a formidable competitor of OpenAI, unveiled its latest innovation: the Claude 3 suite of AI models. This groundbreaking collection includes three advanced models named Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus, each with unique attributes: Haiku is celebrated for its “light and fast” capabilities, Sonnet for its “hard-working” nature, and Opus for its “powerful” performance.

In a significant benchmarking revelation, Anthropic’s “powerful” model, Claude 3 Opus, has demonstrated performance levels that either match or surpass those of OpenAI’s GPT-4, a model previously considered the pinnacle of AI technology.

Our team has utilized the Hughes Hallucination Evaluation Model (HHEM) to assess the tendency of the Claude 3 Opus and Sonnet models to generate factually inconsistent summaries. We plan to extend this analysis to Claude 3 Haiku in the near future when it becomes available.

On our updated leaderboard, both Claude 3 Opus and Sonnet models have shown higher factual consistency rates compared to Google’s recently introduced Gemma model. It’s noteworthy, however, that despite being considered as the more “powerful,” Claude 3 Opus ranks slightly below Claude 3 Sonnet in our leaderboard. This ranking, based on a limited evaluation set, should not be hastily interpreted as a definitive measure of superiority between the models in the context of model hallucination.

The leaderboard, as of March 6, 2024, is illustrated below, presenting a comparative view of the models’ performance against other foundational models:

HHEM update

The findings highlight notable improvements in performance when compared to Claude 2, shedding light on the overall progress within the industry – spanning both open and closed models – on our benchmark. Nevertheless, the claim that Claude 3 models surpass GPT-4 warrants a cautious examination, particularly with regard to factual consistency.

As opposed to recent releases of many open-source models, Claude 3 models are not open-sourced and can be accessed through the Anthropic API. 

Get the HHEM on HuggingFaceGet the HHEM on HuggingFaceTo code repository
Get the HHEM on HuggingFace
Before you go...

Connect with
our Community!