Skip to main content
Menu

Blog Post

HHEM Flash Update Hero Image

Hallucination

HHEM | Flash Update: Fast. But Are They Furious?

GPT4o and Gemini-1.5-Flash are fast and cheap, but hallucinate more

What a week it’s been. And it’s only Wednesday.

On Monday, OpenAI launched their new GPT4o model, a faster and cheaper model (half the price and twice the speed of GPT4-Turbo), while also being an “omni-modal” model, with a very compelling demo demonstrating how fast it is, and what it means to be omni-modal.

On Tuesday Google announced (among many other cool announcements) the general availability of Gemini 1.5 Flash. This model is also faster and cheaper while supporting the previous long context length of 1M tokens that we saw in the full Google Pro 1.5 model.

Our team was able to quickly evaluate these models for their tendency to hallucinate using Hughes Hallucination Evaluation Model (HHEM). 

The results?

Well, the models are certainly fast. But they are not as furious.

As our updated leaderboard shows, both models perform worse than their earlier iterations. GPT4-Turbo sported an amazing 2.5% and GPT4o worsened to 3.7%. Google Gemini 1.5 Pro shows a hallucination rate of 4.6% whereas the Flash model worsened to 5.3%.

Reflecting on this, it actually makes sense that models optimized for speed and cost would lose some of their capabilities. Of course, we’d rather that won’t be the case, but alas that’s a common engineering trade-off that seems to apply here too.

HHEM Flash Update Blog - Ofe/Miaoran

Figure 1: GPT4-turbo vs GPT4o hallucination rates

HHEM Flash Update Blog -Ofer/Miaoran Image 1

Figure 2: Gemini-Pro-1.5 vs Gemini-1.5-Flash hallucination rates

And finally, the news that Ilya is leaving OpenAI just dropped. As many have said, he is one of the most amazing innovators in the field, and I think I can safely say we are all extremely grateful for his contributions that led us all to this age of AI.

Recommended Content

OPEN SOURCE

HHEM Leaderboard

Check out Vectara's HHEM leaderboard listing the hallucination rates of various LLMs.

HHEM Leaderboard
Resource Image
Close Menu