HHEM | Flash Update: Google Gemma

Following the recent release of Gemini 1.5, Google has just unveiled its latest contribution to the landscape of large language models (LLMs): Gemma, an open-source model available in 2B, 7B, and instruction fine-tuned variants. Using the open-source Hughes Hallucination Evaluation Model (HHEM), we quantify the tendency of Gemma to hallucinate when summarizing a set of facts, a key benchmark in applications that employ the Retrieval Augmented Generation (RAG) architecture.

Our updated leaderboard positions Gemma, with a hallucination rate of 7.5%, on par with Cohere’s Chat model, right below Llama2 13B. This is significantly better than Mistral 7B’s 9.4% rate, yet falls short of Llama2 7B’s 5.6%. Gemma’s answer rate of 100.0% also stands out, highlighting its suitability as a summarizer.

Moreover, the release of Gemma under the liberal “Gemma Terms of Use” is a strategic move by Google to facilitate easier integration into commercial and enterprise systems. This decision mirrors Microsoft’s earlier move with Phi 2, underscoring a significant shift towards open licensing in the LLM domain.

The table below, which reproduces the leaderboard as of February 21, 2024, shows Gemma’s performance in relation to other foundation models:

HHEM-Hughes-Hallucination-Evaluation-Model-Gemma

The implications of Gemma’s release echo the sentiment of our previous analysis: the landscape of proprietary LLM vendors is under increasing pressure to innovate and adjust pricing strategies. With Gemma, the message is clear — the race for efficiency, performance, and accessibility in LLMs is heating up, with end users standing to gain the most.