August 22nd, 2024

Gemma explained: What's new in Gemma 2

Gemma 2 introduces open models in 2B, 9B, and 27B sizes, enhancing conversational AI with innovations like GQA and logit soft-capping, while future developments will explore the RecurrentGemma model.

Read original articleLink Icon
Gemma explained: What's new in Gemma 2

Gemma 2 has been introduced as a new suite of open models that enhances performance and accessibility in conversational AI. It is available in three parameter sizes: 2B, 9B, and 27B. The 27B model has quickly gained recognition, outperforming larger models in real-world conversations, while the 2B model excels in edge device applications, surpassing all GPT-3.5 models. Key innovations in Gemma 2 include alternating local and global attention, logit soft-capping to improve prediction confidence, and RMSNorm for stable training. Grouped-Query Attention (GQA) replaces traditional multi-head attention, allowing for more efficient processing of large text volumes. The models were trained using knowledge distillation from the larger 27B model, leading to significant performance improvements. The findings suggest that deeper models perform slightly better than wider models with the same parameter count. Future developments will explore the RecurrentGemma model, which is based on Griffin.

- Gemma 2 is available in 2B, 9B, and 27B parameter sizes.

- The 27B model outperforms larger models in conversational tasks.

- Key innovations include GQA, logit soft-capping, and RMSNorm for improved performance.

- Knowledge distillation from larger models enhances the performance of smaller models.

- Future updates will focus on the RecurrentGemma model.

Link Icon 1 comments