November 11th, 2024

Hunyuan-Large: An Open-Source Moe Model with 52B Activated Parameters

Hunyuan-Large, Tencent's largest open-source MoE model with 389 billion parameters, excels in language tasks, outperforms LLama3.1-70B, and features innovations like synthetic data and advanced routing strategies.

Read original article

Hunyuan-Large: An Open-Source Moe Model with 52B Activated Parameters

Hunyuan-Large is introduced as the largest open-source mixture of experts (MoE) model developed by Tencent, featuring a total of 389 billion parameters, with 52 billion activated parameters. This model is designed to handle up to 256,000 tokens and has been evaluated across various benchmarks, demonstrating superior performance in language understanding, generation, logical reasoning, mathematical problem-solving, coding, and long-context tasks. Hunyuan-Large outperforms the LLama3.1-70B model and shows comparable results to the larger LLama3.1-405B model. Key innovations in Hunyuan-Large include the use of large-scale synthetic data, a mixed expert routing strategy, key-value cache compression, and an expert-specific learning rate strategy. The research also explores scaling laws and learning rate schedules for MoE models, providing insights for future developments. The code and model checkpoints have been released to support further research and applications in the field.

- Hunyuan-Large is the largest open-source MoE model with 389 billion parameters.

- It excels in various benchmarks, outperforming LLama3.1-70B and matching LLama3.1-405B.

- Key innovations include large-scale synthetic data and advanced routing strategies.

- The model can handle up to 256,000 tokens.

- Code and checkpoints are available for public use to encourage further research.

Llama 3 Secrets Every Engineer Must Know

Llama 3 is an advanced open-source language model trained on 15 trillion multilingual tokens, featuring 405 billion parameters, improved reasoning, and multilingual capabilities, while exploring practical applications and limitations.

Mixture of a Million Experts

The paper "Mixture of A Million Experts" introduces a sparse MoE architecture, PEER, which improves transformer efficiency by enabling retrieval from over a million experts, enhancing performance without high computational costs.

SmolLM2

SmolLM2 is a new family of lightweight language models from Hugging Face, available in three sizes, trained on 11 trillion tokens, and designed for on-device operation with accessible model weights.