November 11th, 2024

Hunyuan-Large: An Open-Source Moe Model with 52B Activated Parameters

Hunyuan-Large, Tencent's largest open-source MoE model with 389 billion parameters, excels in language tasks, outperforms LLama3.1-70B, and features innovations like synthetic data and advanced routing strategies.

Read original articleLink Icon
Hunyuan-Large: An Open-Source Moe Model with 52B Activated Parameters

Hunyuan-Large is introduced as the largest open-source mixture of experts (MoE) model developed by Tencent, featuring a total of 389 billion parameters, with 52 billion activated parameters. This model is designed to handle up to 256,000 tokens and has been evaluated across various benchmarks, demonstrating superior performance in language understanding, generation, logical reasoning, mathematical problem-solving, coding, and long-context tasks. Hunyuan-Large outperforms the LLama3.1-70B model and shows comparable results to the larger LLama3.1-405B model. Key innovations in Hunyuan-Large include the use of large-scale synthetic data, a mixed expert routing strategy, key-value cache compression, and an expert-specific learning rate strategy. The research also explores scaling laws and learning rate schedules for MoE models, providing insights for future developments. The code and model checkpoints have been released to support further research and applications in the field.

- Hunyuan-Large is the largest open-source MoE model with 389 billion parameters.

- It excels in various benchmarks, outperforming LLama3.1-70B and matching LLama3.1-405B.

- Key innovations include large-scale synthetic data and advanced routing strategies.

- The model can handle up to 256,000 tokens.

- Code and checkpoints are available for public use to encourage further research.

Link Icon 1 comments
By @fngjdflmdflg - 3 months