HuggingFace - Tencent launches Hunyuan Large which outperforms Llama 3.1 405B
Tencent has launched the Hunyuan-Large model, the largest open-source MoE model with 389 billion parameters, excelling in AI applications and promoting collaboration for further advancements in technology.
Read original articleTencent has introduced the Hunyuan-Large model, the largest open-source Transformer-based Mixture of Experts (MoE) model, featuring 389 billion parameters with 52 billion active parameters. This model aims to optimize resource consumption while maintaining high performance in various AI applications, including natural language processing and computer vision. Key technical advantages include the use of high-quality synthetic data for enhanced learning, KV cache compression techniques to reduce memory usage, and expert-specific learning rate scaling for improved model training. The Hunyuan-Large model supports long-context processing, handling text sequences up to 256K, and has demonstrated superior performance in multiple benchmarks, outperforming competitors in tasks such as commonsense reasoning, reading comprehension, and mathematics. Notably, the Hunyuan-Large-Instruct variant shows significant improvements in MMLU and MATH datasets, indicating its advanced understanding and reasoning capabilities. The model's efficiency is highlighted by its performance, achieving high accuracy with fewer activated parameters compared to other large models. Tencent encourages collaboration within the open-source community to further explore and optimize AI technologies.
- Hunyuan-Large is the largest open-source MoE model with 389 billion parameters.
- It utilizes advanced techniques for memory and computational efficiency.
- The model excels in long-context processing and various AI benchmarks.
- Hunyuan-Large-Instruct shows significant improvements in language understanding tasks.
- Tencent promotes open-source collaboration for future AI advancements.
Related
Llama 3 Secrets Every Engineer Must Know
Llama 3 is an advanced open-source language model trained on 15 trillion multilingual tokens, featuring 405 billion parameters, improved reasoning, and multilingual capabilities, while exploring practical applications and limitations.
Mixture of a Million Experts
The paper "Mixture of A Million Experts" introduces a sparse MoE architecture, PEER, which improves transformer efficiency by enabling retrieval from over a million experts, enhancing performance without high computational costs.
New Phi-3.5 Models from Microsoft, including new MoE
Phi-3.5-MoE-instruct is a Microsoft model for text generation and reasoning, featuring 6.6 billion parameters, 128,000 token context, multilingual support, and rigorous safety evaluations, available in Azure AI Studio.
Open-source 70B model surpass GPT-4o and Claude 3.5 on Arena Hard
NVIDIA's Llama-3.1-Nemotron-70B-Instruct model, optimized for helpfulness, ranks first in alignment benchmarks, utilizes advanced RLHF techniques, requires significant hardware, and emphasizes ethical usage guidelines for responsible application.
Nvidia releases weights for Llama-3.1-Nemotron-70B-Instruct
NVIDIA launched the Llama-3.1-Nemotron-70B-Instruct model, ranking first in three benchmarks, utilizing RLHF techniques, requiring significant hardware, and emphasizing ethical AI development and responsible usage.
Related
Llama 3 Secrets Every Engineer Must Know
Llama 3 is an advanced open-source language model trained on 15 trillion multilingual tokens, featuring 405 billion parameters, improved reasoning, and multilingual capabilities, while exploring practical applications and limitations.
Mixture of a Million Experts
The paper "Mixture of A Million Experts" introduces a sparse MoE architecture, PEER, which improves transformer efficiency by enabling retrieval from over a million experts, enhancing performance without high computational costs.
New Phi-3.5 Models from Microsoft, including new MoE
Phi-3.5-MoE-instruct is a Microsoft model for text generation and reasoning, featuring 6.6 billion parameters, 128,000 token context, multilingual support, and rigorous safety evaluations, available in Azure AI Studio.
Open-source 70B model surpass GPT-4o and Claude 3.5 on Arena Hard
NVIDIA's Llama-3.1-Nemotron-70B-Instruct model, optimized for helpfulness, ranks first in alignment benchmarks, utilizes advanced RLHF techniques, requires significant hardware, and emphasizes ethical usage guidelines for responsible application.
Nvidia releases weights for Llama-3.1-Nemotron-70B-Instruct
NVIDIA launched the Llama-3.1-Nemotron-70B-Instruct model, ranking first in three benchmarks, utilizing RLHF techniques, requiring significant hardware, and emphasizing ethical AI development and responsible usage.