November 5th, 2024

HuggingFace - Tencent launches Hunyuan Large which outperforms Llama 3.1 405B

Tencent has launched the Hunyuan-Large model, the largest open-source MoE model with 389 billion parameters, excelling in AI applications and promoting collaboration for further advancements in technology.

Read original article

HuggingFace - Tencent launches Hunyuan Large which outperforms Llama 3.1 405B

Tencent has introduced the Hunyuan-Large model, the largest open-source Transformer-based Mixture of Experts (MoE) model, featuring 389 billion parameters with 52 billion active parameters. This model aims to optimize resource consumption while maintaining high performance in various AI applications, including natural language processing and computer vision. Key technical advantages include the use of high-quality synthetic data for enhanced learning, KV cache compression techniques to reduce memory usage, and expert-specific learning rate scaling for improved model training. The Hunyuan-Large model supports long-context processing, handling text sequences up to 256K, and has demonstrated superior performance in multiple benchmarks, outperforming competitors in tasks such as commonsense reasoning, reading comprehension, and mathematics. Notably, the Hunyuan-Large-Instruct variant shows significant improvements in MMLU and MATH datasets, indicating its advanced understanding and reasoning capabilities. The model's efficiency is highlighted by its performance, achieving high accuracy with fewer activated parameters compared to other large models. Tencent encourages collaboration within the open-source community to further explore and optimize AI technologies.

- Hunyuan-Large is the largest open-source MoE model with 389 billion parameters.

- It utilizes advanced techniques for memory and computational efficiency.

- The model excels in long-context processing and various AI benchmarks.

- Hunyuan-Large-Instruct shows significant improvements in language understanding tasks.

- Tencent promotes open-source collaboration for future AI advancements.

Llama 3 Secrets Every Engineer Must Know

Llama 3 is an advanced open-source language model trained on 15 trillion multilingual tokens, featuring 405 billion parameters, improved reasoning, and multilingual capabilities, while exploring practical applications and limitations.

Mixture of a Million Experts

The paper "Mixture of A Million Experts" introduces a sparse MoE architecture, PEER, which improves transformer efficiency by enabling retrieval from over a million experts, enhancing performance without high computational costs.

New Phi-3.5 Models from Microsoft, including new MoE

Phi-3.5-MoE-instruct is a Microsoft model for text generation and reasoning, featuring 6.6 billion parameters, 128,000 token context, multilingual support, and rigorous safety evaluations, available in Azure AI Studio.

Open-source 70B model surpass GPT-4o and Claude 3.5 on Arena Hard

NVIDIA's Llama-3.1-Nemotron-70B-Instruct model, optimized for helpfulness, ranks first in alignment benchmarks, utilizes advanced RLHF techniques, requires significant hardware, and emphasizes ethical usage guidelines for responsible application.

Nvidia releases weights for Llama-3.1-Nemotron-70B-Instruct

NVIDIA launched the Llama-3.1-Nemotron-70B-Instruct model, ranking first in three benchmarks, utilizing RLHF techniques, requiring significant hardware, and emphasizing ethical AI development and responsible usage.

1 comments

By @ChrisArchitect - 6 months

More: https://news.ycombinator.com/item?id=42054186

HuggingFace - Tencent launches Hunyuan Large which outperforms Llama 3.1 405B

Related

Llama 3 Secrets Every Engineer Must Know

Mixture of a Million Experts

New Phi-3.5 Models from Microsoft, including new MoE

Open-source 70B model surpass GPT-4o and Claude 3.5 on Arena Hard

Nvidia releases weights for Llama-3.1-Nemotron-70B-Instruct

Related

Llama 3 Secrets Every Engineer Must Know

Mixture of a Million Experts

New Phi-3.5 Models from Microsoft, including new MoE

Open-source 70B model surpass GPT-4o and Claude 3.5 on Arena Hard

Nvidia releases weights for Llama-3.1-Nemotron-70B-Instruct