DeepSeek v3 beats Claude sonnet 3.5 and way cheaper
DeepSeek-V3 is a 671 billion parameter language model that excels in benchmarks, particularly math and coding tasks, utilizing advanced training strategies and supporting various hardware for local deployment.
Read original articleDeepSeek-V3 is a state-of-the-art Mixture-of-Experts (MoE) language model featuring 671 billion parameters, with 37 billion activated for each token. It employs innovative architectures such as Multi-head Latent Attention (MLA) and DeepSeekMoE, validated in its predecessor, DeepSeek-V2. The model introduces an auxiliary-loss-free strategy for load balancing and a multi-token prediction training objective, enhancing performance. Pre-training was conducted on 14.8 trillion tokens, followed by supervised fine-tuning and reinforcement learning, resulting in a model that outperforms other open-source models and rivals leading closed-source models. The training process was efficient, requiring only 2.788 million GPU hours, and was stable without significant loss spikes. DeepSeek-V3 excels in various benchmarks, particularly in math and coding tasks, and supports context lengths up to 128K. It is available for local deployment through various platforms and hardware configurations, including support for AMD GPUs and Huawei Ascend NPUs. The model's performance is further enhanced through knowledge distillation from previous models, improving reasoning capabilities while maintaining control over output style and length.
- DeepSeek-V3 features 671 billion parameters, with 37 billion activated per token.
- It employs advanced training strategies for efficient performance and stability.
- The model outperforms many existing open-source and closed-source models in benchmarks.
- It supports extensive context lengths and is deployable on various hardware.
- Knowledge distillation enhances its reasoning capabilities while controlling output characteristics.
Related
DeepSeek: Advancing theorem proving in LLMs through large-scale synthetic data
The paper presents a method to enhance theorem proving in large language models by generating synthetic proof data. The DeepSeekMath 7B model outperformed GPT-4, proving five benchmark problems.
DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive
DeepSeek launched DeepSeek-V2.5, an advanced open-source model with a 128K context length, excelling in math and coding tasks, and offering competitive API pricing for developers.
Hunyuan-Large: An Open-Source Moe Model with 52B Activated Parameters
Hunyuan-Large, Tencent's largest open-source MoE model with 389 billion parameters, excels in benchmarks, outperforms LLama3.1-70B, and supports 256,000 tokens, with code available for research.
HuggingFace - Tencent launches Hunyuan Large which outperforms Llama 3.1 405B
Tencent has launched the Hunyuan-Large model, the largest open-source MoE model with 389 billion parameters, excelling in AI applications and promoting collaboration for further advancements in technology.
Hunyuan-Large: An Open-Source Moe Model with 52B Activated Parameters
Hunyuan-Large, Tencent's largest open-source MoE model with 389 billion parameters, excels in language tasks, outperforms LLama3.1-70B, and features innovations like synthetic data and advanced routing strategies.
As someone who just follows this stuff from afar, it is hard for me to conceptualize if this is a SaaS only model, or if it means we are getting to the point where you can have a A1 model on a local machine.
Related
DeepSeek: Advancing theorem proving in LLMs through large-scale synthetic data
The paper presents a method to enhance theorem proving in large language models by generating synthetic proof data. The DeepSeekMath 7B model outperformed GPT-4, proving five benchmark problems.
DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive
DeepSeek launched DeepSeek-V2.5, an advanced open-source model with a 128K context length, excelling in math and coding tasks, and offering competitive API pricing for developers.
Hunyuan-Large: An Open-Source Moe Model with 52B Activated Parameters
Hunyuan-Large, Tencent's largest open-source MoE model with 389 billion parameters, excels in benchmarks, outperforms LLama3.1-70B, and supports 256,000 tokens, with code available for research.
HuggingFace - Tencent launches Hunyuan Large which outperforms Llama 3.1 405B
Tencent has launched the Hunyuan-Large model, the largest open-source MoE model with 389 billion parameters, excelling in AI applications and promoting collaboration for further advancements in technology.
Hunyuan-Large: An Open-Source Moe Model with 52B Activated Parameters
Hunyuan-Large, Tencent's largest open-source MoE model with 389 billion parameters, excels in language tasks, outperforms LLama3.1-70B, and features innovations like synthetic data and advanced routing strategies.