December 26th, 2024

DeepSeek v3 beats Claude sonnet 3.5 and way cheaper

DeepSeek-V3 is a 671 billion parameter language model that excels in benchmarks, particularly math and coding tasks, utilizing advanced training strategies and supporting various hardware for local deployment.

Read original article

DeepSeek v3 beats Claude sonnet 3.5 and way cheaper

DeepSeek-V3 is a state-of-the-art Mixture-of-Experts (MoE) language model featuring 671 billion parameters, with 37 billion activated for each token. It employs innovative architectures such as Multi-head Latent Attention (MLA) and DeepSeekMoE, validated in its predecessor, DeepSeek-V2. The model introduces an auxiliary-loss-free strategy for load balancing and a multi-token prediction training objective, enhancing performance. Pre-training was conducted on 14.8 trillion tokens, followed by supervised fine-tuning and reinforcement learning, resulting in a model that outperforms other open-source models and rivals leading closed-source models. The training process was efficient, requiring only 2.788 million GPU hours, and was stable without significant loss spikes. DeepSeek-V3 excels in various benchmarks, particularly in math and coding tasks, and supports context lengths up to 128K. It is available for local deployment through various platforms and hardware configurations, including support for AMD GPUs and Huawei Ascend NPUs. The model's performance is further enhanced through knowledge distillation from previous models, improving reasoning capabilities while maintaining control over output style and length.

- DeepSeek-V3 features 671 billion parameters, with 37 billion activated per token.

- It employs advanced training strategies for efficient performance and stability.

- The model outperforms many existing open-source and closed-source models in benchmarks.

- It supports extensive context lengths and is deployable on various hardware.

- Knowledge distillation enhances its reasoning capabilities while controlling output characteristics.

DeepSeek: Advancing theorem proving in LLMs through large-scale synthetic data

The paper presents a method to enhance theorem proving in large language models by generating synthetic proof data. The DeepSeekMath 7B model outperformed GPT-4, proving five benchmark problems.

DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive

DeepSeek launched DeepSeek-V2.5, an advanced open-source model with a 128K context length, excelling in math and coding tasks, and offering competitive API pricing for developers.

Hunyuan-Large: An Open-Source Moe Model with 52B Activated Parameters

Hunyuan-Large, Tencent's largest open-source MoE model with 389 billion parameters, excels in benchmarks, outperforms LLama3.1-70B, and supports 256,000 tokens, with code available for research.

HuggingFace - Tencent launches Hunyuan Large which outperforms Llama 3.1 405B

Tencent has launched the Hunyuan-Large model, the largest open-source MoE model with 389 billion parameters, excelling in AI applications and promoting collaboration for further advancements in technology.

Hunyuan-Large: An Open-Source Moe Model with 52B Activated Parameters

Hunyuan-Large, Tencent's largest open-source MoE model with 389 billion parameters, excels in language tasks, outperforms LLama3.1-70B, and features innovations like synthetic data and advanced routing strategies.

4 comments

By @helloericsf - 5 months

HF link: https://huggingface.co/deepseek-ai/DeepSeek-V3 Aider link: https://aider.chat/docs/leaderboards/ Pricing($0.14/$0.28 per 1M tokens) reference:https://x.com/xingyaow_/status/1872145835699691675?ref_src=t... LiveBench via reddit: https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....

By @patrickhogan1 - 5 months

It does not beat Claude Sonnet 3.5 on SWE Bench (42 to Claude's 50). It chooses 4 benchmarks of the 100s of available benchmarks and then decides it "beats" Claude Sonnet 3.5.

By @Jet_Xu - 4 months

Please refer to my recent AI Code review performance test include DeepSeek V3: https://news.ycombinator.com/item?id=42547196

By @sam_goody - 5 months

What are the minimum and recommended amounts of RAM, hard disk space, CPU or GPU to run this locally.

As someone who just follows this stuff from afar, it is hard for me to conceptualize if this is a SaaS only model, or if it means we are getting to the point where you can have a A1 model on a local machine.

DeepSeek: Advancing theorem proving in LLMs through large-scale synthetic data

The paper presents a method to enhance theorem proving in large language models by generating synthetic proof data. The DeepSeekMath 7B model outperformed GPT-4, proving five benchmark problems.

DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive

DeepSeek launched DeepSeek-V2.5, an advanced open-source model with a 128K context length, excelling in math and coding tasks, and offering competitive API pricing for developers.

Hunyuan-Large: An Open-Source Moe Model with 52B Activated Parameters

Hunyuan-Large, Tencent's largest open-source MoE model with 389 billion parameters, excels in benchmarks, outperforms LLama3.1-70B, and supports 256,000 tokens, with code available for research.

DeepSeek v3 beats Claude sonnet 3.5 and way cheaper

Related

DeepSeek: Advancing theorem proving in LLMs through large-scale synthetic data

DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive

Hunyuan-Large: An Open-Source Moe Model with 52B Activated Parameters

HuggingFace - Tencent launches Hunyuan Large which outperforms Llama 3.1 405B

Hunyuan-Large: An Open-Source Moe Model with 52B Activated Parameters

Related

DeepSeek: Advancing theorem proving in LLMs through large-scale synthetic data

DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive

Hunyuan-Large: An Open-Source Moe Model with 52B Activated Parameters

HuggingFace - Tencent launches Hunyuan Large which outperforms Llama 3.1 405B

Hunyuan-Large: An Open-Source Moe Model with 52B Activated Parameters