Notes on the New Deepseek v3
Deepseek v3, a leading open-source model with 607 billion parameters, excels in reasoning and math tasks, outperforming competitors while being cost-effective, trained on 14.8 trillion data points for $6 million.
Read original articleDeepseek has launched its latest model, Deepseek v3, which features a 607 billion parameter mixture-of-experts architecture with 37 billion active parameters. This model has been recognized as the best open-source model, outperforming competitors like Llama 3.1 and Mistral, and is comparable to OpenAI's GPT-4o and Claude 3.5 Sonnet in various benchmarks. Deepseek v3 was trained on 14.8 trillion high-quality data points, utilizing 2,788,000 GPU hours at a cost of approximately $6 million, significantly less than its competitors. The model's efficiency is attributed to its innovative engineering, including a mixture-of-experts architecture, FP8 mixed precision training, and a custom training framework. Deepseek v3 excels in reasoning and mathematical tasks, surpassing GPT-4o and Claude 3.5 Sonnet, although it lags slightly in writing and coding tasks. The introduction of a deep thinking feature enhances its reasoning capabilities. Overall, Deepseek v3 is positioned as a cost-effective alternative to high-end models, providing substantial performance at a lower price point.
- Deepseek v3 is the leading open-source model, outperforming major competitors.
- The model was trained efficiently, costing around $6 million.
- It excels in reasoning and math tasks compared to GPT-4o and Claude 3.5 Sonnet.
- A new deep thinking feature improves its reasoning abilities.
- Deepseek v3 offers significant value for AI developers at a lower cost.
Related
DeepSeek v3 beats Claude sonnet 3.5 and way cheaper
DeepSeek-V3 is a 671 billion parameter language model that excels in benchmarks, particularly math and coding tasks, utilizing advanced training strategies and supporting various hardware for local deployment.
DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch
Chinese AI startup DeepSeek launched DeepSeek-V3, a 671 billion parameter model outperforming major competitors. It features cost-effective training, innovative architecture, and is available for testing and commercial use.
DeepSeek's new AI model appears to be one of the best 'open' challengers yet
DeepSeek, a Chinese AI firm, launched DeepSeek V3, an open-source model with 671 billion parameters, excelling in text tasks and outperforming competitors, though limited by regulatory constraints.
Show HN: DeepSeek v3 – A 671B parameter AI Language Model
DeepSeek v3 is an advanced AI language model with 671 billion parameters, pre-trained on 14.8 trillion tokens, supporting a 128K context window, and available for commercial use across various hardware platforms.
DeepSeek v3: The Six Million Dollar Model
DeepSeek v3 is an affordable AI model with 37 billion active parameters, showing competitive benchmarks but underperforming in output diversity and coherence. Its real-world effectiveness remains to be evaluated.
Also don't just focus on this model but check out what DeepSeek mission is, and the CEO words in the recently released interview. They want to be the DJI / Bambulab of AI, basically: leaders and not followers, and after V3 it's hard to say they don't have the right brains to do that.
- How many 'r's are in Strawberry?
- Finding the fourth word of the response
These tests are at odds with the tokenizer and next-word prediction model. They do not accurately represent an LLM's capabilities. It's akin to asking a blind person to identify colors.
> They probably trained the model on a synthetic dataset generated by GPT-4o.
This seems to be the case. I can speculate further. They trained on copyrighted material that OpenAI did not.
It remains to be seen what the pricing will be when run by non-Deepseek providers. They might be loss leading.
The comparison for cheap models should also be Gemini 2.0 Flash Exp. I could see it being even cheaper when it stops being free - if it does at all. There's definitely a scenario where Google just keeps it freeish for a long time with relatively high limits.
AI slop, I don't trust any of this article, especially the bullets on what made Deepseek "win"
Like with all of these models, we don't know what's in them.
Related
DeepSeek v3 beats Claude sonnet 3.5 and way cheaper
DeepSeek-V3 is a 671 billion parameter language model that excels in benchmarks, particularly math and coding tasks, utilizing advanced training strategies and supporting various hardware for local deployment.
DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch
Chinese AI startup DeepSeek launched DeepSeek-V3, a 671 billion parameter model outperforming major competitors. It features cost-effective training, innovative architecture, and is available for testing and commercial use.
DeepSeek's new AI model appears to be one of the best 'open' challengers yet
DeepSeek, a Chinese AI firm, launched DeepSeek V3, an open-source model with 671 billion parameters, excelling in text tasks and outperforming competitors, though limited by regulatory constraints.
Show HN: DeepSeek v3 – A 671B parameter AI Language Model
DeepSeek v3 is an advanced AI language model with 671 billion parameters, pre-trained on 14.8 trillion tokens, supporting a 128K context window, and available for commercial use across various hardware platforms.
DeepSeek v3: The Six Million Dollar Model
DeepSeek v3 is an affordable AI model with 37 billion active parameters, showing competitive benchmarks but underperforming in output diversity and coherence. Its real-world effectiveness remains to be evaluated.