December 27th, 2024

Show HN: DeepSeek v3 – A 671B parameter AI Language Model

DeepSeek v3 is an advanced AI language model with 671 billion parameters, pre-trained on 14.8 trillion tokens, supporting a 128K context window, and available for commercial use across various hardware platforms.

Read original articleLink Icon
Show HN: DeepSeek v3 – A 671B parameter AI Language Model

DeepSeek v3 is a cutting-edge AI language model featuring a total of 671 billion parameters, with 37 billion activated for each token. It employs a Mixture-of-Experts (MoE) architecture, which allows it to achieve state-of-the-art performance across various benchmarks while ensuring efficient inference. The model has been pre-trained on 14.8 trillion high-quality tokens, showcasing its extensive knowledge in areas such as mathematics, coding, and multilingual tasks. DeepSeek v3 supports a long context window of 128K, enabling it to process extensive input sequences effectively. It also incorporates advanced features like Multi-Token Prediction, which enhances its performance and speeds up inference. The model is accessible through an online demo platform and API services, and it is available for commercial use under specific licensing terms. DeepSeek v3 can be deployed on various hardware platforms, including NVIDIA and AMD GPUs, and supports multiple frameworks for optimal performance. Its training process was efficient, utilizing FP8 mixed precision and completing pre-training with minimal GPU hours. Overall, DeepSeek v3 sets new standards in AI language modeling, outperforming many existing models and providing high-quality responses across a range of tasks.

- DeepSeek v3 features 671 billion parameters and utilizes a Mixture-of-Experts architecture.

- It has been pre-trained on 14.8 trillion tokens, excelling in mathematics, coding, and multilingual tasks.

- The model supports a 128K context window for processing extensive input sequences.

- DeepSeek v3 is available for commercial use and can be deployed on various hardware platforms.

- It incorporates advanced features like Multi-Token Prediction for enhanced performance.

Link Icon 4 comments
By @yorwba - about 2 months
DeepSeek's actual website: https://www.deepseek.com/

('yangxiaobo regularly submits copycat landing pages of AI products developed by other people as "Show HN," as evidenced by the posting history: https://news.ycombinator.com/submitted?id=yangxiaobo )

By @wolfgangK - about 2 months
Most interesting ! Amazing job at optimizing various parts of the task. It seems that being an MoE with 'only' 37B active params per token would put it within the reach of CPU & RAM inference for the lucky hobbyist with an Epyc homelab and 8 or 16 memory channels on a second hand single or dual Gen2 mobo (around $2500 used). Any idea of how hard it would be (will?) support the new architecture for llama.cpp ?

I must confess that my interest in LLMs is grounded RAG as I consider any intrinsic knowledge of the LLL to be unreliable overfitting. Is DeepkSeek able to perform grounded RAG like Command R and Nous-Hermes 3 for instance ?

Thx for this amazing model and all the insights in your report !

By @qup - about 2 months
Great score on the aider leaderboards: https://aider.chat/docs/leaderboards/