December 27th, 2024

Show HN: DeepSeek v3 – A 671B parameter AI Language Model

DeepSeek v3 is an advanced AI language model with 671 billion parameters, pre-trained on 14.8 trillion tokens, supporting a 128K context window, and available for commercial use across various hardware platforms.

Read original article

Show HN: DeepSeek v3 – A 671B parameter AI Language Model

DeepSeek v3 is a cutting-edge AI language model featuring a total of 671 billion parameters, with 37 billion activated for each token. It employs a Mixture-of-Experts (MoE) architecture, which allows it to achieve state-of-the-art performance across various benchmarks while ensuring efficient inference. The model has been pre-trained on 14.8 trillion high-quality tokens, showcasing its extensive knowledge in areas such as mathematics, coding, and multilingual tasks. DeepSeek v3 supports a long context window of 128K, enabling it to process extensive input sequences effectively. It also incorporates advanced features like Multi-Token Prediction, which enhances its performance and speeds up inference. The model is accessible through an online demo platform and API services, and it is available for commercial use under specific licensing terms. DeepSeek v3 can be deployed on various hardware platforms, including NVIDIA and AMD GPUs, and supports multiple frameworks for optimal performance. Its training process was efficient, utilizing FP8 mixed precision and completing pre-training with minimal GPU hours. Overall, DeepSeek v3 sets new standards in AI language modeling, outperforming many existing models and providing high-quality responses across a range of tasks.

- DeepSeek v3 features 671 billion parameters and utilizes a Mixture-of-Experts architecture.

- It has been pre-trained on 14.8 trillion tokens, excelling in mathematics, coding, and multilingual tasks.

- The model supports a 128K context window for processing extensive input sequences.

- DeepSeek v3 is available for commercial use and can be deployed on various hardware platforms.

- It incorporates advanced features like Multi-Token Prediction for enhanced performance.

DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive

DeepSeek launched DeepSeek-V2.5, an advanced open-source model with a 128K context length, excelling in math and coding tasks, and offering competitive API pricing for developers.

DeepSeek v3 beats Claude sonnet 3.5 and way cheaper

DeepSeek-V3 is a 671 billion parameter language model that excels in benchmarks, particularly math and coding tasks, utilizing advanced training strategies and supporting various hardware for local deployment.

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

Chinese AI startup DeepSeek launched DeepSeek-V3, a 671 billion parameter model outperforming major competitors. It features cost-effective training, innovative architecture, and is available for testing and commercial use.

DeepSeek-V3

DeepSeek has launched DeepSeek-V3, which processes 60 tokens per second, features 671 billion parameters, and maintains open-source compatibility. Pricing changes will occur after February 8, 2024, with future updates planned.

DeepSeek's new AI model appears to be one of the best 'open' challengers yet

DeepSeek, a Chinese AI firm, launched DeepSeek V3, an open-source model with 671 billion parameters, excelling in text tasks and outperforming competitors, though limited by regulatory constraints.

4 comments

By @yorwba - 4 months

DeepSeek's actual website: https://www.deepseek.com/

('yangxiaobo regularly submits copycat landing pages of AI products developed by other people as "Show HN," as evidenced by the posting history: https://news.ycombinator.com/submitted?id=yangxiaobo )

By @wolfgangK - 4 months

Most interesting ! Amazing job at optimizing various parts of the task. It seems that being an MoE with 'only' 37B active params per token would put it within the reach of CPU & RAM inference for the lucky hobbyist with an Epyc homelab and 8 or 16 memory channels on a second hand single or dual Gen2 mobo (around $2500 used). Any idea of how hard it would be (will?) support the new architecture for llama.cpp ?

I must confess that my interest in LLMs is grounded RAG as I consider any intrinsic knowledge of the LLL to be unreliable overfitting. Is DeepkSeek able to perform grounded RAG like Command R and Nous-Hermes 3 for instance ?

Thx for this amazing model and all the insights in your report !

By @qup - 4 months

Great score on the aider leaderboards: https://aider.chat/docs/leaderboards/

DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive

DeepSeek launched DeepSeek-V2.5, an advanced open-source model with a 128K context length, excelling in math and coding tasks, and offering competitive API pricing for developers.

DeepSeek v3 beats Claude sonnet 3.5 and way cheaper

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

DeepSeek-V3

DeepSeek's new AI model appears to be one of the best 'open' challengers yet

DeepSeek, a Chinese AI firm, launched DeepSeek V3, an open-source model with 671 billion parameters, excelling in text tasks and outperforming competitors, though limited by regulatory constraints.

Show HN: DeepSeek v3 – A 671B parameter AI Language Model

Related

DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive

DeepSeek v3 beats Claude sonnet 3.5 and way cheaper

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

DeepSeek-V3

DeepSeek's new AI model appears to be one of the best 'open' challengers yet

Related

DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive

DeepSeek v3 beats Claude sonnet 3.5 and way cheaper

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

DeepSeek-V3

DeepSeek's new AI model appears to be one of the best 'open' challengers yet