December 26th, 2024

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

Chinese AI startup DeepSeek launched DeepSeek-V3, a 671 billion parameter model outperforming major competitors. It features cost-effective training, innovative architecture, and is available for testing and commercial use.

Read original articleLink Icon
DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

Chinese AI startup DeepSeek has launched its new ultra-large open-source model, DeepSeek-V3, which boasts 671 billion parameters and utilizes a mixture-of-experts architecture to optimize performance. This model has reportedly outperformed leading open-source models like Meta's Llama 3.1-405B and closely matches the performance of closed models from Anthropic and OpenAI. DeepSeek aims to bridge the gap between open-source and closed-source AI, with aspirations towards achieving artificial general intelligence (AGI). The model features innovations such as an auxiliary loss-free load-balancing strategy and multi-token prediction, enhancing training efficiency and speed. DeepSeek-V3 was trained on 14.8 trillion tokens, with a two-stage context length extension reaching up to 128K. The training process was notably cost-effective, totaling approximately $5.57 million, significantly lower than the hundreds of millions typically required for large language models. DeepSeek-V3 has excelled in benchmarks, particularly in Chinese and math-centric tasks, although it was outperformed by Anthropic's Claude 3.5 Sonnet in some areas. The model is available on Hugging Face and GitHub, with an API for commercial use at competitive pricing.

- DeepSeek-V3 outperforms major models like Llama 3.1 and Qwen 2.5.

- The model uses a mixture-of-experts architecture for efficient task handling.

- Training costs for DeepSeek-V3 were significantly lower than typical large models.

- Innovations include load-balancing strategies and multi-token prediction for enhanced performance.

- The model is available for testing and commercial use via DeepSeek Chat and API.

Link Icon 1 comments