January 31st, 2025

DeepSeek-R1 at 3,872 tokens / second on a single Nvidia HGX H200

NVIDIA's DeepSeek-R1 model, with 671 billion parameters, enhances reasoning through test-time scaling, achieving 3,872 tokens per second. It offers easy deployment via the NIM microservice and promises future performance improvements.

Read original article

DeepSeek-R1 at 3,872 tokens / second on a single Nvidia HGX H200

DeepSeek-R1, a new open model from NVIDIA, is designed for advanced reasoning capabilities, utilizing a method called test-time scaling to enhance its inference processes. This model performs multiple inference passes over queries, employing techniques such as chain-of-thought and consensus to derive optimal answers. With 671 billion parameters, DeepSeek-R1 significantly outperforms many existing models, offering high accuracy in logical reasoning, math, coding, and language understanding. It can process up to 3,872 tokens per second on a single NVIDIA HGX H200 system, making it suitable for real-time applications. The model's architecture includes a large mixture-of-experts (MoE) design, with each layer featuring 256 experts, allowing for parallel evaluation of tokens. The NVIDIA NIM microservice facilitates easy deployment and customization of DeepSeek-R1, ensuring security and data privacy for enterprises. Future enhancements, including the upcoming NVIDIA Blackwell architecture, promise to further improve the model's performance, particularly in real-time inference scenarios. Developers can access the DeepSeek-R1 NIM microservice on NVIDIA's platform to experiment and build specialized AI agents.

- DeepSeek-R1 features 671 billion parameters, enhancing its reasoning capabilities.

- The model utilizes test-time scaling for improved inference quality.

- It can deliver up to 3,872 tokens per second on NVIDIA's hardware.

- The NVIDIA NIM microservice simplifies deployment and customization for enterprises.

- Future NVIDIA architectures are expected to boost performance for real-time applications.

Show HN: DeepSeek v3 – A 671B parameter AI Language Model

DeepSeek v3 is an advanced AI language model with 671 billion parameters, pre-trained on 14.8 trillion tokens, supporting a 128K context window, and available for commercial use across various hardware platforms.

DeepSeek R1

DeepSeek-R1 is a new series of reasoning models utilizing large-scale reinforcement learning, featuring distilled models that outperform benchmarks. They are open-sourced, available for local use, and licensed under MIT.

Run DeepSeek R1 Dynamic 1.58-bit

DeepSeek-R1 is an open-source alternative to OpenAI's O1, reduced from 720GB to 131GB via quantization. It runs on various systems, with performance benchmarks indicating valid outputs and minor errors.

The Illustrated DeepSeek-R1

DeepSeek-R1 is a new language model emphasizing reasoning, utilizing a three-step training process and a unique architecture. It faces challenges in readability and language mixing while enhancing reasoning capabilities.

DeepSeek's AI breakthrough bypasses industry-standard CUDA, uses PTX

DeepSeek trained a 671 billion parameter AI model using 2,048 Nvidia GPUs, achieving tenfold efficiency over competitors. This raised Nvidia's stock concerns but may democratize AI technology access.

2 comments

By @billconan - 3 months

https://news.ycombinator.com/item?id=42879864

this is cerebras' 70B number, 1600 tokens / sec, not sure about the costs.

Show HN: DeepSeek v3 – A 671B parameter AI Language Model

DeepSeek R1

Run DeepSeek R1 Dynamic 1.58-bit

The Illustrated DeepSeek-R1

DeepSeek's AI breakthrough bypasses industry-standard CUDA, uses PTX

DeepSeek trained a 671 billion parameter AI model using 2,048 Nvidia GPUs, achieving tenfold efficiency over competitors. This raised Nvidia's stock concerns but may democratize AI technology access.

DeepSeek-R1 at 3,872 tokens / second on a single Nvidia HGX H200

Related

Show HN: DeepSeek v3 – A 671B parameter AI Language Model

DeepSeek R1

Run DeepSeek R1 Dynamic 1.58-bit

The Illustrated DeepSeek-R1

DeepSeek's AI breakthrough bypasses industry-standard CUDA, uses PTX

Related

Show HN: DeepSeek v3 – A 671B parameter AI Language Model

DeepSeek R1

Run DeepSeek R1 Dynamic 1.58-bit

The Illustrated DeepSeek-R1

DeepSeek's AI breakthrough bypasses industry-standard CUDA, uses PTX