March 19th, 2025

Nvidia Blackwell Delivers World-Record DeepSeek-R1 Inference Performance

NVIDIA's Blackwell architecture achieves record deep learning inference performance, processing over 30,000 tokens per second, with enhancements in the TensorRT ecosystem and improved image generation efficiency.

Read original articleLink Icon
Nvidia Blackwell Delivers World-Record DeepSeek-R1 Inference Performance

NVIDIA has announced a significant breakthrough in deep learning inference performance with its Blackwell architecture, achieving world-record results with the DeepSeek-R1 model. At the NVIDIA GTC 2025 event, it was revealed that a single DGX system equipped with eight Blackwell GPUs can process over 250 tokens per second per user, reaching a maximum throughput of over 30,000 tokens per second. This performance leap is attributed to enhancements in NVIDIA's open ecosystem of inference tools, optimized for the Blackwell architecture. The improvements include up to five times more AI compute power, increased NVLink bandwidth, and scalability for larger data center applications. The TensorRT ecosystem, which includes various libraries for model optimization and deployment, has also been updated to support Blackwell, enabling developers to achieve high-performance inference. Notably, the latest TensorRT Model Optimizer supports FP4 precision, which enhances efficiency while maintaining accuracy. Additionally, NVIDIA's advancements extend to image generation, where the Blackwell architecture allows for improved performance in generating images with lower VRAM usage. Overall, these developments position NVIDIA's Blackwell architecture as a leader in the AI inference landscape, promising enhanced performance for both large language models and image generation tasks.

- NVIDIA's Blackwell architecture achieves record inference performance with DeepSeek-R1.

- A single DGX system can process over 30,000 tokens per second.

- The TensorRT ecosystem has been optimized for Blackwell, enhancing model deployment.

- FP4 precision in TensorRT Model Optimizer improves efficiency and accuracy.

- Blackwell architecture enhances image generation performance while reducing VRAM usage.

Link Icon 1 comments