Nvidia Blackwell Delivers World-Record DeepSeek-R1 Inference Performance
NVIDIA's Blackwell architecture achieves record deep learning inference performance, processing over 30,000 tokens per second, with enhancements in the TensorRT ecosystem and improved image generation efficiency.
Read original articleNVIDIA has announced a significant breakthrough in deep learning inference performance with its Blackwell architecture, achieving world-record results with the DeepSeek-R1 model. At the NVIDIA GTC 2025 event, it was revealed that a single DGX system equipped with eight Blackwell GPUs can process over 250 tokens per second per user, reaching a maximum throughput of over 30,000 tokens per second. This performance leap is attributed to enhancements in NVIDIA's open ecosystem of inference tools, optimized for the Blackwell architecture. The improvements include up to five times more AI compute power, increased NVLink bandwidth, and scalability for larger data center applications. The TensorRT ecosystem, which includes various libraries for model optimization and deployment, has also been updated to support Blackwell, enabling developers to achieve high-performance inference. Notably, the latest TensorRT Model Optimizer supports FP4 precision, which enhances efficiency while maintaining accuracy. Additionally, NVIDIA's advancements extend to image generation, where the Blackwell architecture allows for improved performance in generating images with lower VRAM usage. Overall, these developments position NVIDIA's Blackwell architecture as a leader in the AI inference landscape, promising enhanced performance for both large language models and image generation tasks.
- NVIDIA's Blackwell architecture achieves record inference performance with DeepSeek-R1.
- A single DGX system can process over 30,000 tokens per second.
- The TensorRT ecosystem has been optimized for Blackwell, enhancing model deployment.
- FP4 precision in TensorRT Model Optimizer improves efficiency and accuracy.
- Blackwell architecture enhances image generation performance while reducing VRAM usage.
Related
Nvidia's Blackwell Reworked – Shipment Delays and GB200A Reworked Platforms
Nvidia's Blackwell family faces production challenges causing shipment delays, impacting targets for 2024-2025. The company is extending Hopper product lifespans and shifting focus to new systems and simpler packaging solutions.
Nvidia NVLink and Nvidia NVSwitch Supercharge Large Language Model Inference
NVIDIA's NVLink and NVSwitch technologies enhance multi-GPU performance for large language model inference, enabling efficient communication and real-time processing, while future innovations aim to improve bandwidth and scalability.
Nvidia's Christmas Present: GB300 and B300 – Reasoning Inference, Amazon, Memory
Nvidia launched the GB300 and B300 GPUs, enhancing reasoning model performance with a 50% increase in FLOPS, upgraded memory, and a restructured supply chain benefiting OEMs and hyperscalers.
DeepSeek-R1 at 3,872 tokens / second on a single Nvidia HGX H200
NVIDIA's DeepSeek-R1 model, with 671 billion parameters, enhances reasoning through test-time scaling, achieving 3,872 tokens per second. It offers easy deployment via the NIM microservice and promises future performance improvements.
Nvidia's RTX Pro 6000 has 96GB of VRAM and 600W of power
Nvidia has launched the RTX Pro Blackwell series of GPUs for professional workstations, featuring the RTX Pro 6000 with 96GB VRAM, 600W power, and advanced support for AI and gaming tasks.
Related
Nvidia's Blackwell Reworked – Shipment Delays and GB200A Reworked Platforms
Nvidia's Blackwell family faces production challenges causing shipment delays, impacting targets for 2024-2025. The company is extending Hopper product lifespans and shifting focus to new systems and simpler packaging solutions.
Nvidia NVLink and Nvidia NVSwitch Supercharge Large Language Model Inference
NVIDIA's NVLink and NVSwitch technologies enhance multi-GPU performance for large language model inference, enabling efficient communication and real-time processing, while future innovations aim to improve bandwidth and scalability.
Nvidia's Christmas Present: GB300 and B300 – Reasoning Inference, Amazon, Memory
Nvidia launched the GB300 and B300 GPUs, enhancing reasoning model performance with a 50% increase in FLOPS, upgraded memory, and a restructured supply chain benefiting OEMs and hyperscalers.
DeepSeek-R1 at 3,872 tokens / second on a single Nvidia HGX H200
NVIDIA's DeepSeek-R1 model, with 671 billion parameters, enhances reasoning through test-time scaling, achieving 3,872 tokens per second. It offers easy deployment via the NIM microservice and promises future performance improvements.
Nvidia's RTX Pro 6000 has 96GB of VRAM and 600W of power
Nvidia has launched the RTX Pro Blackwell series of GPUs for professional workstations, featuring the RTX Pro 6000 with 96GB VRAM, 600W power, and advanced support for AI and gaming tasks.