Nvidia GPU roadmap confirms it: Moore's Law is dead and buried
Nvidia's GTC announcements reveal challenges in scaling compute power, with future GPU architectures requiring more power and increased GPU density, posing significant thermal and power management issues for data centers.
Read original articleNvidia's recent announcements at the GTC event highlight significant challenges in the semiconductor industry, particularly regarding the limitations of Moore's Law. CEO Jensen Huang revealed plans for future GPU architectures, including the Blackwell Ultra processors and a new family of GPUs named after Richard Feynman, expected in 2028. The company is facing hurdles in scaling compute power, as advancements in process technology have slowed. Nvidia's strategy involves increasing the number of GPUs per rack, with plans to expand from 72 to 576 GPUs, while also enhancing memory capacity and bandwidth. However, this approach leads to higher power consumption, with future racks projected to require up to 600kW. This poses substantial challenges for data center operators, who must manage the thermal and power demands of these ultra-dense systems. Nvidia's roadmap aims to prepare infrastructure partners for these changes, but the broader industry, including competitors like AMD and Intel, will face similar obstacles. The need for advanced cooling and power delivery systems is critical, as the demand for AI-driven computing continues to surge. Nvidia's proactive communication of its roadmap is intended to help partners align their capabilities with future requirements.
- Nvidia's roadmap reveals challenges in scaling compute power and the limitations of Moore's Law.
- Future GPU architectures will require significantly more power, with racks expected to reach 600kW.
- The company plans to increase GPU density in racks from 72 to 576.
- Data center operators face significant thermal and power management challenges.
- Nvidia's strategy aims to prepare infrastructure partners for upcoming demands in AI computing.
Related
Nvidia's Blackwell Reworked – Shipment Delays and GB200A Reworked Platforms
Nvidia's Blackwell family faces production challenges causing shipment delays, impacting targets for 2024-2025. The company is extending Hopper product lifespans and shifting focus to new systems and simpler packaging solutions.
Interview: Post-Earnings Insight with Nvidia CFO Colette Kress
Nvidia CFO Colette Kress highlighted strong demand for existing GPUs despite Blackwell delays, emphasizing software's role in enterprise transitions and potential increases in corporate IT budgets due to generative AI.
Nvidia's Blackwell GPUs are sold out for the next 12 months
Nvidia's Blackwell GPUs are sold out for the next year due to high demand from major clients. Analysts predict increased market share in AI hardware by 2025, despite memory supply concerns.
Nvidia CEO says his AI chips are improving faster than Moore's Law
Nvidia CEO Jensen Huang announced that the company's AI chips are advancing faster than Moore's Law, with the latest superchip being over 30 times faster for AI inference than its predecessor.
Nvidia GTC 2025 – Built for Reasoning, Vera Rubin, Kyber, Jensen Math, Feynman
NVIDIA's GTC 2025 showcased advancements in AI models and hardware, projecting a 35x reduction in inference costs. New architectures, Blackwell Ultra B300 and Rubin, promise significant performance improvements and efficiency.
TL;DW:
* The improvements don't come from transistor density but other tricks like putting 2 chips together, using smaller 4bit format, more on-chip memory, 2x memory bandwidth
* And Nvidia is packing more GPUs in the same rack and consuming unheard of amounts of power, with a huge toll on datacenter infra
And it seems all these have growth limits. No more 2x every year or even every other year.Of course it's much more complex than that. The nature of the problems that tech now has to solve is different and as stated in the article, Nvidia hit many roadblocks, but I still think if it had healthy competition, other brands would step in and make it more likely for a creative solution to manifest itself.
With proper engineering with nothing fundamentally new, one could get proper integrated water cooled racks with very high computing density and power usage. Then put those racks in a building close to a nuclear powerplant (outside the security zone perhaps though). Or a wind or solar park + batteries. Can you circumvent the costs associated with using the public power grid?
Extremely simplified, you only need fiber in and out of this power+computing facility. Ultimately you could do this in space with solar power and laser up/downlink. Cooling might be problematic though.
France at least used to have a dedicated nuclear plant just for uranium enrichment. Before electricity, a lot of industry centered to places near running water for sawmills, flour mills, metal hammering and so on. With electricity these were in many cases decoupled. But things that get energy intense enough and don't need material transport or local labor, it might make sense to locate close to energy sources again.
Ouch, I grew up where Meta created its first data center and while it is very efficient (https://engineering.fb.com/2011/04/14/core-infra/designing-a...) I remember a more recent article about how they are zero net emissions (https://tech.facebook.com/engineering/2021/11/10-years-world...). I wonder if they have kept that in this AI rush (along with most of the datacenter companies).
EDIT: a direct link to net zero for Meta: https://tech.facebook.com/ideas/2020/9/facebooks-path-to-net...
So was Moore's law dead 20 years ago? Clearly not.
Right now Nvidia's focus is on making bigger silicon with faster interconnects. But pretty soon that will stop working and then the focus will shift again. It's still early days for AI. People have only just started working on making dedicated AI chips. Presuming that performance per watt or per mm^2 of silicon cannot go up from here seems silly.
This seems wild to me. I used to warm myself next to 4kW racks of telco gear when I was still doing overnight work (20 ish years ago) and thought this was a lot of power to be using in such a small space…
That’s why Jensen is always saying Moores Law is dead - so you buy less of his sand for more bucks.
The reason it's about flops/watt these days is that potentially the demand for "intelligence" limitless.
Tech leaders love this narrative:
2005: Intel CEO https://hothardware.com/news/moores-law-is-dead-says-gordon-...
2009: Sandisk CEO https://archive.nytimes.com/bits.blogs.nytimes.com/2009/05/2...
2010: NVIDIA VP https://finance.yahoo.com/news/2010-05-03-nvidia-vp-says-moo...
2016: Intel CEO https://www.nytimes.com/2016/05/05/technology/moores-law-run...
2017: NVIDIA CEO https://www.extremetech.com/cars/256558-nvidias-ceo-declares...
2019: NVIDIA CEO https://www.cnet.com/tech/computing/moores-law-is-dead-nvidi...
2022: NVIDIA CEO https://www.barrons.com/articles/nvidia-graphic-card-prices-...
The only time this felt true was on the desktop/Windows/Linux for CPUs when Intel had an uncontested monopoly from 2009 - 2016:
https://cdn.arstechnica.net/wp-content/uploads/2020/11/CPU-p...
But it was broken by AMD's Ryzen architecture on one side and Apple's M series on the other.
That said, in terms of horizontal expansion (tripling or quadrupling the number of computational cores), surely the main challenge must be the lack of bandwidth to the memory to power such insane data transfer rates.
You could of course dedicate some amount of memory to each compute node, then the problem is one of quickly splitting the workload into chunks and sending them their separate way with no need to move them around. In a way, LLM and neural networks do function that way. Maybe it is time to build hardware that maps 1:1 to the software architecture.
Related
Nvidia's Blackwell Reworked – Shipment Delays and GB200A Reworked Platforms
Nvidia's Blackwell family faces production challenges causing shipment delays, impacting targets for 2024-2025. The company is extending Hopper product lifespans and shifting focus to new systems and simpler packaging solutions.
Interview: Post-Earnings Insight with Nvidia CFO Colette Kress
Nvidia CFO Colette Kress highlighted strong demand for existing GPUs despite Blackwell delays, emphasizing software's role in enterprise transitions and potential increases in corporate IT budgets due to generative AI.
Nvidia's Blackwell GPUs are sold out for the next 12 months
Nvidia's Blackwell GPUs are sold out for the next year due to high demand from major clients. Analysts predict increased market share in AI hardware by 2025, despite memory supply concerns.
Nvidia CEO says his AI chips are improving faster than Moore's Law
Nvidia CEO Jensen Huang announced that the company's AI chips are advancing faster than Moore's Law, with the latest superchip being over 30 times faster for AI inference than its predecessor.
Nvidia GTC 2025 – Built for Reasoning, Vera Rubin, Kyber, Jensen Math, Feynman
NVIDIA's GTC 2025 showcased advancements in AI models and hardware, projecting a 35x reduction in inference costs. New architectures, Blackwell Ultra B300 and Rubin, promise significant performance improvements and efficiency.