Grace Hopper, Nvidia's Halfway APU
Nvidia's Grace Hopper architecture integrates a CPU and GPU for high-performance computing, offering high memory bandwidth but facing significant latency issues, particularly in comparison to AMD's solutions.
Read original articleNvidia's Grace Hopper, a new high-performance computing architecture, combines a CPU and GPU into a single unit, aiming to compete with AMD's integrated solutions. The architecture features the Grace CPU with 72 Neoverse V2 cores running at 3.44 GHz and 114 MB of L3 cache, paired with the H100 GPU that has 96 GB of HBM3 memory. This design utilizes Nvidia's NVLink C2C interconnect, providing high bandwidth and hardware coherency, allowing the CPU to access GPU memory directly. However, the system exhibits high latency, particularly when accessing HBM3 memory, which can hinder performance in certain applications. The architecture is designed for parallel compute tasks, with a focus on high memory bandwidth and efficient data sharing between CPU and GPU. Despite its advantages in bandwidth, Grace Hopper's latency issues and system responsiveness during tests raise concerns about its practical performance in real-world applications. Comparisons with AMD's offerings indicate that while Grace Hopper excels in certain areas, it may not outperform AMD's solutions in all aspects, particularly in latency-sensitive tasks.
- Nvidia's Grace Hopper architecture combines a CPU and GPU for high-performance computing.
- The system features high memory bandwidth but suffers from significant latency issues.
- NVLink C2C interconnect allows direct memory access between CPU and GPU.
- Grace Hopper is designed for parallel compute applications, targeting high bandwidth needs.
- Performance comparisons with AMD highlight both strengths and weaknesses in the architecture.
Related
Testing AMD's Giant MI300X
AMD introduces Radeon Instinct MI300X to challenge NVIDIA in GPU compute market. MI300X features chiplet setup, Infinity Cache, CDNA 3 architecture, competitive performance against NVIDIA's H100, and excels in local memory bandwidth tests.
AMD MI300X performance compared with Nvidia H100
The AMD MI300X AI GPU outperforms Nvidia's H100 in cache, latency, and inference benchmarks. It excels in caching performance, compute throughput, but AI inference performance varies. Real-world performance and ecosystem support are essential.
Nvidia NVLink Switch Chips Change to the HGX B200
NVIDIA introduced the HGX B200 board at Computex 2024, featuring two NVLink Switch chips instead of four, aiming to enhance performance and efficiency in high-performance computing applications by optimizing GPU configurations.
AMD's Long and Winding Road to the Hybrid CPU-GPU Instinct MI300A
AMD's journey from 2012 led to the development of the powerful Instinct MI300A compute engine, used in the "El Capitan" supercomputer. Key researchers detailed AMD's evolution, funding, and technology advancements, impacting future server offerings.
Nvidia's Blackwell Reworked – Shipment Delays and GB200A Reworked Platforms
Nvidia's Blackwell family faces production challenges causing shipment delays, impacting targets for 2024-2025. The company is extending Hopper product lifespans and shifting focus to new systems and simpler packaging solutions.
- Concerns about high memory latency in Nvidia's design compared to AMD's solutions.
- Speculation on the future of AI computing, with some believing AMD's APUs could dominate if AI becomes more self-hosted.
- Criticism of Nvidia's focus on datacenters at the expense of consumer markets.
- Discussion on the potential benefits of integrating CPU and GPU into a single chip for simplicity and efficiency.
- Mixed feelings about Nvidia's corporate culture and leadership, with some expressing frustration over its market strategies.
Maybe that’s all far enough afield to make the current state of things irrelevant?
It’s interesting to me that they’ve settled on using standard Neoverse cores, when almost everything else is custom designed and tuned for the expected workloads.
I wonder if AMD could license the IBM Telum cache implementation where one core complex could offer unused cache lines to other cores, increasing overall occupancy.
Would be quite neat, even if cross-complex bandwidth and latency is not awesome, it still should be better than hitting DRAM.
Can it run vi?
That might make things much simpler for people who write kernel, drivers and video games.
The history of CPU and GPU prevented that, it was always more profitable for CPU and GPU vendors to sell them separately.
Having 2 specialized chips makes more sense because it's flexible, but since frequencies are stagnating, having more cores make sense, and AI means massively parallel things are not only for graphics.
Smartphones are much modern in that regard. Nobody upgrades their GPU or CPU anymore, might as well have a single, soldered product that last a long time instead.
That may not be the end of building your own computer, but I just hope it will make things simpler and in a smaller package.
But this bullshit with Jensen signing girls’ breasts like he’s Robert Plant and telling young people to learn prompt engineering instead of C++ and generally pulling a pump and dump shamelessly while wearing a leather jacket?
Fuck that: if LLMs could write cuDNN-caliber kernels that’s how you would do it.
It’s ok in my book to live the rockstar life for the 15 minutes until someone other than Lisa Su ships an FMA unit.
The 3T cap and the forward PE and the market manipulation and the dated signature apparel are still cringe and if I had the capital and trading facility to LEAP DOOM the stock? I’d want as much as there is.
The fact that your CPU sucks ass just proves this isn’t about real competition just now.
Related
Testing AMD's Giant MI300X
AMD introduces Radeon Instinct MI300X to challenge NVIDIA in GPU compute market. MI300X features chiplet setup, Infinity Cache, CDNA 3 architecture, competitive performance against NVIDIA's H100, and excels in local memory bandwidth tests.
AMD MI300X performance compared with Nvidia H100
The AMD MI300X AI GPU outperforms Nvidia's H100 in cache, latency, and inference benchmarks. It excels in caching performance, compute throughput, but AI inference performance varies. Real-world performance and ecosystem support are essential.
Nvidia NVLink Switch Chips Change to the HGX B200
NVIDIA introduced the HGX B200 board at Computex 2024, featuring two NVLink Switch chips instead of four, aiming to enhance performance and efficiency in high-performance computing applications by optimizing GPU configurations.
AMD's Long and Winding Road to the Hybrid CPU-GPU Instinct MI300A
AMD's journey from 2012 led to the development of the powerful Instinct MI300A compute engine, used in the "El Capitan" supercomputer. Key researchers detailed AMD's evolution, funding, and technology advancements, impacting future server offerings.
Nvidia's Blackwell Reworked – Shipment Delays and GB200A Reworked Platforms
Nvidia's Blackwell family faces production challenges causing shipment delays, impacting targets for 2024-2025. The company is extending Hopper product lifespans and shifting focus to new systems and simpler packaging solutions.