September 3rd, 2024

Zen, CUDA, and Tensor Cores, Part I: The Silicon

The article compares Zen, CUDA, and Tensor cores, highlighting their physical structures and complexities. Zen 4 cores are larger and more intricate than CUDA and Tensor cores, with measurement challenges noted.

Read original articleLink Icon
Zen, CUDA, and Tensor Cores, Part I: The Silicon

This article is the first part of a miniseries exploring the differences between Zen cores, CUDA cores, and Tensor cores in modern computing. It aims to provide a detailed understanding of these core types by examining their physical structures on silicon chips, specifically comparing an AMD Ryzen 5 7600 CPU and an NVIDIA RTX 4090 GPU. The author highlights that while the Zen 4 core and CUDA cores serve different functions, they share similarities in layout and design. The article discusses the complexity of identifying specific cores within the chips, noting that the Zen 4 core is significantly larger than the CUDA cores. The analysis reveals that the GPU's architecture is more intricate, with various components such as Graphics Processing Clusters (GPCs) and Texture Processing Clusters (TPCs) that house the CUDA and Tensor cores. The author emphasizes the challenges in accurately measuring core sizes due to limited resolution in die shots and the lack of detailed information from manufacturers. Ultimately, the article concludes that while Zen 4 cores are more complex, CUDA and Tensor cores are smaller and less intricate, setting the stage for further exploration in subsequent parts of the series.

- The article compares Zen, CUDA, and Tensor cores in modern CPUs and GPUs.

- It highlights the physical structure and layout differences between these core types.

- Zen 4 cores are larger and more complex than CUDA and Tensor cores.

- The analysis faces challenges due to limited resolution in die shots.

- The series aims to deepen understanding of core designs and their functions.

Related

The AMD Zen 5 Microarcitecure

The AMD Zen 5 Microarcitecure

AMD introduced Zen 5 CPU microarchitecture at Computex 2024, launching Ryzen AI 300 for mobile and Ryzen 9000 for desktops. Zen 5 offers improved IPC, dual-pipe fetch, and advanced branch prediction. Ryzen AI 300 includes XDNA 2 NPU and RDNA 3.5 graphics, while Ryzen 9000 supports up to 16 cores and 5.7 GHz boost clock.

A Video Interview with Mike Clark, Chief Architect of Zen at AMD

A Video Interview with Mike Clark, Chief Architect of Zen at AMD

The interview with AMD's Chief Architect discussed Zen 5's enhancements like improved branch predictor and schedulers. It optimizes single-threaded and multi-threaded performance, focusing on compute capabilities and efficiency.

AMD Ryzen AI 300 Series Launched – ServeTheHome

AMD Ryzen AI 300 Series Launched – ServeTheHome

AMD released the Ryzen AI 300 series with Zen 5 CPU, RDNA 3.5 GPU, and XDNA 2 NPU for enhanced AI performance. The processors target content creation and gaming, competing with Intel and Qualcomm. AMD aims to advance AI computing with a balanced CPU, GPU, and NPU approach.

The AMD Zen 5 Microarchitecture

The AMD Zen 5 Microarchitecture

AMD revealed Zen 5 microarchitecture at Computex 2024, launching Ryzen AI 300 series for mobile and Ryzen 9000 series for desktop. Zen 5 brings enhanced performance with XDNA 2 NPU, RDNA 3.5 graphics, and 16% better IPC than Zen 4.

An interview with AMD's Mike Clark, 'Zen Daddy' says 3nm Zen 5 is coming fast

An interview with AMD's Mike Clark, 'Zen Daddy' says 3nm Zen 5 is coming fast

AMD's Mike Clark discusses Zen 5 architecture, covering 4nm and 3nm nodes. 4nm chips launch soon, with 3nm to follow. Zen 'c' cores may integrate into desktop processors. Zen 5 enhances Ryzen CPUs with full AVX-512 acceleration, emphasizing design balance for optimal performance.

Link Icon 6 comments
By @fulafel - 8 months
The answer to the leading question "What’s the difference between a Zen core, a CUDA core, and a Tensor core?" is not covered in Part 1, so you may want to wait if this interests you more than chip layouts.
By @paulmd - 8 months
you can calculate the area of the tensor and raytracing units by measuring+comparing die sizes between the nearest 20-series and 16-series chips. Contrary to the assumptions a lot of people made from the cartoon diagrams, it's actually relatively small, together they make up approximately 18% of the cluster area and it's below 10% of the chip as a whole. The area is roughly 2/3rds tensor unit area and 1/3 raytracing unit area, so RT is around 3% of total chip area and tensor is around 6%.

https://old.reddit.com/r/hardware/comments/baajes/rtx_adds_1...

This could have changed somewhat in newer releases, but probably not too drastically, since NVIDIA has never really increased raw ray performance since the 20-series launch. And while there have been a few raytracing features around the edges, raster and cache have been bumped significantly too (notably, ampere got dual-issue fp32 pipelines... which didn't really work out for NVIDIA that well either!) so honestly there's a reasonable chance it's slightly less in subsequent architectures.

By @kvemkon - 8 months
> Each of the tiles on the CPU side is actually a Zen 4 core, complete with its dedicated L2 cache.

Perhaps, it could be more interesting to compare without L2 cache.

By @diabllicseagull - 8 months
It was a good read. I wonder what hot takes he'll have in the second part if any.
By @downvotetruth - 8 months
I refused to buy the so determined defective chips even if they represented better value because if the intent was truly to try and max yield then there should be for Ryzen for example good 7 core versions with only 1 core that was found to be defective. Since no 7 core zens exist, then at least some of the CPUs with 6 core CCDs have intentionally had 1 of the cores destroyed for reasons unknown, which could be to meet volume targets. If this is because for Ryzen the cores can only be disabled in pairs, then it boggles my mind that it would not be economic given the $ diff of tens to hundreds of dollars between the 6 and 8 core versions that is does not make sense to add the circuits to allow each core to be individually fused off and allow further product differentiation, especially considering how much effort and # of SKUs have been put forth with the frequency binning in AM4 (5700x, 5800, 5800x, 5800xt, etc.), rather than bigger market segmentation jumps.