November 24th, 2024

Pushing AMD's Infinity Fabric to Its Limit

AMD's Infinity Fabric architecture has improved memory bandwidth and latency management in Zen 5 compared to Zen 4, with better performance under load and benefits from faster DDR5 memory.

Read original article

Pushing AMD's Infinity Fabric to Its Limit

AMD's Infinity Fabric architecture has been tested for memory latency and bandwidth performance across its Zen CPU generations. The testing involved running latency-sensitive applications alongside bandwidth-hungry threads to observe how they interact under load. The results indicated that certain AMD chips are sensitive to thread placement, leading to significant latency spikes depending on core affinity choices. The Infinity Fabric connects Core Complex Dies (CCDs) to an I/O die, allowing for high core counts and modular system designs. However, as bandwidth demands increase, latency can also rise dramatically, particularly when multiple threads contend for the same resources. The latest Zen 5 architecture shows improvements in managing bandwidth and latency, with better performance under high load conditions compared to Zen 4. The testing revealed that isolating bandwidth-intensive tasks to one CCD can help maintain lower latency for sensitive applications on another CCD. Overall, the findings suggest that AMD's architecture has evolved to better handle memory bandwidth while minimizing latency impacts, particularly with the introduction of faster DDR5 memory and improved traffic management policies in Zen 5.

- AMD's Infinity Fabric architecture allows for high core counts but can lead to latency spikes under heavy load.

- Thread placement and core affinity significantly affect latency performance in AMD chips.

- Zen 5 shows improved bandwidth and latency management compared to Zen 4, especially under high load.

- Isolating bandwidth-heavy tasks to one CCD can help maintain lower latency for sensitive applications.

- Faster DDR5 memory enhances overall performance in the latest AMD architectures.

An interview with AMD's Mike Clark, 'Zen Daddy' says 3nm Zen 5 is coming fast

AMD's Mike Clark discusses Zen 5 architecture, covering 4nm and 3nm nodes. 4nm chips launch soon, with 3nm to follow. Zen 'c' cores may integrate into desktop processors. Zen 5 enhances Ryzen CPUs with full AVX-512 acceleration, emphasizing design balance for optimal performance.

Zen5's AVX512 Teardown and More

AMD's Zen5 architecture enhances AVX512 capabilities with native implementation, achieving 4 x 512-bit throughput, while facing thermal throttling challenges. It shows significant performance gains, especially in high-performance computing.

AMD's Strix Point: Zen 5 Hits Mobile

AMD has launched its Zen 5 architecture with the Ryzen AI 9 HX 370, featuring a dual-cluster design, enhanced multithreaded performance, and competitive memory bandwidth against Intel's processors.

Zen5's AVX512 Teardown and More (Without Redacted Content)

AMD's Zen5 architecture enhances AVX512 capabilities with full 512-bit execution paths, but faces memory bandwidth limitations affecting high-performance computing. IPC improvements vary, with some workloads achieving up to 98% gains.

A closer look at Intel and AMD's different approaches to gluing together CPUs

Intel and AMD are adopting different CPU architectures; AMD uses chiplet designs for flexibility and yield, while Intel employs heterogeneous designs for lower latencies. Both face challenges and future innovations.

5 comments

By @majke - 5 months

This has puzzled me for a while. The cited system has 2x89.6 GB/s bandwidth. But a single CCD can do at most 64GB/s of sequential reads. Are claims like "Apple Silicon having 400GB/s" meaningless? I understand a typical single logical CPU can't do more than 50-70GB/s, and it seems like a group of CPU's typically shares a mem controller which is similarly limited.

To rephrase: is it possible to cause 100% mem bandwith utilization with only or 1 or 2 CPU's doing the work per CCD?

By @cebert - 5 months

George’s detailed analysis always impresses me. I’m amazed with his attention to detail.

By @Agingcoder - 5 months

Proper thread placement and numa handling does have a massive impact on modern amd cpus - significantly more so than on Xeon systems. This might be anecdotal, but I’ve seen performance improve by 50% on some real world workloads.

By @AbuAssar - 5 months

Great deep dive into AMD's Infinity Fabric! The balance between bandwidth, latency, and clock speeds shows both clever engineering and limits under pressure. Makes me wonder how these trade-offs will evolve in future designs. Thoughts?

An interview with AMD's Mike Clark, 'Zen Daddy' says 3nm Zen 5 is coming fast

Zen5's AVX512 Teardown and More

AMD's Strix Point: Zen 5 Hits Mobile

AMD has launched its Zen 5 architecture with the Ryzen AI 9 HX 370, featuring a dual-cluster design, enhanced multithreaded performance, and competitive memory bandwidth against Intel's processors.

Pushing AMD's Infinity Fabric to Its Limit

Related

An interview with AMD's Mike Clark, 'Zen Daddy' says 3nm Zen 5 is coming fast

Zen5's AVX512 Teardown and More

AMD's Strix Point: Zen 5 Hits Mobile

Zen5's AVX512 Teardown and More (Without Redacted Content)

A closer look at Intel and AMD's different approaches to gluing together CPUs

Related

An interview with AMD's Mike Clark, 'Zen Daddy' says 3nm Zen 5 is coming fast

Zen5's AVX512 Teardown and More

AMD's Strix Point: Zen 5 Hits Mobile

Zen5's AVX512 Teardown and More (Without Redacted Content)

A closer look at Intel and AMD's different approaches to gluing together CPUs