RDNA 4's “Out-of-Order” Memory Accesses
AMD's RDNA 4 architecture enhances memory handling with out-of-order accesses and multiple queues, improving performance, especially in ray tracing, though its advancements are considered evolutionary compared to competitors.
Read original articleAMD's RDNA 4 architecture introduces significant enhancements to its memory subsystem, particularly in handling out-of-order memory accesses. This new capability allows requests from different shader waves to be processed independently, addressing a limitation in RDNA 3 where memory requests were handled in a strict order, leading to false dependencies between waves. The testing conducted revealed that RDNA 4 effectively eliminates these cross-wave delays, allowing for improved performance, especially in workloads like ray tracing. The architecture also features multiple out-of-order queues for memory requests, enhancing the efficiency of memory access handling within a wave. This change allows threads to interleave different types of memory requests, improving overall throughput. While RDNA 4's memory management improvements are notable, they are seen as evolutionary rather than revolutionary, as similar techniques have been implemented in other GPU architectures from Intel and Nvidia. Overall, RDNA 4 represents a significant step forward for AMD's GPU memory subsystem, enhancing performance across various applications.
- RDNA 4 allows out-of-order memory accesses, improving performance by eliminating false dependencies.
- The architecture introduces multiple out-of-order queues for memory requests, enhancing efficiency.
- Improvements are particularly beneficial for ray tracing workloads, allowing simultaneous traversal and result handling.
- While significant, RDNA 4's enhancements are seen as evolutionary, with similar features present in other GPU architectures.
- The changes mark the most substantial update to AMD's GPU memory subsystem since the launch of RDNA in 2019.
Related
AMD announces unified UDNA GPU architecture – bringing RDNA and CDNA together
AMD has introduced UDNA, a unified GPU architecture merging RDNA and CDNA to simplify development and enhance competitiveness against Nvidia. Timelines for implementation remain unclear, with improved AI capabilities expected.
Pushing AMD's Infinity Fabric to Its Limit
AMD's Infinity Fabric architecture has improved memory bandwidth and latency management in Zen 5 compared to Zen 4, with better performance under load and benefits from faster DDR5 memory.
Pushing AMD's Infinity Fabric to Its Limits
AMD's Infinity Fabric effectively manages memory latency in high-core-count processors. Zen 5 architecture improves performance and memory management, reducing latency spikes compared to Zen 4 under heavy bandwidth loads.
AMD Radeon RX 9070 Series Technical Deep Dive
AMD launched the Radeon RX 9070 series with RDNA 4 architecture, priced at $600 and $550, focusing on performance and value to compete with NVIDIA, enhancing 4K gaming and AI capabilities.
AMD Radeon RX 9070 and 9070 XT review: RDNA 4 fixes a lot of AMD's problems
AMD has launched the Radeon RX 9070 and 9070 XT graphics cards featuring RDNA 4 architecture, enhancing performance and efficiency for 1440p and entry-level 4K gaming, competing with Nvidia's RTX 5070 series.
I kind of thought this was just gonna be some kind of deferred texture loading thing, help with streaming assets.
If it actually allows inter-warp sequencing, it sounds like it might possibly solve the chief complains supreme GUI master Raph Levien recently had in I want a good parallel computer, which so that even though we can dynamically add shaders & construct a dynamic workgraph (largely thanks to VK_AMDX_shader_enqueue?), there isn't any sequencing/fencing/barrier-ing between the sections. https://raphlinus.github.io/gpu/2025/03/21/good-parallel-com... https://news.ycombinator.com/item?id=43440174
Not applicable to GPUs, but since I ran into it recently, it's interesting to see how io_uring handles sequenced submissions. Here's Lord of io_uring's write-up, https://unixism.net/loti/tutorial/link_liburing.html#link-li...
Edit: having read the article more fully, I'm not sure this is about waves depending on each other. Maybe more about them trying to access memory. Apologies. Hopefully someday!
Ah yeah he says that at the end. Doesn't really matter for rasterisation but might make more of a difference for ray tracing.
Related
AMD announces unified UDNA GPU architecture – bringing RDNA and CDNA together
AMD has introduced UDNA, a unified GPU architecture merging RDNA and CDNA to simplify development and enhance competitiveness against Nvidia. Timelines for implementation remain unclear, with improved AI capabilities expected.
Pushing AMD's Infinity Fabric to Its Limit
AMD's Infinity Fabric architecture has improved memory bandwidth and latency management in Zen 5 compared to Zen 4, with better performance under load and benefits from faster DDR5 memory.
Pushing AMD's Infinity Fabric to Its Limits
AMD's Infinity Fabric effectively manages memory latency in high-core-count processors. Zen 5 architecture improves performance and memory management, reducing latency spikes compared to Zen 4 under heavy bandwidth loads.
AMD Radeon RX 9070 Series Technical Deep Dive
AMD launched the Radeon RX 9070 series with RDNA 4 architecture, priced at $600 and $550, focusing on performance and value to compete with NVIDIA, enhancing 4K gaming and AI capabilities.
AMD Radeon RX 9070 and 9070 XT review: RDNA 4 fixes a lot of AMD's problems
AMD has launched the Radeon RX 9070 and 9070 XT graphics cards featuring RDNA 4 architecture, enhancing performance and efficiency for 1440p and entry-level 4K gaming, competing with Nvidia's RTX 5070 series.