July 2nd, 2024

A quick introduction to DirectX workgraphs

Workgraphs in DirectX12, supported by NVidia and AMD, enable GPU independence from the CPU. They process data through nodes acting as shaders, enhancing graphics programming with efficient data handling and node interactions.

Read original article

A quick introduction to DirectX workgraphs

Workgraphs is a new feature in DirectX12 supported by NVidia and AMD, allowing GPUs to handle work independently from the CPU. The author experimented with workgraphs to implement a shadow raytracer in three steps: isolating backfacing pixels, raymarching surviving pixels towards the light, and raytracing using an acceleration structure. Workgraphs are depicted as a graph of nodes where each node, acting as a shader, processes data without CPU intervention. Currently, only compute shaders and inline raytracing are supported, with plans for other shader types. The technique involves launching nodes in broadcasting or thread modes to share data efficiently. The article delves into a detailed example of a node checking backfacing pixels in an 8x8 tile and spawning new nodes accordingly. Barriers and groupshared memory are crucial for synchronization and data sharing among threads. The second node in the graph performs per-pixel raymarching towards the light direction to detect collisions in the depth buffer. The implementation showcases the practical application of workgraphs in graphics programming, emphasizing efficient data handling and node interactions.

20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

Using ONNX Runtime with WebGPU and WebAssembly in browsers achieves 20x speedup for background removal, reducing server load, enhancing scalability, and improving data security. ONNX models run efficiently with WebGPU support, offering near real-time performance. Leveraging modern technology, IMG.LY aims to enhance design tools' accessibility and efficiency.

Eight million pixels and counting: improving texture atlas allocation in Firefox (2021)

Improving texture atlas allocation in WebRender with the guillotiere crate reduces texture memory usage. The guillotine algorithm was replaced due to fragmentation issues, leading to a more efficient allocator. Visualizing the atlas in SVG aids debugging. Rust's simplicity and Cargo fuzz testing are praised for code development and robustness. Enhancements in draw call batching and texture upload aim to boost performance on low-end Intel GPUs by optimizing texture atlases.

Homegrown Rendering with Rust

Embark Studios develops a creative platform for user-generated content, emphasizing gameplay over graphics. They leverage Rust for 3D rendering, introducing the experimental "kajiya" renderer for learning purposes. The team aims to simplify rendering for user-generated content, utilizing Vulkan API and Rust's versatility for GPU programming. They seek to enhance Rust's ecosystem for GPU programming.

GPU-Friendly Stroke Expansion

The paper introduces a GPU-friendly technique for stroke expansion in vector graphics, optimizing GPU rendering with parallel algorithms and minimal preprocessing. It addresses efficient rendering of stroked paths, enhancing performance in vector graphics.

GPU profiling for WebGPU workloads on Windows with Chrome

Challenges of GPU profiling for WebGPU in Chrome on Windows are addressed. A workaround using a custom DLL enables GPU profiling with tools like AMD's Radeon GPU Profiler and Nvidia's Nsight, enhancing performance metrics for WebGPU applications.

2 comments

By @jsheard - 10 months

In true Vulkan fashion, this feature which was announced for DirectX a year ago and is now shipping hasn't even been publicly acknowledged by Khronos yet :(

I'm sure they're working on it, but the slower pace imposed by the much bigger committee including the mobile GPU vendors is a drag for those who only care about high performance hardware, and in the end their version usually ends up mirroring the design of the DirectX version anyway.

By @CooCooCaCha - 10 months

I’m curious what the endgame is for GPUs. It seems like over-time they’ve been going down the road of generally-programmable parallel processors.

Will there come a time where, like CPUs, we land on a more general programming model and we don’t need graphics apis to introduce new features? Instead, if there’s a new graphics technique we don’t have to wait for a new DirectX feature, we can just code it ourselves?

A quick introduction to DirectX workgraphs

Related

20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

Eight million pixels and counting: improving texture atlas allocation in Firefox (2021)

Homegrown Rendering with Rust

GPU-Friendly Stroke Expansion

GPU profiling for WebGPU workloads on Windows with Chrome

Related

20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

Eight million pixels and counting: improving texture atlas allocation in Firefox (2021)

Homegrown Rendering with Rust

GPU-Friendly Stroke Expansion

GPU profiling for WebGPU workloads on Windows with Chrome