gpu.cpp: A lightweight library for portable low-level GPU computation
The GitHub repository features gpu.cpp, a lightweight C++ library for portable GPU compute using WebGPU. It offers fast cycles, minimal dependencies, and examples like GELU kernel and matrix multiplication for easy integration.
Read original articleThe GitHub repository at the provided URL contains information about gpu.cpp, a lightweight library focusing on portable GPU compute using C++. It utilizes the WebGPU specification for a low-level GPU interface, aiming for a high-power-to-weight ratio API with fast compile/run cycles and minimal dependencies. Developers can easily integrate GPU computation into their projects using standard C++ compilers, benefiting from a small API surface area and a prebuilt binary of the Dawn native WebGPU implementation. The library includes examples like GELU kernel, matrix multiplication, physics simulation, and signed distance function rendering, catering to projects needing portable on-device GPU computation with low implementation complexity.
Related
20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU
Using ONNX Runtime with WebGPU and WebAssembly in browsers achieves 20x speedup for background removal, reducing server load, enhancing scalability, and improving data security. ONNX models run efficiently with WebGPU support, offering near real-time performance. Leveraging modern technology, IMG.LY aims to enhance design tools' accessibility and efficiency.
A portable lightweight C FFI for Lua, based on libffi
A GitHub repository offers a portable lightweight C FFI for Lua, based on libffi. It aims for LuaJIT FFI compatibility, developed in C. Includes features, examples, basic types, build instructions, testing, and acknowledgements.
Show HN: UNet diffusion model in pure CUDA
The GitHub content details optimizing a UNet diffusion model in C++/CUDA to match PyTorch's performance. It covers custom convolution kernels, forward pass improvements, backward pass challenges, and future optimization plans.
GPU profiling for WebGPU workloads on Windows with Chrome
Challenges of GPU profiling for WebGPU in Chrome on Windows are addressed. A workaround using a custom DLL enables GPU profiling with tools like AMD's Radeon GPU Profiler and Nvidia's Nsight, enhancing performance metrics for WebGPU applications.
Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c
The GitHub repository focuses on the "llm.c" project by Andrej Karpathy, aiming to implement Large Language Models in C/CUDA without extensive libraries. It emphasizes pretraining GPT-2 and GPT-3 models.
I've worked with OpenGL and Direct3D and Metal in the past, but the pure compute side of GPUs is mostly foreign to me. Learning CUDA always felt like a big time investment when I never had an obvious need at hand.
So I'm definitely going to play with library and try to get up to speed. Thanks for publishing it.
Shameless plug: in case anyone wants to see how doing just compute with Vulkan looks like, I wrote a similar library to compete on SHAllenge [0], which was posted here on HN a few days ago. My library is here: https://github.com/0xf00ff00f/vulkan-compute-playground/
But since this is starting I'm happy to chat. Nice to see the interest here!
What are the benefits, if any, of using gpu.cpp instead of just webgpu.h (webgpu native) directly? Maybe each is tailored for different use cases?
Noo
Related
20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU
Using ONNX Runtime with WebGPU and WebAssembly in browsers achieves 20x speedup for background removal, reducing server load, enhancing scalability, and improving data security. ONNX models run efficiently with WebGPU support, offering near real-time performance. Leveraging modern technology, IMG.LY aims to enhance design tools' accessibility and efficiency.
A portable lightweight C FFI for Lua, based on libffi
A GitHub repository offers a portable lightweight C FFI for Lua, based on libffi. It aims for LuaJIT FFI compatibility, developed in C. Includes features, examples, basic types, build instructions, testing, and acknowledgements.
Show HN: UNet diffusion model in pure CUDA
The GitHub content details optimizing a UNet diffusion model in C++/CUDA to match PyTorch's performance. It covers custom convolution kernels, forward pass improvements, backward pass challenges, and future optimization plans.
GPU profiling for WebGPU workloads on Windows with Chrome
Challenges of GPU profiling for WebGPU in Chrome on Windows are addressed. A workaround using a custom DLL enables GPU profiling with tools like AMD's Radeon GPU Profiler and Nvidia's Nsight, enhancing performance metrics for WebGPU applications.
Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c
The GitHub repository focuses on the "llm.c" project by Andrej Karpathy, aiming to implement Large Language Models in C/CUDA without extensive libraries. It emphasizes pretraining GPT-2 and GPT-3 models.